Hacker News on BigQuery: Now with daily updates — So what are the top domains?

We published a copy of all Hacker News contents in BigQuery back in 2015. It was time for an update, and even better than that, how about daily updates? In this post let’s look at the HN favorite sources, and changes in the best times to post.

Most frequent domains on HN, 2017. Source: Hacker News copy shared in BigQuery
#standardSQL
SELECT REGEXP_EXTRACT(url, '//([^/]*)/?') domain, COUNT(*) c
FROM `bigquery-public-data.hacker_news.full`
WHERE url!='' AND EXTRACT(YEAR FROM timestamp)=2017
GROUP BY domain ORDER BY c DESC LIMIT 10
Most frequent domains with score>40 on HN, 2017. Source: Hacker News copy shared in BigQuery
#standardSQL
SELECT REGEXP_EXTRACT(url, '//([^/]*)/?') top_domains_2017, COUNT(*) count, COUNTIF(score>40) score_gt_40
FROM `bigquery-public-data.hacker_news.full`
WHERE url!='' AND EXTRACT(YEAR FROM timestamp)=2017
GROUP BY 1 ORDER BY 3 DESC LIMIT 10
Domains which have landed the most % of submissions in the HN frontpage, 2017 (score>40, count>30). Source: Hacker News copy shared in BigQuery
#standardSQL
SELECT REGEXP_EXTRACT(url, '//([^/]*)/?') top_domains_2017, COUNT(*) count, COUNTIF(score>40) score_gt_40
, ROUND(100*COUNTIF(score>40)/COUNT(*),2) chances_of_homepage
FROM `bigquery-public-data.hacker_news.full`
WHERE url!='' AND EXTRACT(YEAR FROM timestamp)=2017
GROUP BY 1
HAVING count>30
ORDER BY 4 DESC LIMIT 20
All domains with > 300 posts on HN, 2017, ranked by chances of getting >40 upvotes. Source: Hacker News copy shared in BigQuery
#standardSQL
SELECT REGEXP_EXTRACT(url, '//([^/]*)/?') top_domains_2017, COUNT(*) count, COUNTIF(score>40) score_gt_40
, ROUND(100*COUNTIF(score>40)/COUNT(*),2) chances_of_homepage
FROM `bigquery-public-data.hacker_news.full`
WHERE url!='' AND EXTRACT(YEAR FROM timestamp)=2017
GROUP BY 1
HAVING count>300
ORDER BY 4 DESC
All domains with > 1350 posts to HN in 2016, ranked by chances of getting >40 upvotes. Source: Hacker News copy shared in BigQuery
All domains with >(350, 280) posts to HN in (2008, 2009), ranked by chances of getting >40 upvotes. Source: Hacker News copy shared in BigQuery

Best time to submit for a front page chance

Hacker News: For each week, which was the hour with the highest chance of front page. US/Pacific time. Source: Hacker News copy shared in BigQuery
Top time to post on HN: Mondays vs Saturdays. US/Pacific time. Source: Hacker News copy shared in BigQuery
#standardSQL
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY month ORDER BY chances_of_homepage DESC) rn
FROM (
SELECT TIMESTAMP_TRUNC(timestamp, WEEK) month
, EXTRACT(HOUR FROM timestamp AT TIME ZONE 'US/Pacific') hour
, ROUND(100*COUNTIF(score>40)/COUNT(*),4) chances_of_homepage
, COUNT(*) c
FROM `bigquery-public-data.hacker_news.full`
WHERE type='story' AND EXTRACT(YEAR FROM timestamp)>2012
AND EXTRACT(DAYOFWEEK FROM timestamp)=2
GROUP BY 1,2
)
)
WHERE rn=1 ORDER BY 1,2

Want more?

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.