“OK Boomer” escalated quickly — a reddit+BigQuery report
Let’s use BigQuery to find the first time that someone commented “OK Boomer” on reddit. Turns out it happened hours and even years before the alleged first tweet. Was this all an attempt by teenagers to replace the word “old” with “boomer”? Let’s dig in.
Important update: I left Google and joined Snowflake in 2020 — so I’m unable to keep my older posts updated. If you want to try Snowflake, join us — I’m having a lot of fun ❄️.
“OK boomer” growth by subreddit:
The first “OK boomer”
According to Know Your Meme this tweet marks the first appearance of “ok boomer”:
And the author has been celebrating it:
But was this the real first appearance? 90 minutes earlier, that same day — April 12, 2018— someone else had immortalized “OK BOOMERS” on reddit:
Turns out there are 9 reddit comments even before that, with “Ok boomer” starting in September 2009:
And things didn’t escalate quickly back then. It took 9 years to get the first 9 “OK boomer” on reddit, and then we have to jump until October 2018. That’s when we start seeing an almost daily dose of the phrase:
And it had spread through different subreddits! From /r/Tinder, to /r/worldnews, to /r/xboxone, to /r/golang — redditors started embracing it.
And something happened on March 2019. Until then we had a spotty supply of the phrase, but on March 24 six different accounts decide to comment “OK boomer”. Since that day, the trend continues going up until June 2019, with more than 30 accounts commenting “OK boomer” per day.
And then we have a gap in our data — you can draw your own guess on how the usage continued escalating to ~200 accounts commenting daily by mid October — with a huge peak on September 9, 2019:
What happened on September 9? This post by /r/teenagers:
Oh yeah. This was a petition by teenagers to replace the world “old” with “boomer” — and 30k upvotes agreed with it.
But for the real explosion in usage, you have to wait until October 30, 2019:
That’s a day after the NY Times reported on the phrase:
And what happened next? “OK boomer” escalated quickly. With a peak on November 6, when more than 12k accounts posted the phrase:
And you can find these comments all over reddit:
You can now play with the interactive dashboard, to find all sorts of patterns within these comments:
How-to
I used two different sources of data:
- Historical reddit archives stored in BigQuery, for all comments before June 2019.
- Live pushshift.io API for all newer comments. Extracted like described in my previous post.
To extract all of the historical reddit comments, I used this query:
CREATE TABLE `reddit_extracts.201906_all_okboomer`
PARTITION BY fake_date
CLUSTER BY subreddit, ts
AS
SELECT TIMESTAMP_SECONDS(created_utc) ts, *, DATE('2000-01-01') fake_date
FROM `reddit_comments.2*`
WHERE REGEXP_CONTAINS(body, r'(?i)\bok boomer')
AND _table_suffix >= '018_03';
INSERT INTO `reddit_extracts.201906_okboomer_all`SELECT TIMESTAMP_SECONDS(created_utc) ts, *, DATE('2000-01-01') fake_date
FROM `reddit_comments.2*`
WHERE REGEXP_CONTAINS(body, r'(?i)\bok boomer')
AND _table_suffix BETWEEN '0' AND '018_02'
Find the shared table here: console.cloud.google.com/bigquery?p=fh-bigquery&d=reddit_extracts&t=201906_okboomer_all&page=table
To create the table summarizing live and archived comments for Data Studio:
CREATE OR REPLACE TABLE `reddit_extracts.201911_okboomer_day`
ASWITH data AS (
SELECT ts, author, subreddit, score, body
, permalink
FROM `reddit_extracts.201910_live_okboomer`
UNION ALL
SELECT ts, author, subreddit, score, body
, CONCAT('/r/',subreddit,'/comments/', REGEXP_REPLACE(link_id, 't3_', ''), '//', id, '/') permalink
FROM `reddit_extracts.201906_okboomer_all`
)SELECT day, authors
, (SELECT STRING_AGG(value, ', ') FROM UNNEST(top_reddit)) top_subs
, top_comm
FROM (
SELECT TIMESTAMP_TRUNC(ts, DAY) day, COUNT(DISTINCT author) authors
, APPROX_TOP_COUNT(subreddit, 3) top_reddit
, ARRAY_AGG(STRUCT(score, subreddit, permalink, body) ORDER BY score DESC LIMIT 1)[OFFSET(0)] top_comm
FROM data
WHERE body NOT IN ('[deleted]', '[removed]')
AND REGEXP_CONTAINS(body, r'(?i)\bok boomer[s]?\b')
AND ts < TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)
GROUP BY 1
)
ORDER BY 1 DESC
The animated chart
I used a Data Studio custom visualization created by Michael Whitaker.
The query to count uniques uses some HLL magic for efficiency:
CREATE OR REPLACE TABLE `reddit_extracts.201911_okboomer_subgrowth`
ASWITH data AS (
SELECT ts, author, subreddit, score, body
, permalink
FROM `reddit_extracts.201910_live_okboomer`
UNION ALL
SELECT ts, author, subreddit, score, body
, CONCAT('/r/',subreddit,'/comments/', REGEXP_REPLACE(link_id, 't3_', ''), '//', id, '/') permalink
FROM `reddit_extracts.201906_okboomer_all`
)
, sketches AS (
SELECT TIMESTAMP_TRUNC(ts, DAY) day
, subreddit
, HLL_COUNT.INIT(author) sketch
, COUNT(DISTINCT author) authors
FROM data
WHERE body NOT IN ('[deleted]', '[removed]')
AND REGEXP_CONTAINS(body, r'(?i)\bok boomer[s]?\b')
GROUP BY 1,2
), notable_early AS (
SELECT DISTINCT subreddit
FROM data
WHERE ts < '2018-04-15'
), overall_notable AS (
SELECT subreddit, COUNT(DISTINCT author) c
FROM data
GROUP BY 1
ORDER BY c DESC
LIMIT 30
)SELECT *
FROM (
SELECT *, IFNULL(authors - LAG(authors) OVER(PARTITION BY subreddit ORDER BY day), authors) authors_diff
FROM (
SELECT a.day
, CASE
WHEN subreddit IN (SELECT * from notable_early UNION ALL (SELECT subreddit from overall_notable)) THEN subreddit
-- ELSE '[other]'
END AS subreddit
, COUNT(*) c, HLL_COUNT.MERGE(sketch) authors
FROM (
SELECT DISTINCT CASE
WHEN day< '2019-07-01' THEN TIMESTAMP_TRUNC(day, QUARTER)
WHEN day< '2019-09-01' THEN TIMESTAMP_TRUNC(day, MONTH)
WHEN day< '2019-10-30' THEN TIMESTAMP_TRUNC(day, WEEK)
ELSE day
END AS day
FROM sketches) a
JOIN sketches b
ON b.day<=a.day
GROUP BY 1,2
)
)
WHERE subreddit IS NOT null
Want more?
Check my previous post for more details on collecting live data from pushshift.io. Thanks Jason Baumgartner for the constant supply of data!
I’m Felipe Hoffa, a Developer Advocate for Google Cloud. Follow me on @felipehoffa, find my previous posts on medium.com/@hoffa, and all about BigQuery on reddit.com/r/bigquery.