# of unique reddit accounts per day commenting “OK boomer”

“OK Boomer” escalated quickly — a reddit+BigQuery report

Let’s use BigQuery to find the first time that someone commented “OK Boomer” on reddit. Turns out it happened hours and even years before the alleged first tweet. Was this all an attempt by teenagers to replace the word “old” with “boomer”? Let’s dig in.

Felipe Hoffa
5 min readNov 9, 2019

--

Important update: I left Google and joined Snowflake in 2020 — so I’m unable to keep my older posts updated. If you want to try Snowflake, join us — I’m having a lot of fun ❄️.

“OK boomer” growth by subreddit:

# of unique accounts per sub that have commented “OK Boomer”

The first “OK boomer”

According to Know Your Meme this tweet marks the first appearance of “ok boomer”:

The first tweet for “Ok Boomer”. Source: Know Your Meme

And the author has been celebrating it:

But was this the real first appearance? 90 minutes earlier, that same day — April 12, 2018— someone else had immortalized “OK BOOMERS” on reddit:

An “OK boomer” on reddit, 90 minutes before that first tweet.

Turns out there are 9 reddit comments even before that, with “Ok boomer” starting in September 2009:

“OK boomer” reddit prehistory: 9 comments between 2009 and April 12, 2018

And things didn’t escalate quickly back then. It took 9 years to get the first 9 “OK boomer” on reddit, and then we have to jump until October 2018. That’s when we start seeing an almost daily dose of the phrase:

Waiting from April 2018 until October 2018 to find an almost daily dose of “OK Boomer” on reddit.

And it had spread through different subreddits! From /r/Tinder, to /r/worldnews, to /r/xboxone, to /r/golang — redditors started embracing it.

And something happened on March 2019. Until then we had a spotty supply of the phrase, but on March 24 six different accounts decide to comment “OK boomer”. Since that day, the trend continues going up until June 2019, with more than 30 accounts commenting “OK boomer” per day.

The embrace of “OK boomer” from Aug 2018 to Jun 2019 on reddit.

And then we have a gap in our data — you can draw your own guess on how the usage continued escalating to ~200 accounts commenting daily by mid October — with a huge peak on September 9, 2019:

What happened on September 9? This post by /r/teenagers:

Oh yeah. This was a petition by teenagers to replace the world “old” with “boomer” — and 30k upvotes agreed with it.

But for the real explosion in usage, you have to wait until October 30, 2019:

October 30, 2019: 2,640 accounts post “OK boomer” on reddit

That’s a day after the NY Times reported on the phrase:

And what happened next? “OK boomer” escalated quickly. With a peak on November 6, when more than 12k accounts posted the phrase:

And you can find these comments all over reddit:

You can now play with the interactive dashboard, to find all sorts of patterns within these comments:

Play with the interactive report, or load it full size.

How-to

I used two different sources of data:

To extract all of the historical reddit comments, I used this query:

CREATE TABLE `reddit_extracts.201906_all_okboomer`
PARTITION BY fake_date
CLUSTER BY subreddit, ts
AS
SELECT TIMESTAMP_SECONDS(created_utc) ts, *, DATE('2000-01-01') fake_date
FROM `reddit_comments.2*`
WHERE REGEXP_CONTAINS(body, r'(?i)\bok boomer')
AND _table_suffix >= '018_03'
;
INSERT INTO `reddit_extracts.201906_okboomer_all`
SELECT TIMESTAMP_SECONDS(created_utc) ts, *, DATE('2000-01-01') fake_date
FROM `reddit_comments.2*`
WHERE REGEXP_CONTAINS(body, r'(?i)\bok boomer')
AND _table_suffix BETWEEN '0' AND '018_02'

Find the shared table here: console.cloud.google.com/bigquery?p=fh-bigquery&d=reddit_extracts&t=201906_okboomer_all&page=table

To create the table summarizing live and archived comments for Data Studio:

CREATE OR REPLACE TABLE `reddit_extracts.201911_okboomer_day`
AS
WITH data AS (
SELECT ts, author, subreddit, score, body
, permalink
FROM `reddit_extracts.201910_live_okboomer`
UNION ALL
SELECT ts, author, subreddit, score, body
, CONCAT('/r/',subreddit,'/comments/', REGEXP_REPLACE(link_id, 't3_', ''), '//', id, '/') permalink
FROM `reddit_extracts.201906_okboomer_all`
)
SELECT day, authors
, (SELECT STRING_AGG(value, ', ') FROM UNNEST(top_reddit)) top_subs
, top_comm
FROM (
SELECT TIMESTAMP_TRUNC(ts, DAY) day, COUNT(DISTINCT author) authors
, APPROX_TOP_COUNT(subreddit, 3) top_reddit
, ARRAY_AGG(STRUCT(score, subreddit, permalink, body) ORDER BY score DESC LIMIT 1)[OFFSET(0)] top_comm
FROM data
WHERE body NOT IN ('[deleted]', '[removed]')
AND REGEXP_CONTAINS(body, r'(?i)\bok boomer[s]?\b')
AND ts < TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)
GROUP BY 1
)
ORDER BY 1 DESC

The animated chart

I used a Data Studio custom visualization created by Michael Whitaker.

The query to count uniques uses some HLL magic for efficiency:

CREATE OR REPLACE TABLE `reddit_extracts.201911_okboomer_subgrowth`
AS
WITH data AS (
SELECT ts, author, subreddit, score, body
, permalink
FROM `reddit_extracts.201910_live_okboomer`
UNION ALL
SELECT ts, author, subreddit, score, body
, CONCAT('/r/',subreddit,'/comments/', REGEXP_REPLACE(link_id, 't3_', ''), '//', id, '/') permalink
FROM `reddit_extracts.201906_okboomer_all`
)
, sketches AS (
SELECT TIMESTAMP_TRUNC(ts, DAY) day
, subreddit
, HLL_COUNT.INIT(author) sketch
, COUNT(DISTINCT author) authors
FROM data
WHERE body NOT IN ('[deleted]', '[removed]')
AND REGEXP_CONTAINS(body, r'(?i)\bok boomer[s]?\b')
GROUP BY 1,2
), notable_early AS (
SELECT DISTINCT subreddit
FROM data
WHERE ts < '2018-04-15'
), overall_notable AS (
SELECT subreddit, COUNT(DISTINCT author) c
FROM data
GROUP BY 1
ORDER BY c DESC
LIMIT 30
)
SELECT *
FROM (
SELECT *, IFNULL(authors - LAG(authors) OVER(PARTITION BY subreddit ORDER BY day), authors) authors_diff
FROM (
SELECT a.day
, CASE
WHEN subreddit IN (SELECT * from notable_early UNION ALL (SELECT subreddit from overall_notable)) THEN subreddit
-- ELSE '[other]'
END AS subreddit
, COUNT(*) c, HLL_COUNT.MERGE(sketch) authors
FROM (
SELECT DISTINCT CASE
WHEN day< '2019-07-01' THEN TIMESTAMP_TRUNC(day, QUARTER)
WHEN day< '2019-09-01' THEN TIMESTAMP_TRUNC(day, MONTH)
WHEN day< '2019-10-30' THEN TIMESTAMP_TRUNC(day, WEEK)
ELSE day
END AS day
FROM sketches) a
JOIN sketches b
ON b.day<=a.day
GROUP BY 1,2
)
)
WHERE subreddit IS NOT null

Want more?

Check my previous post for more details on collecting live data from pushshift.io. Thanks Jason Baumgartner for the constant supply of data!

I’m Felipe Hoffa, a Developer Advocate for Google Cloud. Follow me on @felipehoffa, find my previous posts on medium.com/@hoffa, and all about BigQuery on reddit.com/r/bigquery.

--

--

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.