Spurious correlations: Alphabet stock price and Wikipedia pageviews for Pokemons

Using my new super powers to automatically import stock prices into BigQuery, I went out to find what were the most correlated Wikipedia pages with GOOG’s stock price during July 2016. Would you be surprised if I told you it was the list of Pokemon Indigo episodes?
SELECT CORR(a.req, b.close) corr, title, COUNT(*) c, SUM(req) requests
FROM (
SELECT SUM(requests) req, title, COUNT(*) c, DAY(datehour) day, COUNT(*) OVER(PARTITION BY title) days
FROM [fh-bigquery:wikipedia.pagecounts_201607]
WHERE requests > 5
AND language = 'en'
GROUP BY title, day
HAVING c=24
) a
JOIN (
SELECT close, DAY(date) day
FROM [fh-bigquery:public_dump.goog]
WHERE MONTH(date)=7
) b
ON a.day=b.day
WHERE days>22
GROUP BY title
HAVING c>18
ORDER BY corr DESCQuery complete (20.6s elapsed, 305 GB processed)

To get the numbers to draw the time series for the chart:
SELECT a.day day, req List_of_Pokemon_episodes, close goog_close
FROM (
SELECT SUM(requests) req, title, DAY(datehour) day
FROM [fh-bigquery:wikipedia.pagecounts_201607]
WHERE title='List_of_Pok%C3%A9mon:_Indigo_League_episodes'
AND language = 'en'
GROUP BY title, day
) a
LEFT JOIN (
SELECT close, DAY(date) day
FROM [fh-bigquery:public_dump.goog]
WHERE MONTH(date)=7
) b
ON a.day=b.day
ORDER BY dayWarning 1
The correlation is funny, but spurious. With only 20 elements in a series, is easy to find highly correlated series within a list of hundreds of thousands of Wikipedia pageviews daily time series. But it’s fun :).
Warning 2
The above query goes over 305 GB of Wikipedia pageviews. This is within the monthly free terabyte query in BigQuery — but if I wanted to play more with this, I would extract first a summary of the pageviews I’m into a way smaller table.
More?
Want more stories? Check my medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.
Disclaimer: I’m a Google employee and nothing in this post constitutes a recommendation on whether to buy, sell or hold shares of any particular stock.