Spurious correlations: Alphabet stock price and Wikipedia pageviews for Pokemons

Using my new super powers to automatically import stock prices into BigQuery, I went out to find what were the most correlated Wikipedia pages with GOOG’s stock price during July 2016. Would you be surprised if I told you it was the list of Pokemon Indigo episodes?

SELECT CORR(a.req, b.close) corr, title, COUNT(*) c, SUM(req) requests
FROM (
SELECT SUM(requests) req, title, COUNT(*) c, DAY(datehour) day, COUNT(*) OVER(PARTITION BY title) days
FROM [fh-bigquery:wikipedia.pagecounts_201607]
WHERE requests > 5
AND language = 'en'
GROUP BY title, day
HAVING c=24
) a
JOIN (
SELECT close, DAY(date) day
FROM [fh-bigquery:public_dump.goog]
WHERE MONTH(date)=7
) b
ON a.day=b.day
WHERE days>22
GROUP BY title
HAVING c>18
ORDER BY corr DESC
Query complete (20.6s elapsed, 305 GB processed)
The most correlated Wikipedia pages with GOOG stock price July 2016

To get the numbers to draw the time series for the chart:

SELECT a.day day, req List_of_Pokemon_episodes, close goog_close
FROM (
SELECT SUM(requests) req, title, DAY(datehour) day
FROM [fh-bigquery:wikipedia.pagecounts_201607]
WHERE title='List_of_Pok%C3%A9mon:_Indigo_League_episodes'
AND language = 'en'
GROUP BY title, day
) a
LEFT JOIN (
SELECT close, DAY(date) day
FROM [fh-bigquery:public_dump.goog]
WHERE MONTH(date)=7
) b
ON a.day=b.day
ORDER BY day

Warning 1

The correlation is funny, but spurious. With only 20 elements in a series, is easy to find highly correlated series within a list of hundreds of thousands of Wikipedia pageviews daily time series. But it’s fun :).

Warning 2

The above query goes over 305 GB of Wikipedia pageviews. This is within the monthly free terabyte query in BigQuery — but if I wanted to play more with this, I would extract first a summary of the pageviews I’m into a way smaller table.

More?

Want more stories? Check my medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.

Disclaimer: I’m a Google employee and nothing in this post constitutes a recommendation on whether to buy, sell or hold shares of any particular stock.

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.