The real top Stack Overflow questions

Update 2020 — Check my new post “Stack Overflow in 2023: Predicting with ARIMA and BigQuery

Here’s the interactive dashboard you can use to get the answers to your specific questions. Below, find the story of how it was created and why it matters.

Interactive dashboard. Play with it to find the top questions for any tag.

The top Stack Overflow questions — all time vs current

Let’s see first what questions have received the largest number of pageviews through time:

The top Stack Overflow questions of all time. What’s missing?

Some observations:

  • The top Stack Overflow question of all time — with more than 7 million views since its creation 9 years ago — is not even a programming question: “How do I undo the most recent commits in Git”.
  • From the top 10 questions, 4 are related to git, 3 to JavaScript, 1 to Java, 1 to Linux, and 1 to HTML. What’s missing?
  • There are no Python questions in the top 10 list. Surprised?

So let’s take a look at the top 10 question — not from all time, but from this last quarter. This is how the Stack Overflow world looks now:

The top Stack Overflow questions of 2018Q4. What tag is missing now?

Some observations:

  • The top question shows that people are still trying to figure out “How do I undo the most recent commits in Git” — with more than 400k pageviews this last quarter.
  • The programming world has changed: Now 4 of the top 10 questions are related to Python.
  • Java is gone — it doesn’t show up within the top 10 questions any more.

The trend is even more evident if you look at the tags for the top 30 questions:

Top 10 and top 30 Stack Overflow questions. Current vs all time

Java and SQL are gone from the top 30. Meanwhile Python now dominates the top 10 and top 30 Stack Overflow questions — by current pageviews.

So what are the top questions for each tag

Now you can look at what the top questions for any tag are. Before we take a look at them — can you guess the top questions for JavaScript, Python, Go?

JavaScript, Python, Go: Different challenges for each

Top 10 questions Q42018 for Python
Top 10 questions Q42018 for JavaScript
Top 10 questions Q42018 for Go

The first thing you can notice with this dashboard are the top current 10 questions for any tag, and how they’ve trended through time. Python devs keep asking about “Iterating over dictionaries using ‘for’ loops”, JavaScript devs ask “How do I remove a particular element from an array in JavaScript?”, and Go devs need to know “What is the best way to convert byte array to string?”.

TensorFlow struggles

Top 10 questions Q42018 for TensorFlow

People are really excited about machine learning, and TensorFlow is one of the top projects in the space. But most of the top questions show people struggling with issues like “TensorFlow not found using pipor the ability to work with their current CPU/GPU combos.

A worthy goal for the TensorFlow team: With this dashboard they’ll be able to track how these questions trend down through time — if they work out a foolproof way for devs to install TensorFlow with pip.

Kotlin top 10: Now and then

With the dashboard you can jump to see the top 10 questions at any given quarter. For example, let’s compare the top 10 for Kotlin in 2018Q4 vs 2017Q2:

Top 10 Kotling questions in 2017Q2 vs 2018 Q4

This looks good: Most top questions from 2017 disappeared into the void — and as a guess, let’s say that the team was great at solving common issues people had back then. And on the 2018Q4 side you can see how most current top questions are new, and not the same as many months ago.

Redis top questions — focusing on Java vs C#

This are the top 10 questions for Redis:

Top 10 questions for Redis

You can see that the current top question for Redis ask how to install it on Windows. If you are a Redis developer working to improve the Java libraries, this doesn’t tell you much — luckily our dashboard also allows us to filter for subtags.

Let’s look at the top 10 Redis+Java questions instead:

Top 10 questions for Redis + Java

If we filter for “redis” and “Java” then we see different questions: Most of these are related to Redis and Spring, for example. Which are different to the top question for Redis and C#:

Top 10 questions for Redis + C#

Oh yeah — the top question for Redis + Java asks for simple C# examples. Maybe developers can’t find these in the official docs?

One question, so many replies

With this dashboard you can also see how many answers each question has. For example, the top question “Convert a String to int in Rust?“ has 5 different answers. Go has 12 different replies to “What is the best way to convert byte array to string?” and 20 to “How to efficiently concatenate strings in Go?”:

Top questions for Rust and Go

Dig beyond the top 10

Why stop at the top 10? You can also go deeper into the top 30:

Top 21 to 30 questions for jQuery

How-to: Queries

You can find all this data in BigQuery. Every 3 months Stack Overflow publishes a snapshot of their latest data, and we make sure to have a fresh copy ready to be queried.

Top Stack Overflow questions, current vs all time

SELECT (
SELECT tag
FROM UNNEST(tags)
ORDER BY view_count DESC LIMIT 1
) tag
, * EXCEPT(tags)
FROM (
SELECT quarter_views , view_count
, ROW_NUMBER() OVER(ORDER BY quarter_views DESC) q_ranking
, ROW_NUMBER() OVER(ORDER BY view_count DESC) ranking
, ARRAY(
SELECT AS STRUCT tag, b.view_count
FROM UNNEST(tags) tag
JOIN `fh-bigquery.stackoverflow_archive_questions.merged_aux_tags` b
ON tag=b.tag
) tags, title
FROM `fh-bigquery.stackoverflow_archive_questions.merged`
WHERE quarter='2018-12-01'
AND view_count > 50000
)
WHERE q_ranking<30 OR ranking <30
ORDER BY 1 DESC

Finding the number of pageviews for each question through time

For this I started by storing a copy of each snapshot through time — and then proceeded to calculate the difference of pageviews between the snapshots.

Luckily the query looks pretty simple:

CREATE OR REPLACE TABLE `stackoverflow_archive_questions.merged` 
AS
SELECT
IFNULL(
view_count -
LAG(view_count) OVER(PARTITION BY id ORDER BY view_count)
, view_count) quarter_views, *
FROM (
SELECT PARSE_DATE('%Y%m',_table_suffix) quarter
, id, view_count
, SPLIT(tags, '|') tags
, score, creation_date, answer_count
, accepted_answer_id, title
FROM `fh-bigquery.stackoverflow_archive_questions.q*`
)

Top questions per tag

#standardSQL
SELECT title, quarter_views, view_count
FROM `fh-bigquery.stackoverflow_archive_questions.merged`
WHERE 'google-cloud-dataflow' IN UNNEST(tags)
AND quarter='2018-12-01'
ORDER BY quarter_views DESC
LIMIT 10
Top current questions for Dataflow

Top current questions that haven’t been updated in more than a year

#standardSQL    
WITH top_questions AS (
SELECT id, title, quarter_views, view_count
FROM `fh-bigquery.stackoverflow_archive_questions.merged`
WHERE 'google-cloud-dataflow' IN UNNEST(tags)
AND quarter='2018-12-01'
), latest_answer AS (
SELECT parent_id, DATE(MAX(COALESCE(last_edit_date, last_activity_date, creation_date))) answer_last_edit_date
FROM `bigquery-public-data.stackoverflow.posts_answers` b
GROUP BY parent_id
)
SELECT SUBSTR(title, 0,80) title, quarter_views, view_count, answer_last_edit_date
FROM top_questions a
JOIN latest_answer b
ON a.id=b.parent_id
WHERE DATE_DIFF(CURRENT_DATE(), answer_last_edit_date, DAY)>360
ORDER BY quarter_views DESC
LIMIT 10
Top current questions for Dataflow, that haven’t been updated in a year

Go beyond

Now you can go back to the top of this post to play with the dashboard. Please share what you find!

Then also check out what tags are trending and which ones are going down, with The real Stack Overflow trends:

Predict how long it takes for Stack Overflow to answer a question:

Next steps

Want more stories? Check my Medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.

--

--

--

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Testing with DynamoDB Toolbox

Rtl8111e Linux Driver Download

How to root Hitech Amaze s430 plus

Root LG Phone

EOS/WAX Archives Site OPEN

Kubernetes Cost Monitoring with Prometheus & Grafana

Screenshot of a Grafana dashboard to calculate costs

DevOps and Telco Softwarisation — simple CI/CD example [#2]

devops loop

Developing Responsive Emails With MJML Framework

Waving goodbye to Internet Explorer 11 in 2021

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Felipe Hoffa

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

More from Medium

Google attacks Microsoft in the Field of Data Analytics and IoT

Big data and the Cloud: A Blog about the Implementation of Big Data in the Cloud

Automated Data Pipeline Testing using Great Expectations

Regulating for Data Protection: GDPR