The top weekend programming languages — based on GitHub’s activity

Felipe Hoffa
5 min readFeb 10, 2017

--

Stack Overflow published an article analyzing the “top weekend programming languages”. One of their data scientists — Julia Silge — did an awesome job, but she only analyzed Stack Overflow tags. Many questions were raised on reddit and Hacker News, and I’m going to use data from GitHub’s commits to find them an answer.

The top weekend languages 2016:

The top weekend languages 2016. Source: GitHub+GHTorrent+BigQuery

Rust, Glsl, D, Haskell, Common Lisp, Kicad, Emacs Lisp, Lua, Scheme, Julia, Elm, Eagle, Racket, Dart, Nsis, Clojure, Kotlin, Elixir, F#, Ocaml: Clearly 2016 was a year dedicated to play with functional languages, up and coming paradigms, and scripting 3d worlds.

The top weekday languages 2016:

The top weekday languages 2016. Source: GitHub+GHTorrent+BigQuery

Nginx, Matlab, Processing, Vue, Fortran, Visual Basic, Objective-C++, Plsql, Plpgsql, Web Ontology Language, Smarty, Groovy, Batchfile, Objective-C, Powershell, Xslt, Cucumber, Hcl, Puppet, Gcc Machine Description: These are languages that people would rather code at the office than on their free time — at least in 2016.

Who wants to write puppet scripts during their weekend?

Charting some interesting languages through time

Weekend ranking for languages, 2010-2016. Source: GitHub+GHTorrent+BigQuery

Some notes from the hand picked chart:

  • Rust used to be a weekday language. Not anymore, it quickly rose to be a weekend one.
  • The more popular Go grows, the more it settles as a weekday language #CorporateLife.
  • Puppet is the champion of weekday coders.
  • Ruby is slowly leaving the week and embracing the weekend.
  • R had a crazy 2014–2015 during weekends, but in 2016 it came back towards the week.
  • Haskell and Clojure: The eternal “I will learn this during the weekend” langs.
  • Arduino’s seem to be a popular weekend hobby, but also slowly embraced into the week.
  • Python and C play comfortably on both camps.

By popular request, how other popular languages rank through time:

Weekend ranking for other popular languages, 2010–2016. Source: GitHub+GHTorrent+BigQuery

Let’s answer some reddit+HN questions:

/u/techmidrop says: “My mom says I can only use Rust from 4–6pm on Sundays”

You are not alone! Rust has been the top weekend language these last few years. But back in 2010, it showed up as one of the top weekday languages.

/u/mooglinux says:Now I want to see a report from Github on which languages are most used in weekend commits vs weekday

You’re welcome!

/u/MasterRaceLordGaben says: “Assembly for fun on weekends!? Who are these people?”

Students doing homework? Assembly didn’t show up as notoriously here — it might be that they go to Stack Overflow in search of answers, but they are not really trying to push code

/u/TheGoodPlaceJanet insists: “Wouldn’t analysing GitHub commits give a better perspective than SO tags?

Hopefully this article does.

Queries and datasets

I quickly analyzed these datasets with BigQuery. You can try it too, even in the next 5 minutes — it’s that easy.

Ranking the top languages 2016:

#standardSQL
SELECT lang
, ROUND(weekend/weekday,2) ratio
, weekday, weekend
, repos[OFFSET(0)].value sample_repo, repos[OFFSET(1)].value sample_repo_2
FROM (
SELECT lang, month
, MAX(IF(weekday,c,null)) weekday, MAX(IF(NOT weekday,c,null)) weekend
, ANY_VALUE(repos) repos
FROM (
SELECT language lang, TIMESTAMP_TRUNC(created_at, YEAR) month
, EXTRACT(DAYOFWEEK FROM a.created_at) BETWEEN 2 AND 6 weekday
, COUNT(DISTINCT committer_id) c
, APPROX_TOP_COUNT(repo, 3) repos
FROM `ghtorrent-bq.ght_2017_01_19.commits` a
JOIN `fh-bigquery.github_extracts.ght_project_languages` b
ON a.project_id=b.project_id
WHERE b.percent>0.25
AND EXTRACT(YEAR FROM a.created_at) BETWEEN 2016 AND 2016
GROUP BY 1,2,3
HAVING c>100
)
GROUP BY 1,2
)
WHERE (weekend+weekday)>1450
ORDER BY ratio DESC

Charting 2010–2016:

#standardSQL
SELECT *
FROM (
SELECT *, 40-rn inv_rank, ROW_NUMBER() OVER(PARTITION BY month ORDER BY ratio) weekend_rank,
MAX(month) OVER(PARTITION BY lang) max_month
FROM (
SELECT lang, month
, ROUND(weekend/weekday,2) ratio
, weekday, weekend, weekday+weekend total
, repos[OFFSET(0)].value sample_repo
, ROW_NUMBER() OVER(PARTITION BY month ORDER BY weekday+weekend DESC) rn
FROM (
SELECT lang, month
, MAX(IF(weekday,c,null)) weekday, MAX(IF(NOT weekday,c,null)) weekend
, ANY_VALUE(repos) repos
FROM (
SELECT language lang, TIMESTAMP_TRUNC(created_at, YEAR) month
, EXTRACT(DAYOFWEEK FROM a.created_at) BETWEEN 2 AND 6 weekday
, COUNT(DISTINCT committer_id) c
, APPROX_TOP_COUNT(repo, 3) repos
FROM `ghtorrent-bq.ght_2017_01_19.commits` a
JOIN `fh-bigquery.github_extracts.ght_project_languages` b
ON a.project_id=b.project_id
WHERE b.percent>0.25
AND language IN UNNEST(SPLIT('rust,haskell,c,clojure,arduino,ruby,python,go,r,puppet,xml'))
AND EXTRACT(YEAR FROM a.created_at) BETWEEN 2010 AND 2016
GROUP BY 1,2,3
)
GROUP BY 1,2
)
)
WHERE rn<=40
)
ORDER BY 2,3 DESC

Extra queries:

To measure by file extensions, you can try:

SELECT lang
, EXTRACT(DAYOFWEEK FROM date) BETWEEN 2 AND 6 weekday
, COUNT(DISTINCT email) c
, APPROX_TOP_COUNT(repo, 3) repos
FROM (
SELECT author.email, LOWER(REGEXP_EXTRACT(diff.new_path, r'\.([^\./\(~_ \- #]*)$')) lang, author.date, repo_name[OFFSET(0)] repo
FROM `bigquery-public-data.github_repos.commits`, UNNEST(difference) diff
WHERE EXTRACT(YEAR FROM author.date)=2016
)
WHERE lang IS NOT null
AND LENGTH(lang)<8
AND REGEXP_CONTAINS(lang, '[a-zA-Z]')
GROUP BY 1,2
HAVING c>100
ORDER BY c DESC

To measure by pull requests:

SELECT lang
, MAX(IF(weekday,c,null)) weekday, MAX(IF(NOT weekday,c,null)) weekend , ANY_VALUE(repos) repos
FROM (
SELECT JSON_EXTRACT_SCALAR(payload, '$.pull_request.head.repo.language') lang
, EXTRACT(DAYOFWEEK FROM created_at) BETWEEN 2 AND 6 weekday
, APPROX_TOP_COUNT(repo.name, 3) repos
, COUNT(DISTINCT actor.id) c
FROM `githubarchive.year.2016`
WHERE type='PullRequestEvent'
GROUP BY 1,2
)
GROUP BY 1

Datasets:

For this post, I used mainly the newest GHTorrent import on BigQuery (thanks Georgios Gousios).

You might also want to check GitHub Archive, GitHub repos on BigQuery, and Stack Overflow on BigQuery.

Want more?

Want more stories? Check my Medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.

--

--

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.