The top GitHub projects per country

Let’s find the most popular GitHub projects for each country. A simple query can give us that result: Take every star given on GitHub and look for the country of origin of users who have chosen to show their country on their GitHub profile. But we are going to go deeper than that.

9 min readApr 5, 2017

--

Step 1: The most popular GitHub project per country — naive count

The first pass results are somehow monotone, and not surprising: FreeCodeCamp was the most starred GitHub project by far in 2016:

The most starred project per country on GitHub during 2016 — naive count.

Step 2: What about the 2nd most popular project for each country?

The 2nd most starred project per country on GitHub during 2016 — naive count.

The 2nd most starred project for each country shows more variety results:

Let’s skip the top 75 worldwide GitHub projects, and only then choose the most popular project per country:

The most starred project per country on GitHub during 2016 — taking out the top 75 GitHub worldwide projects.

A list of the most interesting GitHub projects for each country:

Let’s get a list of top projects for each country, this time with some rules:

  • 10 top projects per country
  • Removing projects in the GitHub global top 75.
  • We are looking for projects special to each country — if a project shows up in many countries top 10, we’ll skip it and look for a more “special” one.

And these are our results:

Joshua Levy appears on top of the US — Americans show a lot of attention to compensation. It’s great to see Kelsey Hightower in this list too with his Kubernetes in hard mode. On data — I love data, in this case coming from FiveThirtyEight. And thanks Alex Gaynor for teaching us “what happens when”, Ben Brown for chat bots, J Delaney for easy applications, and everyone else listed here and beyond!

China — they love WeChat — and they have their own version of FreeCodeCamp.

Germany — Daniel Quinn is Canadian who lives in the UK, nevertheless Germans starred his project paperless to the top.

UK — great to see Max Woolf and Kelsey Hightower again. And on top of the UK, Rich Harris’s Svelte.

India seems to pay a lot of attention to CS educational repositories.

Brazil — where the top project translates to “love Serenade” and its mission is to give “Artificial Intelligence for social control of public administration”. Tell me more, Irio Musskopf at Data Science Brigade.

Russia — yandex is big there, as well as YoptaScript (as in Ukraine).

France seems to be big into PHP. I’d love to learn more.

Japan — I don’t know much about these projects, but they seem very Japanese oriented.

Canada — seems not only the us cares about stock options compensation.

Australia — not many Australian exclusive projects show up, but some of the ones we already saw. Not sure why relax/relax is big here, as their top maintainers live in Portugal (Bruno Mota, for example).

Taiwan — where the top project mission is to “Make the Taiwan a Better Place to Work”. I like it.

Ukraine — top project samgozman/YoptaScript says “Скриптовый язык программирования для гопников и реальных пацанов”. Makes sense.

Spain — Karumi is a Spanish company, and their Android framework is the top project for Spain. Great to see open data as their #2.

(comments pending)

Netherlands — They also pay a lot of attention to Max’s big list of naughty strings.

(comments pending)

South Korea — open government on top. Also interesting to see Apache Zeppelin on top of this country ranking, as I met many of its developers in Seoul (hello Seoul Engineer!).

Italy — where you can find the constitutional reform on top of their GitHub projects (and maybe send pull requests?).

(comments pending)

(comments pending)

(comments pending)

Switzerland — not only Germany has danielquinn/paperless within the top, Switzerland too. But the top project is “an opensource, publicly owned evoting system. It is the result of the collaboration between the Geneva State Chancellery and the Geneva IT Department”. Good.

(comments pending)

(comments pending)

Portugal — Max Woolf again between the top projects with his big list of naughty strings.

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

(comments pending)

Romania — where a list of cool projects made in Romania is on top of the Romanian projects. Thanks IonicaBizau!

Ireland: Not only the US and Canada care about stock options compensation. Probably due to a big presence of American tech companies, Ireland also shows its interest.

(comments pending)

Israel — on top of this list we can see Arik Fraimovich’s re:dash — my favorite open source dashboarding tool. This project is not exclusive to Israel, but since Arik lives there that apparently makes an awareness impact.

(comments pending)

(comments pending)

Philippines — Rico Sta. Cruz lives there, and his project shows on top.

(comments pending)

(comments pending)

Chile — my home country! Great to see projects from Juan Antonio Karmy, Agustin Feuerhake, platanus, Yerko Palma, Juan Antonio Karmy on top.

More countries? Find the raw list on a spreadsheet.

Queries

Datasets

Most popular GitHub project per country, naive count:

#standardSQL
SELECT REGEXP_EXTRACT(a.name, r'/(.*)'), stars, iso3
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn=1
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars DESC

Most popular GitHub projects per country, removing the top 75, special rank:

#standardSQL
SELECT REGEXP_EXTRACT(a.name, r'/(.*)'), stars, iso3
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND repo.id NOT IN (SELECT id FROM (
SELECT repo.id, COUNT(*) c
FROM `githubarchive.year.2016` a
WHERE a.type='WatchEvent'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 75
))
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn=1
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars DESC

Most popular GitHub projects per country, removing the top 75:

#standardSQL
SELECT country, a.name, stars, description, stars_country, other_countries_sc
FROM (
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn2
FROM (
SELECT *, ROUND(SUM(1/(1+ROUND(rn/20,0))) OVER(PARTITION BY repo_id),1) other_countries_sc
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn, SUM(stars) OVER(PARTITION BY country_code) stars_country, repo_id
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND repo.id NOT IN (SELECT id FROM (
SELECT repo.id, COUNT(*) c
FROM `githubarchive.year.2016` a
WHERE a.type='WatchEvent'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 75
))
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn<300
)
)
WHERE other_countries_sc<3
)
WHERE rn2<11
) a
LEFT JOIN (SELECT url, ANY_VALUE(description) description FROM `ghtorrent-bq.ght_2017_01_19.projects` GROUP BY 1) b
ON a.name=REPLACE(url, 'https://api.github.com/repos/', '')
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars_country DESC, country_code, stars DESC

Want more?

Want more stories? Check my Medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.

--

--

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.