The top GitHub projects per country

Let’s find the most popular GitHub projects for each country. A simple query can give us that result: Take every star given on GitHub and look for the country of origin of users who have chosen to show their country on their GitHub profile. But we are going to go deeper than that.

Step 1: The most popular GitHub project per country — naive count

The most starred project per country on GitHub during 2016 — naive count.

Step 2: What about the 2nd most popular project for each country?

The 2nd most starred project per country on GitHub during 2016 — naive count.

Let’s skip the top 75 worldwide GitHub projects, and only then choose the most popular project per country:

The most starred project per country on GitHub during 2016 — taking out the top 75 GitHub worldwide projects.

A list of the most interesting GitHub projects for each country:

Queries

Datasets

Most popular GitHub project per country, naive count:

#standardSQL
SELECT REGEXP_EXTRACT(a.name, r'/(.*)'), stars, iso3
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn=1
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars DESC

Most popular GitHub projects per country, removing the top 75, special rank:

#standardSQL
SELECT REGEXP_EXTRACT(a.name, r'/(.*)'), stars, iso3
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND repo.id NOT IN (SELECT id FROM (
SELECT repo.id, COUNT(*) c
FROM `githubarchive.year.2016` a
WHERE a.type='WatchEvent'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 75
))
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn=1
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars DESC

Most popular GitHub projects per country, removing the top 75:

#standardSQL
SELECT country, a.name, stars, description, stars_country, other_countries_sc
FROM (
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn2
FROM (
SELECT *, ROUND(SUM(1/(1+ROUND(rn/20,0))) OVER(PARTITION BY repo_id),1) other_countries_sc
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn, SUM(stars) OVER(PARTITION BY country_code) stars_country, repo_id
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND repo.id NOT IN (SELECT id FROM (
SELECT repo.id, COUNT(*) c
FROM `githubarchive.year.2016` a
WHERE a.type='WatchEvent'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 75
))
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn<300
)
)
WHERE other_countries_sc<3
)
WHERE rn2<11
) a
LEFT JOIN (SELECT url, ANY_VALUE(description) description FROM `ghtorrent-bq.ght_2017_01_19.projects` GROUP BY 1) b
ON a.name=REPLACE(url, 'https://api.github.com/repos/', '')
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars_country DESC, country_code, stars DESC

Want more?

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store