Tracking the vaccination progress with Snowflake and Starschema

Starschema shares COVID-19 tracking data in the Snowflake Data Marketplace, including vaccination numbers from JHU and Our World in Data. Find here the basic SQL queries to use this data in your projects.

Felipe Hoffa
3 min readMar 12, 2021

World ranking

2021–03–11: Few countries have vaccinated more than 4% of their population. Gibraltar and Israel display outstanding numbers at almost 50%. They are followed by Seychelles (27%), Bermuda, Cayman Islands, Bahrain, USA, Serbia, Malta, Faeroe Islands, and Chile (6.64%).

% vaccinated per country, source OurWorldInData through Starschema in Snowflake

What’s interesting about this query:

  • Source is Our World in Data, and Starschema loads and shares this data in Snowflake.
  • The owid_vaccinations table reports interesting statistics like # of daily vaccinations, % of population vaccinated, and the specific vaccines each country is using.
  • To get only the latest day of data we use the qualify row_number()=1 instruction. This is especially important in a chart like this because each country reports their numbers in different schedules.
  • Some countries are missing in the chart: This because not all countries report people_fully_vaccinated_per_hundred. For example, the UAE is one of the leading countries in number of people vaccinated, but since we don’t know how many are fully vaccinated, they don’t show up on this particular chart.
select *
from public.owid_vaccinations
qualify row_number() over(partition by iso3166_1 order by date desc)=1
order by people_fully_vaccinated_per_hundred desc nulls last

USA states ranking

2021–03–12: Looking at the chart it looks like Alaska has given the second dose to 17% of its population, while DC less than 9%.

% vaccinated per state, source JHU through Starschema in Snowflake

What’s interesting about this query:

  • Source is JHU, and Starschema loads and shares this data in Snowflake.
  • The jhu_vaccine table doesn’t report percentage of people vaccinated, but we can get state demographics out of the demographics table.
  • The query uses data from the people_total_2nd_dose column, but there will be a vaccine that doesn’t require a 2nd dose. For now the people_total_2nd_dose is a good representation of fully vaccinated. You can also use data from the columns doses_admin_total and people_total, but there’s no column explicitly accounting for “fully vaccinated”.
  • To get only the latest day of data we use the qualify row_number()=1 instruction.
select people_total_2nd_dose/population ratio_vaccinated, *
from starschema_covid19.public.jhu_vaccines a
join (
select state, sum(total_population) population
from starschema_covid19.public.demographics
group by 1
) b
on a.stabbr=b.state
qualify row_number() over(partition by province_state order by date desc)=1
order by date desc, ratio_vaccinated desc

Learn more

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter, and check reddit.com/r/snowflake for the most interesting Snowflake news.

--

--

Felipe Hoffa
Felipe Hoffa

Written by Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Responses (1)