Tracking the vaccination progress with Snowflake and Starschema
Starschema shares COVID-19 tracking data in the Snowflake Data Marketplace, including vaccination numbers from JHU and Our World in Data. Find here the basic SQL queries to use this data in your projects.
World ranking
2021–03–11: Few countries have vaccinated more than 4% of their population. Gibraltar and Israel display outstanding numbers at almost 50%. They are followed by Seychelles (27%), Bermuda, Cayman Islands, Bahrain, USA, Serbia, Malta, Faeroe Islands, and Chile (6.64%).
What’s interesting about this query:
- Source is Our World in Data, and Starschema loads and shares this data in Snowflake.
- The
owid_vaccinations
table reports interesting statistics like # of daily vaccinations, % of population vaccinated, and the specific vaccines each country is using. - To get only the latest day of data we use the
qualify row_number()=1
instruction. This is especially important in a chart like this because each country reports their numbers in different schedules. - Some countries are missing in the chart: This because not all countries report
people_fully_vaccinated_per_hundred
. For example, the UAE is one of the leading countries in number of people vaccinated, but since we don’t know how many are fully vaccinated, they don’t show up on this particular chart.
select *
from public.owid_vaccinations
qualify row_number() over(partition by iso3166_1 order by date desc)=1
order by people_fully_vaccinated_per_hundred desc nulls last
USA states ranking
2021–03–12: Looking at the chart it looks like Alaska has given the second dose to 17% of its population, while DC less than 9%.
What’s interesting about this query:
- Source is JHU, and Starschema loads and shares this data in Snowflake.
- The
jhu_vaccine
table doesn’t report percentage of people vaccinated, but we can get state demographics out of thedemographics
table. - The query uses data from the
people_total_2nd_dose
column, but there will be a vaccine that doesn’t require a 2nd dose. For now thepeople_total_2nd_dose
is a good representation of fully vaccinated. You can also use data from the columnsdoses_admin_total
andpeople_total
, but there’s no column explicitly accounting for “fully vaccinated”. - To get only the latest day of data we use the
qualify row_number()=1
instruction.
select people_total_2nd_dose/population ratio_vaccinated, *
from starschema_covid19.public.jhu_vaccines a
join (
select state, sum(total_population) population
from starschema_covid19.public.demographics
group by 1
) b
on a.stabbr=b.state
qualify row_number() over(partition by province_state order by date desc)=1
order by date desc, ratio_vaccinated desc
Learn more
- https://starschema.com/covid-19-data-set
- https://starschema.com/solutions/covid-19-modelling-analytics
- https://starschema.com/kb/working-with-the-covid-19-data-set (registration required)
- https://www.snowflake.com/datasets/starschema-covid-19-epidemiological-data/
Want more?
I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter, and check reddit.com/r/snowflake for the most interesting Snowflake news.