Let’s take a real-time streaming source — all the RSVPs from Meetup.com — into Snowflake, with the help of Google Cloud Pub/Sub.

Image generated by AI (by author)

Use case

We want to move a stream of data with ~4k messages per hour to Snowflake, using Google Pub/Sub as temporary storage.

This is a real use case, consisting of all worldwide RSVPs on meetup.com being published in real time. This data has been collected from Meetup through their official API.

This is a quick demo to show how to export the Hacker News archive from BigQuery into Snowflake. It will be quick, easy, and a great way to show off semi-structured data support in Snowflake, and its SQL recursive capabilities.

Image generated by AI (by author)

From BigQuery to Snowflake in less than 1 minute

Hacker News in BigQuery

If you are a BigQuery user, you can find the Hacker News archive table on console.cloud.google.com/bigquery?p=bigquery-public-data&d=hacker_news&t=full&page=table.

This table contains an updated copy of Hacker News, as seen on my previous posts (when I used to work with the BigQuery team). …

How to export GA4 from BigQuery into Snowflake, and translate the cookbook sample queries.

Picture generated by AI (VQGAN+CLIP)

Google Analytics 4 exports event data from individual user level for free into BigQuery. You might then be wondering “How do I move this data into Snowflake, to get all the benefits of the Data Cloud?” and “Is there a quick way to translate the sample BigQuery queries into Snowflake…

Building a decay function in SQL is not trivial, but fun. The best answer so far uses window functions, but can we do better with a JS UDTF in Snowflake? Find the results here

A UDTF solving the decaying scores SQL puzzle (image by author)

Brittany Bennett and Claire Carroll snipped data-twitter with a fun SQL puzzle. Hours later Benn Eifert had a great solution with SQL window functions. Then TJ Murphy tested it in Snowflake and explained why window functions are better than joins. And now, it’s my turn to play. Can I do…

The most popular language on Reddit (other than English) will surprise you. To build this chart I analyzed almost a million Reddit comments with Snowflake and a Java UDTF (in less than 2 minutes).

The most popular languages on Reddit, after analyzing 1M comments: English, German(!), Spanish, Portuguese, French, Italian, Romanian(!), Dutch(!)…

Surprising results, compared to the # of native speakers:

Top languages on Reddit — full image + updated Tableau interactive (by author)

German is the second most popular language on Reddit — Which probably is a big reason for Reddit opening an office in Berlin

Predicting the future starts with understanding the past and this is no different when working with weather data. Just think how much weather affects agriculture, construction, transportation, and even consumer behavior. Let’s focus on Weather Source, one of the premium weather data providers in the Data Marketplace.

You can use Snowflake to find correlations between your past data and historical weather data, and also to find a source of weather predictions into the future. …

This example explores Snowflake’s geospatial capabilities, and how to enhance them using CARTO shared functions in the Snowflake Data Marketplace.

The Data Cloud not only allows you to share data, but also logic. Let’s check out the power of this by playing with GIS UDFs shared by CARTO.

Watch on YouTube

Random points around a country

Let’s start by plotting 1,000 random points around Italy:

I used to load reddit comments onto BigQuery, now it’s time to upgrade my pipelines to Snowflake — and to share some of the nice surprises I found. Let’s get started with 261GB of them.

Querying for the subreddits with the most comments

Loading and analyzing reddit comments in Snowflake is fun, and a great way to show its powers. These are some of the Snowflake features that delighted me, that I didn’t get with BigQuery:

  • Working with semi-structured data in Snowflake is fast, easy, and fun.
  • Snowflake understands JSON objects on load…

The Braze Engagement Benchmarks give Snowflake users access to industry-by-industry data on message engagement, app retention, user acquisition, and purchasing behavior, updated daily. All data in Benchmarks are anonymized and aggregated. The data are pulled from their customer base of over 1,000 global brands across 14 major industries and encompass the past year from the current date. Find here how to query them.

Photo by Anne Nygård on Unsplash

Let’s say you have a health and fitness app, and you’re wondering: What’s the best day to send notifications to your users? It’s Monday:

NOAA GSOD’s daily worldwide weather data is updated daily in Snowflake, and in this post we’ll make it even more useful. Check inside for pivots, geo-joins, finding the closest station to each city, and pattern matching with MATCH_RECOGNIZE().

Video on Youtube

The source

To access the daily NOAA GSOD weather data, just go to the Marketplace and create a database with Knoema’s Environment Data Atlas in your account.

Knoema’s Environment Data Atlas in Snowflake

With a couple of exploratory queries you’ll notice that:

  • Having this data automatically refreshed in your account is cool!
  • Making this table…

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store