Watch this to see how we set-up and run dbt and BigQuery on the cloud shell

Get started with BigQuery and dbt, the easy way

Find here the quickest way to get started with dbt and BigQuery using only free offerings from Google Cloud.

I’m a big fan of dbt — an open source project that helps me build data pipelines around BigQuery using only SQL.

Get started with BigQuery and dbt

There’s a lot already writen about Bigquery and dbt. For example, there’s this official tutorial to set up dbt with BigQuery, with a lot more details than I do here (thanks Claire Carroll). The goal of this post is to share with you some GCP secrets to make the installation as easy as possible.

Step 1: Create a free Google Cloud account

Good news: You don’t need a credit card to have your own Google Cloud account. You’ll get a free terabyte of queries in BigQuery every month, and also a free shell environment you can use through your browser.

While creating your account, you’ll also create your first Google Cloud project. We’ll use the id of it later.

Step 2: Welcome to the free Cloud Shell

On console.cloud.google.com click on the “cloud shell” icon on top, and a shell environment will be ready for your use in a minute or so:

Find the Cloud Shell button
A cloud shell will open

Step 3: pip3 install dbt

Once in the cloud shell, installing dbt is really easy. To avoid problems skip installing the full dbt, but just install the dbt-bigquery parts with:

$ pip3 install --user --upgrade dbt-bigquery

Notes:

  • pip3 instead of pip, to make sure we are on the Python 3 world.
  • --user to avoid installing at the root level.
  • --upgrade just in case an older version of dbt was installed.

Step 3.1: debug

If you get an error like /usr/local/bin/python3.7: bad interpreter: No such file or directory, uninstall dbt and reinstall.

$ pip3 uninstall dbt-core
$ pip3 install --user --upgrade dbt-bigquery

Step 4: start your first dbt project

$ ~/.local/bin/dbt init first_project

Notes:

  • ~/.local/bin/ is the path to the just installed dbt. Consider adding this directory to the PATH env, to avoid the need of prepending it.

Step 4.1: try to run dbt

$ cd first_project/
$ ~/.local/bin/dbt run

That could have run! But it didn’t. We got an error message like Credentials in profile “default”, target “dev” invalid: Runtime Error instead. That means we still need to configure a way for dbt to connect and authenticate to BigQuery. The good news: This is really easy, since we are already in a Google Cloud environment.

Step 5: configure dbt to BigQuery

Use vim or your favorite shell editor:

$ vim ~/.dbt/profiles.yml  

In this file, replace the default configuration (which goes to redshift… why?), and instead point it to Bigquery:

default:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: the-id-of-your-project
dataset: temp

Notes:

  • oauth is all you need to say — the authentication libraries will recognize that it’s you connecting through dbt to bigquery within the cloud shell — no further configuration is needed.
  • the-id-of-your-project should be the id of your project
  • temp is the name of a dataset you must create in BigQuery to hold your tables. Note that if you are using BigQuery without a credit card, then the tables in your datasets can only live for a certain number of days.

Step 5: run dbt

~/.local/bin/dbt run

If everything has gone well, now we have run dbt! That means it created a sample view and a sample table in our provided dataset:

Check Claire’s tutorial for more details on what happens next.

Step 5.1: the magic cloud shell editor

While editing your dbt files and configuration, check out the ready-to-use cloud shell editor. Launch it by clicking the “open editor” button in the cloud shell:

Open the cloud shell editor

Or just click on this link: ssh.cloud.google.com/cloudshell/editor

Step 6: the magic of docs

dbt is really good about generating documentation for our pipelines, and you can generate and serve it with the following commands:

$ ~/.local/bin/dbt docs generate
$ ~/.local/bin/dbt docs serve

The really magic part is this: If you are running dbt in the cloud shell, and dbt gives you a localhost (127.0.0.1) URL to contact, you might think it won’t be easy to connect. But it is. Just click on the cloud shell link, and magic will happen:

Turns out the cloud shell can create http proxies to give you direct access to any web server running on it — and this immediately gives you access to the docs being served by dbt.

Isn’t that neat?

Next steps

  • Write all your SQL pipelines within the dbt environment, and make sure to keep them under version control.
  • Find a better way for BigQuery ML models creation.

How to run dbt every day on a schedule, in a serverless way? Check Mete Atamel solution with Cloud Run and Cloud Scheduler.

More resources

Hamza Khan medium.com/weareservian/bigquery-dbt-modern-problems-require-modern-solutions-b40faedc8aaf

Mark Rittman rittmananalytics.com/blog/2020/2/8/multichannel-attribution-bigquery-dbt-looker-segment

Lak Lakshmanan medium.com/google-cloud/loading-and-transforming-data-into-bigquery-using-dbt-65307ad401cd

Want more?

I’m Felipe Hoffa, a Developer Advocate for Google Cloud. Follow me on @felipehoffa, find my previous posts on medium.com/@hoffa, and all about BigQuery on reddit.com/r/bigquery.

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.