Get started with BigQuery and dbt, the easy way
Find here the quickest way to get started with dbt and BigQuery using only free offerings from Google Cloud.
Important update: I left Google and joined Snowflake in 2020 — so I’m unable to keep my older posts updated. If you want to try Snowflake, join us — I’m having a lot of fun ❄️.
I’m a big fan of dbt — an open source project that helps me build data pipelines around BigQuery using only SQL.
Get started with BigQuery and dbt
There’s a lot already writen about Bigquery and dbt. For example, there’s this official tutorial to set up dbt with BigQuery, with a lot more details than I do here (thanks Claire Carroll). The goal of this post is to share with you some GCP secrets to make the installation as easy as possible.
Step 1: Create a free Google Cloud account
Good news: You don’t need a credit card to have your own Google Cloud account. You’ll get a free terabyte of queries in BigQuery every month, and also a free shell environment you can use through your browser.
While creating your account, you’ll also create your first Google Cloud project. We’ll use the id of it later.
Step 2: Welcome to the free Cloud Shell
On console.cloud.google.com click on the “cloud shell” icon on top, and a shell environment will be ready for your use in a minute or so:
Step 3: pip3 install dbt
Once in the cloud shell, installing dbt is really easy. To avoid problems skip installing the full dbt, but just install the dbt-bigquery parts with:
$ pip3 install --user --upgrade dbt-bigquery
Notes:
pip3
instead ofpip
, to make sure we are on the Python 3 world.--user
to avoid installing at the root level.--upgrade
just in case an older version of dbt was installed.
Step 3.1: debug
If you get an error like /usr/local/bin/python3.7: bad interpreter: No such file or directory
, uninstall dbt and reinstall.
$ pip3 uninstall dbt-core
$ pip3 install --user --upgrade dbt-bigquery
Step 4: start your first dbt project
$ ~/.local/bin/dbt init first_project
Notes:
~/.local/bin/
is the path to the just installed dbt. Consider adding this directory to the PATH env, to avoid the need of prepending it.
Step 4.1: try to run dbt
$ cd first_project/
$ ~/.local/bin/dbt run
That could have run! But it didn’t. We got an error message like Credentials in profile “default”, target “dev” invalid: Runtime Error
instead. That means we still need to configure a way for dbt to connect and authenticate to BigQuery. The good news: This is really easy, since we are already in a Google Cloud environment.
Step 5: configure dbt to BigQuery
Use vim or your favorite shell editor:
$ vim ~/.dbt/profiles.yml
In this file, replace the default configuration (which goes to redshift… why?), and instead point it to Bigquery:
default:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: the-id-of-your-project
dataset: temp
Notes:
oauth
is all you need to say — the authentication libraries will recognize that it’s you connecting through dbt to bigquery within the cloud shell — no further configuration is needed.the-id-of-your-project
should be the id of your projecttemp
is the name of a dataset you must create in BigQuery to hold your tables. Note that if you are using BigQuery without a credit card, then the tables in your datasets can only live for a certain number of days.
Step 5: run dbt
~/.local/bin/dbt run
If everything has gone well, now we have run dbt! That means it created a sample view and a sample table in our provided dataset:
Check Claire’s tutorial for more details on what happens next.
Step 5.1: the magic cloud shell editor
While editing your dbt files and configuration, check out the ready-to-use cloud shell editor. Launch it by clicking the “open editor” button in the cloud shell:
Or just click on this link: ssh.cloud.google.com/cloudshell/editor
Step 6: the magic of docs
dbt is really good about generating documentation for our pipelines, and you can generate and serve it with the following commands:
$ ~/.local/bin/dbt docs generate
$ ~/.local/bin/dbt docs serve
The really magic part is this: If you are running dbt in the cloud shell, and dbt gives you a localhost (127.0.0.1) URL to contact, you might think it won’t be easy to connect. But it is. Just click on the cloud shell link, and magic will happen:
Turns out the cloud shell can create http proxies to give you direct access to any web server running on it — and this immediately gives you access to the docs being served by dbt.
Isn’t that neat?
Next steps
- Write all your SQL pipelines within the dbt environment, and make sure to keep them under version control.
- Find a better way for BigQuery ML models creation.
How to run dbt every day on a schedule, in a serverless way? Check Mete Atamel solution with Cloud Run and Cloud Scheduler.
More resources
Hamza Khan medium.com/weareservian/bigquery-dbt-modern-problems-require-modern-solutions-b40faedc8aaf
Mark Rittman rittmananalytics.com/blog/2020/2/8/multichannel-attribution-bigquery-dbt-looker-segment
Lak Lakshmanan medium.com/google-cloud/loading-and-transforming-data-into-bigquery-using-dbt-65307ad401cd
Want more?
I’m Felipe Hoffa, a Developer Advocate for Google Cloud. Follow me on @felipehoffa, find my previous posts on medium.com/@hoffa, and all about BigQuery on reddit.com/r/bigquery.