Get started with BigQuery and dbt, the easy way
Find here the quickest way to get started with dbt and BigQuery using only free offerings from Google Cloud.
I’m a big fan of dbt — an open source project that helps me build data pipelines around BigQuery using only SQL.
Get started with BigQuery and dbt
There’s a lot already writen about Bigquery and dbt. For example, there’s this official tutorial to set up dbt with BigQuery, with a lot more details than I do here (thanks Claire Carroll). The goal of this post is to share with you some GCP secrets to make the installation as easy as possible.
Step 1: Create a free Google Cloud account
Good news: You don’t need a credit card to have your own Google Cloud account. You’ll get a free terabyte of queries in BigQuery every month, and also a free shell environment you can use through your browser.
While creating your account, you’ll also create your first Google Cloud project. We’ll use the id of it later.
BigQuery without a credit card: Discover, learn and share
If you ever had trouble signing up for BigQuery, worry no more — now it’s easier than ever to sign up and start…
Step 2: Welcome to the free Cloud Shell
On console.cloud.google.com click on the “cloud shell” icon on top, and a shell environment will be ready for your use in a minute or so:
Step 3: pip3 install dbt
Once in the cloud shell, installing dbt is really easy. To avoid problems skip installing the full dbt, but just install the dbt-bigquery parts with:
$ pip3 install --user --upgrade dbt-bigquery
pip, to make sure we are on the Python 3 world.
--userto avoid installing at the root level.
--upgradejust in case an older version of dbt was installed.
Step 3.1: debug
If you get an error like
/usr/local/bin/python3.7: bad interpreter: No such file or directory, uninstall dbt and reinstall.
$ pip3 uninstall dbt-core
$ pip3 install --user --upgrade dbt-bigquery
Step 4: start your first dbt project
$ ~/.local/bin/dbt init first_project
~/.local/bin/is the path to the just installed dbt. Consider adding this directory to the PATH env, to avoid the need of prepending it.
Step 4.1: try to run dbt
$ cd first_project/
$ ~/.local/bin/dbt run
That could have run! But it didn’t. We got an error message like
Credentials in profile “default”, target “dev” invalid: Runtime Error instead. That means we still need to configure a way for dbt to connect and authenticate to BigQuery. The good news: This is really easy, since we are already in a Google Cloud environment.
Step 5: configure dbt to BigQuery
Use vim or your favorite shell editor:
$ vim ~/.dbt/profiles.yml
In this file, replace the default configuration (which goes to redshift… why?), and instead point it to Bigquery:
oauthis all you need to say — the authentication libraries will recognize that it’s you connecting through dbt to bigquery within the cloud shell — no further configuration is needed.
the-id-of-your-projectshould be the id of your project
tempis the name of a dataset you must create in BigQuery to hold your tables. Note that if you are using BigQuery without a credit card, then the tables in your datasets can only live for a certain number of days.
Step 5: run dbt
If everything has gone well, now we have run dbt! That means it created a sample view and a sample table in our provided dataset:
Check Claire’s tutorial for more details on what happens next.
Step 5.1: the magic cloud shell editor
While editing your dbt files and configuration, check out the ready-to-use cloud shell editor. Launch it by clicking the “open editor” button in the cloud shell:
Or just click on this link: ssh.cloud.google.com/cloudshell/editor
Step 6: the magic of docs
dbt is really good about generating documentation for our pipelines, and you can generate and serve it with the following commands:
$ ~/.local/bin/dbt docs generate
$ ~/.local/bin/dbt docs serve
The really magic part is this: If you are running dbt in the cloud shell, and dbt gives you a localhost (127.0.0.1) URL to contact, you might think it won’t be easy to connect. But it is. Just click on the cloud shell link, and magic will happen:
Turns out the cloud shell can create http proxies to give you direct access to any web server running on it — and this immediately gives you access to the docs being served by dbt.
Isn’t that neat?
- Write all your SQL pipelines within the dbt environment, and make sure to keep them under version control.
- Find a better way for BigQuery ML models creation.
Scheduled serverless dbt + BigQuery service
My colleague Felipe Hoffa recently published a blog post titled Get started with BigQuery and dbt, the easy way. More…