Skip to main content

How to Test dbt CLI Changes

With the proper setup, you can use Spectacles to test the impact of dbt changes in Looker. Spectacles will tell you if your dbt changes will break anything in Looker before you merge them. For example, Spectacles might detect a schema change in a model that would break a key dashboard without a corresponding LookML change.

Spectacles has a native integration with dbt Cloud. So don't worry; if you're using the dbt CLI with a CI/CD workflow, you can still test those changes with Spectacles!

In this guide, we'll explain how to use our API to trigger a dbt-specific Spectacles run with from the CI tool of your choice.

Prerequisites

This guide assumes that you've already done the following:

If you haven't met these prerequisites, read the guides linked above before continuing.

How it works

If you're using a CI service like GitHub Actions, CircleCI, or Gitlab CI to run your dbt CI checks, you can use the Spectacles API to trigger SQL validation in Spectacles that will test the staging data created by your dbt change.

At a high level, the approach looks like this:

  1. [In your CI service] Run dbt to produce staging data unique to that change
  2. [In your CI service] Kick off a Spectacles run via our API with information about your staging data
  3. Using user attributes, Spectacles points the production version of your LookML to your staging data
  4. Spectacles runs tests to identify SQL errors

With this approach, we can tell you if there are any mismatches between the data generated by dbt in the PR and what Looker expects. Let's see how we can configure this approach.

Example for CircleCI

First, you'll need to add a step to your CI pipeline to trigger Spectacles via our API.

Add a step to your CI job after the dbt run or dbt build portion. This step should call the Spectacles API.

Here's an example for how this could look in CircleCI.

Add a Spectacles job to the pipeline

First, we run our dbt job, which includes dbt build. This command will build all the necessary tables and views required for our next step. Typically, these tables and views will get created in a schema that is unique to either the commit or the PR, e.g., dbt_ci_123abc.

Next, we've added a spectacles job that runs in a Python Docker container. It installs requests and runs a simple Python script to communicate with the Spectacles API.

/.circleci/config.yml
version: 2.1jobs:  dbt:    docker:       - image: cimg/python:3.8    steps:      - checkout      - run:          name: "Install dbt"          command: |            python -m venv venv            . venv/bin/activate            pip install dbt==0.19.1      - run:          name: "Run dbt"          command: |            . venv/bin/activate            dbt deps             dbt seed             dbt build  spectacles:    docker:       - image: cimg/python:3.8    steps:      - checkout      - run:          name: "Install requirements"          command: |            python -m venv venv            . venv/bin/activate            pip install requests      - run:          name: "Run Spectacles"          command:  |            . venv/bin/activate            python .circleci/spectacles.pyworkflows:  version: 2  test-all:    jobs:      - dbt      - spectacles:          requires:            - dbt

It's important that the spectacles job only runs if the dbt job is successful. So we've added a requirement to our pipeline's workflows section to ensure this.

jobs:
- dbt
- spectacles:
requires:
- dbt

Making requests with a Python script

Finally, let's look at the Python script, which runs Spectacles via the API. This script assumes the following:

  • The user attribute that controls schema/dataset in Looker is called dbt_schema, and you have set it up in Spectacles.
  • You have set up your Spectacles API key as an environment variable in CircleCI called SPECTACLES_API_KEY.

First, we define some IDs at the top of the file. You can find these IDs in the URLs of your project and Suite in Spectacles.

/.circleci/spectacles.py:1-9
import time
import requests
import os
import sys

# Define Spectacles IDs and API key
org_id = "WstGk40UEDhElCWTYi"
suite_id = "p0t0ZZE7iQuorvC4ME"
project_id = "JDmbNOfdWj0B50B6z6"

Next, we grab some environment variables (configured in CircleCI) that define our Spectacles API key and the commit hash of the change.

We set the API key in our headers as a bearer token.

/.circleci/spectacles.py:11-13
# Set the API key in header
api_key = os.getenv("SPECTACLES_API_KEY")
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}

We'll also define the schema name where the dbt staging data lives. We're assuming the schema/dataset your dbt CI job builds into has the format dbt_ci_<commit sha> where <commit sha> refers to the first six characters of the git commit being tested.

You can find your current naming convention in the profiles.yml file you use for your dbt runs. Update this code with your pattern if it's different.

/.circleci/spectacles.py:16
schema_name = f"dbt_ci_{os.getenv('CIRCLE_SHA1')[:6]}"

Lastly, we'll make a request to the Spectacles API and monitor its status until it finishes, checking every five seconds. If the run doesn't pass, it exits with an exit code of 100, informing your CI tool that it didn't pass successfully.

/.circleci/spectacles.py:18-49
# Create a run
create_run_url = "https://app.spectacles.dev/api/v1/runs"
payload = {
"org_id": org_id,
"suite_id": suite_id,
"project_id": project_id,
# This tells Spectacles to update that user attribute for the run
"user_attributes": {"dbt_schema": schema_name},
}
create_run_response = requests.post(url=create_run_url, headers=headers, json=payload)
create_run_response.raise_for_status()
run_id = create_run_response.json()["run_id"]

run_status = "queued"
run_url = (
f"https://app.spectacles.dev/api/v1/org/{org_id}/proj/{project_id}/run/{run_id}"
)

while run_status not in ["cancelled", "error", "passed", "failed"]:

# Naively wait for 5 seconds to check
time.sleep(5)

# Get the run's results
run_response = requests.get(url=run_url, headers=headers)

run_status = run_response.json()["status"]

print(run_response.json()["url"])

if run_status != "passed":
sys.exit(100)

Here's the entire file.

/.circleci/spectacles.py
import timeimport requestsimport osimport sys# Define Spectacles IDs and API keyorg_id = "WstGk40UEDhElCWTYi"suite_id = "p0t0ZZE7iQuorvC4ME"project_id = "JDmbNOfdWj0B50B6z6"# Set the API key in headerapi_key = os.getenv("SPECTACLES_API_KEY")headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the schema of the dbt staging dataschema_name = f"dbt_ci_{os.getenv('CIRCLE_SHA1')[:6]}"# Create a runcreate_run_url = "https://app.spectacles.dev/api/v1/runs"payload = {    "org_id": org_id,    "suite_id": suite_id,    "project_id": project_id,    # This tells Spectacles to update that user attribute for the run    "user_attributes": {"dbt_schema": schema_name}, }create_run_response = requests.post(url=create_run_url, headers=headers, json=payload)create_run_response.raise_for_status()run_id = create_run_response.json()["run_id"]run_status = "queued"run_url = (    f"https://app.spectacles.dev/api/v1/org/{org_id}/proj/{project_id}/run/{run_id}")while run_status not in ["cancelled", "error", "passed", "failed"]:    # Naively wait for 5 seconds to check    time.sleep(5)    # Get the run's results    run_response = requests.get(url=run_url, headers=headers)    run_status = run_response.json()["status"]print(run_response.json()["url"])if run_status != "passed":    sys.exit(100)