By default, Spectacles first tests entire explores, then tests each dimension in all failing explores.
For faster validation, you can run the SQL validator in fail fast mode. This explanation describes the default approach, then explains how fail fast differs from the default.
Default query mode
The default mode is best for identifying all of the errors on your branch, but requires some extra dimension-level queries to do so.
By default, Spectacles will run approximately one query per Explore, with all non-ignored dimensions selected. We'll refer to these queries as Explore queries.
Whenever an Explore query returns an error, Spectacles queries each dimension in the Explore individually and returns any errors associated with each dimension.
Why is this necessary? If Spectacles only runs the Explore query, it can only return the first error in the Explore, because the database query planner stops compiling after identifying any error. This means that if multiple dimensions in an explore contain SQL errors, you'll have to fix them one-by-one, re-running the SQL validator in between. Running dimension-by-dimension also means Spectacles can confidently associate each error with a dimension.
By default, Spectacles runs no more than 10 queries at a time to avoid overwhelming your Looker instance. You can change this value with the query concurrency flag in the CLI or in the SQL Validator section of the Suite page.
You can also restrict validation to a specific explore or model to reduce the query load on your instance.
Fail fast mode
If you have many explores or explores with many dimensions, you may want to skip these dimension-level queries to speed up runtimes by only running the Explore queries.
In fail-fast mode (
--fail-fast), Spectacles will run one query for each selected Explore.
Pros: Spectacles will run fewer queries and usually finishes more quickly, depending on the number of dimensions in your explores.
Cons: Because each explore is a single query, you will only see the first SQL error encountered in that query. This means Spectacles may find additional errors once you have fixed the first error.
Limitations with incremental mode
If you try to run SQL validation with
--incremental, you'll get an error. Spectacles doesn't allow incremental, fail fast runs.
For incremental SQL validation, we need to...
- Only run the Explore queries that have changed between our branch (the base ref) and production (the target ref)
- Only display the errors that are unique to our branch
The second part is the challenge: in order to deduplicate errors across the base ref and target ref, we need to know which dimensions they correspond to. Here's an example:
Imagine a situation where we have two dimensions with SQL errors in production (our target ref). We open a new branch,
hotfix (our base ref), off of production and fix exactly one error. If we run Spectacles in incremental mode, it will detect the differing SQL between the target and base ref and will run the Explore query on our base ref.
Because the Explore query still errors (remember, we only fixed one of the dimensions), Spectacles tests all of that Explore's dimensions for the base ref, only returning errors for dimensions that have differing SQL when compared to the target ref.
The only way to do this reliably is with dimension-level queries, which rules out fail fast mode. If we ran in fail fast mode, we wouldn't know if the first error on the base ref and the first error on the target ref corresponded to the same dimension. It's for this reason that incremental mode only works when fail fast is turned off.