Company
-
September 13, 2023

Bigeye and dbt Labs partner to speed data issue detection and resolution

With the new partnership, Bigeye and dbt Labs help data teams build healthy, reliable data pipelines and find and fix data issues before they impact their business.

Kendall Lovett

We’re thrilled to announce that, in partnership with dbt Labs, Bigeye customers can now integrate with dbt Cloud to expand their pipeline monitoring capabilities and increase the reliability of critical data products powered by dbt Cloud.

dbt is a widely-adopted, SQL-based transformation and modeling tool. The dbt Cloud platform comes with a host of simple yet powerful features for building and testing data pipelines. Now, with the combined power of Bigeye and dbt Cloud, customers can layer on data observability to track changes in data over time and find previously unidentified anomalies (i.e. unknown unknowns) across their environment.

By applying the dbt testing framework and Bigeye data observability together, customers can get a complete picture of the health of their dbt pipelines, find and fix issues faster, and deliver more reliable data to the business—all while significantly reducing the burden on the data platform and analytics engineering teams.

This new integration provides insight into which dbt Cloud jobs are powering data assets being monitored by Bigeye and helpful information about jobs for dramatically faster impact and root cause analysis.

At Philo, we use dbt Cloud to transform our data in order to understand TV viewing habits. dbt is our primary data transformation tool and the Bigeye dbt Cloud integration helps us conduct root cause analysis to rapidly evaluate if a dbt job is contributing to a data anomaly. It helps our investigation process when we review alerts and anomalies to determine if the cause of the anomaly is a real change, a dbt model or job change, or another data source issue that needs to be fixed by the data team.

Scott Ziolko, Head of Data Science and Analytics, Philo

Key Benefits

Easier root cause analysis

The new integration brings context about dbt Cloud jobs directly into Bigeye to help you simplify root cause analysis and speed issue resolution. Now, when an issue occurs, analytics and engineering teams can see the most recent dbt job that updated a data asset and its status. Depending on the status of the dbt job, users can choose to perform deeper analysis directly in dbt or explore upstream pipeline dependencies in Bigeye’s lineage graph.

Context on how dbt Cloud is connected to tables

Bigeye shows helpful metadata from dbt Cloud directly in the Bigeye catalog view. Now, when users view a table or schema, they can see if any dbt Cloud jobs are updating, how recently they ran, and their status. Users can also click on the job to be taken directly to dbt Cloud for additional exploration or updates.

Investigate dbt Cloud jobs directly within Bigeye issues

Each time Bigeye detects a data anomaly or problem, an issue is created for resolution tracking and triage. While some tools simply provide a list of dbt jobs and leave it to the user to connect the dots, Bigeye presents the most recent dbt Cloud jobs that created or changed a data asset, allowing for faster triage and resolution.

Types of data issues Bigeye detects

Symptoms of data pipeline and modeling issues can vary widely, from data not arriving when expected, to fluctuations in the volume of data delivered, unintentional schema changes, incorrect data values, and many more. Any one of these issues, if not detected and resolved quickly, can have major impacts on your business operations and your customers.

Bigeye uses ML-driven anomaly detection to help teams quickly identify and resolve issues like these and many more. With Bigeye’s monitoring, root cause analysis, and lineage capabilities, data teams have a single place to track issues across distributed data pipelines and quickly identify and fix them, regardless of whether they originate from dbt or some other part of the pipeline.

Some of the common issues customers detect and resolve with Bigeye include:  

  • Failed job runs: Occasionally, a change in dbt may cause a dbt Cloud job to break or not run at all. In this case, Bigeye will both detect that the data is not updated when expected and show that the dbt job has failed or been canceled. This allows the data team to notify the impacted teams and take quick action to mitigate.
  • User error: dbt is becoming increasingly adopted by a wide range of users. Even teams that leverage best practices and CI/CD processes can sometimes introduce errors that slip past dbt tests. Bigeye monitors anomalies across all your tables, columns, and schemas automatically, creating a built-in safety net for catching any user-introduced issues. Read more about how our very own Bigeye team accidentally broke a dbt transform and caught it with Bigeye here.
  • Upstream issues: Failures can occur at any point in the pipeline, including in upstream tools such as Stitch or Fivetran. Because Bigeye is monitoring across your environment, you have a one-stop shop to find and identify issues at any stage of the pipeline. Upstream or down.
  • Unintentional data changes: Another common scenario is when data is incorrectly changed in an upstream tool or system and passed through to dbt. This commonly includes a schema change, table name change or other unintentional change to a primary key. Bigeye allows you to use referential integrity checks to ensure data meets your business requirements and alerts you to any issues before they make it downstream.

Bigeye dbt Cloud integration features

Fast, easy connection within the Bigeye user interface

Connecting Bigeye and dbt Cloud is as simple as providing your dbt credentials in the Bigeye connection wizard. No CLI or messy scripts required. Once connected, you’ll see information about all of your connected ETL tools in the Bigeye catalog.

View dbt Cloud jobs and status in the Bigeye catalog

By connecting Bigeye and your dbt Cloud account, you will be able to see the most relevant dbt Cloud jobs used to update tables, schemas, and columns from the Bigeye catalog. You can see helpful information such as what dbt Cloud job last created or updated a schema or table, when the job last ran, and whether or not it was successful. If you’d like to learn more, you can simply click the link in Bigeye to go directly to the corresponding job in dbt Cloud.

Issues integration with dbt Cloud for root cause analysis.

Bigeye creates a new issue each time an anomaly or error is detected in your tables. With the dbt Cloud integration enabled, any dbt Cloud jobs associated with the table are automatically included in the issue.

If Bigeye has detected an anomaly in data values or the data seems suspect, you can now quickly review the dbt Cloud job to either identify it as a root cause or quickly rule it out.

Users can also see when a job has failed, allowing immediate insight into pipeline health and potential data quality issue root cause.

With dbt Cloud and Bigeye, data teams can deliver better data to their internal and external customers, with confidence that it’s reliable and ready for business. This is just the beginning of our partnership and we’re committed to continuing to grow our combined offerings and deliver ongoing value to all of our joint users.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.