Product
-
November 3, 2021

Deltas: Taking the pain out of moving data from A to B

Cloud migrations happened 24x faster during the pandemic, and now, more data than ever is being moved into data warehouses and lakes. All of this data movement highlights a fundamental challenge: It’s really hard to tell whether the data you moved from point A landed unbroken at point B.

Bigeye Staff

The 2020 pandemic accelerated a trend that was already in full swing: the push for digital transformation. “Digital transformation” is a pretty nebulous term; it can mean anything from accelerating the adoption of AI and ML to adopting a modern data stack. No matter what digital transformation means to you, one thing is likely: it’s a lot of data is moving around, fast.

Data is constantly being updated, optimized, and migrated. Cloud migrations happened 24x faster during the pandemic, and now, more data than ever is being moved into data warehouses and lakes with tools like Airbyte, Fivetran, and Matillion. You’re moving data from source systems to targets; you’re moving data from staging to production, and you’re probably starting to reverse it back to your applications. Data is moving and changing everywhere.

All of this data movement highlights a fundamental challenge: It’s really hard to tell whether the data you moved from point A landed unbroken at point B.

Can You Spot the Difference?

TLDR: Comparing and validating replicated data is hard, but you can’t afford to leave the reliability of your data up to chance. Either you are spending precious time manually comparing datasets or you’re hoping for the best — neither is acceptable when data matters to your business and you need to move fast.

Whether from human error, communication challenges between teams, unintended side effects, or finicky pipelines — all kinds of data problems happen. When these issues creep in, data quality suffers. Big, expensive projects like migrating from on-premise to the cloud can stall for weeks and months while you figure out the discrepancies.

Whether you’re using custom SQL queries, writing one-off Python scripts, or laboriously pulling data into spreadsheets, comparing datasets is time-consuming and error-prone. Manually comparing and validating data adds significant time and cost. You're left with the difficult choice between significantly delaying a project or taking the risk and hoping that the data is okay.

A Better Way to Compare Datasets

At Bigeye, our goal is to make working with data easy. When we began talking about automatically comparing datasets, both our existing customers and teams who were checking out Bigeye for the first time responded enthusiastically. The signs were clear: this is a real pain for many data teams.

With Deltas, we set out to solve the two biggest problems that come with comparing datasets:  speed and completeness.

  • Compare datasets in a fraction of the time: Deltas automatically maps the columns between your source and target data and intelligently applies data quality metrics from Bigeye’s library of more than 50 metrics. In a matter of seconds, Deltas generates reports that show exactly which metrics drifted across which dimensions, which helps you determine root cause and resolve those issues quickly.
  • Provide 10x the validation: Deltas applies Bigeye’s industry-leading observability to both datasets, regardless of the SQL dialect. We cover every column in your table — ensuring that nothing slips through the cracks.

Bigeye was designed to be an extensible framework, which allows us to apply data observability to all kinds of exciting use cases. Deltas is a fast, robust tool that will make working with your data significantly easier. We can’t wait to show you more!

If you’d like to learn more about Deltas, check out the product page. Or, if you’re ready to see how Bigeye can help your team address data quality and create more reliable data pipelines, we’d love to give you a demo.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.