Product
-
March 20, 2023

ELT vs. ETL: What's the difference?

ELT, ETL - what's the difference, and where is it important to integrate data observability? We explore.

Zach Behrman

Many of Bigeye’s customers are cutting-edge tech companies that build with the modern data stack. The modern data stack uses data warehouses: not only for data storage, but also for transformation.

Warehouse storage makes it possible for analysts to write transformations in SQL, but it also means that data arrives in the warehouse without having undergone a cleanup. In this post, we'll explain the difference between ETL and ELT, and show why data observability is even more important when you are using ELT.

ETL vs. ELT: What's the difference?

In the world of data warehousing, ETL and ELT are two common approaches for managing data pipelines. ETL (extract, transform, load) pulls data from various sources, standardizes its format, then loads it into a target warehouse. On the other hand, ELT (extract, load, transform) flips the order of operations: raw data loads into the target warehouse first, and then is transformed as needed.

Bigeye monitoring with ELT and ETL setups

Data observability solutions like Bigeye can work with both ETL and ELT setups. Data reliability engineers will routinely turn to Bigeye in both situations, to monitor for data errors and anomalies.

With ETL setups, the transformation step happens first, so Bigeye will monitor tables (post-transformation) as they land in the data warehouse. Use Bigeye’s Metadata Metrics to track things like:

  • Whether the data has landed
  • Volume of data that has landed
  • Freshness of data that has landed

Alternatively, Bigeye might monitor an upstream OLTP database like Postgres or MySQL, before the transformation step.

ELT setups, on the other hand, with transformations occurring in the data warehouse, use Bigeye to monitor every step of the process.

Bigeye’s Metadata Metrics, provide out-of-the-box monitoring for freshness and volume. Putting this check in place automates monitoring for raw tables to ensure that they have actually landed.

Alternatively, Bigeye Deltas creates a Delta between the application database and the snapshotted raw table in the data warehouse. Deltas tracks two tables and checks that both are the same. That way, nothing in the “load” step of the data pipeline can go wrong. If it does, Bigeye will send an alert.

Why data observability is vital for ELT setups (and less so for ETL)

With ELT setups, data transformation happens inside the data warehouse. There isn't time to clean up duplicate records or incorrect formatting in the data before it lands in the data warehouse.

In such a scenario, a data observability solution that monitors the raw bronze layer can go a long ways toward sensing pipeline breakages before they filter downstream, and reducing warehouse costs.

With modern data warehouses like Snowflake or Redshift, not only do you pay for storage, but also for compute - i.e. you are charged for every query and every transformation that you run. By using a data observability solution to track raw tables as they land in the data warehouse, you reduce the number of transformations necessary, and thus your ELT expenditures.

Ideally, over time, data observability on your raw bronze layer of data should help inform your transformation models. If you are perennially getting low-quality with nulls, duplicates, strings lengths spanning outside what they should be, or UUIDs that aren’t what they should be, you may want to modify your transformation SQL to handle them.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.