ELT vs. ETL: What's the difference?
ELT, ETL - what's the difference, and where is it important to integrate data observability? We explore.
Many of Bigeye’s customers are cutting-edge tech companies that build with the modern data stack. The modern data stack uses data warehouses: not only for data storage, but also for transformation.
Warehouse storage makes it possible for analysts to write transformations in SQL, but it also means that data arrives in the warehouse without having undergone a cleanup. In this post, we'll explain the difference between ETL and ELT, and show why data observability is even more important when you are using ELT.
ETL vs. ELT: What's the difference?
In the world of data warehousing, ETL and ELT are two common approaches for managing data pipelines. ETL (extract, transform, load) pulls data from various sources, standardizes its format, then loads it into a target warehouse. On the other hand, ELT (extract, load, transform) flips the order of operations: raw data loads into the target warehouse first, and then is transformed as needed.
Bigeye monitoring with ELT and ETL setups
Data observability solutions like Bigeye can work with both ETL and ELT setups. Data reliability engineers will routinely turn to Bigeye in both situations, to monitor for data errors and anomalies.
With ETL setups, the transformation step happens first, so Bigeye will monitor tables (post-transformation) as they land in the data warehouse. Use Bigeye’s Metadata Metrics to track things like:
- Whether the data has landed
- Volume of data that has landed
- Freshness of data that has landed
Alternatively, Bigeye might monitor an upstream OLTP database like Postgres or MySQL, before the transformation step.
ELT setups, on the other hand, with transformations occurring in the data warehouse, use Bigeye to monitor every step of the process.
Bigeye’s Metadata Metrics, provide out-of-the-box monitoring for freshness and volume. Putting this check in place automates monitoring for raw tables to ensure that they have actually landed.
Alternatively, Bigeye Deltas creates a Delta between the application database and the snapshotted raw table in the data warehouse. Deltas tracks two tables and checks that both are the same. That way, nothing in the “load” step of the data pipeline can go wrong. If it does, Bigeye will send an alert.
Why data observability is vital for ELT setups (and less so for ETL)
With ELT setups, data transformation happens inside the data warehouse. There isn't time to clean up duplicate records or incorrect formatting in the data before it lands in the data warehouse.
In such a scenario, a data observability solution that monitors the raw bronze layer can go a long ways toward sensing pipeline breakages before they filter downstream, and reducing warehouse costs.
With modern data warehouses like Snowflake or Redshift, not only do you pay for storage, but also for compute - i.e. you are charged for every query and every transformation that you run. By using a data observability solution to track raw tables as they land in the data warehouse, you reduce the number of transformations necessary, and thus your ELT expenditures.
Ideally, over time, data observability on your raw bronze layer of data should help inform your transformation models. If you are perennially getting low-quality with nulls, duplicates, strings lengths spanning outside what they should be, or UUIDs that aren’t what they should be, you may want to modify your transformation SQL to handle them.
Schema change detection