Product
-
March 9, 2022

Introducing cross-table monitoring with Virtual Tables

Bigeye is unique among data observability tools because we don't just monitor broadly across your tables, but also monitor deeply into your most critical datasets.

Kate Wendell

Bigeye is unique among data observability tools in its ability to not only monitor widely across your tables but also monitor deeply into your most critical datasets. It’s nice to know that every table in your warehouse is getting refreshed on time, but it’s critical to know that you’re not missing any outages on the core datasets that drive your most important dashboards, in-product analytics, and machine learning models.

Now with the release of Virtual Tables, we’re taking it even further. Teams frequently ask how Bigeye can help them monitor complex logic that spans multiple tables and until now, that was a challenge. Not anymore!

The power of Virtual Tables

Bigeye has always made it easy to monitor any given table within your data source. Advanced options like customizable Metric Templates support complex cross-column logic.

Now it’s just as easy to monitor for conditions that involve more than one table. One common example is checking foreign keys. Let’s say some critical information about a user is in the dim_user table, and some additional important information is in the fact_orders table. With Virtual Tables, you can check for business rules that could cause outages when violated by joining these tables.

Here are some monitoring goals data teams have asked us about that are now possible with Virtual Tables:

  • Ensuring uniqueness of composite keys: Create a virtual table that concatenates two or more attributes used as a composite key. Enable a duplicate metric to monitor duplicates in the table.
  • Measuring dimension aggregations: Write a join statement between a fact and dimension table to group rows by a dimension attribute, then monitor anomalies in that grouping. For example, create a virtual table that aggregates sales by product each week and monitor significant changes in the ratio between product categories or average revenue by product.

Validating type 2 slowly changing dimension tables: Ensure accurate timestamps for slowly changing dimension tables by using a cartesian join of the table to itself, then add monitoring to validate active time periods.

How it works

With Virtual Tables you can encapsulate all your custom logic into something akin to a database view. Write a SQL statement, save it, and Bigeye will persist the results as if it were a normal database table in your Bigeye catalog.

This “view-like” object only exists in Bigeye. Because it’s never materialized to your database, you don’t need write permissions to create it, so anyone can set up monitoring for complex conditions without waiting on — or creating more work for — the data engineering team.

Once the Virtual Table is created, it appears in your Bigeye catalog alongside all your materialized data, and you’re ready to go. Autometrics, Autothresholds, Grouped Metrics, custom Metric Templates, and every other feature in Bigeye work out of the box on your Virtual Table exactly as if it was a materialized table.

Much more scalable, much less work

But why design Virtual Tables instead of just asking users to write custom SQL for each business rule? Scalability.

Virtual Tables allow you to define as much business logic as you need while keeping it all in a single object, separate from your monitoring configuration. Business logic goes in the Virtual Table, and the rest of your monitoring setup happens like usual, with Autometrics and Autothresholds eliminating the repetitive work.

With Virtual Tables, you don’t end up with an ever-growing pile of custom metrics, and you don’t have to contort your business logic into boolean conditions.  Virtual Tables lets you define numeric results that Bigeye’s Autothresholds anomaly detection system can monitor, giving you finer measurement and more context than simple pass-fail results.

And because Virtual Tables are designed to work as if they were materialized, they can even be combined with other Bigeye features like Deltas. Want to compare some business logic about user data in Postgres, and confirm that the same logic holds up once the data has landed in Snowflake? Write a Virtual Table to capture the business logic on each, then run a Delta on the pair of Virtual Tables. This saves you from having to write a long list of one-off checks to cover each condition.

How to get started with Virtual Tables

If you’d like to learn more, check out our documentation on Virtual Tables. Or if you’re ready to see how Bigeye can help your team address data quality and create more reliable data pipelines, we’d love to give you a demo.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.