November 17, 2022

The five most common questions we get about anomaly detection

Users often have questions about anomaly detection. We sat down with our Sales team to discuss the most frequently-occurring anomaly detection questions they get. In this blog post, we answer five of the most common.

Kyle Kirwan

At Bigeye, anomaly detection is a critical part of our product. But what exactly does it mean, and what does that entail?

Anomaly detection helps us monitor intelligently at scale. Rather than setting manual thresholds for thousands of metrics and hundreds of tables, we automatically identify anomalies in the data pipeline. Customers aren't inundated with false positive alerts. Anomaly detection algorithms are sophisticated enough to adapt to business trend changes and respond to feedback.

Perhaps since its intricacies can be abstract, users often have questions about anomaly detection. We sat down with our Sales team to discuss the most frequently-occurring anomaly detection questions they get. In this blog post, we answer five of the most common.

1. What is anomaly detection?

Anomaly detection is the process of detecting data points, events, and/or information that falls outside of a dataset’s normal behavior. Anomaly detection helps companies flag areas that might have issues in their data pipelines.

A data observability system can use automation to learn historical patterns from each data quality attribute. When abnormal behavior occurs, the system fires off alerts indicating that there's an issue. If sophisticated enough, a data observability system will be able to ignore behavior that is slightly off, but not indicative of a real problem. With hundreds or thousands of data quality attributes being tracked simultaneously, that's no easy task.

Naive techniques like gaussian models—that simply look at a number of standard deviations above or below the historical mean—fall apart in many commonly-occurring time series patterns. A good anomaly detection model is more dynamic. It adapts to various patterns that regularly occur in metadata attributes over time.

For a deeper dive into anomaly detection, read here.

2. What are the alternatives to anomaly detection?

The main alternatives to anomaly detection for data observability are simple, manual thresholds. For example, a metric tracks the percentage of successful Airflow jobs. If we set a manual threshold of 90%, we'll get an alert if more than 10% of Airflow jobs failed over a certain period of time.

Simple, manual thresholds work for many scenarios. In particular, they function in systems observability, where metrics usually remain constant over time. However, they also have a number of drawbacks:

  • They require manual setting and tuning, which is fine when you have one data metric but not when you have 1,000.
  • They are inaccurate for metrics that are not constant.  

3. How do I know I won't get too many alerts?

Since every business and data environment is unique, invariably anomaly detection will produce some false positives. However, you can adjust the data observability system to suit your alert sensitivities.

For example, you might only want to get alerts for extreme changes to data batches, rather than small fluctuations. Or, a data team might want to understand specific fluctuation in features within the machine learning feature store, so that downstream automation is run with the most consistent inputs. You can adjust your data observability settings to suit your appetite for alerts.

We recommend that instead of monitoring all your tables at the same level, you follow a T-shaped monitoring strategy. This means that you track basic metadata metrics across all tables, but only go deep on your core tables.

4. How much control over the anomaly detection engine do I have?

With Bigeye, you can control the width parameter on the Autothresholds. You can also give direct feedback to the anomaly detection system. This data is used to improve the model.

For example, let's say a data issue notification is fired, but the user deems that the data batch in question should pass. The user can inform Bigeye that underlying data state is tolerable, or that a false positive alert is present. Bigeye will take this information into account so that similar behavior in the future will not trigger an alert. Additionally, in the case of good alerts, you can tell Bigeye to remove the data associated with confirmed anomalies and to make sure that it’s not propagated into future predictions.

5. How long does it take for the anomaly detection to start working?

We do not recommend fewer than five points (five days with daily data) for anomaly detection, and we accumulate the data over time. Bigeye has a backfill capability for data tables with past data.

share this episode
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
Data analyst
Business analyst
Data/product manager
Total cost
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights