June 23, 2023

MLOps and data observability: What should you know?

How do MLOps and data observability interact and support each other? Here are nine things you should know about this partnership.

Liz Elfman

Best practices from data engineering are increasingly incorporating into machine learning frameworks. After all, ML systems are similar to other consumers of data.

As ML-teams set up processes, tooling, and infrastructure, they need access to healthy, reliable data pipelines. In this post, we cover nine things you should know about how data observability and MLOps interact and support your organization's entire data ecosystem.

1. Understand where observability sits in the ML lifecycle

Data observability monitors the internal workings of each table and pipeline in a data stack. Unlike traditional data quality assessment, which is based on pass/fail conditions, data observability continuously collects signals from datasets.

Similarly, when ML modeling is involved, it isn't enough to judge data at its ingestion point. All the way from ingestion to testing to validation, data models can break, compromising output accuracy. For businesses, this can filter into bad decision-making and lost revenue. Ideally, you'll be able to set up data observability across your entire MLOps - to safeguard against the snowballing consequences of bad ML models built with bad data.

2. Figure out which observability activities apply to your MLOps

Your MLOps framework is unique, so not all of the typical data observability activities apply to your organization. Data observability helps manage ML models in several ways. Which ones apply to you?

1. Input data monitoring: ML models depend heavily on input data. If there are issues in the data, such as missing values, outliers, or incorrect entries, these can significantly affect the model's performance. Data observability helps ensure the quality of the data by tracking and alerting about such anomalies.

2. Data drift signs: Over time, the statistical properties of input data to ML models may change, a phenomenon known as "data drift." This can degrade the model's performance, even if the model itself has not changed. Data observability tools can monitor the data for signs of drift, allowing teams to address the issue before it significantly impacts the model's performance.

3. Model performance monitoring: Even if an ML model was accurate at deployment, its performance can degrade over time due to changes in the data or the environment in which it operates. Data observability allows for the continuous monitoring of model performance and provides early warnings when model accuracy begins to drop.

4. Operational efficiencies: In many cases, data pipelines can break or fail silently, causing delays in data processing and ingestion. Data observability's real-time visibility into pipelines means data stays available when and where it is needed.

5. Upstream issue identification: Data observability can identify problems as far upstream as possible in your model training frameworks. For example, suppose you find 10,000 unusable images in your data set after retraining your model, wasting significant resources. Data observability halts these situations through data monitoring and alerting.

3. Focus on a single SLI

What metrics should you monitor when you have a machine learning model in production? There are numerous statistics, like KL divergences for features, or recall and/or precision, that are frequently talked about in the ML research context, but don't make sense for monitoring through an observability context.

Our suggestion is to start with a single “metric to rule them all” that is understandable to all stakeholders and directly tied to business value; for example, revenue. This is your service level indicator (SLI).

Once you’ve picked an SLI, the second question is: when that SLI changes, why is it changing? Now you can add additional metrics measuring data volume, freshness, correctness, model drift, or feature drift, to help you figure out why.

To summarize, changes in your SLI should trigger alerts, and other metrics are then used to debug what happened. Don't get overwhelmed by monitoring hundreds of irrelevant metrics across your system.

4. Don't only monitor models

ML models are not static systems. Data constantly flows in for both training and inference, and flows out via predictions, recommendations, and determinations. Outputs are labeled (by users or by an internal team), and used to improve the model.

While you may want to only monitor model outputs like "accuracy" or "recall," it actually makes more sense to monitor the whole data pipeline. That way, you'll minimize impact to the model.

Some concrete examples of what this looks like in real life include:

  • Monitoring your production database and raw tables as they land in your data warehouse, prior to any transformations, for volume and freshness, to ensure that the "load" part of ELT has occurred correctly
  • Monitoring transformed tables in data warehouses to ensure that initial cleanup was performed correctly
  • Monitoring “feature” tables to ensure that feature calculations are correct
  • Logging all model outputs into the data warehouse and monitoring them to detect model errors and drift

5. Standardize metrics across models

If you have multiple models/pipelines in production, standardize the metrics you monitor for each one. Try not to vary your metrics across each unique pipeline. Standardized metrics enable comparison and benchmarking between models.

6. Tag and filter metrics for targeted insights

When you are monitoring a number of models, pipelines, and data sources, the quantity of data generated can be overwhelming. Tag metrics based on attributes such as model type, data source, pipeline stage, team, so you can organize data and filter it in a way that delivers targeted insights.

For example, if you notice a dip in the performance of a particular model, filter all metrics by that model's tag. You'll quickly identify whether the problem is specific to that model, or part of a larger issue. Such a system also allows you to customize alerts based on specific tags, ensuring that the right people are notified about relevant issues.

7. Share metrics dashboards across teams

Data observability is a team effort. While a single SLI may be aimed at high-level stakeholders, share more detailed metrics and dashboards across teams - data scientists, data engineers, or product managers. You'll give all stakeholders a view of what's happening in real-time, fostering quicker decision-making and resolution of issues.

8. Consider commercial tools

While open-source tools like Prometheus and Grafana are great for getting started with data observability, as your system grows in complexity, you may find that you need more sophisticated tools.

Commercial vendors like Bigeye, Arize, and Weights & Biases offer powerful, purpose-built solutions for monitoring machine learning models, data, and systems. These tools come with features like automated anomaly detection, alerting, root cause analysis, and integration with popular data stack technologies, which can save you significant time and effort. They also provide advanced visualizations and dashboards that make it easier to understand and communicate about your data and models.

Finally, they often provide support and resources to help you get the most out of your observability efforts. There is a cost associated with these tools, but the benefits in terms of saved time, improved performance, and reduced risk can make them a worthwhile investment.

9. Foster a culture of continuous improvement

A prerequisite to both MLOps and data observability is ultimately a culture of continuous improvement. Use the insights gained from data observability to identify areas of improvement, whether it's a data pipeline that's consistently slow, an ML model that's degrading in performance, or a specific type of data issue that keeps recurring, and incorporate these insights into your planning and prioritization process. Encourage teams to focus on resolving these issues and improving the system's reliability and performance.

Neither data observability nor MLOps are one-time efforts, but instead are ongoing practices. Regularly review your metrics and alerts, refine your SLIs, and adjust your monitoring as your data pipelines and ML models evolve.

share this episode
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
Data analyst
Business analyst
Data/product manager
Total cost
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.