Engineering

July 21, 2022

How to calculate the ROI for data observability

min read

Kyle Kirwan

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join The AI Trust Summit on April 16

A one-day virtual summit on the controls enterprise leaders need to scale AI where it counts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

What is data observability?

Analogous to observability in software engineering, data observability refers to the practice of instrumenting your data systems to give a comprehensive view of what is going on in each component of your data stack at any given time.

You can read more about data observability and why it’s important here.

Convincing your organization that you need a data observability solution

Building a data observability practice in your organization often requires upfront investments – in engineering hours, process changes, and the purchase of technical solutions. Often, before leadership is willing to commit, they’ll want to understand the return on investment (“ROI”). Data teams looking to invest in data observability will need to prove that better quality, fresher data maps directly to increased revenue and/or cost savings.

Calculating ROI

ROI is a generic performance metric that measures the efficiency of a particular investment, in particular the return compared to the cost. It’s especially helpful when used to compare multiple potential investments.

There are two components to calculating ROI:

calculating the return
calculating the initial investment/cost

Since you’re trying to justify an investment to improve data, make sure that your argument is data-driven. This means starting with the most quantifiable impact: how will implementing data observability either increase revenue or decrease costs?

Here are few examples of “pathways” that bad data might take to affect the company bottom line:

If a data outage impacts a company’s machine learning models, the loss of revenue can be significant. For example, a data outage that results in Uber’s surge pricing algorithm updating, might potentially cause millions in lost revenue even over the course of an hour.
Data quality issues might result in direct costs. For example, if the format of customer names and addresses are not validated, multiple mailers might be sent to the same actual customer, creating waste.
Data quality issues eat into developer productivity. Without even taking into account opportunity cost, the time that engineers spend chasing down data reliability issues that they shouldn’t have directly maps to salaries and equity compensation.

To ensure that you’re quantifying the potential return in a comprehensive, methodical way, rather than adding up random impacts, we recommend the following steps to calculate return.

Step 1: Identify all specific business issues within a company

Some examples here might include:

Users are registering for “new user” promo codes more than once.
Fraud detection is not catching fraudulent users.
Analytics dashboard showing sales is not up to date

Step 2: Determine the cost of these specific business issues

The respective answers here might be:

Cost of users using “new user” promo codes when they should not be allowed to: $100,000/year
Cost of fraudulent users: $200,000/year
Cost of inadequate inventory in different locations due to lack of up-to-date analytics dashboard: $300,000/year

Step 3: Determine whether bad data is at the root of the issue.

The respective answers here might be:

Yes, because there’s no validation on new user names or emails so there are duplicate entries of a single user in the database
Yes, because there’s missing data
Yes, because there’s often a delay in the transformation of data

Step 4. Set data SLAs to improve the quality of the data.

The respective answers here might be:

The users database table must be deduplicated; all future writes must be standardized in format, and checked against existing entries.
Missing training data must be interpolated.
Max delay from orders data being produced in Shopify and orders data at rest in Snowflake should be 24 hours. This should allow for timely inventory projections.

Step 5. Determine the updated cost of the issue to the business.

The respective answers here might be:

This should reduce the cost of duplicate new user orders by 100%.
Savings of $100,000/year.
This should bring the false negative rate down to 2% from 4%.
Savings of $100,000/year.
This should bring the leftover inventory percentage down 50%.
Savings of $150,000/year

Less quantifiable metrics

While things like engineering time and software outages can be more or less mapped to dollars and cents, there are other potential “returns” for data observability that are less quantifiable but arguably even more significant. These include:

Ability to make good business decisions
Potential PR or legal risk
Lower employee retention

Our recommendation is that you do not attempt to include these “soft” metrics in the quantitative calculation, as you would have to make potentially ungrounded estimates. However, you can include a qualitative writeup of them along with your final ROI report. This provides decision makers with an additional data point if they’re on the fence, and allows them to value the soft impact as they choose.

Calculating Investment/Cost

In addition to determining the return, data teams will also need to calculate the cost. A simple strategy for determining the cost is to examine three categories:

People: the cost of data engineers to whom the issue will be assigned.

Process: the cost of hiring, training, and general change management.

Technology: data observability tool purchase, implementation, and maintenance as well as infrastructure like servers or databases.

When evaluating all of these categories, it is important to consider both short- and long-term costs.

Case Study

Let’s say that you are an e-commerce brand, and your business issues are as above. Let's look at certain specific issues to determine the overall ROI of a data observability tool:

Issue: Users are registering for “new user” promo codes more than once

Potential savings after observability tool implementation: $100,000
Implementation cost: $80,000
Total savings: $20,000
ROI: 25%

Issue: Fraud detection is not catching fraudulent users

Potential savings after observability tool implementation: $100,000
Implementation cost: $80,000
Total savings: $20,000
ROI: 25%

Issue: Analytics dashboard showing orders is not up to date

Potential savings after observability tool implementation: $150,000
Implementation cost: $80,000
Total savings: $70,000
ROI: 87.5%

Conclusion

Before companies invest in data observability, they will often want to calculate the ROI. They can do this by enumerating business issues, determining their data quality roots, and then setting SLAs that will ameliorate these issues. In arguments made to decision-makers, the quantitative ROI can be supplemented by additional “intangible” effects of data quality improvements, i.e. developer morale and better business decision making.

share with a colleague

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights

about the author

Kyle Kirwan

Chief Product Officer, Bigeye

Kyle Kirwan is Co-Founder and Chief Strategy Officer of Bigeye, where he leads strategic partnerships, prototype development, and other zero-to-one projects.

Kyle’s journey to founding Bigeye began at Uber, where he helped scale the company’s experimentation and data platforms during a period of hypergrowth. As a product leader and former founding data scientist on Uber’s experimentation platform, he worked on standardizing metrics across thousands of A/B tests that shaped rider, driver, and pricing experiences for millions of users.

It was at Uber that Kyle met Egor Gryaznov. Shortly after Egor joined, he launched Uber’s first SQL bootcamp. Kyle signed up partly out of curiosity, and partly to make sure the new guy actually knew his stuff. They quickly bonded over giving each other increasingly complex SQL challenges to solve.

As Uber’s data ecosystem grew to hundreds of petabytes and thousands of weekly users, Kyle saw a pattern emerge: testing the data pipelines was valuable but didn’t scale. His team experimented with using machine learning models on the daily data profiles of tables in the data lake to see if anomalies could be identified without manually writing data quality checks. This technique would later be termed data observability.

In 2019, Kyle and Egor co-founded Bigeye to use the lessons learned at Uber to transform data management in the enterprise. Today Bigeye serves some of the world’s largest organizations and ensures their data is trustworthy, and that their enterprise AI initiatives are grounded in that trusted data.

about the author

Kyle Kirwan is Co-Founder and Chief Strategy Officer of Bigeye, where he leads strategic partnerships, prototype development, and other zero-to-one projects.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Want the practical playbook?

Join us on April 16 for The AI Trust Summit, a one-day virtual summit focused on the production blockers that keep enterprise AI from scaling: reliability, permissions, auditability, data readiness, and governance.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

How to calculate the ROI for data observability

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

What is data observability?

Convincing your organization that you need a data observability solution

Calculating ROI

Less quantifiable metrics

Calculating Investment/Cost

Case Study

Conclusion

Kyle Kirwan

Get the Best of Data Leadership

Want the practical playbook?

Get Data Insights Delivered

How To Evaluate Data Observability Platforms (With Downloadable)

Why data lineage is mission-critical for businesses today

Making sense of machine learning and artificial intelligence models by monitoring the training data

Join the Bigeye Newsletter

How to calculate the ROI for data observability

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

What is data observability?

Convincing your organization that you need a data observability solution

Calculating ROI

Less quantifiable metrics

Calculating Investment/Cost

Case Study

Conclusion

Kyle Kirwan

Get the Best of Data Leadership

Want the practical playbook?

Get Data Insights Delivered

Related posts

How To Evaluate Data Observability Platforms (With Downloadable)

Why data lineage is mission-critical for businesses today

Making sense of machine learning and artificial intelligence models by monitoring the training data

Join the Bigeye Newsletter