Engineering
-
July 21, 2022

How to calculate the ROI for data observability

On the one hand, businesses are more data driven than ever before. On the other hand, data pipelines are increasingly complex and error prone. Is it time to invest in data observability?

Kyle Kirwan

What is data observability?

Analogous to observability in software engineering, data observability refers to the practice of instrumenting your data systems to give a comprehensive view of what is going on in each component of your data stack at any given time.

You can read more about data observability and why it’s important here.

Convincing your organization that you need a data observability solution

Building a data observability practice in your organization often requires upfront investments – in engineering hours, process changes, and the purchase of technical solutions. Often, before leadership is willing to commit, they’ll want to understand the return on investment (“ROI”). Data teams looking to invest in data observability will need to prove that better quality, fresher data maps directly to increased revenue and/or cost savings.

Calculating ROI

ROI is a generic performance metric that measures the efficiency of a particular investment, in particular the return compared to the cost. It’s especially helpful when used to compare multiple potential investments.

There are two components to calculating ROI:

  • calculating the return
  • calculating the initial investment/cost

Since you’re trying to justify an investment to improve data, make sure that your argument is data-driven. This means starting with the most quantifiable impact: how will implementing data observability either increase revenue or decrease costs?

Here are few examples of “pathways” that bad data might take to affect the company bottom line:

  • If a data outage impacts a company’s machine learning models, the loss of revenue can be significant. For example, a data outage that results in Uber’s surge pricing algorithm updating, might potentially cause millions in lost revenue even over the course of an hour.
  • Data quality issues might result in direct costs. For example, if the format of customer names and addresses are not validated, multiple mailers might be sent to the same actual customer, creating waste.
  • Data quality issues eat into developer productivity. Without even taking into account opportunity cost, the time that engineers spend chasing down data reliability issues that they shouldn’t have directly maps to salaries and equity compensation.

To ensure that you’re quantifying the potential return in a comprehensive, methodical way, rather than adding up random impacts, we recommend the following steps to calculate return.

Step 1: Identify all specific business issues within a company

Some examples here might include:

  • Users are registering for “new user” promo codes more than once.
  • Fraud detection is not catching fraudulent users.
  • Analytics dashboard showing sales is not up to date

Step 2:  Determine the cost of these specific business issues

The respective answers here might be:

  • Cost of users using “new user” promo codes when they should not be allowed to: $100,000/year
  • Cost of fraudulent users: $200,000/year
  • Cost of inadequate inventory in different locations due to lack of up-to-date analytics dashboard: $300,000/year

Step 3: Determine whether bad data is at the root of the issue.

The respective answers here might be:

  • Yes, because there’s no validation on new user names or emails so there are duplicate entries of a single user in the database
  • Yes, because there’s missing data
  • Yes, because there’s often a delay in the transformation of data

Step 4. Set data SLAs to improve the quality of the data.

The respective answers here might be:

  • The users database table must be deduplicated; all future writes must be standardized in format, and checked against existing entries.
  • Missing training data must be interpolated.
  • Max delay from orders data being produced in Shopify and orders data at rest in Snowflake should be 24 hours. This should allow for timely inventory projections.

Step 5. Determine the updated cost of the issue to the business.

The respective answers here might be:

  • This should reduce the cost of duplicate new user orders by 100%.
  • Savings of $100,000/year.
  • This should bring the false negative rate down to 2% from 4%.
  • Savings of $100,000/year.
  • This should bring the leftover inventory percentage down 50%.
  • Savings of $150,000/year

Less quantifiable metrics

While things like engineering time and software outages can be more or less mapped to dollars and cents, there are other potential “returns” for data observability that are less quantifiable but arguably even more significant. These include:

  • Ability to make good business decisions
  • Potential PR or legal risk
  • Lower employee retention

Our recommendation is that you do not attempt to include these “soft” metrics in the quantitative calculation, as you would have to make potentially ungrounded estimates. However, you can include a qualitative writeup of them along with your final ROI report. This provides decision makers with an additional data point if they’re on the fence, and allows them to value the soft impact as they choose.

Calculating Investment/Cost

In addition to determining the return, data teams will also need to calculate the cost. A simple strategy for determining the cost is to examine three categories:

People: the cost of data engineers to whom the issue will be assigned.

Process: the cost of hiring, training, and general change management.

Technology: data observability tool purchase, implementation, and maintenance as well as infrastructure like servers or databases.

When evaluating all of these categories, it is important to consider both short- and long-term costs.

Case Study

Let’s say that you are an e-commerce brand, and your business issues are as above. Let's look at certain specific issues to determine the overall ROI of a data observability tool:

Issue: Users are registering for “new user” promo codes more than once

  • Potential savings after observability tool implementation: $100,000
  • Implementation cost: $80,000
  • Total savings: $20,000
  • ROI: 25%

Issue: Fraud detection is not catching fraudulent users

  • Potential savings after observability tool implementation: $100,000
  • Implementation cost: $80,000
  • Total savings: $20,000
  • ROI: 25%

Issue: Analytics dashboard showing orders is not up to date

  • Potential savings after observability tool implementation: $150,000
  • Implementation cost: $80,000
  • Total savings: $70,000
  • ROI: 87.5%

Conclusion

Before companies invest in data observability, they will often want to calculate the ROI. They can do this by enumerating business issues, determining their data quality roots, and then setting SLAs that will ameliorate these issues. In arguments made to decision-makers, the quantitative ROI can be supplemented by additional “intangible” effects of data quality improvements, i.e. developer morale and better business decision making.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.