Thought leadership
-
January 30, 2023

Data observability catalyzes your digital transformation goals

At the heart of every digital transformation effort, organizations are betting on new digital technologies to create an organizational revolution. Without data observability, this revolution is more of a pipe dream than a reality for most organizations.

Kyle Kirwan

At the heart of every digital transformation effort is one belief. Organizations bet on new technologies to catalyze efficient, customer-centric, cost-effective operations.

In the past decade, digital transformation efforts have moved on-premise infrastructure to the cloud. They've turned batch-processing small amounts of data into real-time processing for seemingly-infinite data streams. Data is transformed, manipulated, and stored in various places depending on the analytical, product, and budget needs of the company. As a result, CIOs and CTOs manage dozens of data ecosystems. To effectively operate in this environment, they need a clear understanding of data assets, pipelines, and business needs.

For most organizations, this comprehensive view remains more of a pipe dream than a reality. Digital transformation efforts generate a deluge of data, but most companies are not in a position to take full advantage of it. They don’t always know where the right data is, and they frequently wrestle with wrong or bad data.

Where can data observability ease the pain?

Data observability means users, particularly data engineering teams, can monitor and understand the state of their data and its supply chain. It means identifying potential issues or delays in the flow of data, without needing to make significant changes to the system. More concretely, data observability platforms will generally provide some subset of:

  • Monitoring - tracking volume, freshness, quality
  • Anomaly detection
  • Service Level Agreements (SLAs)
  • Data lineage
  • Data governance

With these tools, organizations can answer questions such as:

  • Is customer data arriving on time?
  • Are there any duplicated transactions?
  • Is the decrease in average purchase size real or a data issue?
  • Will deleting a table from the data warehouse have any impact?

In general, having data observability in place helps prevent data quality issues or at least mitigates their impact on the business.

Data observability and digital transformation

Data observability is the perfect complement to digital transformation. Here's how it addresses some common digital transformation goals:

  1. Service improvement - Service improvement is always a cornerstone of digital transformation, whether it be customer-facing or internal. Data observability helps pinpoint blockers and issues in the data pipeline, so that end users and stakeholders get faster, more accurate service, without interruption. Ultimately, service improvement has a positive impact on revenue and profit.
  2. Internal collaboration - What have other teams already done? Who is responsible? What has changed since you last checked in? Digital transformation usually aims to create better, more focused internal collaboration. Data observability can help with providing insight between teams that would otherwise have little contact.
  3. Process optimization - Redundant efforts cost money, and botched data can lead to massive inefficiencies. Through monitoring, alerting, and anomaly detection, data observability can provide teams with insight into where they need to refine and improve how data moves through the organization. Additionally, better data gets fed into automation and ML models, and everyone on the receiving end gets better results.
  4. Agility - Technical teams and business teams alike aim to increase their agility. The most agile organizations can adapt to new market trends at lightning speed. Data, if managed properly, can enable a more connected and flexible workplace. With more data-driven decision-making, learning and product development cycles shorten, and more analysis can help improve the direction of the business.
  5. New business models - Your data is your business, no matter your sector. With revolutionized data, your old business models can be overtaken by new ones. Most "disruption" that happens today is based on digital improvement and innovation. When you unlock the power of your data, you unlock your organization's ability to pivot into new features and other previously-unexplored areas.

When should you implement data observability?

Most teams that are dealing with data quality issues will start by implementing data tests/SQL checks of some kind on the inputs/outputs of their data pipelines. However, these data checks, while a good first step, are ultimately restrictive, difficult to manage, and not scalable.

In general, data observability solutions should be implemented sooner rather than later. A good time is when development is complete, the organization is ready to move into production, and at least one or two business lines are using data analytics and machine learning. By incorporating data observability at this stage, companies can set themselves up for high-quality data from the get-go.

More realistically, companies often turn to data observability after a severe data outage damages the business bottom line or company reputation, or when the company is preparing for an IPO. While your data observability setup may be more complicated, this is also when in investing in it will pay the most high-profile dividends.

To determine whether you are ready for data observability, go through our 20-question checklist here.

Companies that have already implemented data observability

Data observability is a relatively new independent concept, but versions of it have been implemented in internal efforts at companies like Uber, Netflix, Airbnb, and Lyft, since the late 2010s.

Most of these data teams developed some sort of pipeline testing system first, before moving on to developing true data observability tools. Eventually, smaller companies with lighter technical teams also decided to invest in observability capabilities, but didn't have the manpower to build them in-house. That's where data observability SaaS solutions entered the market.

Now, with access to off-the-shelf solutions, major enterprise corporations such as Phone Pay, Walmart, Oracle, and Verisk are investing heavily in data observability.

Getting executive buy-in for data observability

Now that you understand how data observability can help your digital transformation efforts, how do you go about getting executive buy-in for it at your company? The steps below provide some guidance:

Ask questions about the state of the company’s data systems. Examples include:

  • "Are you sure that our data is reliable?"
  • "How many outages have we suffered due to data issues?"

It may become apparent, after asking some of these questions, that the organization has already been suffering the effects of unreliable data. By highlighting these pain points and linking them to the capabilities of data observability, you can demonstrate value.

Then, cite real-world examples of what can happen when you don’t have visibility into the state of your data systems. For example, at Zillow in 2021, an incorrect data model led to catastrophic property valuations that led the company to reduce the estimated value of the houses in purchased by more than $500 million. Errors like these can help to illustrate the potential impact on the business of not having proper data observability in place.

Lastly, address misconceptions or confusion about data observability. Executives might note that your organization is already paying for a tool like Datadog. Clarify that Datadog was built as a standard IT observability platform, to monitor web servers rather than data assets and data pipelines. Data observability tools are complementary to tools like Datadog.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.