Engineering
-
October 14, 2022

How to evaluate data observability platforms (RFI)

So you’re considering a data observability platform, but want to make sure you’re choosing the best one for your business. We’ve got some tips to help you run a thorough evaluation process.

Kendall Lovett

So you’re considering a data observability platform, but want to make sure you’re choosing the best one for your business. We’ve got some tips to help you run a thorough evaluation process.

We wrote this article after running hundreds of POCs (proofs of concept)—some great, and some painful—to help you run a great evaluation process to decide which data observability solution is right for your team. Read on to learn how to gather the right resources, define your success criteria, decide on a vendor shortlist, and choose the right data observability solution.

☝️You can also download this handy Data Observability RFI (request for information) to help you get up and running quickly.

When to invest in data observability

The desire to be data driven is nothing new. In the current sense of the word, organizations have been striving for “data-drivenness” for almost 20 years now. And while new data tools and best practices have certainly made us more data driven then we were when Hadoop came hot-stepping onto the scene in ‘06, many organizations still find themselves sitting on a big stack of data they either can’t use, or aren’t sure how to. If you asked them “why?” they might say:

  • “I don’t know what this data is or how I’m supposed to use it.”
  • “We have way too much data and I don’t even know where to start.”
  • “I’m not confident the data is reliable.”

Many innovative companies have popped up in the last few years to try and tackle these challenges but, “I’m not confident the data is reliable” continues to plague even the coolest, most sophisticated DataOps teams. There’s just too much data, too many potential points of failure, and too few skilled humans to test all of it. Even if you could, the tests would still only look for things you anticipated, leaving you open to the, “I’m trying to pull data for our APJ investor meeting and the dashboard is totally #@*!ed!!” type issues you didn’t.

Is data observability right for you?

Enter data observability, a new category of tools designed to solve the data reliability problem. If you’ve recently asked yourself:

  • “How do I find data issues before my executives or customers do?”
  • “How can I make sure my replication job completes on time and all the data shows up looking like I meant it to?”
  • “How many more failed data merge attempts can I debug before I do something I’ll regret?”

Then you should be evaluating data observability solutions. But where do you start? Read on.

How to evaluate data observability solutions

Assemble your team

If you’re like most organizations we’ve worked with, you may have lots of different data observability requirements floating around that may not be well documented or defined. As a first step, we recommend creating an evaluation team that can help gather requirements and synthesize them before you begin looking at vendors. Choosing the right mix for your data observability evaluation team is critical to the success of your project.

Cover the critical roles in the evaluation process:

  • Practitioners: Ensure you get the right balance of representation from across the business by engaging with one or two representatives from each group who might interact with your chosen solution. This could include data engineers, data scientists, ML engineers, analysts, and so on.
  • Evaluation leader: Appointing a project leader who’s close to both the technical work and the business requirements—such as a director of data engineering or a seasoned engineer—will give your initiative the ability to balance requirements and ensure that whatever decision you come to will be correct, technically and strategically.
  • Executive sponsor: Get an executive sponsor such as a CTO or CDO to participate as well. Having this individual’s buy in will go a long way in ensuring your search is aligned with your organization’s strategic goals and you can actually get budget for a solution when the time comes.
  • Purchasing: Once the evaluation is over, you’ll want to move quickly to implement the solution you’ve selected. Procurement and purchasing can take much longer than many people think. Identifying early who will lead any legal reviews, procurement steps, etc. can save a lot of time later on.

Also consider a cross section of teams who will use the data observability platform:

  • Data engineering / data platform: Data engineering or data platform team engineers will almost certainly be core users of your data observability solution, and should be core decision makers in the selection process.
  • Analytics / data science: Data science and analytics teams are often subject matter experts and heavy users of data observability, once the data platform or data engineering teams have set up basic monitoring for pipeline freshness and volume.
  • Machine learning: Machine learning teams often adopt data observability to protect the pipelines feeding the training and serving data used to drive models that reach production. They’ll want to be familiar with the solution you choose.

Example team list

Gather your requirements

Now that you’ve got your team together and—hopefully—bought them all lunch, it’s time to create a laundry list of wants and needs for your business. Try to tie each list item to a specific business challenge or initiative, this will make it easier to create a business case later. You may also want to give your requirements initial scores or weights to simplify prioritization.  

Here are a list of data observability categories we recommend to help guide you:

☝️Tip: This handy Data Observability RFI (request for information) contains the below criteria and more!

Deployment: Consider your deployment needs. Are you a nimble startup in need of a fully-managed Saas solution? A large healthcare provider looking for the flexibility of a single-tenant hosted option? Or a defense contractor with strict air-gap requirements? Also consider how the solution will connect to your existing stack and what performance impacts it may have on your operation.

Security: Be sure to engage your IT security team early to understand your organization’s software security requirements. Look for criteria like SOC2 certification, compliance with relevant industry regulations, and if the vendor exports or stores any of your data in their platform.

Monitoring: Monitoring capabilities are the foundation of data observability. Use the requirements you outlined to determine which types of monitoring you’ll need and how much effort will be required to deploy and maintain them. Which monitors are available out of the box and which will need to be custom? Can they be deployed automatically or do they require manual setup? Do they monitor both table level and field level? Can you easily add your own monitoring for custom business logic?

Anomaly detection: If monitoring is the foundation of data observability, anomaly detection is the crown jewel. Mixed metaphor aside, anomaly detection should help you identify issues you didn’t know existed, a capability very difficult to build yourself. Look at what types of anomaly detection are available and the sophistication of the ML models used. How long does the model need to train before it can start detecting anomalies? Does it adapt to user feedback? Can you easily adjust the sensitivity levels? Is anomaly detection available on custom monitoring? Are simpler methods of detection such as standard deviation available?

Alerts and notifications: If APM tools have taught us anything, it’s that proper alerting will make or break a tool. If you don’t get alerted to critical issues then you’ve got a fire drill, if you get constantly overloaded with false positives or non-issues then you’ve got a grumpy ops team and some nice data observability shelfware. Your solution should provide an optimized alerting system and allow you to easily fine-tune notifications to meet your needs.

Resolution: knowing about a problem is great, knowing how to resolve it is even better. You should evaluate how the solution allows you to track and manage issues, if it provides root cause analysis and debugging, and if you can show you how issues are impacting downstream tables and systems to help prioritize resolution and manage user expectations.

Integrations: Aside from a direct connection to your data warehouse or database, evaluate the current tools in your data stack and consider how a data observability solution will interact with—and add value to—each. These may include analytics tools like Tableau, data catalogs like Alation, and orchestration and transformation tools like Fivetran, dbt and Airflow.

Reporting: Another key benefit of data observability is gaining insights into how your data pipelines are performing over time. Ensure the solution you choose can provide insights such as how much of your pipelines are currently in or out of coverage, how many issues have been detected / acknowledged / resolved, and if issues are trending over time. You should also be able to export metadata to a third-party database / warehouse for additional analysis.

Configuration and management: While simplicity of configuration and advanced features like REST API access, a CLI, or SDK may not be top of mind in your initial search, they will be critical to your success as data observability usage and requirements grow. Ensuring your solution can provide simple, scalable, Terraform-like “as code” options will pay dividends when you want to deploy monitoring on hundreds of tables or integrate blue-green testing into a version-controlled pipeline.

Selecting vendors to evaluate

What you’re looking for in a solution will ultimately be driven by the business requirements you defined in the previous step. If your team is primarily made up of data engineers who need to know that data in the warehouse is fresh and complete, then you’ll want to ensure your chosen solution has the ability to query the metadata and expose broad operational health metrics across the data warehouse with little to no setup of management on your part. If you have data science and analytics teams who need to ensure data adheres to business-logic quality rules before it’s fed into an analytics dashboard or ML application, then you’ll want a solution that provides deep business-logic driven metrics such as variance or outlier detection. If you need both, then we recommend adding Bigeye to your short-list. :)

Stay tuned for our next article in this series where we’ll provide you with battle-tested knowledge on what to do, and what not to do, to ensure you have a successful POC. Or, if you’re ready to get started now, download our free Data Observability RFI below.

Want to run a great POC?

Skip the work and download our free Data Observability RFI (Request for Information). It contains a comprehensive checklist of features, and makes it easy to add additional criteria to score vendors on, so your team can make the best decision.

✔  Pre-built data observability features list

✔  Grouped capabilities to help you focus on what matters most to you

✔  Compare multiple vendors during your evaluation

Get the guide now.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.