Resources

Bigeye's State of Data Quality Report
We're pleased to announce the results of our 2023 State of Data Quality survey. Findings underscore the need for automation and better communication between data producers and consumers. Check it out!
Data quality
Driving marketing performance with reliable testing data
Learn how JustAnswer uses Bigeye to ensure optimal marketing results and drive customer acquisition and revenue performance for their business. Connecting people to on-demand expert advice JustAnswer is the world’s largest Q&A website that connects people to real, qualified experts. JustAnswer allows anyone to get immediate expert advice on topics ranging from legal matters and healthcare, electronics and software, and even tutoring and home repairs—all for a low monthly fee. The JustAnswer website is a powerful sales and marketing tool and the marketing teams are constantly running A/B tests across the site to improve customer experience, increase acquisition, and optimize revenue. As such, it’s critical to the JustAnswer business that tests are always accurate and reliable. Challenge JustAnswer has multiple teams running concurrent A/B tests across various aspects of the website—sometimes dozens at once. This creates the potential for tests to encounter a variety of issues that can compromise results integrity and force the team to scrap the experiment entirely. These include website updates interference, overlapping tests, unexpected data infrastructure issues, and data quality problems. Before Bigeye, each team would monitor the health of their tests using a custom dashboard developed internally. This required the test owners to continually check the dashboard for issues, with a high likelihood they wouldn’t find them until after the test was already compromised. This solution was also inflexible and difficult for business users to understand. In addition, new metrics had to be created manually in SQL, and the data model driving the dashboard was complex and prone to breaking. This resulted in lost investment from compromised tests, too much engineering time spent building and fixing test monitoring dashboards, frustrated business users, and difficulty meeting JustAnswer’s website optimization goals. Solution After evaluating several data observability solutions, JustAnswer selected Bigeye to help monitor their testing data. They appreciated Bigeye’s flexible anomaly detection features, the ability to easily create custom business-logic checks and group them into collections, and the fact that Bigeye supported on-prem data sources like SQL Server in addition to cloud data warehouses. Now, with Bigeye, the JustAnswer marketing teams no longer need to manually monitor tests from a dashboard. Instead, the data engineering team has set up collections of predefined data checks in Bigeye that get deployed automatically when a new test launches. This means if one team decides to run a pricing test, they simply create the experiment and Bigeye will automatically apply the correct pre-established data checks based on the experiment ID. As soon as the test starts, Bigeye will monitor for over 30 issue types—including dramatic price anomalies, unexpected changes in traffic patterns between different versions of the experiment, and data infrastructure issues that could cause the experiment to fail. If Bigeye detects an issue, it sends an immediate alert to Microsoft Teams so the analyst can react before the entire test is compromised. In addition, the JustAnswer engineering team uses the Bigeye REST API to automate monitoring without needing to log into the Bigeye UI. This includes automatically deploying Bigeye monitoring on new tests, adjusting Bigeye alert thresholds based on traffic patterns, and automatically disabling Bigeye if the experiment is stopped early to eliminate potential false alerts. Finally, the data quality liaison for each product team uses Bigeye’s fundamental data quality checks to monitor for data freshness and volume to ensure the underlying data is correct and up-to-date at all times. Results With Bigeye, the JustAnswer data engineering team has eliminated the need to develop and maintain their own quality monitoring tool, saving an estimated 320 hours in development and maintenance time alone. In addition, they’ve empowered over 40 business users to monitor their own data with simple, ML-driven business-logic checks. Now, Bigeye automatically monitors over 50 experiments a month, eliminating the need for each team to spend half a day setting up monitoring. In total, business users now spend 100 hours less per month on data quality and are able to invest that time into feature development and improving business performance.
Business-logic monitoringProfessional servicesSelf-service data quality
Managing big data with a nimble team
Learn how the SimpleRose team uses Bigeye to deliver reliable data for large-scale data operations. Using data to solve complex business optimization problems SimpleRose develops powerful optimization software that can assist a wide range of organizations, from manufacturers to airlines, in solving high-value logistics, scheduling, and production problems that result in hundreds of millions of dollars in improvements to the bottom line. SimpleRose is a small firm that ingests and manipulates large volumes of data—reaching over one hundred terabytes—to both power their product and inform strategic business decision making. The data operation at SimpleRose includes bringing raw data into S3 from multiple internal and external sources and running ETL jobs to normalize and load the data into Redshift, with Talend providing additional orchestration and transformation. Challenge SimpleRose has a small, agile data team that maintains high-volume, business-critical data pipelines. Given the company’s reliance on data, the leadership team needed 100% assurance that the data was correct and complete. For SimpleRose Principal Data Architect, Nick Heidke, this meant manually writing and maintaining an ever-expanding list of pipeline tests, which would quickly become unscalable. In addition, the team invested time and resources into creating a curated, single-source-of-truth data mart for business users and analysts. The data team worried that one ETL failure or bug could create inaccurate data, leading users to lose trust in the data mart and bypass it completely by pulling data directly from source databases. This would result in inconsistent data proliferation, governance issues, and a host of other problems. The SimpleRose team needed to find a fast, agile solution to these challenges so they could move on to other, higher-level business needs. Solution Nick knew traditional data quality wouldn’t work for SimpleRose, given the large amount of rule creation required and the cost of purchasing and maintaining a tool. While researching other options, he came across the concept of data observability. Nick had always assumed he would have to think through and define his own tests and was immediately intrigued by the idea of using AI/ML to profile data and automatically apply data quality and pipeline monitoring. The SimpleRose team evaluated several data observability solutions and found that Bigeye had the right mix of capabilities, including Redshift support, out-of-the-box automatic data pipeline and quality checks, and the ability to easily create custom business-logic checks and granular monitoring. The team also appreciated how easy it was to get Bigeye up and running and the price point. In just a few days, the SimpleRose team had deployed broad operational coverage of their data warehouse and applied numerous out-of-the-box data quality checks and alerts to their critical business tables. Nick and team were also able to start creating their own custom checks to monitor for specific business logic. If anything slipped through the cracks and caused a data issue, not only would the team find out about it before business users and fix it, they could also create new checks in Bigeye to ensure the issue didn’t occur again and harden the pipeline even further. Results For SimpleRose, Bigeye reduced the time spent on data pipeline monitoring by 50% while increasing pipeline and data quality monitoring by over 90%. Factoring out to over $100,000 in operational savings per year. Even more important to Nick, the organization’s trust in and utilization of data has grown dramatically—allowing the business to make better, faster decisions and deliver a superior customer experience.
Big dataStartup
Embracing risk without sacrificing reliability
Impira uses Bigeye as the safety net for its dynamic go-to-market strategy — ensuring that the team can move fast with reliable data. About Impira: Unlocking and unifying data Impira is a venture-backed startup whose platform automatically extracts data from unstructured documents through flexible machine learning. The company's no-code, self-service platform makes it easy for companies of all sizes to create their own automations to unlock and unify their data. With customers that span from Airbnb and Stitch Fix to Colgate and Tom’s of Maine, Impira relies on accurate data to drive their go-to-market strategy. Challenge: Move fast and break things Impira analyzes a huge amount of data to fuel its self-service go-to-market strategy. By examining “aha” moments in the customer journey, the product team can see how customers flow through the implementation process, quickly iterate, and improve the customer experience. Impira is moving fast and continually making important decisions on product usage data. Uncaught problems in that usage data could seriously impact their go-to-market decisions, causing them to focus resources on the wrong part of the customer journey. Because such a critical part of the business depends on reliable data, but the longer an issue goes undetected, the more problems it creates down the road. Before implementing Bigeye, Impira’s go-to-market team identified data issues manually, creating firefighting moments for the data engineering team, and slowing down their decision-making process. The team needed more freedom to make key decisions with the assurance that the data was right. Solution: Creating a safety net with Bigeye The Impira team quickly went to work applying Bigeye’s built-in metrics for measuring freshness, duplicates, and cardinalities. Autothresholds, which dynamically set and adjust alert thresholds for them, giving them 24/7 automated monitoring and alerting in place of their manual spot checks. Now, Bigeye continuously monitors Impira’s customer usage data, alerting the data team to any issues before the data is used by the product management team. The early detection they get from Autothresholds-based alerts also ensures that one issue doesn't balloon into more issues — saving valuable engineering time and ensuring that the product management team can move fast with reliable data. Bigeye is able to detect a wide range of issues from cardinality explosions created by a bad SQL join, to very subtle bugs that violate uniqueness expectations. The Impira team told us that issues that now take one day to resolve, previously took four days of manual investigation, and even more time cleaning up everything else that was affected while the issue went undetected. Result: Empowered to embrace risk “We believe in embracing risk. When you are moving as fast as we are, you have to get over the idea that there is a world of zero outages. With Bigeye, those outages won’t slow us down. We know Bigeye will detect any issues before they affect the business,” said Ankur Goyal, CEO of Impira. Impira is a small, fast moving team. With Bigeye automating their previously manual spot checking, they get hours back to invest in their go-to-market effort, and the confidence that the decisions they make with their customer journey data are backed by fresh, reliable data.
Machine learningStartup
Keeping third-party data fresh
SignalFire uses Bigeye to monitor millions of data points and fuel their investment strategies with reliable data. About SignalFire: A data-driven venture capital firm SignalFire is a venture capital firm built from the ground up with a data-driven investing strategy. The firm leverages data from millions of data points to make smart investment decisions. That data also powers Beacon Talent, SignalFire’s innovative AI-based recruiting platform that tracks the world’s top engineers, data scientists, product managers, designers, and business leaders. Challenge: Where third party data can’t go stale Much of the data SignalFire leverages comes from external sources. The data lands in a wide range of formats, often unstructured, and has varying degrees of quality. The data engineering team uses this wide variety of data to build Beacon Talent. Because those products are core to SignalFire’s strategy, the firm maintains a high bar for data quality. Any loss of data, degradation, or deviation from the norm must be resolved. Before implementing Bigeye, the data engineering team ran ad-hoc queries to check on potential issues with the data. But at the scale that SignalFire operates, the data engineering and data science teams needed tools to allow them to trust all of the data, not just some of it. Solution: Keeping data fresh with Bigeye’s freshness monitoring With Bigeye pointed at their MySQL database, SignalFire’s data engineers have an easy way to continuously monitor their huge volumes of data. Bigeye is tailor-built to make monitoring easy at scale, enabling them to monitor the millions of data sets they ingest. Tony Ho, SignalFire’s Director of Engineering, was able to deploy Bigeye’s built-in freshness tracking metrics on the third-party data his team depends on. Bigeye’s Autothresholds automatically detect and set alerting thresholds for him without needing to configure the expected update timings for each data source. In the first week of implementation, Bigeye caught an issue that SignalFire said would have normally taken days to notice. Result: 2 million+ data points validated automatically Now that he has Bigeye watching data ingestion, Tony says he can sleep easy knowing that the data science team trusts the data that they’re working with. Whether it’s an API token going stale, or a vendor missing their scheduled update, when something doesn’t show up on time, Tony knows about it before the rest of his team does. Slack notifications tell him when something goes wrong, and a quick look at his freshness metrics in Bigeye tell the rest of the story.
FinServGrowth
A Strong Data Culture Gets Stronger with Trusted Data
With Bigeye, Udacity has one place to understand data quality and has reduced detection times from 3+ days to under 24 hours. About Udacity: Better learning with data More than 1.6 million people are advancing their careers with Udacity’s programs. The e-learning school offers massive open online courses on everything from data science and AI to product management and cybersecurity. Supporting that growing customer base is a strong data culture that needs high-quality data for decision making and data science. Challenge: Scattered testing Maintaining the reliability of important data pipelines — including data from microservice-based systems and event-based data — is critical for supporting Udacity’s business analysts and data scientists. Before implementing Bigeye, Udacity’s data engineering team had started tracking data quality with tools, like Airflow and Apache Spark. But the team wasn’t satisfied with the scattered coverage provided by these tools, which also require a non-trivial amount of effort to configure and don’t provide a way to detect the “unknown, unknowns” that can cause trouble for data pipelines. Solely relying on these tests made it difficult to catch subtle but important warning signs, like outliers. Solution: Automating broad coverage With Bigeye, detection times are down from 3+ days to under 24 hours, and the Udacity data engineering team has a single place to understand the quality of their data pipelines. They’re able to monitor hundreds of datasets in a cloud-native data lake with better coverage and faster detection than the testing approach had achieved. The data engineering team is automatically alerted when there are issues with their data pipelines without needing to inspect tests scattered across multiple tools. And because Bigeye’s observability approach makes it simple to get broad coverage, they can catch issues they wouldn’t have thought to write tests for. This allows the team to proactively address issues before business analysts or data scientists are affected, reducing overhead and building greater trust in the underlying data that fuels Udacity’s data culture. Result: Data pipelines that Simon can be confident in “I’ve been in data for many years. As a data engineer, the biggest embarrassment is when the customer discovers that something has gone wrong and says, ‘what has happened here? With Bigeye, I am confident that I have the ultimate answer as to whether the data is good or not.” Now that Simon has Bigeye monitoring his data pipelines, he knows that he’ll be the first to know about any issues in Udacity’s data, not his customers. The fear of an outlier slipping through undetected and causing a great deal of damage is gone. If something happens with the data, Simon knows about it before it affects anyone downstream.
Enterprisee-learning
RFI: Data Observability Platforms
Use this RFI template to quickly evaluate data observability vendors. It contains a comprehensive checklist of features, and makes it easy to add additional criteria to score vendors on, so your team can make the best decision.
Data observability
Data quality survey template
Prior to making an investment in data observability products or data quality initiatives, companies often want to conduct a data quality survey. Below, we present a template for such a survey that can be filled out for your own use.

Understanding Bigeye Autothresholds
A detailed overview of how Bigeye Autothresholds alert data teams to the issues that matter most.
Anomaly detection
Reliability at scale for highly-sensitive industries
About Crux: Solving tough data problems at scale Crux solves some of the toughest data problems facing companies today. Customers turn to Crux’s team to accelerate the ingestion of data from hundreds of suppliers by removing obstacles and helping companies reach their business goals. From ingesting data from multiple sources, processing it into standardized formats, and delivering data where it needs to go, Crux tackles complex, multi-faceted data problems at a huge scale: managing 50,000+ data pipelines across 160+ different sources. Adding to the complexity, Crux predominantly operates in highly-sensitive industries, such as financial services, where any issues in the ingestion or delivery of data pose a significant risk. Challenge: Best-in-Class Solution The Crux team is made of data engineers, analysts, and scientists who are experts at accelerating the flow of data. Because of the level of complexity, scale, and high stake for their customers, the team demands best-in-class solutions — any loss of reliability could mean increased risk. When evaluating a data observability solution to complement their core capabilities, Crux needed a solution built for scale that was highly reliable and customizable enough to work within their data engineering workflows and multiple delivery destinations. Solution: Customizable, scalable, and reliable data observability With Bigeye, Crux found a data observability platform that could keep up in all of the areas that matter most: Designed for data engineers: Crux is a team with deep expertise and needed a data observability solution that matched the depth of that expertise. Bigeye enables Crux’s data engineers to customize to their heart’s content with full API access and templates for creating custom metrics. Built for scale: With 50,000+ pipelines, scale is the name of the game at Crux, and Bigeye delivers performance at their scale without racking up a huge infrastructure bill. Reliable: If Crux’s customers experience issues, they could lose revenue or get exposed to risks. The Crux team stress-tested Bigeye's performance and came out confident in the platform’s reliability under pressure. Result: Pressure tested, data engineer-approved data observability “Bigeye brings scale and reliability to data quality monitoring, helping us meet the needs of our customers with a platform that balances automation and flexibility,” said Jason Taylor, Head of Applied Innovation at Crux. With Bigeye, Crux has a platform that can grow with their data engineering team, complement their core strengths with best-in-class data observability, and be relied upon even when facing the most complex data challenges.
GrowthSoftware
The data observability dictionary
A printable asset of the key definitions, differentiations, and components of data observability that you'll encounter in the wild.

Evaluating data observability platforms
This guide is the end result of thousands of hours and dozens of experts and customers, distilled into the best way to run a thorough data observability evaluation. Check it out!

Bringing Reliability to CI/CD Data Pipelines
Mayan uses Bigeye to validate data pipeline merges and deliver self-serve analytics dashboards at scale. About Mayan: Delivering powerful analytics to Amazon Sellers Mayan provides Amazon sellers with powerful self-service analytics dashboards—giving them deep insights into how to optimize ad spend and more effectively grow their business. Mayan powers its advanced analytics with a modern data stack that includes Snowflake as the primary data warehouse, Airbyte and Fivetran for extraction, dbt for transformation, and Looker to create and embed custom client dashboards. Mayan handles the scale and complexity of their operation by running all data model changes through a data-as-code continuous integration (CI) process with Gitlab. Challenge: Building trust in data pipelines Mayan’s small centralized data engineering team sought to optimize their data model testing process and provide self-service access to data analysts by implementing a CI (continuous integration) data pipeline with GitLab and dbt. While this data-as-code approach helped simplify and streamline data model testing, the team lacked confidence in the results due to the frequency of bugs in dbt jobs and their inability to validate a successful merge or pinpoint the cause of a failure. Each time a test did fail, the engineering team had to go through a slow, manual debugging process and data analysts didn’t have visibility into what changes would help their merge request succeed. As a result, data merges took an average of 4 to 5 days to complete, with some taking over two weeks. Mayan needed a way to increase throughput and reduce toil on the team by monitoring the success or failure of tests and immediately identifying the point of failure for fast, easy resolution. Solution: Blue-Green Deployment Tests After evaluating several solutions, Mayan selected Bigeye’s data observability platform to help monitor and identify issues in data pipeline CI tests. With Bigeye, Mayan is now able to use blue-green testing to compare the staging and production tables and get instant insights into whether the ELT job did what they expected. If there’s an issue, the engineering team can now identify the exact point of failure and correct it—no manual debugging required. In addition, the team now has a historical view into merge performance over time, allowing them to track trends and provide feedback to analysts on changes they can make to help ensure their merge requests pass the first time. Results: 60% reduction in time to deploy and insights into the health of data pipelines over time By implementing Bigeye for data model testing validation, Mayan was able to reduce the average time to merge updates into a data model from 4-5 days down to 1. This 60% reduction in time to push changes allows data analysts to move faster, increases confidence in the data, and frees up the data engineering team to focus on high-value projects instead of chasing ETL bugs. Looking forward, the Mayan team plans to use Bigeye anomaly detection to observe the health of their analytics data and monitor the quality of training data being fed into ML applications.
GrowthServices
The Data Observability Team Deployment Guide
In this guide, we have taken our collective learnings and created a path to building observability according to the unique structure of your data team.

Eckerson: Deep Dive on Data Quality Automation
Traditional techniques for ensuring data quality break at scale. As organizations deal with the ever-growing volume and velocity of data, data engineering teams can’t keep up, leading business users to view their reports and dashboards with skepticism.
Data quality
451 Research: Data quality's contemporary disruption via engineering methodology
Borrowing observability concepts primarily from software engineering, data observability seeks to apply observability principles to the data layer itself, primarily for the purpose of ensuring downstream data quality and integrity.
Data quality
451 Research: Bigeye doesn't blink on data observability, bringing in fresh funding
From 451: 'Data observability' specialist Bigeye is treating data quality more like a continuous engineering and monitoring challenge rather than an ad hoc remediation challenge.
Data observability
The Field Guide to Trustworthy Data in Snowflake
In this guide, we have taken our collective learnings and created a path to building trust in your Snowflake data.