in issue detection time
data points monitored
With Bigeye, Udacity has one place to understand data quality and has reduced detection times from 3+ days to under 24 hours.
About Udacity: Better learning with data
More than 1.6 million people are advancing their careers with Udacity’s programs. The e-learning school offers massive open online courses on everything from data science and AI to product management and cybersecurity. Supporting that growing customer base is a strong data culture that needs high-quality data for decision making and data science.
Challenge: Scattered testing
Maintaining the reliability of important data pipelines — including data from microservice-based systems and event-based data — is critical for supporting Udacity’s business analysts and data scientists.
Before implementing Bigeye, Udacity’s data engineering team had started tracking data quality with tools, like Airflow and Apache Spark. But the team wasn’t satisfied with the scattered coverage provided by these tools, which also require a non-trivial amount of effort to configure and don’t provide a way to detect the “unknown, unknowns” that can cause trouble for data pipelines. Solely relying on these tests made it difficult to catch subtle but important warning signs, like outliers.
Solution: Automating broad coverage
With Bigeye, detection times are down from 3+ days to under 24 hours, and the Udacity data engineering team has a single place to understand the quality of their data pipelines. They’re able to monitor hundreds of datasets in a cloud-native data lake with better coverage and faster detection than the testing approach had achieved.
The data engineering team is automatically alerted when there are issues with their data pipelines without needing to inspect tests scattered across multiple tools. And because Bigeye’s observability approach makes it simple to get broad coverage, they can catch issues they wouldn’t have thought to write tests for. This allows the team to proactively address issues before business analysts or data scientists are affected, reducing overhead and building greater trust in the underlying data that fuels Udacity’s data culture.
Result: Data pipelines that Simon can be confident in
“I’ve been in data for many years. As a data engineer, the biggest embarrassment is when the customer discovers that something has gone wrong and says, ‘what has happened here? With Bigeye, I am confident that I have the ultimate answer as to whether the data is good or not.”
Now that Simon has Bigeye monitoring his data pipelines, he knows that he’ll be the first to know about any issues in Udacity’s data, not his customers. The fear of an outlier slipping through undetected and causing a great deal of damage is gone. If something happens with the data, Simon knows about it before it affects anyone downstream.
share this case study