Bigconfig empowers data teams to implement data reliability at scale
Bigeye has partnered with dozens of top data teams to create Bigconfig, a declarative, YAML-based monitoring-as-code solution, that allows teams to continue this convention and integrate data monitoring as code into their workflows.
Modern data teams manage their infrastructure, pipelines, and analytics tools as code. Bigeye has partnered with dozens of top data teams to create Bigconfig, a declarative, YAML-based monitoring-as-code solution, that allows teams to continue this convention and integrate data monitoring as code into their workflows.
Bigconfig provides Bigeye customers with the scalability, repeatability, and governance enterprise data engineering teams need. Bigconfig comes with conveniences like saved metrics, dynamic tags with wild card asset identifiers, and Autothresholds. Our solution empowers data teams to apply standardized monitoring across 1000s of tables to ensure full coverage and early detection of incidents.
This post will help your team get started. It covers the creation of your first Bigconfg, helpful tips to maximize coverage, and details on advanced features.
Monitor business-critical datasets with table deployments
The simplest way to get started with Bigconfig is using table deployments. Most data teams will have a handful of business-critical datasets—tables that power reports for the executive team or training data for a production pricing model. We recommend using a table deployment in Bigconfig to create and manage metrics on these datasets. This allows you to control and customize these metrics with fine grained precision to ensure even subtle pattern changes in your data are detected.
The good news is Bigconfig makes it easy to customize what you want—and automatically configures the rest.
Start with table metrics to ensure the dataset is updating on time with expected row counts.
Then apply column-level metrics to ensure the accuracy of the data. You can choose from Bigeye’s 60+ predefined metrics. You can customize these metrics at the field level by filtering rows with conditions, grouping by relevant dimensions, or defining custom thresholds. But don’t worry, you don’t have to manually define each attribute, Bigconfig automatically applies workspace defaults for those not specified. For example, autothresholds will automatically analyze series history and alert you to anomalies so there’s no need to determine acceptable ranges.
If your table loads incrementally, we recommend setting a row creation time to apply a windowing function on metrics. This will optimize query performance and enhance anomaly detection.
Check out the example below to get started:
Finally, if you find yourself repeating the same metric across multiple columns, it’s a good idea to save the metric in saved_metric_definitions so they’re consistent across datasets, and you can apply metrics in a single line of code. Here are a couple common examples:
Implement broad, standardized monitoring across your warehouse with tag deployments
Once you’ve monitored business critical tables with table deployments, use tags to deploy broad coverage on all datasets. Tags empower you to deploy metrics across your warehouse consistently and automatically, so your data observability scales with your data.
For example, we recommend tracking consistency of table updates with hours since last load and row count checks on all tables. Further, primary key or ID fields should be monitored for NULLs, duplicates, and proper formatting in all tables. Emails and other contact information can similarly be monitored. Finally, ensure critical KPIs have distribution checks to catch any anomalies.
To do this, first define your tags with a list of column selectors. Tag definitions are designed to match common semantic standards in your warehouse. You can include wildcards to dynamically match values across your warehouse, within a specific schema, or in any dataset prefixed/suffixed with a certain name. See below for some examples:
Next, deploy metrics on these tags in tag_deployments. You can reuse saved metrics or inline metric definitions as needed. Autothresholds ensure that each metric created is trained and customized to the specific dataset, so no need for tedious definitions or maintenance. See the example below:
Integrate Bigconfig into your DevOps Workflow
Implementing monitoring-as-code makes it easy to integrate Bigconfig into your existing workflows. First, we recommend versioning your Bigconfig YAML in git so that changes are governed by pull request reviews and approvals. Further, you can integrate and automate Bigconfig by defining tasks in the CI/CD automation tool of your choice – GitHub Actions, Bamboo, Jenkins, etc. For example, you could create a GitHub Action that:
- Automatically runs a Bigconfig Plan after a pull request is made for changes to your Bigconfig YAML files
- Automatically runs a Bigconfig Apply after that pull request is approved and merged.
Finally, you could automatically Plan and Apply when other code files, like DBT YAML, are released by triggering tasks off those files. This would ensure that metrics are automatically enabled on new tables and views.
Stay tuned for our next blog where we'll further discuss using Bigeye-CLI to integrate data observability into your CI/CD data pipeline.
Schema change detection