Product

May 23, 2023

Monitoring Stripe data with Bigeye

min read

Liz Elfman

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Many companies rely on Stripe data to make key business decisions. Reinforce the quality of this data is by using a data observability tool. In this blog post, we'll demonstrate how to quickly spin up deep data monitoring on your critical Stripe datasets with Bigeye.

Understanding Stripe datasets

Depending on which Stripe product you use and how you’re using it, you'll produce different types of raw Stripe tables. For example, a SaaS software company might use the Stripe subscription product, while an e-commerce company might use Stripe to accept payment for orders. A lending company might use Stripe for credit notes. Below, we show how raw Stripe data might look as it arrives in your data warehouse through an ETL tool like Fivetran.

It's common to ingest raw Stripe data into your data warehouse or lake, then have your data engineering team perform transformations and permission your BI teams for analysis. Improper financial data landing in your BI teams can result in disastrous business decisions being made.

There are two critical tables for every organization that uses Stripe:

Balance transaction: a running log of every transaction hitting your Stripe account
Customer: a running log of all customer information

Other tables containing data like payment method cards, intent, and payouts, may also be available as metadata on the account itself.

Generally speaking, you will do some preliminary cleanup of the raw Stripe data. Then, the data will be aggregated into higher-level tables that make business metrics like sales, revenue, refunds, etc, easily accessible.

How to monitor Stripe data with Bigeye

We suggest taking a programmatic approach to deploying data monitoring on your critical datasets. Why?

Stripe tables can be repetitive; data includes many of the same, repetitive columns. As a result, when deploying metrics against Stripe data, it’s useful to use Bigconfig, which allows you to specify that all columns with certain column names should be monitored in a certain way.

Bigconfig

BigConfig consists of two components: tag definitions and metric definitions. Tag definitions allow you to use selectors with wildcards to identify repetitive columns like transaction IDs, amounts, and currencies that occur across multiple tables. In the example below, the tag definition OBJECT_IDS states that every time a column name is of the form SAAS*.STRIPE_RAW.*.id, it belongs in the OBJECT_IDS tag.

Metric definitions allow you to apply certain kinds of metrics to each tag, without having to enable the appropriate metrics for each column. Metric definitions simplify your monitoring deployment.

Metric definitions also allow you to create custom metrics. For instance, you might want to track your refund data, grouping it by currency and calculating the average refund per currency.

Once you have defined your tags and your metrics, you can apply certain metrics to certain tags in a deployment.

In Bigconfig, you can also create Collections to organize the tag and metric definitions. For example, we may create one collection that displays all sales data, one for customer data integrity, and another for general data integrity. Each SLA in the Bigconfig file maps to a collection in the Bigeye UI. Each collection can be configured to send notifications to specific individuals.

What metrics should you monitor on your Stripe data?

You’ll notice that in the UI above, we’ve created several collections. These collections roughly correspond to three types metrics that we suggest you monitor on your Stripe data: General data integrity/Balance transaction integrity metrics: These collections include checks that all transaction IDs are unique, non-null, and matching across multiple tables.

Accurate sales data metrics: These collections include checks that numerical values like number of sales made, number of refunds issued, chargebacks, invoices, fees, etc., all make sense. We recommend grouping these metrics by currency to ensure each currency is operating correctly.

Customizable metrics: Finally, Bigeye’s metric templates allow you to define custom checks. For example, maybe one of your Stripe datasets contains some JSON data. Checking that JSON data directly for whether it’s empty, etc., might not give granular enough guarantees that key/value pairs are present. Instead, with Bigeye, you can define queries that expand the JSON data into specific columns to check into specific values.

Step-by-step instructions for monitoring your Stripe data

Add the data warehouse that contains your Stripe data, as a data source in Bigeye
Copy the Stripe BigConfig template into a local yaml file
Follow the instructions in the recipe to add your own custom tag definitions and metric definitions.
Follow the instructions here to install the BigConfig CLI tool
Apply the BigConfig to your dataset using a single CLI command

Conclusion

Monitoring your Stripe data with Bigeye can ensure the integrity of your financial data. The process is made easy by the Bigconfig tool, which allows for efficient data monitoring and notification in case of any issues. As we continue to expand our turn-key coverage of other popular SaaS data sources, like Hubspot, we look forward to offering even more low-friction service to our users.

If you would like to try this out, contact us now. You just need a read-only account with access to your Stripe data, or you can use our dummy Stripe datasets to give it a spin!

share this episode

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights