Engineering
-
May 15, 2023

Data mesh: A people challenge, not a technical challenge

Data mesh is fundamentally NOT a technical issue. Figure out your people and your processes; you data governance comes together from there. Let's explore.

Egor Gryaznov

Creating a decentralized data architecture – technically speaking, a “data mesh” – would sound for all intents and purposes like a technical challenge. How do you organize, store, and give access to data by specific business domains, thereby giving ownership and accountability to its producers?

Data mesh is a great approach at decentralizing data governance, but data governance isn’t really a technical issue. It’s all about the processes around managing the data: documentation, data reliability, quality, and understanding. A data mesh isn’t just about technology, it’s about the teams and people. Creating a data mesh is a people and process problem, and should be treated as such.

What is a data mesh?

Data mesh and data contracts share a similar origin story. A data mesh is a distributed data architecture in which business units own and publish their data (their “domain”) that others consume. A data mesh is implemented wherever organizations want to eradicate bottlenecks associated with centralizing data from multiple systems. By distributing the data across multiple domains, the theory is that it’s easier for teams to evolve their own data sets and it’s easier for consumers to self-serve the use of data from a specific domain.

Data contracts are similar: they are agreements under which various data producers and data consumers agree on the intended responsibility and usage of data.

Both of these concepts come down to the same principle: people need processes in place in order to force them to care about their data output. Individual product teams don’t want accountability for owning their data, nor do they want to claim ownership of it.

Creating a culture of caring about data

If nobody wants to care about their data output and how it impacts other teams, who is responsible for that accountability? The culture of caring about data needs to start from within the product and engineering organization.

It may need to come from the top down; for example from the VP of Engineering or the CTO. A powerful message from leadership: “If your data is broken, I will come directly to your team, not the data team. It is on you to fix it.”

When accountability is the mandate, you can create a culture wherein teams care about the data messes they make. Clean up a huge mess a few times, and you’re a lot less likely to create such a big mess in the first place.

Data mesh is a people challenge

The promise of data mesh is that you can move faster when individual teams are responsible for their domains. A data mesh promises cheaper, smaller data teams and self-service capabilities to the individual.

Data contracts promise clear definitions for how data looks and changes to your stakeholders. But the existence of these agreements don’t change the fact that if the team doesn’t care, they’ll break the contract. If there are no consequences, they’re not incentivized not to break it in the name of building new features faster.

Data mesh and data contracts are great ways for central data teams to say, “please stop dumping all your problems on me.” Teams know their own data best; they’re working with it every day. So it makes sense that, given the tools that you need to figure it out, you use them to manage your data and create reliable data sources. But the tools can’t magically do that for you; your team has to care.  

Individual teams mostly only care about shipping faster. They don’t care about their schema changes or downstream impact. Only a culture of accountability, which is strongly enforced by management, allows for the positive outcomes of data mesh.

Final thoughts

It’s tempting for people to make the mistake of attempting to solve organizational problems with technology. We saw the same pattern happen with cloud warehouses, where teams en masse decided, “No one needs to manage servers and databases anymore!” But in reality, the Snowflake bills grew incredibly expensive as teams sacrificed efficiency and performance for ease of use.

Tools are not panaceas all on their own. They won’t automatically solve everything across the data landscape. If your teams aren’t encouraged to think about the data modeling and efficient querying patterns, you’ll move fast but at the expense of cost and performance.

You can centralize a lot of governance and tooling at the central data team and distribute the ownership of the data to the rest of the organization with a data mesh and data contracts. Enforcing the contracts and compliance to the practice of owning your data is a totally different beast, and you can’t do that without a culture of accountability. Good luck!

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.