Engineering

May 15, 2023

Data mesh: A people challenge, not a technical challenge

min read

Egor Gryaznov

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Creating a decentralized data architecture – technically speaking, a “data mesh” – would sound for all intents and purposes like a technical challenge. How do you organize, store, and give access to data by specific business domains, thereby giving ownership and accountability to its producers?

Data mesh is a great approach at decentralizing data governance, but data governance isn’t really a technical issue. It’s all about the processes around managing the data: documentation, data reliability, quality, and understanding. A data mesh isn’t just about technology, it’s about the teams and people. Creating a data mesh is a people and process problem, and should be treated as such.

What is a data mesh?

Data mesh and data contracts share a similar origin story. A data mesh is a distributed data architecture in which business units own and publish their data (their “domain”) that others consume. A data mesh is implemented wherever organizations want to eradicate bottlenecks associated with centralizing data from multiple systems. By distributing the data across multiple domains, the theory is that it’s easier for teams to evolve their own data sets and it’s easier for consumers to self-serve the use of data from a specific domain.

Data contracts are similar: they are agreements under which various data producers and data consumers agree on the intended responsibility and usage of data.

Both of these concepts come down to the same principle: people need processes in place in order to force them to care about their data output. Individual product teams don’t want accountability for owning their data, nor do they want to claim ownership of it.

Creating a culture of caring about data

If nobody wants to care about their data output and how it impacts other teams, who is responsible for that accountability? The culture of caring about data needs to start from within the product and engineering organization.

It may need to come from the top down; for example from the VP of Engineering or the CTO. A powerful message from leadership: “If your data is broken, I will come directly to your team, not the data team. It is on you to fix it.”

When accountability is the mandate, you can create a culture wherein teams care about the data messes they make. Clean up a huge mess a few times, and you’re a lot less likely to create such a big mess in the first place.

Data mesh is a people challenge

The promise of data mesh is that you can move faster when individual teams are responsible for their domains. A data mesh promises cheaper, smaller data teams and self-service capabilities to the individual.

Data contracts promise clear definitions for how data looks and changes to your stakeholders. But the existence of these agreements don’t change the fact that if the team doesn’t care, they’ll break the contract. If there are no consequences, they’re not incentivized not to break it in the name of building new features faster.

Data mesh and data contracts are great ways for central data teams to say, “please stop dumping all your problems on me.” Teams know their own data best; they’re working with it every day. So it makes sense that, given the tools that you need to figure it out, you use them to manage your data and create reliable data sources. But the tools can’t magically do that for you; your team has to care.

Individual teams mostly only care about shipping faster. They don’t care about their schema changes or downstream impact. Only a culture of accountability, which is strongly enforced by management, allows for the positive outcomes of data mesh.

Final thoughts

It’s tempting for people to make the mistake of attempting to solve organizational problems with technology. We saw the same pattern happen with cloud warehouses, where teams en masse decided, “No one needs to manage servers and databases anymore!” But in reality, the Snowflake bills grew incredibly expensive as teams sacrificed efficiency and performance for ease of use.

Tools are not panaceas all on their own. They won’t automatically solve everything across the data landscape. If your teams aren’t encouraged to think about the data modeling and efficient querying patterns, you’ll move fast but at the expense of cost and performance.

You can centralize a lot of governance and tooling at the central data team and distribute the ownership of the data to the rest of the organization with a data mesh and data contracts. Enforcing the contracts and compliance to the practice of owning your data is a totally different beast, and you can’t do that without a culture of accountability. Good luck!

share this episode

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights

about the author

Egor Gryaznov

Co-founder and Field CTO, Bigeye

Egor Gryaznov is Bigeye’s Co-founder and Field CTO. Before starting Bigeye, he helped build and scale Uber’s data platform, including the pipelines behind its massive A/B testing system. He also kicked off Uber’s first SQL bootcamp, teaching hundreds of engineers how to get hands-on with data. Uber is also where Egor met his future co-founder, Kyle Kirwan, and the two eventually teamed up to start Bigeye. At Bigeye, Egor works directly with customers to bring modern data observability practices into their organizations and make sure their data stays reliable at scale.

about the author

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Data mesh: A people challenge, not a technical challenge

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

What is a data mesh?

Creating a culture of caring about data

Data mesh is a people challenge

Final thoughts

Egor Gryaznov

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

How To Evaluate Data Observability Platforms (With Downloadable)

Why data lineage is mission-critical for businesses today

Making sense of machine learning and artificial intelligence models by monitoring the training data

Join the Bigeye Newsletter

Data mesh: A people challenge, not a technical challenge

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

What is a data mesh?

Creating a culture of caring about data

Data mesh is a people challenge

Final thoughts

Egor Gryaznov

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

Related posts

How To Evaluate Data Observability Platforms (With Downloadable)

Why data lineage is mission-critical for businesses today

Making sense of machine learning and artificial intelligence models by monitoring the training data

Join the Bigeye Newsletter