Thought leadership

January 1, 2026

The House Of Data Series: Data Quality

50 min read

This paper focuses on the five dimensions of data quality, how to measure and monitor them, the role of data stewardship, and what data quality means for AI Trust. It does not cover pipeline architecture or data observability tooling in depth — those are addressed in the DataOps whitepaper.

Jim Barker

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join The AI Trust Summit on April 16

A one-day virtual summit on the controls enterprise leaders need to scale AI where it counts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

House of Data Series

Every strong data program is built like a house. Data Architecture forms the foundation — the platforms, pipelines, and operating model that everything else depends on. Seven domain pillars rise from that foundation, each one essential to a complete data program: Data Quality, Privacy, Data Security, DataOps, Compliance, Data Enablement, and Data Consumption. Data Literacy runs across all seven as a connecting beam, ensuring people at every level can read, interpret, and act on data. At the top, People & Leadership sets the direction, accountability, and culture that holds the whole structure together.

This series of whitepapers covers each component of the House of Data in depth. Each paper was written by Jim Barker (Principal, Data Strategy), with support from a practitioner with direct experience in that domain. Together, they form a practical guide to building data programs that earn — and keep — trust.

Data Leadership Data Literacy Data Quality Privacy Data Security DataOps Compliance Data Enablement Data Consumption Data Architecture

This paper covers Data Quality — the first pillar of the House of Data, and the one that underpins every other capability. Without accurate, complete, and consistent data, pipelines can't be trusted, analytics mislead rather than inform, and AI systems encode problems rather than solve them. Data quality isn't a cleanup task. It's a continuous discipline.Introduction

Data has become foundational to how organizations operate, compete, and scale. It powers financial reporting, customer engagement, supply chains, regulatory compliance, analytics, and increasingly, automated decision-making through AI. As reliance on data has grown, so has the cost of getting it wrong.

Despite modern data platforms and significant investment, many organizations still struggle with a basic question: can we trust the data we are using to run the business? Conflicting reports, unexpected pipeline failures, and late-stage remediation remain common. In AI-driven use cases, these challenges are amplified. Poor data quality no longer results only in flawed insights, but in automated outcomes that are difficult to detect, explain, or reverse.

Data quality is not about making all data perfect. It is about ensuring that data is fit for business use. That means defining what "good" looks like for the data that matters most, detecting when data falls below acceptable thresholds, addressing issues efficiently, and preventing the same failures from recurring. When data meets those expectations, it should be explicitly recognized as suitable for use.

This paper examines eleven focus areas that, in practice, have the greatest impact on improving data quality outcomes. These areas span organizational roles, identification of critical data, definition of quality expectations, visibility into data health, operational response to issues, and transparency for leadership. Together, they form a practical path from reactive data cleanup toward proactive, scalable data quality management.

Data quality

Data quality defined

Data quality is a core component of successful data programs. To get value out of data, it must be "fit for business" or "fit for purpose." The goal is to understand whether data meets business readiness thresholds. When it is not fit for purpose, fix it. When it does meet the business need, certify it as "good for use."

DAMA definition: The planning, implementation, and control of activities that apply data quality management techniques to data in order to assure it is fit for consumption and meeting the needs of data consumers. (Source: DMBOK, 2017)

Gartner definition: Data quality refers to the usability and applicability of data used for an organization's priority use cases, including AI and machine learning initiatives. Data quality is usually one of the goals of effective data management and data governance. Yet too often organizations treat it like an afterthought.

McGilvray definition: The degree to which information and data can be a trusted source for any and/or all required uses.

Practical definition: Data quality is the level at which data can be trusted to be used for the purpose in which it was collected. It needs to be addressed while in-flight and at-rest, and all users of the data should follow the mantra of "See It, Say It, and Sort It."

Data quality terms of interest

The following vocabulary forms the working foundation for any serious data quality program. Each term represents a concept teams will encounter when building, running, or improving quality capabilities.

The set of data used to assist in data entry, find corresponding values, and generate data that is aligned across records, systems, and processes. Data quality policies The formal rules established on how data will work inside your organization. Can be relative to systems, tables, columns, or classes of data. Examples: all tax IDs need one of three formats; all data is retained for seven years; data destruction rules apply when data is archived or deleted. Data drift Over time, data will become less reliable. Data drift is that notion — for example, needing to verify financial or address data every 18 months to make sure your data stays current. Also referred to as data decay. Data cleansing The act of taking data in your system and making it right. This can be done through rules, through software, or through manual efforts such as data triage. Root cause analysis Identifying the real reason something fails. It's fine to determine that something is wrong, but identifying why it's wrong allows you to improve overall data quality and learn from mistakes for the future. Data validation The review of data from any given process, not just at the end but at each step of the journey. Heavily used in data migration efforts. Typically explores number of records passed, number failed, matching data between steps, and expected calculations. Data verification The idea of reviewing data at its final source and verifying that it's fit for purpose and correct. Often used in data migration projects. While data validation reviews the outcome of a process step by step, data verification focuses on simply the final outcome. VOC (Voice of the Customer) The idea of asking the people using data what works, what doesn't work, and how it can be made better. Rather than a "they'll take what we give them" posture, we ask how to be better. It's critical in data quality to get the consuming public to provide constructive criticism to find places to improve. Five why's The idea that if you ask why a data quality challenge exists five times, you get to the heart of the answer. Based on six sigma and continuous improvement, it provides a great way to consider the driving factors of data quality challenges.

Focus areas of data quality

Each of those terms is important to data quality, and each could take several pages to fully explore. This paper dives into eleven of them -- the eleven most critical to supporting data quality progress in an organization.

People
Critical Data Objects and Critical Data Elements (CDEs)
Defining data quality dimensions
Defining a data quality process
Developing DQ rules for CDEs
Profiling CDEs
Building DQ reports (data at-rest)
Rolling out a data triage process for DQ
Piloting in-flight DQ checks
Implementing a business process for data circuit breakers
Rolling out a "State of Data" report for leadership

1. People

People are the key to data programs, and the data quality focus area is no different. Three main focus points on people should be understood.

Everyone has a role to play in data quality.

Leaders (business, data, and technical) need to understand the importance of data in their organization and how critical good data is for operations, AI, digital transformation, analytics, and execution. Leaders need to pay attention to data quality efforts, set expectations, support funding requests, and have the ability to align resources to prevent "garbage in, garbage out" challenges.
Data analysts need to understand the importance of good data for the things they do. Rather than simply complaining that data is bad, they should provide specifics, speak up when they see a problem, and do their part to support those trying to improve data.
Data engineers should think about data quality in everything they do when designing, building, and maintaining solutions. Create solutions that help with data quality, ask for business input related to quality, and do what is necessary for solutions with data quality at their core.
Data quality specialists work to bring people together, listen to input, build solutions, and promote the help they receive. They should not be afraid to ask for help, to teach others, and to raise the profile and benefits of high-quality data.
Data stewards bring it all together and promote data quality activities across the organization. They build out teams across the organization that can walk and talk the benefits of data quality and genuinely believe in a community of practice working together for better data to run a better business.

See It, Say It, Sort It.

Establish a system that provides a mechanism to record issues as you find them, brings people together, and gives people the help they need. Build out the notion that if you see something, learn something, or need something, you write it down. The worst problems encountered in business occur when someone knows something is a problem but does not take the time to share it with others. As you see something wrong, write it down ("say it") and then get it fixed ("sort it"). This makes all the difference.

Who's who in data quality.

Provide a register that is easy to find and tells people who the experts are for data quality, and for all of data. This would include references to functional expertise, technical capabilities, business process, audit concerns, and overall execution. Having the organization set up so you know who people are, recording challenges, and understanding that everyone has a role to play in data will help you progress forward. Get the people right and everything is easier.

2. Critical Data Objects and Critical Data Elements

For years, data governance practitioners have spoken of CDEs, or Critical Data Elements. This concept is of particular importance for data quality, but it helps to take it a step further. The concept of CDEs is actually two different things:

Critical Data Elements (CDEs) are the fields in files or tables that are critical to business processes and analytics. If they are wrong, the business is in serious trouble.
Critical Data Objects are the tables or files that hold CDEs. It is only with an understanding of both that firms can manage data quality work effectively.

As part of your data program, record what your CDEs are. Use this to manage the work of getting and keeping data of high quality and report on the level of data quality for these critical objects. Firms can have millions of columns in their data, but a smaller number will be critical. Know the difference. Focusing first on CDEs can be a game changer for all organizations.

3. Defining data quality dimensions

Data quality dimensions have been at the core of data management for the last 25 years. It is important to define what your data quality dimensions are and use them for building out better data quality rules, reporting on the quality of your data, and building improvement plans based on those details.

The typical six dimensions most data experts start with are listed below in the recommended sequence to address. Completeness is often addressed first and last: first for critical elements to make a complete record, and last to bring together aspects of other data quality dimension details.

Dimension	Description	Example
Completeness*	A verification that all necessary data elements are populated and all mandatory checks have passed baseline criteria. Completeness can also include results of other DQ dimensions.	An address record has the street, city, state, country, and postal code so that a product or document can be sent via post.
Conformity	Data that is maintained in the correct format or pattern required for its purpose.	All phone numbers are in the correct format, e.g. ###-###-####.
Consistency	Data is consistent across columns, conforming when considering the values in other related fields.	The postal code for a given address is correct based on the country of the address.
Uniqueness	Data is unique within this system for a given entity. There are no inappropriate duplicate records.	The given records for a company are free of duplicates and redundancies — you have one Beta Corp in Belmont, CA.
Timeliness	Data is available and maintained in the system within the time necessary for use.	For a product to be shipped, all aspects of the order — customer, product, units, and regulatory information — are available within 72 hours of order completion.
Accuracy	Data is correct, appropriate, verified, and up to date. It works for its needs and meets business context. These are the toughest rules to implement.	The record is reviewed and common sense is applied to say "yes, that is correct" based on deep understanding of the data set.

*Completeness is often addressed first and last -- first for critical elements to make a complete record, and last to bring together aspects of other DQ dimension details.

While this is the most common list to work with, there are many other ways to look at this. The DAMA organization in the Netherlands performed a project that categorized a wider variety of data quality dimensions. The conceptual groupings below can provide insight when a firm works to define its own dimensions. (Source: DAMA DMBOK 2.0)

Accessibility	Contextual	Intrinsic	Representational
Accessibility Ease of Access Security	Appropriate Amount of Data Completeness Relevancy Timeliness Value-Added	Accuracy Believability Objectivity Reputation	Concise Representation Consistent Representation Ease of Understanding Interpretability

When defining data quality dimensions for your organization, right-size the list. Find a set comprehensive enough to meet your needs, but small enough to socialize across the organization. The goal is to use these dimensions to drive improvement.

4. Defining a data quality process

Data quality often gets a bad reputation and can be difficult to move forward. Experience has shown that by understanding a common process and getting people to work together and follow it, data quality can improve in an efficient manner.

Danette McGilvray wrote the seminal book on data quality process, Executing Data Quality Projects. The great gift to the data industry from that work was "The Ten Step Process." The original steps are:

These ten steps have evolved in recent years with more focus on data catalogs and data observability. The following evolution is more prescriptive and leads to more proactive action. Automated data observability checks are implemented first and expanded to custom when necessary.

5. Developing DQ rules for CDEs

Documenting the business definition of what "business fit" means for a given field is very important. That does not mean a firm should invest time in defining the data quality rule for every field in their systems. Common sense must rule. Most firms will start by documenting their CDEs and potentially expand from there.

This task involves meeting with subject matter experts for a field, reviewing the data and data profile when available, and documenting the specifics of a rule: what makes a field meet the definition of "business fit" for your organization. The DQ rule should include:

Field name -- the technical name of the field in the file or database
Business name (if different) -- the business common name of the field
Data type -- the technical detail of the field (string, number, date, etc.)
Data size/scale -- the length of the data field
Business definition -- describes what the field holds from a business perspective
Business specification -- one or two sentences describing what this data field needs to contain

An example business DQ rule:

Field attribute	Value
Field name	Address1
Business name	Address Line 1
Data type	String (Varchar2)
Data size/scale	80
Business definition	This field holds the first/main address line for a customer address.
Business specification	This field must not start with a 0, should be longer than 5 digits, and should include a house number and a street. Note: some addresses deviate slightly, so warnings are more appropriate than hard errors.

6. Profiling CDEs

Profiling is technology that, with the click of a button, lets an end user read a table or file and get a report highlighting much of the data source. This typically includes:

Sample data and frequency
Number of nulls
Number of probable duplicates
System data type, size, and scale
Inferred data type, size, and scale
Patterns and frequency
Minimum and maximum of data value
Minimum and maximum length of data value
Average length of data value
Average value of data field

Use profiling technology to profile each table or file that holds CDEs. By running the profile and paying particular attention to data patterns, nulls and duplicates, and sample values, the writing of data quality rules can be expedited.

This profiling activity helps answer key questions: Is this data really populated? What is the general hygiene of this data? What are the formats of the data, and does it need more cleanup? By increasing the understanding of CDEs, teams are set up for a more robust definition of CDE DQ rules.

7. Building DQ reports (data at-rest)

There is an important need that is often missed: the development of reports that support data quality initiatives. This does not need to be done in a dedicated data quality tool -- it is most effective when done with resident BI and analytics software packages. These reports tend to fall into three categories:

Tactical spreadsheet-level reports that illustrate which rows of data have issues and what those issues are. These reports are for data experts and act like audit reports that say what is wrong and what needs to be done to fix it.
DQ dimension reports -- graphical reports that show where data stands across tables or functional boundaries. They show overall trending and help illustrate where more work is needed.
Executive-level reports that show the current state and the historical reference, designed to show leadership where things were, where they are now, and what the focus is moving forward.

These reports give data teams what they need to action DQ situations and are often the basis of capabilities inside DQ dashboards.

[Screenshots: executive-level report, DQ dimension report, and tactical audit report examples]

8. Rolling out a data triage process

A critical part of data quality is the build-out of data stewardship and functional support for data. As your data quality program grows, there will be needs to fix data in operational systems. Some practitioners refer to this as "shifting left." The idea is to fix data at the source when possible so it does not need to be fixed and patched across multiple data pipelines.

The roll-out of a data triage process typically includes:

A help desk notification or workflow that captures requests and allows assignment to people who can address those challenges
Aging reports that show outstanding requests approaching or exceeding service level agreement timeframes
Tooling so that business users can make the necessary updates in a timely manner, reducing the lift required to make changes
Reporting that generates volumetrics to track requests, completions, and timeframe to do the work
A program to recognize top performers who provide support for data triage activities

Note: some firms get caught in a cycle of pushing these requests into help desk software like ServiceNow or JIRA. It is worth asking whether those platforms are the right ones for this type of activity, or whether the work should happen in software packages closer to the users of data.

9. Piloting in-flight DQ checks

In-flight DQ checks are the technical processes built inside data pipelines that use a variety of techniques to identify when data being processed fails to meet the established DQ business rules. These processes are built with capabilities including reference data and data quality or data observability software, and they generate defects and restrict loading if data is not fit for purpose.

The general process: data is read from a source system, a DQ check runs for validity, data that passes is transformed and loaded, while discards are flagged separately before the data reaches the target system.

10. Implementing a business process for data circuit breakers

The in-flight DQ check is helpful, but what happens after is key. In most cases, building a business process that reviews data discards, modifies data in-flight or at the source, and notifies key personnel on challenges is critical.

As you build out your in-flight DQ solutions, remember the business users you support. Do not let the process hold data back without a resolution path. That pattern is a top reason business users lose confidence in data quality and in the technical teams that support them.

Key deliverables for implementing a business process for circuit breakers include:

Building discard reports
Circulating discard reports
Providing oversight for discard follow-ups
Reporting on exceptions and trends over time

Note: this area of business process for discard processing should be viewed as a specialty use case closely related to data triage.

11. Rolling out a "State of Data" report for leadership

As your data program moves forward and gains the attention of leaders, it is vital to include them and share progress. Transparency on data quality progress is critical to grow the trust and confidence of senior management.

Build a "State of Data" report that gets refreshed quarterly. It should cover a wide range of topics related to data, data governance, and data quality. In the data quality space, one or two slides that tell the story of data quality progress are recommended. Those slides often include:

Number of data quality actions identified
Number of data quality actions completed
Trending reports on the improvement of data by DQ dimension across quarters
Number of people trained on DQ topics
Next steps for the next 30, 60, and 100 days

The goal is to provide leadership with a consistent picture of where data stands each quarter, so leaders spend their time asking the right questions:

What are your top priorities for the next quarter?
What assistance do you need with resourcing and priorities?
What are the benefits from our recent investments?
What should I be most concerned about?
What is next?

Role of data quality in AI trust

AI is widely regarded as an efficient way to complete business processes. This can be done using machine learning (ML), natural language processing (NLP), or generative AI (GenAI). But all these models, just like traditional analytics, have one requirement: they need quality data.

A focus on quality both at-rest and in-flight is required to have trusted data and processes. Agents operating on low-quality data can provide incorrect outcomes, cause business interruptions, and risk the reputation of your enterprise.

It is critical to have continuous monitoring, regular review, and efficient stewardship strategies for creating and maintaining high-quality data that a firm can trust. You cannot trust AI without first trusting your data.

Bigeye's role in data quality

Bigeye, as a data observability tool, brings data quality forward in an actionable format. It addresses a large number of data quality capabilities, including data quality dimensions, checks, reports and dashboards, data profiling, alerts, and issue management.

Capability	Description	Benefit
Data quality dimensions	Bigeye gives each customer the ability to establish their own data quality dimensions — not a forced set, but dimensions configured to match the organization's needs. These can be used to define what checks are needed and to report a summary view of data health by dimension.	Organizations can align data quality reporting to their own standards rather than adapting to a rigid framework.
Data quality checks	With over 70 out-of-the-box metrics, a data quality analyst can set up approximately 80% of checks in minutes across a table or schema. These checks can also be embedded inside pipeline processes, and circuit breakers can be put into place to stop processing under certain conditions.	Rather than spending hours writing data quality rules from scratch, Bigeye covers the majority out of the box. Teams can focus manual effort on the custom business logic that requires it.
Data quality dashboards and reports	Inside Bigeye, a set of reporting widgets, reports, and dashboards are available. Additionally, Bigeye offers the capability to surface Bigeye metrics inside your own BI tool, enabling the current state of data to be visible in the tools teams already use.	The combination of out-of-the-box reporting and BI flexibility provides comprehensive support for data quality visibility at every level of the organization.
Data profiling	Bigeye offers data profiling capabilities that help accelerate the development of new data products with a focus on data quality, auto-generate metrics from the profile, and assist data stewards in cleaning up the data.	Profiling that used to take days of manual SQL work can be completed quickly, giving engineers a head start on data quality rules and pipeline design.
Data quality alerts	The Bigeye platform uses machine learning to establish thresholds and generate alerts when action is needed. It identifies real anomalies and filters out outliers that appear problematic but reflect normal business or technical operation.	Bigeye's alerting balances the need for notification with the concern about alert overload, so data teams act on what matters rather than chasing noise.
Issue management	Bigeye's issue resolution portal allows data teams to take action on data issues and helps teach the ML model for better decision-making in the future. Issues can be assigned, tracked, and resolved within the platform.	A unified place to collect, manage, and resolve issues removes the coordination overhead of tracking problems across email threads and spreadsheets.

Summary

Data quality is critical to using data to run your business. This paper covered a wide range of topics: from data quality dimensions and data quality checks to profiling, data triage, and the eleven focus areas that move programs from reactive to proactive. It is vital to your business to have data that you can trust. Data quality is where that trust is built.

Explore the Series

Every great data program is built from the ground up.

The House of Data breaks down the ten pillars of a mature, trustworthy data organization. Click any section to explore that paper.

Data Leadership Data Literacy Data Quality Privacy Data Security DataOps Compliance Data Enablement Data Consumption Data Architecture

References

Caballero, I., Verboon, N., & Piattini, M. (2020). Dimensions of data quality (Version 1.2). DAMA-NL. https://dama-nl.org/wp-content/uploads/2020/09/DDQ-Dimensions-of-Data-Quality-Research-Paper-version-1.2-d.d.-3-Sept-2020.pdf

Khatri, V., & Brown, C. V. (2010). Data governance: The missing approach to data quality. California Management Review, 52(2), 86–103. https://www.proquest.com/openview/4b405a8360f99610460c0640fc680668

Lee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: A methodology for information quality assessment. (Related to “Beyond accuracy” concept; see below if using specific article)

Naumann, F. (2002). Quality-driven query answering for integrated information systems. (Contextual—remove if not needed)

Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211–218. https://dl.acm.org/doi/10.1145/240455.240479

Redman, T. C. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099

Strong, D. M., Lee, Y. W., & Wang, R. Y. (1997). Data quality in context. Communications of the ACM, 40(5), 103–110. https://doi.org/10.1145/253769.253804

Taleb, I., Serhani, M. A., & Dssouli, R. (2024). A review of data quality dimensions. Procedia Computer Science, 232, 187–196. https://www.sciencedirect.com/science/article/pii/S187705092400365X

Trehan, A. (2024). An intelligent approach to data quality management: AI-powered quality monitoring in analytics. https://www.researchgate.net/publication/387298750

Zhang, Y., et al. (2022). Data quality challenges in deep learning. The VLDB Journal, 31, 1–23. https://doi.org/10.1007/s00778-022-00775-9

Loshin, D. (2020). Executing data quality projects: Ten steps to quality data and trusted information. Academic Press. https://www.amazon.com/Executing-Data-Quality-Projects-Information/dp/0128180153

Sambasivan, N., et al. (2020). Data quality and explainable AI. ACM Digital Library. https://dl.acm.org/doi/10.1145/3386687

share with a colleague

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights

What are the five dimensions of data quality?

Completeness (are all expected records present?), conformity (does the data match the expected format or schema?), consistency (does the same value appear consistently across systems?), uniqueness (are there duplicates where there shouldn't be?), and timeliness (did the data arrive when it was supposed to?). Each dimension can fail independently, which is why checking one doesn't substitute for checking all five.

What is data stewardship and why does it matter for quality?

Data stewardship assigns ownership of specific datasets to specific people or teams. When a quality issue surfaces, stewardship answers the question "whose problem is this?" before anyone has to spend time figuring it out. Without clear ownership, even a well-instrumented quality program degrades into alert fatigue and unresolved incidents.

How does data quality affect AI?

AI models are trained on historical data and make predictions based on ongoing feeds. Quality issues that would surface quickly in a dashboard — a null rate that doubled, a schema field that changed — can silently corrupt a model's training data or inference inputs. The consequences compound over time. A model trained on six months of skewed data doesn't just give one wrong answer; it encodes that skew into its weights.

about the author

Jim Barker

Principal, Data Strategy

Jim Barker is a lifelong data practitioner, industry thought leader, and passionate advocate for treating data as a strategic asset. With more than four decades of experience spanning data quality, governance, warehousing, migration, and architecture, Jim brings a rare blend of hands-on expertise and executive perspective to the evolving data landscape.

Jim’s journey in data began at just 14 years old. Since then, he has held leadership roles across organizations including Honeywell, Informatica, Thomson Reuters, Winshuttle (Precisely), Alation, and nCloud Integrators, contributing to advancements in data governance, migration methodologies, and enterprise data strategies. His work has included building global data quality programs, developing scalable governance frameworks, and driving innovation recognized across the industry.

His research and writing focus on lean data management, governance strategies, and the intersection of AI, data quality, and enterprise value creation.

Now at Bigeye, Jim is energized by the company’s vision for AI Trust and its role in shaping the future of data. He continues to share his perspectives through writing and speaking, aiming to elevate the conversation around data, cut through industry noise, and help organizations do data the right way.

Outside of work, Jim enjoys spending time with his family, often around bikes, fishing, horses or robots. Much of Jim’s professional success has been influenced by activities with his kids prior to their graduating from college. In the past he was quite involved in coaching basketball and coaching,where many of the same lessons about teamwork, discipline, and leadership apply.

As Jim puts it: “Data matters.”

about the author

His research and writing focus on lean data management, governance strategies, and the intersection of AI, data quality, and enterprise value creation.

As Jim puts it: “Data matters.”