Product

November 12, 2025

How Does Bigeye Handle Your Sensitive Data? Our Field CTO Breaks It Down.

8 min read

TL;DR: Bigeye stores only encrypted metadata and aggregate metrics by default, with optional features that can access raw data (like table previews and AI debugging) that you can control through Data Restricted mode, feature flags, and RBAC permissions.

Egor Gryaznov

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

As co-founder and Field CTO at Bigeye, I've found that nearly every enterprise security team, data leader, and compliance officer wants to understand the same core aspects of our platform: what data gets stored, how they can control raw data access, and which features depend on row-level information.

These are exactly the right things to ask. After recently completing a comprehensive breakdown of these topics for a Fortune 100 customer, I thought it would be valuable to share these answers more broadly. Here's what every enterprise should know about data security and control within Bigeye.

What data does Bigeye collect and store by default?

One of the first things enterprise customers want to understand is exactly what data Bigeye collects from their environment. The answer has two distinct components that work together to provide comprehensive data observability while maintaining security controls.

The data that Bigeye receives by default, regardless of which features are enabled, is as follows:

Metadata from connected sources, which includes structural information about your data environment. Some examples include: table and column names, lineage relationship information showing how column A feeds into column B, BI report names, and ETL job names and steps. This metadata forms the foundation of Bigeye's catalog and lineage capabilities, giving you visibility into your data landscape without exposing the actual data values.
For any source that is monitored, aggregate metrics about the data provide statistical insights into pipeline and data quality. Examples of these include: counts of null values in particular columns, min/max/average values in columns, and time since a table was last updated. These metrics power Bigeye's core monitoring and alerting capabilities.

How are these metrics calculated?

Here's the crucial technical detail: these metrics are computed in an efficient manner directly on the source via SQL query pushdown or in Bigeye’s agent. Both of those approaches keeps all of the logic executed within your environment and does not require sending raw data back to Bigeye's cloud. The computation happens where your data lives, and only the aggregate results are transmitted.

How does Bigeye connect to your data sources?

Bigeye offers two deployment architectures that determine where connections to your data sources originate.

In our agent architecture, the agent lives inside your environment and connects to your sources from within your network. Information is then sent back to the control plane in Bigeye's cloud. This approach gives you complete network-level control over data source access while enabling comprehensive observability.

In an agent-less architecture, the connections to your sources come from Bigeye's cloud and pull information directly back. While this can simplify initial setup, it requires opening network access from external systems to your data sources.

Which architecture do enterprise customers choose?

Virtually all enterprise customers use the agent model, and there are compelling reasons why. The agent-based approach provides the security and control that enterprise environments require, keeping all data processing within your network perimeter while still enabling full platform functionality.

Are there any times when Bigeye accesses raw data?

There are a number of powerful features in Bigeye that benefit from accessing raw data. These include:

Table previews in the catalog enable users to see actual data samples directly within Bigeye's interface. Bigeye never stores this data, and it is scoped to the user’s request. This is also controllable in a more fine-grained way through RBAC if you want to limit which users or teams can access this functionality.
Issue debug previews allow you to view the specific rows that are causing a particular data quality issue directly in Bigeye. The query gets generated every time, but there is a button (controlled by separate feature flags as well) to preview the results in the UI. This gives you control over when raw data gets transmitted for debugging purposes.
bigAI powered issue resolution and descriptions leverage AI to provide more actionable insights when problems occur. Bigeye automatically executes the debug query (see above bullet) and uses AI to generate enhanced issue resolutions and descriptions. The results of the query are never stored, but the descriptions and resolution steps may include values from the query result (e.g. - the duplicate values come from rows with a "New Mexico" value in the state column).

The key distinction is that some features require storing raw data (like grouped metrics), while others transmit raw data temporarily but don't store it permanently (like AI-enhanced issue descriptions). Each feature can be controlled independently through feature flags and RBAC permissions.

How do I prevent Bigeye from accessing any raw data?

For organizations with the strictest data handling requirements, Bigeye offers Data Restricted mode. When this mode is enabled, no raw data will ever be sent back to Bigeye's systems under any circumstances. This provides absolute certainty around data transmission while maintaining core data observability capabilities.

Data Restricted mode works by disabling all the features described above that could potentially result in raw data transmission. You'll still get comprehensive monitoring through metadata analysis and aggregate metrics, but you'll lose access to table previews, grouped metrics, debug query previews, and AI-enhanced issue descriptions.

The trade-off is straightforward: maximum data restriction in exchange for reduced functionality in specific areas. For many organizations, particularly those in heavily regulated industries, this trade-off makes perfect sense. The core value of data observability—monitoring, alerting, and lineage tracking—remains fully intact.

How do you choose the right configuration for your organization?

The decision between default mode and Data Restricted mode isn't binary, and it doesn't have to be permanent. We work with customers to understand their specific compliance requirements, risk tolerance, and functional needs to determine the right configuration.

Consider your compliance landscape first. Do you operate under regulations that place specific restrictions on raw data transmission or data residency? Understanding these requirements helps establish the boundaries for your implementation.

Evaluate which observability features are critical for your use cases. If your team relies heavily on debugging capabilities and AI-enhanced issue resolution, operating in default mode with selective feature controls might be the right approach. If your primary needs center on monitoring, alerting, and lineage tracking, Data Restricted mode could provide the security posture you need without sacrificing core functionality.

Think about your network security preferences and risk management approach. Most enterprise customers prefer keeping all data source connections within their network perimeter, which points toward the agent-based architecture regardless of which data handling mode you choose.

Can you customize these settings?

The flexibility built into Bigeye's architecture means you can fine-tune permissions, enable specific features based on user roles, and adjust your configuration as your needs evolve. We're always happy to discuss these technical details in more depth and help you design the right architecture for your specific requirements.

Final Thoughts:

Data observability shouldn't require compromising on security or compliance. The key is understanding exactly how your platform handles data at a technical level and having granular control over those processes. Bigeye's approach gives you multiple layers of control, from deployment architecture to feature-level permissions to complete data restriction, so you can implement the data observability capabilities your organization needs while maintaining the security posture your compliance requirements demand.

Have questions about implementing secure data observability in your environment? Feel free to reach out to directly on LinkedIn (this is exactly the kind of technical discussion I love having) or request a demo to get a full walkthrough of the platform from our team.

share this episode

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights

about the author

Egor Gryaznov

Co-founder and Field CTO, Bigeye

Egor Gryaznov is Bigeye’s Co-founder and Field CTO. Before starting Bigeye, he helped build and scale Uber’s data platform, including the pipelines behind its massive A/B testing system. He also kicked off Uber’s first SQL bootcamp, teaching hundreds of engineers how to get hands-on with data. Uber is also where Egor met his future co-founder, Kyle Kirwan, and the two eventually teamed up to start Bigeye. At Bigeye, Egor works directly with customers to bring modern data observability practices into their organizations and make sure their data stays reliable at scale.

about the author

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

How Does Bigeye Handle Your Sensitive Data? Our Field CTO Breaks It Down.

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

What data does Bigeye collect and store by default?

How are these metrics calculated?

How does Bigeye connect to your data sources?

Which architecture do enterprise customers choose?

Are there any times when Bigeye accesses raw data?

How do I prevent Bigeye from accessing any raw data?

How do you choose the right configuration for your organization?

Can you customize these settings?

Final Thoughts:

Egor Gryaznov

Get the Best of Data Leadership

Stay Informed

Get Data Insights Delivered

Introducing: Bigeye's AI Guardian

How We Turn Customer Needs into Product Features