How Does Bigeye Handle Your Sensitive Data? Our Field CTO Breaks It Down.
TL;DR: Bigeye stores only encrypted metadata and aggregate metrics by default, with optional features that can access raw data (like table previews and AI debugging) that you can control through Data Restricted mode, feature flags, and RBAC permissions.


Get the Best of Data Leadership
Stay Informed
Get Data Insights Delivered
As co-founder and Field CTO at Bigeye, I've found that nearly every enterprise security team, data leader, and compliance officer wants to understand the same core aspects of our platform: what data gets stored, how they can control raw data access, and which features depend on row-level information.
These are exactly the right things to ask. After recently completing a comprehensive breakdown of these topics for a Fortune 100 customer, I thought it would be valuable to share these answers more broadly. Here's what every enterprise should know about data security and control within Bigeye.
What data does Bigeye collect and store by default?
One of the first things enterprise customers want to understand is exactly what data Bigeye collects from their environment. The answer has two distinct components that work together to provide comprehensive data observability while maintaining security controls.
The data that Bigeye receives by default, regardless of which features are enabled, is as follows:
- Metadata from connected sources, which includes structural information about your data environment. Some examples include: table and column names, lineage relationship information showing how column A feeds into column B, BI report names, and ETL job names and steps. This metadata forms the foundation of Bigeye's catalog and lineage capabilities, giving you visibility into your data landscape without exposing the actual data values.
- For any source that is monitored, aggregate metrics about the data provide statistical insights into pipeline and data quality. Examples of these include: counts of null values in particular columns, min/max/average values in columns, and time since a table was last updated. These metrics power Bigeye's core monitoring and alerting capabilities.
How are these metrics calculated?
Here's the crucial technical detail: these metrics are computed in an efficient manner directly on the source via SQL query pushdown or in Bigeye’s agent. Both of those approaches keeps all of the logic executed within your environment and does not require sending raw data back to Bigeye's cloud. The computation happens where your data lives, and only the aggregate results are transmitted.
How does Bigeye connect to your data sources?
Bigeye offers two deployment architectures that determine where connections to your data sources originate.
In our agent architecture, the agent lives inside your environment and connects to your sources from within your network. Information is then sent back to the control plane in Bigeye's cloud. This approach gives you complete network-level control over data source access while enabling comprehensive observability.
In an agent-less architecture, the connections to your sources come from Bigeye's cloud and pull information directly back. While this can simplify initial setup, it requires opening network access from external systems to your data sources.
Which architecture do enterprise customers choose?
Virtually all enterprise customers use the agent model, and there are compelling reasons why. The agent-based approach provides the security and control that enterprise environments require, keeping all data processing within your network perimeter while still enabling full platform functionality.
Are there any times when Bigeye accesses raw data?
There are a number of powerful features in Bigeye that benefit from accessing raw data. These include:
- Table previews in the catalog enable users to see actual data samples directly within Bigeye's interface. Bigeye never stores this data, and it is scoped to the user’s request. This is also controllable in a more fine-grained way through RBAC if you want to limit which users or teams can access this functionality.
- Issue debug previews allow you to view the specific rows that are causing a particular data quality issue directly in Bigeye. The query gets generated every time, but there is a button (controlled by separate feature flags as well) to preview the results in the UI. This gives you control over when raw data gets transmitted for debugging purposes.
- bigAI powered issue resolution and descriptions leverage AI to provide more actionable insights when problems occur. Bigeye automatically executes the debug query (see above bullet) and uses AI to generate enhanced issue resolutions and descriptions. The results of the query are never stored, but the descriptions and resolution steps may include values from the query result (e.g. - the duplicate values come from rows with a "New Mexico" value in the state column).
The key distinction is that some features require storing raw data (like grouped metrics), while others transmit raw data temporarily but don't store it permanently (like AI-enhanced issue descriptions). Each feature can be controlled independently through feature flags and RBAC permissions.
How do I prevent Bigeye from accessing any raw data?
For organizations with the strictest data handling requirements, Bigeye offers Data Restricted mode. When this mode is enabled, no raw data will ever be sent back to Bigeye's systems under any circumstances. This provides absolute certainty around data transmission while maintaining core data observability capabilities.
Data Restricted mode works by disabling all the features described above that could potentially result in raw data transmission. You'll still get comprehensive monitoring through metadata analysis and aggregate metrics, but you'll lose access to table previews, grouped metrics, debug query previews, and AI-enhanced issue descriptions.
The trade-off is straightforward: maximum data restriction in exchange for reduced functionality in specific areas. For many organizations, particularly those in heavily regulated industries, this trade-off makes perfect sense. The core value of data observability—monitoring, alerting, and lineage tracking—remains fully intact.
How do you choose the right configuration for your organization?
The decision between default mode and Data Restricted mode isn't binary, and it doesn't have to be permanent. We work with customers to understand their specific compliance requirements, risk tolerance, and functional needs to determine the right configuration.
Consider your compliance landscape first. Do you operate under regulations that place specific restrictions on raw data transmission or data residency? Understanding these requirements helps establish the boundaries for your implementation.
Evaluate which observability features are critical for your use cases. If your team relies heavily on debugging capabilities and AI-enhanced issue resolution, operating in default mode with selective feature controls might be the right approach. If your primary needs center on monitoring, alerting, and lineage tracking, Data Restricted mode could provide the security posture you need without sacrificing core functionality.
Think about your network security preferences and risk management approach. Most enterprise customers prefer keeping all data source connections within their network perimeter, which points toward the agent-based architecture regardless of which data handling mode you choose.
Can you customize these settings?
The flexibility built into Bigeye's architecture means you can fine-tune permissions, enable specific features based on user roles, and adjust your configuration as your needs evolve. We're always happy to discuss these technical details in more depth and help you design the right architecture for your specific requirements.
Final Thoughts:
Data observability shouldn't require compromising on security or compliance. The key is understanding exactly how your platform handles data at a technical level and having granular control over those processes. Bigeye's approach gives you multiple layers of control, from deployment architecture to feature-level permissions to complete data restriction, so you can implement the data observability capabilities your organization needs while maintaining the security posture your compliance requirements demand.
Have questions about implementing secure data observability in your environment? Feel free to reach out to directly on LinkedIn (this is exactly the kind of technical discussion I love having) or request a demo to get a full walkthrough of the platform from our team.
Monitoring
Schema change detection
Lineage monitoring

