Thought leadership
-
March 25, 2025

AI for Data Observability: Designing for Privacy, Access, and Risk

9 min read

TL;DR: AI is adding powerful new features to data observability tools, but it’s also introducing a new class of risk. That makes access controls, infrastructure choices, and output visibility more important than ever. In this article, we walk through how Bigeye built bigAI to be secure by design, and what questions data leaders should be asking before trusting AI features with sensitive data.

Adrianna Vidal
Get Data Insights Delivered
Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.
Stay Informed
Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.
Get the Best of Data Leadership
Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

From flagging anomalies to recommending resolutions, AI is starting to play a real role in data operations. And while it’s speeding things up, removing guesswork, and making sense of noisy metrics, it also introduces a new class of risks: AI privacy, compliance, and control. 

Generative AI is text-based. That means even seemingly harmless summaries or suggestions might leak more than you realize. If an AI sees raw data, it might describe it too well. It might surface something a user shouldn’t be able to see, like exact values contained in the anomalous data. And because many models are black boxes, it’s hard to control the level of caution the model will take with sensitive data..

In this article, we’ll break down the potential dangers of AI features within your data observability platforms, what “secure by design” looks like, and the best questions to ask vendors. 

The Intersection of AI and Security in Data Platforms

AI needs data to generate anything useful: descriptions, recommendations, or predictions. In data observability tools, that means feeding metadata and sometimes row-level data into the model.

The potential issue? Outputs are text. And text can end up in lots of places.

Even when you don’t expose raw data directly, AI can reflect sensitive attributes in ways that are hard to spot or redact. If it sees values like employee salaries, transaction amounts, or patient IDs, it might mention them when describing an issue. Not maliciously. Just... helpfully.

This is where large language models (LLMs) get tricky. They:

  • Can reconstruct or infer sensitive patterns
  • Provide limited transparency into how their outputs are formed
  • Don’t include built-in access control

Once sensitive insights or even raw data gets included in the model’s output, that output could go into chat messages, emails, Jira tickets, etc.

Real World Example: bigAI Issue Descriptions

Bigeye generates issue summaries to help teams understand and resolve data problems. If enabled, the system can use row-level data to identify what’s wrong, and then explain it in plain English.

But not everyone in an org should see that level of detail.

So Bigeye generates two versions:

  • Enhanced descriptions: includes row-level insights
  • Reduced descriptions: uses metadata only

That way, users without data access won’t receive sensitive info by accident. It’s the difference between saying, "Revenue dropped in Q4" and "Revenue dropped due to 4 specific rows from our enterprise client dataset."

Infrastructure and Hosting Considerations

AI features are only as secure as the infrastructure they run on. If your data observability platform is analyzing sensitive data, you need to know exactly where that data lives, how it's processed, and what’s happening behind the scenes.

Bigeye’s AI system, bigAI, is built with these questions in mind. And the answers start with where everything is hosted: AWS Bedrock.

AWS Bedrock is a managed service that gives Bigeye access to industry-leading foundation models like Claude from Anthropic—without ever sending data outside your AWS environment.

Here’s what that looks like in practice:

  • All data stays in-region. Bigeye and AWS Bedrock run in the same AWS environment, so data never crosses geographic or network boundaries.
  • No API calls to external vendors. Even though Bigeye uses third-party models, those models are hosted inside our cloud. No calls are routed to Anthropic or OpenAI.
  • No internet exposure. Data doesn’t leave the platform to get insights back. Everything happens within our controlled infrastructure.
  • Model isolation via deep copy. AWS Bedrock provides a dedicated instance (or "deep copy") of the foundation model that’s only used by Bigeye. It’s like having your own local version of Claude, running privately and securely.

And because both Bigeye and AWS Bedrock are ISO 27001 and SOC 2 Type II certified, you don’t have to just take our word for it. You get assurance that our infrastructure meets the highest standards for security, availability, and data integrity.

Prefer to self-host? No problem.

Some teams need complete, end-to-end control. That’s why Bigeye also supports self-hosting, with optional setup for bigAI.

In a self-hosted environment, you can:

  • Choose where your data resides
  • Control how and when models are used

This setup is especially useful for high-security industries like finance, healthcare, and government—or any organization with strict data sovereignty requirements.

The bottom line? Whether you're cloud-native or on-prem, Bigeye gives you the flexibility to deploy AI observability in a way that aligns with your security model.

Where your data lives matters. And we’ve built bigAI to respect that—by design

Addressing Common Concerns from Security and Legal Teams

Chances are you've had at least one meeting derailed by questions from your security or legal team. Totally fair—generative AI introduces a new class of risk, and those teams have to consider .

So let’s tackle the top three concerns we hear all the time. 

Is my data used to train models?

No. Neither Bigeye nor AWS uses your data to train any AI models, now or in the future. Here's how it works:

  • The models available through AWS Bedrock are pre-trained. That means they’re already fully baked before you ever interact with them.

  • When you send a prompt to bigAI, it's processed by a deep copy of the model hosted inside our AWS environment.

What if someone without access receives an alert?

They won’t see anything they’re not supposed to. 

Bigeye has strict role-based access controls (RBAC) that apply to all AI outputs. If a user doesn’t have permission to view row-level data, they’ll only see reduced AI descriptions—ones that rely on metadata only.

And critically: enhanced AI outputs are never included in alerts or notifications. Even if a user is subscribed to issue alerts, they'll never get sensitive context in their inbox or Slack channel.

Does AI increase compliance risk?

If it’s poorly implemented? Definitely. But if it's built with security at the core? It can actually reduce your risk surface.

Here’s how Bigeye approaches this:

  • All data stays in the platform. The model provider has no ability to see customer data, gather telemetry from the model or prompts, or have any other access to data because Bigeye has a private copy of the foundation model.

  • Access-aware outputs. AI-generated summaries are filtered based on the viewer’s permission level so users without permission to view row-level data will never see raw data included in the outputs from bigAI.

  • Compliant by default. This design meets the requirements of our SOC 2 Type II and ISO 27001 certifications.

You get the same level of auditability, access control, and policy enforcement you’d expect from any other system handling sensitive data.

So yes, AI introduces new considerations. But with the right design, it doesn’t have to mean new headaches. 

Whether you're evaluating a new tool or auditing your current stack, these are the questions you should be asking.

Questions to ask potential vendors:

  • Where is your AI model hosted? If it’s not within your cloud region or the model is making external API calls, that could be a red flag. 
  • What data types go into it? Be clear on whether the system uses row-level data, metadata, or both.
  • Are outputs permission-aware? AI descriptions should change based on user access. If they don’t, that’s a serious problem.
  • Can I restrict AI to metadata only? You should be able to control the level of data AI touches, period.

Red flags to watch for:

  • Outputs that aren’t filtered by role or permissions
  • Lack of clarity on where the models are run or how data is handled
  • Vendors using your data to improve their models for other customers

We believe AI and data security aren’t in conflict. In the right hands, and with the right architecture, they can strengthen each other.

So if you’re planning to bring AI into your observability stack, ask the hard questions. Expect clear answers. And don’t settle for a black box.

Your data deserves better. And now, it can have both intelligence and accountability.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.