Product

May 19, 2023

Agent or agentless data observability? The choice is yours.

min read

Kendall Lovett

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Typically, “deployment model” is not the first thing data teams think about when they start evaluating a data observability platform. They’re often just excited about the prospect of reducing manual data quality rules, discovering unknown anomalies in their data pipelines, and understanding the impact and root cause of data issues.

It doesn’t take long after the initial excitement, however, for teams to start thinking about how this will actually work in practice. How will the tool get access to our data? How involved will the implementation be? What are the security ramifications of instituting data observability?

Among these common questions is, “do we need to deploy an agent?” Bigeye customers are pleased to learn the answer is, “no, but you have the option if you need it.”

In this post, we’ll explore both agent and agentless options to help you better understand the benefits of each.

Bigeye standard agentless model

Bigeye is a SaaS application, built to be deployed quickly and easily across your data pipeline with no agent required. Bigeye has native connectors to all the common cloud data warehouses and lakes—Snowflake, Databricks, Redshift, Athena, Bigquery, etc—along with a host of on-premises and traditional databases like Oracle and SAP HANA, MySQL, Microsoft SQL Server, and other common data stack tools.

The Bigeye app can also be managed through an intuitive UI, programmatically as code, or entirely by REST API. This is great news for data teams that include both data engineers who prefer to work from the command line and data analysts who prefer a dynamic UI.

Let’s take a look at some of the security considerations for a standard Bigeye deployment.

No raw data leaves your environment

Bigeye uses only aggregated statistics and metadata to perform monitoring and anomaly detection and doesn’t store any raw data from your systems. Some helpful, optional features—like auto-generated debug queries—will return row-level data so you can view it in your browser. This data is never persisted and each of these features can be completely disabled in advanced settings, if security and compliance rules require.

Connections are read only

Bigeye uses a read-only JDBC service account to query your data, just like most BI and analytics tools. This means Bigeye can’t modify your data. In a standard agentless deployment, data source credentials are stored on Bigeye AWS servers, encrypted at rest, and cannot be accessed by Bigeye engineers. We’ll discuss credential management for agent deployments in detail a little later.

Encryption

All information handled by Bigeye is encrypted. Aggregated data is sent over HTTPS and encrypted with TLS. Data at rest is encrypted with AES-256.

SOC2 Type II

Bigeye maintains a SOC 2 Type II audit report and our SRE team performs regular penetration tests and stringent security reviews.

Bigeye agent model

For most customers, Bigeye’s standard agentless approach provides the right balance of built-in security, scalability, and ease of use. For organizations with more stringent security requirements, however, the Bigeye agent model is a great alternative. The two primary benefits of running an agent are that it removes both the need to expose raw data to Bigeye’s SaaS platform and the need to allow incoming connections from outside your datacenter or VPC. This enables customers in highly-regulated industries to get the benefits of data observability within security protocol.

How it works

With the agent model, Bigeye agents run natively in your environment while the Bigeye SaaS application is hosted in a Bigeye-managed VPC. As illustrated in the diagram below, Bigeye agents communicate with Bigeye in a “pull-only” fashion by periodically polling a secure work queue for instructions.

When an agent has work to carry out it, connects to your data source, runs the necessary queries, and returns aggregated results to Bigeye. Bigeye can then use the results to perform anomaly detection, send alerts, and create durable event records.

While one agent is typically sufficient, larger enterprises may choose to deploy multiple agents across different environments or for different use cases. In this scenario, Bigeye agents operate asynchronously, running queries and returning results to Bigeye’s SaaS platform independently. This allows Bigeye to remain performant even in large-scale deployments.

In addition to the native security provided by the Bigeye SaaS app, mentioned in the agentless section, the Bigeye agent provides some additional important security measures.

No inbound connections to your network

In an agent model, there are only outbound networking connections made from Bigeye agents. This allows the Bigeye agent to query a secure work queue for instructions without allowing any incoming connections from outside the network.

Additional encryption protocols

We’ve designed the Bigeye agent to be as secure as possible. In addition to the TLS layer mentioned above, we use asymmetric encryption in both directions to make sure that no data, aggregated or otherwise, is ever at risk in transit.

No raw data is stored in Bigeye agents

As in the agentless approach, Bigeye uses only aggregated statistics and metadata to perform monitoring and anomaly detection and does not collect any raw data. The Bigeye agent does not store any raw data either, but simply collects and transmits aggregated statistics about your data back to Bigeye. In some cases, the agent can query and return raw data to assist in debugging or root cause analysis, but this data is never stored in durable memory and each of these features can be disabled, if required.

No two data stacks are the same and Bigeye’s agent and agentless options are built to provide robust flexibility and governance—no matter how stringent the security needs or how bespoke the environment.

Talk to our team to learn more about Bigeye's flexible deployment options.

share this episode

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights

about the author