Agent or agentless data observability? The choice is yours.
Bigeye is the only data observability platform that offers both agentless and agent-based models, allowing organizations to choose the deployment option that best fits their needs.
Typically, “deployment model” is not the first thing data teams think about when they start evaluating a data observability platform. They’re often just excited about the prospect of reducing manual data quality rules, discovering unknown anomalies in their data pipelines, and understanding the impact and root cause of data issues.
It doesn’t take long after the initial excitement, however, for teams to start thinking about how this will actually work in practice. How will the tool get access to our data? How involved will the implementation be? What are the security ramifications of instituting data observability?
Among these common questions is, “do we need to deploy an agent?” Bigeye customers are pleased to learn the answer is, “no, but you have the option if you need it.”
In this post, we’ll explore both agent and agentless options to help you better understand the benefits of each.
Bigeye standard agentless model
Bigeye is a SaaS application, built to be deployed quickly and easily across your data pipeline with no agent required. Bigeye has native connectors to all the common cloud data warehouses and lakes—Snowflake, Databricks, Redshift, Athena, Bigquery, etc—along with a host of on-premises and traditional databases like Oracle and SAP HANA, MySQL, Microsoft SQL Server, and other common data stack tools.
The Bigeye app can also be managed through an intuitive UI, programmatically as code, or entirely by REST API. This is great news for data teams that include both data engineers who prefer to work from the command line and data analysts who prefer a dynamic UI.
Let’s take a look at some of the security considerations for a standard Bigeye deployment.
No raw data leaves your environment
Bigeye uses only aggregated statistics and metadata to perform monitoring and anomaly detection and doesn’t store any raw data from your systems. Some helpful, optional features—like auto-generated debug queries—will return row-level data so you can view it in your browser. This data is never persisted and each of these features can be completely disabled in advanced settings, if security and compliance rules require.
Connections are read only
Bigeye uses a read-only JDBC service account to query your data, just like most BI and analytics tools. This means Bigeye can’t modify your data. In a standard agentless deployment, data source credentials are stored on Bigeye AWS servers, encrypted at rest, and cannot be accessed by Bigeye engineers. We’ll discuss credential management for agent deployments in detail a little later.
All information handled by Bigeye is encrypted. Aggregated data is sent over HTTPS and encrypted with TLS. Data at rest is encrypted with AES-256.
SOC2 Type II
Bigeye maintains a SOC 2 Type II audit report and our SRE team performs regular penetration tests and stringent security reviews.
Bigeye agent model
For most customers, Bigeye’s standard agentless approach provides the right balance of built-in security, scalability, and ease of use. For organizations with more stringent security requirements, however, the Bigeye agent model is a great alternative. The two primary benefits of running an agent are that it removes both the need to expose raw data to Bigeye’s SaaS platform and the need to allow incoming connections from outside your datacenter or VPC. This enables customers in highly-regulated industries to get the benefits of data observability within security protocol.
How it works
With the agent model, Bigeye agents run natively in your environment while the Bigeye SaaS application is hosted in a Bigeye-managed VPC. As illustrated in the diagram below, Bigeye agents communicate with Bigeye in a “pull-only” fashion by periodically polling a secure work queue for instructions.
When an agent has work to carry out it, connects to your data source, runs the necessary queries, and returns aggregated results to Bigeye. Bigeye can then use the results to perform anomaly detection, send alerts, and create durable event records.
While one agent is typically sufficient, larger enterprises may choose to deploy multiple agents across different environments or for different use cases. In this scenario, Bigeye agents operate asynchronously, running queries and returning results to Bigeye’s SaaS platform independently. This allows Bigeye to remain performant even in large-scale deployments.
In addition to the native security provided by the Bigeye SaaS app, mentioned in the agentless section, the Bigeye agent provides some additional important security measures.
No inbound connections to your network
In an agent model, there are only outbound networking connections made from Bigeye agents. This allows the Bigeye agent to query a secure work queue for instructions without allowing any incoming connections from outside the network.
Additional encryption protocols
We’ve designed the Bigeye agent to be as secure as possible. In addition to the TLS layer mentioned above, we use asymmetric encryption in both directions to make sure that no data, aggregated or otherwise, is ever at risk in transit.
No raw data is stored in Bigeye agents
As in the agentless approach, Bigeye uses only aggregated statistics and metadata to perform monitoring and anomaly detection and does not collect any raw data. The Bigeye agent does not store any raw data either, but simply collects and transmits aggregated statistics about your data back to Bigeye. In some cases, the agent can query and return raw data to assist in debugging or root cause analysis, but this data is never stored in durable memory and each of these features can be disabled, if required.
No two data stacks are the same and Bigeye’s agent and agentless options are built to provide robust flexibility and governance—no matter how stringent the security needs or how bespoke the environment.
Talk to our team to learn more about Bigeye's flexible deployment options.
Schema change detection