Rashmi Ramesh
rashmi-ramesh
Thought leadership
-
April 10, 2026

The Enterprise AI Stack, Explained

8 min read

Enterprise AI systems today span seven interconnected layers, from cloud infrastructure up through analytics and BI. Understanding what each layer does and what it depends on is the prerequisite for making coherent infrastructure decisions. This guide maps and simplifies each layer, the tools operating it, and how they connect.

Rashmi Ramesh
Get Data Insights Delivered
Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.
Join The AI Trust Summit on April 16
A one-day virtual summit on the controls enterprise leaders need to scale AI where it counts.
Get the Best of Data Leadership
Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

The seven-layer framework in this article was explored in a live session with practitioners from across the enterprise AI ecosystem. Joining Rashmi Ramesh, VP of Product at Bigeye, were practitioners representing Google Cloud, Kong, ThoughtSpot and Bigeye: the infrastructure, connectivity, analytics, and trust layers. Watch the full conversation here.

Enterprise AI spending reached $37 billion in 2025, up from $11.5 billion the year before, according to Menlo Ventures' annual state of generative AI report. The investment is real and accelerating. So is the complexity. Most enterprise AI systems today involve multiple models, multiple data sources, multiple agent frameworks, and multiple teams working across all of them. The technical infrastructure required to hold that together spans more than a single vendor, or even product category.

This guide maps the enterprise AI stack layer by layer: what each one does, what it provides to the layers above it, and what organizations are building or buying at each level.

The seven layers covered here are:

  1. Cloud infrastructure
  2. Enterprise data platforms
  3. Model garden
  4. Agent orchestration
  5. AI trust platform
  6. Connectivity and APIs
  7. Analytics and business intelligence

The enterprise AI stack at a glance

The table below is a simplified snapshot, not a comprehensive directory. It reflects the tools we're seeing most frequently in production enterprise AI environments recently. Every layer has more tooling options than are listed here, and the landscape shifts quickly: new tools launch regularly, existing ones evolve their positioning, and enterprises experiment constantly. Treat this as a starting point for orientation, not a definitive tool directory.

Layer What it does Tooling
Cloud Infrastructure Provides the compute, storage, and memory required to run AI systems at scale. Google Cloud, AWS, Azure
Enterprise Data Platforms Centralizes and processes enterprise data so it can be reliably used by AI systems. This is the foundation for building trustworthy AI. Snowflake, Databricks, SharePoint, Box, S3
Model Garden Provides access to foundation models that power reasoning and decision-making. Google, OpenAI, Anthropic, Mistral, Meta, Google Vertex, Azure AI Foundry, AWS Bedrock, Databricks Mosaic
Agent Orchestration Defines how agents plan, reason, and take actions to complete tasks. LangGraph, LlamaIndex, ThoughtSpot
AI Trust Platform Ensures the data used by AI systems is accurate, secure, and compliant. This layer is critical for managing risk in production AI. Bigeye, Rubrik, Airia
Connectivity and APIs Controls, secures, and routes interactions between applications, data, and models. This is the control plane for how AI systems operate. Kong, Apigee, Cloudflare AI Gateway
Analytics & Business Intelligence Transforms AI outputs into insights that business users can understand and act on. ThoughtSpot, Tableau, Sigma

Cloud infrastructure

Cloud infrastructure is the foundation: the compute, storage, memory, and networking required to train, run, and scale AI systems. In practice, this means GPU and TPU access for model inference, elastic scaling to handle variable AI workloads, low-latency networking for real-time retrieval-augmented generation (RAG) pipelines, and database services that support both structured and vector-based data.

IBM defines the infrastructure layer as providing "the computational power, physical storage and tools necessary to develop, train and operate AI models effectively," with AI accelerators as the critical component that "dramatically reduce training time for complex models." Google Cloud's architecture blueprint adds a CI/CD layer on top of infrastructure as an explicit component, reflecting how much enterprise AI depends on automated deployment pipelines rather than manual release processes.

The three major public cloud platforms — Google Cloud, AWS, and Azure — account for the majority of enterprise AI infrastructure. IDC reported that cloud deployments account for 84% of AI infrastructure spending, with hyperscalers driving 87% of that. AWS holds roughly 30% of the global cloud market, Azure approximately 20%, and Google Cloud approximately 13%, Synergy Research Group found.

The infrastructure layer sets the constraints for every layer above it: latency budgets, throughput limits, cost per inference, and data residency requirements all flow from decisions made here.

Enterprise data platforms

The enterprise data platform layer centralizes, processes, and governs enterprise data so it can be used by AI systems. This covers data warehouses, data lakehouses, storage systems, and the pipelines that move data between them. It also includes the governance mechanisms that determine what data is accessible, to whom, and under what classification.

This layer must handle unstructured data alongside structured data. IDC estimates that up to 90% of enterprise data is unstructured. AI systems trained on or querying enterprise data need access to both types, and the data platform layer is where that unification happens.

Databricks and Snowflake are the dominant lakehouse platforms in this layer. Databricks' 2025 State of AI report found that the ratio of experimental to production AI models has narrowed from 16:1 in 2023 to 5:1 in early 2024, and that 31% of use cases reached full production in 2025 — double the prior year. The data platform's governance and accessibility directly influence that production rate. Models that can't access complete, governed, lineage-traced data can't reliably move from pilot to production.

Tools commonly used in this layer include Databricks, Snowflake, Amazon S3, SharePoint, Box, Fivetran, Airbyte, and dbt for pipeline orchestration.

Model garden

The model garden layer provides access to foundation models and includes the management layer for routing requests, monitoring usage, and evaluating performance across them. Most enterprise AI programs today use multiple foundation models for different tasks rather than a single model for everything.

According to a16z's third annual CIO survey, 44% of enterprises use Anthropic Claude in production, 78% use OpenAI GPT-4-series models, and Google Gemini has grown to 21% enterprise share from 7% in 2023. Meta's Llama models are the preferred open-source option for 46% of enterprises running self-hosted workloads. Menlo Ventures' 2025 data shows Anthropic capturing approximately 40% of enterprise model spending.

The model garden management layer handles three functions that become important at scale: routing requests to the appropriate model based on task type and cost, monitoring token usage and latency across models, and managing version upgrades without breaking production applications. Google Cloud's Model Garden within Vertex AI and Azure AI Foundry are examples of platforms that combine model access with this management infrastructure.

Foundation model providers represented in this layer include Google, OpenAI, Anthropic, Mistral, Meta, Google Vertex AI, Azure AI Foundry, AWS Bedrock, and Databricks Mosaic.

Agent orchestration

Agent orchestration defines how AI agents plan, reason, and take actions to complete tasks. This layer manages tool usage, multi-step workflows, agent memory, and inter-agent communication. It's where the application logic of AI systems lives.

Salesforce's 2025 enterprise architecture framework identifies this as one of the four new AI-specific layers enterprises need that traditional IT architecture didn't include. Their "Agentic Layer" description covers agent runtime environments, reasoning engines, memory and context stores, and interoperability protocols like Model Context Protocol (MCP) and agent-to-agent (A2A) communication. Oracle's reference architecture uses LangChain at this layer for orchestrating requests across their LLM routing module.

The open-source frameworks dominating this layer are LangGraph (LangChain's production-grade multi-agent framework, with 80,000-plus GitHub stars), LlamaIndex (strong for RAG-heavy pipelines), and CrewAI (role-based multi-agent automation, reporting 60% of Fortune 500 as customers). Microsoft's AutoGen and Semantic Kernel are merging into a unified agent framework, expected in general availability in early 2026. ThoughtSpot has also introduced agentic analytics capabilities at this layer for multi-step analysis workflows.

AI trust platform

The AI trust platform layer ensures the data feeding AI systems is accurate, classified, and governed, and that model behavior in production aligns with the policies the organization has committed to. It provides visibility into data quality across pipelines, tracks what AI agents are accessing and from which data sources, identifies sensitive data, and enforces access and usage policies.

IBM, Oracle, Salesforce, and Google Cloud all include an explicit governance layer in their enterprise AI architectures. IBM's watsonx.governance "supports organizations in implementing comprehensive AI lifecycle governance." Salesforce's architecture includes a Trust/Safety/Governance Hub as a component of its AI/ML layer. Oracle's OCI Enterprise AI Governance provides "enterprise-grade security, compliance, and access controls at every stage." Google Cloud uses Dataplex, IAM, and a security command center to cover this function.

Despite this consensus, the governance layer is the least mature in practice. Only 28% of enterprises have a formal AI governance framework in production, and only one in five companies has a mature governance model for autonomous agents, according to Deloitte's 2026 state of AI report. EU AI Act Article 10 requirements for high-risk AI systems apply from August 2026, creating a compliance timeline for organizations that haven't built this layer yet.

Part of what makes governance difficult to implement at enterprise scale is that data doesn't live in one place. The governance capabilities built into watsonx.governance, Salesforce's Trust Hub, or Oracle's OCI controls are designed around their own environments. An organization running AI on data spread across Snowflake, Databricks, legacy Oracle databases, SharePoint, and a dozen other sources needs governance that spans all of them, not just the portion that sits in one vendor's reach. Most platform-native tools stop at their own perimeter. The approach Bigeye takes is different: it connects to 50+ data sources across cloud warehouses, legacy databases, ETL platforms, and BI tools, so the observability, lineage, and governance controls apply to the full data estate an AI system actually touches. That scope matters because AI agents don't confine themselves to a single platform either. For a deeper look at how this works, Bigeye's AI Trust Platform page covers more.

This layer is where data observability, data lineage, sensitive data classification, and runtime policy enforcement converge into a single governance function rather than a set of disconnected controls.

Connectivity and APIs

The connectivity and API layer controls, secures, and routes interactions between applications, data, and AI models. It is the control plane for how AI systems communicate internally and externally. In traditional architectures, this was API management: authentication, rate limiting, and traffic routing. In AI-driven architectures, the scope has expanded significantly.

AI systems generate a different traffic profile than traditional applications. Models make API calls autonomously, orchestration layers coordinate across services, and agentic workflows can trigger thousands of downstream calls from a single user request. Gartner predicts that 70% of organizations building multi-model applications will use dedicated AI gateways by 2028.

The practical requirements at this layer include: routing requests across multiple model providers to balance cost and performance, enforcing token budget controls, maintaining a full audit trail of what the model accessed and when, detecting and blocking prompt injection, and preventing vendor lock-in by abstracting model-specific APIs behind a common interface. Without this layer, organizations typically encounter unpredictable inference costs, no consistent audit trail, and no visibility into the full scope of model activity.

Kong AI Gateway was the first dedicated AI gateway (launched February 2024) and is built for production-grade LLM traffic governance and MCP server support. Apigee (Google Cloud), Cloudflare AI Gateway, MuleSoft AI Gateway, and the open-source LiteLLM proxy are also popular options in this layer.

Analytics and business intelligence

The analytics and BI layer transforms AI outputs into insights that business users can understand and act on. It handles the last-mile problem: getting AI-generated results to the people who make decisions, without requiring them to write SQL or interpret raw model outputs.

The BI layer functions differently when it sits on top of AI rather than static data. Traditional BI queries a fixed schema. AI-native BI retrieves dynamic, model-generated outputs and needs to present them in a way that users trust enough to act on. Salesforce's architecture identifies this as the "Experience Layer," where the interface must support non-technical users accessing AI-driven results through conversational interfaces and natural language search.

Hex has grown from 9% of enterprise BI spend in 2023 to 25% in 2025 (Ramp Velocity data), reflecting a broader shift toward combined exploration and dashboarding environments with built-in AI SQL assistance. ThoughtSpot's approach, search-driven analytics combined with its agentic "Spotter" capability, lets business users ask multi-step analytical questions in plain language and get AI-generated answers without writing SQL or waiting on a data team, a very different interface model than traditional dashboard BI. Sigma Computing, Looker, Tableau Einstein, and Power BI Copilot all represent different approaches to adding AI to the BI interface.

How the seven layers connect

Each layer passes something to the one above it. Cloud infrastructure provides the compute and storage that data platforms run on. Data platforms provide the governed, accessible data that models train on and query against. The model garden provides the intelligence that orchestration frameworks direct. Agent orchestration defines the workflows that the trust platform monitors and governs. The connectivity layer manages the traffic flows that orchestration generates. The BI layer surfaces what the agents and models produce.

The dependencies flow in both directions. The AI trust platform needs lineage visibility that extends through the data platform layer to the model layer. The API layer's audit trail feeds back into governance reporting. The BI layer's output quality depends on what the model received from the data layer.

Understanding where each layer's responsibility begins and ends is what makes it possible to have the right conversations across teams, and allocate investment where it will actually have impact.

share with a colleague
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights
about the author

Rashmi Ramesh

Vice President of Product, Bigeye

Rashmi is a seasoned product executive with deep expertise in cybersecurity, data platforms, observability, and AI. She has led product strategy and execution across startups, scale-ups, and large enterprises, building and scaling both platforms and applications that translate complex systems into intuitive, high-impact products.

Prior to her current role, Rashmi held leadership positions at SentinelOne, Tanium, and Cisco, where she drove product initiatives spanning security and network and application observability. She has built products from the ground up and scaled them into widely adopted platforms, leading global teams across APAC, Israel, Europe, and the US through both early product definition and the complexities of growth and maturity.

Rashmi is also an advocate for the next generation of product leaders. She is the creator of the “Path to CPO” podcast and an “AI Product Management” series, where she explores leadership journeys and emerging trends in AI. She holds an MBA from The Wharton School.

about the author

Kyle Kirwan

Chief Product Officer, Bigeye

Kyle Kirwan is Co-Founder and Chief Strategy Officer of Bigeye, where he leads strategic partnerships, prototype development, and other zero-to-one projects.

Kyle’s journey to founding Bigeye began at Uber, where he helped scale the company’s experimentation and data platforms during a period of hypergrowth. As a product leader and former founding data scientist on Uber’s experimentation platform, he worked on standardizing metrics across thousands of A/B tests that shaped rider, driver, and pricing experiences for millions of users.

It was at Uber that Kyle met Egor Gryaznov. Shortly after Egor joined, he launched Uber’s first SQL bootcamp. Kyle signed up partly out of curiosity, and partly to make sure the new guy actually knew his stuff. They quickly bonded over giving each other increasingly complex SQL challenges to solve.

As Uber’s data ecosystem grew to hundreds of petabytes and thousands of weekly users, Kyle saw a pattern emerge: testing the data pipelines was valuable but didn’t scale. His team experimented with using machine learning models on the daily data profiles of tables in the data lake to see if anomalies could be identified without manually writing data quality checks. This technique would later be termed data observability.

In 2019, Kyle and Egor co-founded Bigeye to use the lessons learned at Uber to transform data management in the enterprise. Today Bigeye serves some of the world’s largest organizations and ensures their data is trustworthy, and that their enterprise AI initiatives are grounded in that trusted data.

about the author

Rashmi is a seasoned product executive with deep expertise in cybersecurity, data platforms, observability, and AI. She has led product strategy and execution across startups, scale-ups, and large enterprises, building and scaling both platforms and applications that translate complex systems into intuitive, high-impact products.

Prior to her current role, Rashmi held leadership positions at SentinelOne, Tanium, and Cisco, where she drove product initiatives spanning security and network and application observability. She has built products from the ground up and scaled them into widely adopted platforms, leading global teams across APAC, Israel, Europe, and the US through both early product definition and the complexities of growth and maturity.

Rashmi is also an advocate for the next generation of product leaders. She is the creator of the “Path to CPO” podcast and an “AI Product Management” series, where she explores leadership journeys and emerging trends in AI. She holds an MBA from The Wharton School.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Want the practical playbook?

Join us on April 16 for The AI Trust Summit, a one-day virtual summit focused on the production blockers that keep enterprise AI from scaling: reliability, permissions, auditability, data readiness, and governance.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.