How to track AI agent costs and token usage
TL;DR: AI agents consume 5-30x more tokens per task than standard chatbots, and average enterprise AI spending grew 483% from 2024 to 2026. For most enterprise AI programs, the challenge is attribution: knowing which agent, team, workflow, or conversation is driving spend. Per-token prices have fallen roughly 80% since mid-2023; total enterprise AI spending grew 483% from 2024 to 2026 anyway, because agents consume tokens in fundamentally different ways than chatbots do. Native platform dashboards show total usage but offer limited per-agent, per-workflow attribution. OpenAI provides two attribution dimensions. Anthropic's Enterprise Analytics API, launched in March 2026, adds per-user attribution but doesn't reach per-request granularity. Snowflake's AI observability tools can track Cortex function costs by user and model, but agent-level attribution across a multi-agent environment requires additional tooling. FinOps for AI is now the top priority for 98% of enterprise FinOps practitioners. Getting it right requires a visibility layer that attributes cost to the specific agent and user behind each interaction, not just to the platform. This article covers why AI agent costs are hard to predict, what adequate attribution looks like, and how the Agent Trust Hub provides per-agent, per-user, and per-conversation cost visibility.

.png)
Get the Best of Data Leadership
Stay Informed
Get Data Insights Delivered
AI agent cost tracking is the practice of attributing AI infrastructure spending to the specific agents, users, workflows, and conversations that generated it. Most enterprise AI programs can tell you their total monthly token spend. Far fewer can tell you which agent drove a cost spike, which team is consuming disproportionately, or whether a given workflow is generating spend in proportion to the business value it delivers.
That attribution problem is new. Traditional cloud cost management dealt with compute and storage resources that scaled predictably with workload. AI agent costs scale with autonomy: the more steps an agent takes, the more context it accumulates, and the more it gets called in parallel across a workforce, the faster token consumption compounds in ways that standard infrastructure cost models don't anticipate.
Why AI agent costs are hard to predict
Two structural features of agentic AI create cost unpredictability that doesn't exist in simpler AI use cases.
The agentic multiplier. Gartner's March 2026 analysis found that agentic AI models require 5-30x more tokens per task than standard chatbots. The reason is architectural: a reasoning agent doesn't just send a prompt and receive a completion. It sends the full accumulated context, including system prompt and conversation history, to the model at every step. By step 20 of a multi-step task, the agent is paying for the same context 20 times over. Goldman Sachs projects a 24-fold increase in enterprise token consumption by 2030 driven largely by agentic workflows.
Parallel deployment across a workforce. An individual running one agent for one task generates manageable spend. An organization where 84% of developers are using agentic coding tools daily generates a very different cost profile. Uber disclosed in May 2026 that its AI budget for the full year was consumed in four months, with the company's COO publicly questioning whether the link between agent spend and consumer-facing output was visible enough to justify the rate. Microsoft reported similar pressures in the same period, announcing changes to how it allocates agentic coding tool access to manage costs. Both situations trace back to the same issue: agent usage scaled faster than the attribution infrastructure to understand where the spend was going.
What native platform tools expose
Every major AI platform has added cost monitoring capabilities in the past year. It's worth understanding what each one covers and where it stops.
OpenAI. The usage dashboard shows total token consumption, monthly spend projection, and model usage breakdown. Attribution is available on two dimensions: user and project. Custom tags, environment labels, or workflow identifiers aren't supported. When a cost spike appears in the dashboard, identifying whether it came from a specific feature, agent, or team requires querying application-level logs outside of OpenAI's tooling.
Anthropic. The Admin Usage and Cost API supports basic token and cost tracking by workspace and API key. The Enterprise Analytics API, launched March 2026, adds per-user attribution: named user consumption, individual token and cost data, engagement patterns including Claude Code sessions and conversation counts. It provides 90 days of history (from January 1, 2026) across nine endpoints. Limitations: no per-request granularity, a 3-day delay on engagement data, and access is enterprise-tier only.
Snowflake Cortex. The CORTEX_AI_FUNCTIONS_USAGE_HISTORY view, generally available since March 2026, tracks Cortex AI function consumption by function, model, user, role, and warehouse. This is meaningful for tracking aggregate Cortex spend but doesn't provide agent-level attribution across a multi-agent environment where agents are invoked by different users running different workflows. Practitioners have reported significant single-query costs. One documented case involved a Cortex AI query generating a $5,000 bill with no prior warning because cost visibility at the individual query level required explicit instrumentation.
Databricks. Unity AI Gateway is the most mature native cost attribution layer among major platforms. It tracks by identity (user or service principal), endpoint tags, custom request tags, model, and provider. Budget controls per user, per use case, per workspace, and per account are available with hard limits and automated throttling. Every request is logged to Unity Catalog system tables with actual dollar costs. Some features of Unity AI Gateway are still in beta.
The shared limitation across all of these is cross-provider attribution. An agent that calls Anthropic, OpenAI, and a Snowflake Cortex function in a single workflow produces costs split across three separate billing systems with no unified view of the combined cost of that workflow.
What good AI agent cost attribution looks like
Organizations with mature AI cost programs attribute consumption at four levels.
By agent. Each deployed agent has an identity, and token consumption is tracked against that identity continuously. When a cost spike occurs, you can identify which agent is responsible and whether its behavior has changed. An agent that enters a reasoning loop, over-queries a data source, or expands its task scope beyond the original intent is visible at the agent level before it manifests as a line item on a monthly bill.
By user. Different users interact with agents differently. Power users who run long multi-turn conversations consume more than users running discrete queries. Attribution at the user level supports chargeback to business units and identifies which teams are consuming disproportionately relative to their use cases.
By workflow. A workflow is a defined sequence of agent tasks: generating a weekly report, processing a batch of customer inquiries, running a data analysis pipeline. Workflow-level attribution makes it possible to evaluate ROI at the business process level (whether the value a workflow delivers justifies what it costs to run) rather than aggregating costs into a single platform spend figure that's impossible to connect to outcomes.
By conversation. Individual conversation-level attribution supports debugging, anomaly detection, and per-interaction cost auditing. When a specific interaction generates unusual cost, conversation-level logs identify what the agent was asked, what it did, and why it consumed the resources it did.
FinOps for AI: the emerging discipline
The FinOps Foundation's State of FinOps 2026 report surveyed 1,192 practitioners representing $83 billion in annual cloud spend. Ninety-eight percent now manage AI spend as part of their FinOps responsibilities, up from 31% two years earlier. FinOps for AI is the top forward-looking priority in the survey.
The category has expanded beyond cloud cost management into a broader function covering AI spend attribution, ROI forecasting, and chargeback governance. The core FinOps for AI challenge is connecting token consumption to business outcomes: teams need to demonstrate that what they're spending on AI agents is generating value proportional to the spend. Only 39% of organizations can attribute any measurable business impact to AI spend (Deloitte, 2026). The attribution infrastructure is what makes that connection possible.
Platform-level cost management tools (CloudZero, Finout, Apptio Cloudability) have added AI modules that ingest OpenAI, Anthropic, and cloud AI spending into unified dashboards. LLM-native observability tools (Langfuse, Helicone, Portkey) provide per-request token tracking and cost attribution for teams that instrument their applications directly. The emerging requirement for agentic AI environments is a layer above these: attribution by agent identity, across providers and platforms, tied to the specific conversations and workflows those agents are executing.
How the Agent Trust Hub tracks costs by agent, user, workflow, and conversation
Bigeye's Agent Trust Hub logs every agent interaction with token consumption attributed to the specific agent and user. For organizations running Snowflake Intelligence, Databricks Genie, or Claude Code, this means token usage is visible in the agent registry alongside the other signals the Hub tracks: which tables the agent accessed, what queries it ran, and what data trust conditions applied at the time of the interaction.
The visibility the Agent Trust Hub provides is different from what platform billing dashboards offer. Attribution goes to the agent identity, not just to the platform user or API key. Conversations are linked to the agents that ran them and the data those agents accessed. Usage patterns by agent, by user, and by workflow are visible in a single view across all connected agents, without requiring separate queries into each platform's native cost monitoring tools.
This attribution layer sits alongside the data trust signals that the Agent Trust Hub provides through Data Classification, Data Lineage, and AI Guardian. Cost visibility and access control share the same registry: teams can see what an agent spent and whether the data it was spending that budget to access was trustworthy and authorized, in the same place.
Monitoring
Schema change detection
Lineage monitoring
Why do AI agents cost so much more than chatbots?
The primary driver is context accumulation. A chatbot sends a prompt and receives a completion. An AI agent planning a multi-step task sends its full accumulated context, including the system prompt, conversation history, and all intermediate results, to the language model at every step. By step 20, the agent has paid for the original context 20 times. Gartner estimates agentic models require 5-30x more tokens per task than chatbots as a result. Parallel deployment multiplies this: an organization where hundreds of employees are running agentic coding sessions or agentic analytics workflows simultaneously compounds that multiplier across the workforce.
What is the inference cost paradox?
Token prices fell roughly 80% between mid-2023 and early 2026. Enterprise AI spending rose roughly 320% in the same period. The paradox resolves when you account for the agentic multiplier: organizations didn't buy more tokens at a lower price; they deployed agents that each consume far more tokens per task, and they deployed many more agents across many more workflows. Lower per-token cost with dramatically higher token consumption per task produces dramatically higher total bills. The organizations controlling costs are the ones routing workloads to appropriately sized models rather than sending every request to frontier models.
What's the difference between LLM cost monitoring and AI agent cost tracking?
LLM cost monitoring tracks spending at the API layer: how many tokens were consumed per request, what the cost per token was, and what the total spend is for a given project or workspace. It answers "what did we spend?" AI agent cost tracking attributes that spending to the specific agents, users, workflows, and conversations behind it. It answers "what spent it and why?" The second question is what enterprise FinOps for AI programs require: without it, cost optimization is guesswork, chargeback is impossible to administer fairly, and runaway agent behavior is invisible until it appears as an unexpected monthly bill.
How should organizations set AI agent cost budgets?
Practical cost budgeting for AI agents requires baselines before it can produce useful limits. Start by deploying attribution infrastructure to understand what each agent type actually costs per interaction and per workflow. That baseline makes it possible to set meaningful per-team, per-agent, and per-workflow token budgets, and to detect when an agent is consuming significantly above its baseline before costs accumulate to a level that warrants investigation. Organizations without baselines who set limits upfront tend to set them either too high to catch runaway spend or too low to accommodate normal agent behavior. Running attributed cost monitoring for four to six weeks before hardening budget limits produces significantly better outcomes.