Bigeye Staff
bigeye-staff
Thought leadership
-
April 15, 2026

AI in Financial Services: How the Largest Institutions Are Using it Right Now

7 min read

From fraud detection processing 143 billion transactions a year to agentic AI rewriting compliance workflows, here is how financial services is deploying AI at enterprise scale, and what enterprises rolling out their own initiatives need to know.

Bigeye Staff
Get Data Insights Delivered
Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.
Join The AI Trust Summit on April 16
A one-day virtual summit on the controls enterprise leaders need to scale AI where it counts.
Get the Best of Data Leadership
Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Every card transaction you make, every tap, swipe, and online checkout, is assessed by AI in under 50 milliseconds. Mastercard's Decision Intelligence Pro runs that evaluation against more than a trillion data points, 143 billion times a year. The model scores every transaction in real time against behavioral baselines that shift as fast as the fraud does, rather than checking against a static rules list.

Financial services AI is already in production at a scale most industries haven't reached. JPMorgan has 450 AI use cases in production and estimates $1.5 billion in cumulative business value from AI. HSBC processes up to 1.2 billion transactions monthly through AI-powered AML systems. The question this sector has moved past is whether AI works at scale. The open questions are about how to govern it, what happens when it fails, and what regulators now require.

What follows is a use-case-by-use-case walkthrough of how financial services is actually deploying AI, and what enterprises rolling out their own initiatives need to account for at each step.

Fraud detection is where financial services AI has the deepest production track record

Fraud detection was the first function where AI reached genuine enterprise scale in financial services, and it remains the most mature. The production results are specific enough to serve as benchmarks.

Mastercard's Decision Intelligence Pro delivers a 20% average improvement in fraud detection across issuers and reduces false positive alerts by more than 85%. 42% of issuers using the system saved more than $5 million in fraud losses over two years. The mechanism: every transaction is scored in real time against behavioral baselines built from a trillion data points, rather than checked against a static rules set. The model learns as fraud patterns shift rather than waiting for a human to update its rule list.

HSBC's AML monitoring system processes between 980 million and 1.2 billion transactions monthly, detects two to four times more suspicious activity than its predecessor, and has cut false positive alert rates by 60%. AML reviews that previously took weeks now take days. That compression matters operationally: analysts spend less time clearing alerts that shouldn't have fired and more time on patterns that warrant investigation.

Enterprise consideration: More than half of fraud now involves AI on the attacker side: synthetic identities, AI-generated phishing, and automated account takeover at scale. The institutions with strong AI defenses aren't just performing better against the same threat landscape. They're defending against a qualitatively different one. For enterprises deploying fraud AI, the data freshness question is critical: a model scoring transactions against stale behavioral baselines produces confidently wrong outputs. Real-time data lineage monitoring is what keeps the model's inputs as current as the attacks it's designed to catch.

Credit underwriting AI delivers faster decisions, with a bias exposure regulators are now enforcing

Credit underwriting was an early target for AI deployment because the inputs are relatively structured and the outcome variable (default risk) is well-defined. AI models can evaluate more signals than traditional scorecards: cash flow patterns, alternative data, and behavioral indicators, then return decisions faster. The operational upside is real.

The governance risk is equally real, and the Apple Card case made it visible. In 2019, David Heinemeier Hansson and Steve Wozniak publicly reported that Apple Card credit limits appeared to be set dramatically lower for women, in some cases 20 times lower than for spouses with identical or weaker financial profiles. The New York Department of Financial Services launched an investigation. Goldman Sachs, which underwrote the card, was ultimately unable to fully explain the algorithm's outputs.

Gender wasn't a direct input to the model. But correlated proxy variables (spending patterns, account histories, behavioral signals that track with gender without naming it) produced discriminatory outcomes anyway. The AI Incident Database documents this as the canonical case of proxy variable bias at production scale. It's why the CFPB now requires specific, individualized explanations for AI-driven credit denials. Generic adverse action codes are no longer sufficient when an algorithm made the decision. The bureau's position: "There are no exceptions to the federal consumer financial protection laws for new technologies."

Enterprise consideration: Explainability in credit AI is both a regulatory requirement and an operational signal. If your team can't trace why a model produced a specific output, you can't assess whether the training data that produced it was representative, or whether a data distribution shift has made its outputs less accurate over time. Both questions need an answer before a regulator asks them.

Document processing: what JPMorgan and Goldman built

NLP and large language models have found one of their most productive applications in financial services in document review: extracting structured data from loan agreements, regulatory filings, trade confirmations, and earnings reports at a speed and scale that manual review can't approach.

JPMorgan's COiN system processes 12,000 commercial loan documents in seconds, replacing work that previously consumed an estimated 360,000 attorney and loan officer hours per year. The value is consistency as much as speed: an AI system reviewing loan documents applies the same criteria to every document, without fatigue or variance across reviewers. Goldman Sachs cut pitchbook drafting time by approximately half after rolling out its GS AI Assistant firmwide.

Enterprise consideration: Document processing AI depends on the quality and structure of the documents it's trained on and the data it extracts into. Inconsistent source formatting, incomplete document sets, and poor extraction validation all degrade output quality in ways that aren't always visible until a downstream process fails. This is an area where data quality monitoring on extracted outputs as well as source documents is worth building into the deployment from the start.

The largest banks are running AI at workforce scale, not team scale

Internal AI deployment at the major institutions has moved from team-level pilots to organization-wide infrastructure. The distinction matters because it changes the governance question entirely.

JPMorgan's LLM Suite is used by 200,000 employees. Wells Fargo's Fargo assistant handled 245 million customer interactions in 2024. These aren't tools a data team manages in isolation. They're operational infrastructure spanning every function, every geography, and every level of the organization, and the governance of that data access is an enterprise-wide question.

Enterprise consideration: At workforce scale, the governance question shifts from "is this tool safe to deploy" to "can we monitor what it's doing across 200,000 users." That requires real-time visibility into what data the AI is accessing, for what purpose, and whether access patterns match what was approved at deployment. The 50+ integrations that span an enterprise's fragmented data estate are the prerequisite for governance that actually covers what's running.

Compliance and surveillance: AI is now inside the function that regulators watch most closely

AI is being deployed to automate the compliance function itself: monitoring trading activity for market manipulation, reviewing communications for violations, interpreting regulatory text, and generating filings. The function most scrutinized by regulators is now one of the heaviest AI users, which means the governance requirements compound.

The governance failure in this space has a canonical case. Knight Capital's $440 million loss in 45 minutes in 2012 came from a deployment error that activated legacy code with no kill switch, no real-time behavioral monitoring, and no escalation protocol. The firm lost 75% of its market value in two days and was acquired within weeks. SR 11-7 now requires all three of those missing controls for every AI and ML model a financial institution runs, including vendor-provided ones.

Enterprise consideration: SR 11-7's requirement for independent model validation means the institution bears full accountability for every model in production regardless of source. Purchased vendor models are subject to the same documentation and validation requirements as in-house builds. For enterprises deploying compliance AI, this means governance infrastructure needs to cover the full model inventory, not just the ones the internal team built.

Agentic AI is the wild west, and governance frameworks haven't fully caught up

Unlike a model that returns a single output for human review, an agent executes a sequence of decisions, accesses multiple data sources in a single task, and may produce outputs that aren't reviewed before they have downstream effects.

The governance gap here is specific. Most model risk frameworks, including SR 11-7, were written with point-in-time model outputs in mind. An agent that accesses your CRM, your contract database, and your customer records in a single workflow and makes decisions across all three requires the same lineage and auditability documentation as a credit model, applied to a more complex and dynamic data access pattern. Most governance frameworks in financial services haven't been updated to address this explicitly.

Enterprise consideration: Before deploying agentic AI in production, the data access scope needs to be defined and enforced at the infrastructure level, not just described in a design document. What data can this agent access, under what conditions, and how is that access logged? Those questions need answers that survive the first incident review.

What enterprises rolling out financial services AI need in place

The sector's documented deployments converge on a consistent finding: the production challenges in financial services AI are predominantly about data, not models. The IIF-EY Annual Survey found 96% of financial services firms cite data quality as their primary AI production challenge, with only 30% reporting full visibility into where their data resides.

The regulatory frameworks reinforce this. SR 11-7 requires documented training data provenance and independent validation. The CFPB requires specific explainability for AI-driven credit decisions. The EU AI Act, enforceable from August 2026, requires structured data governance documentation, human oversight mechanisms, and conformity assessments for credit scoring, fraud detection, and AML systems. All are classified as high-risk AI under the act. The technical requirements include training data provenance, quality validation procedures, and documented monitoring of model behavior in production.

What all three demand, in different jurisdictional language, is the same underlying capability: knowing what data trained the model, what it accessed, whether that data was accurate and properly governed, and being able to demonstrate all of it on demand. Citigroup's experience makes the cost of that gap concrete. $136 million in fines and $11.8 billion in remediation came from data infrastructure that couldn't satisfy examiners asking basic governance questions, not from any model producing a wrong output. That distinction matters: four years of remediation to answer questions that should have been answerable on day one.

If your team is building the data governance foundation for financial services AI, Bigeye's AI Trust Platform covers data quality, data lineage, sensitivity classification, and AI governance across the full enterprise data estate.

share with a colleague
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights
about the author

Bigeye Staff

Bigeye Staff represents the collective voice of the Bigeye team. Each article is informed by the expertise of individual contributors and strengthened through collaboration across our engineers, data experts, and product leaders, reflecting our shared mission to help teams build trust in their data.

about the author

about the author

Bigeye Staff represents the collective voice of the Bigeye team. Each article is informed by the expertise of individual contributors and strengthened through collaboration across our engineers, data experts, and product leaders, reflecting our shared mission to help teams build trust in their data.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Want the practical playbook?

Join us on April 16 for The AI Trust Summit, a one-day virtual summit focused on the production blockers that keep enterprise AI from scaling: reliability, permissions, auditability, data readiness, and governance.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.