The CISO's Guide to Not Getting Blindsided by EU GPAI Rules: Part I, Foundations
TL;DR: The EU AI Act creates compliance requirements for organizations using General-Purpose AI models like GPT-4. Most companies think their AI vendor handles compliance, but the rules actually require detailed documentation of your own data pipelines, copyright compliance, and operational processes. This article breaks down the gap between what executives think compliance means versus what's actually required.
.jpeg)

Get the Best of Data Leadership
Stay Informed
Get Data Insights Delivered
Your CEO just forwarded you an article about the EU AI Act with one line: "Are we ready for this?"
Here's what they're really asking, and why your answer matters more than you think. They're not asking about policies or documentation. They're asking if the AI initiatives they've been touting to investors are about to become liability landmines. They're asking if that competitive advantage they've been claiming is about to evaporate under regulatory scrutiny. And they're asking if you understand the difference between having AI and having compliant AI.
Yes, August 2nd has already passed. The European Union's Artificial Intelligence Act - the world's first comprehensive AI regulations - are technically already in effect. But if you think that means the scramble is over, you're missing what's actually happening. The real work is just beginning because now organizations are discovering what these requirements actually mean in practice. The gap between the headline understanding and the technical reality is massive.
The disconnect I'm seeing is stark. Executives think AI compliance means their vendor has the right certifications and their legal team has updated the privacy policy. Meanwhile, the technical requirements demand:
- Column-level data lineage tracking across hybrid cloud and on-premise systems,
- Documentation of every transformation in your data pipeline,
- Proof that no copyrighted material is inadvertently flowing through your prompts.
I'm being asked questions about this which indicate a fundamental lack of understanding, making those questions hard to answer. Can't we just get a certificate? Doesn't our AI vendor handle this? How much will the compliance software cost? These questions assume AI compliance is a checkbox exercise. It's not. It's a fundamental rethinking of how data flows through your organization.
Understanding the EU AI Act’s Code of Practice
Let's break down what these regulations actually say.
What the Act Says (Official Language)
To help contextualize how compliance will evolve under the EU Artificial Intelligence Act, the European Commission published an official Code of Practice outlining voluntary commitments for General Purpose AI providers. Below is a summary excerpted directly from the EU’s materials.
- Purpose: The AI Act Code of Practice (introduced in Article 56) is a set of guidelines for compliance with the AI Act. It is a crucial tool for ensuring compliance with the EU AI Act obligations, especially in the interim period between when General Purpose AI (GPAI) model provider obligations came into effect (August 2025) and the adoption of standards (August 2027 or later). Though they are not legally binding, GPAI model providers can adhere to the Code of Practice to demonstrate compliance with GPAI model provider obligations until European standards come into effect.
- Process: The Code was developed through a multi-stakeholder process, involving academic and independent experts, GPAI model providers, downstream deployers, members of civil society, and more.
- Content: The Code has three chapters. The first two, Transparency and Copyright, apply to all GPAI model providers. The third, Safety and Security chapter, only applies to providers of GPAI models with systemic risk. For each chapter, the Code lays down certain commitments and corresponding measures for how providers can live up to the commitments.
- Implementation: The Commission and the EU AI Board have confirmed that the GPAI Code is an adequate voluntary tool for providers of GPAI models to demonstrate compliance with the AI Act. Namely, the Code adequately covers the obligations provided for in Articles 53 and 55 of the AI Act relevant to providers of GPAI models and GPAI models with systemic risk.
You can read the full text and follow updates on the AI Act Code of Practice directly from the European Commission’s website and the EU AI Office. These resources include the official Code, implementation guidance, and information about upcoming standardization timelines through 2027.
What This Means in Plain English
Purpose: This is a playbook for AI companies to show they're following the rules. It's technically "voluntary," but it's the only official guidance available. The rules are live now, but official standards won't exist until 2027, so this Code is what everyone's working from.
Process: The EU didn't write these rules in isolation, they brought together AI companies, researchers, advocacy groups, and regulators to hash out what compliance should look like.
Content: There are three main areas AI providers need to address: Transparency (documenting how your AI works) and Copyright (proving you're not using stolen content). Only the biggest AI models also need to meet Safety and Security requirements.
Implementation: The EU has officially given this Code their blessing, as the way to prove you're compliant with the AI Act's requirements for General-Purpose AI models.
The bulk of the rules are directed at AI providers, but they create compliance obligations that flow down to every organization using those AI systems. Here's why:
You're not just a customer - you're a "deployer" under the regulations. When you use GPT-4 or Claude in your business operations, you become what the EU AI Act calls a "deployer" of the AI system. This means you have your own set of compliance obligations that are separate from and additional to what your AI provider must do.
Your data creates your liability. While OpenAI or Anthropic might be compliant with the model requirements, the regulations also apply to how you use the model with your data. The Copyright requirements, for example, don't just apply to the training data the provider used - they apply to the operational data you're sending through the model in your prompts and workflows.
The "general-purpose" nature creates shared responsibility. Because these are foundation models designed to be adapted for countless use cases, the regulations acknowledge that compliance can't be fully handled by the provider alone. Your specific implementation, your data sources, your use case, and your operational context all create compliance obligations that only you can address.
Think of it like AWS compliance. Amazon can be SOC 2 compliant for their infrastructure, but that doesn't automatically make your application compliant. You still need to implement proper security controls, data handling procedures, and monitoring for your specific use of their services. The EU AI Act works similarly - your AI provider handles their piece, but you're responsible for your operational piece.
The regulations essentially say: "We can't regulate every possible use of these powerful AI systems, so we'll regulate both the providers who build them AND the organizations who deploy them in business-critical ways."
So, what does this mean for your organization?
The Supply Chain Problem That Nobody's Mapping
When you integrate a GPAI model into your operations, you're accountable for understanding how your data flows through it and what happens to that data. Consider a typical enterprise implementation where GPT-4 gets integrated into customer service workflows. The assumption is that OpenAI's compliance is sufficient. But map the actual data flow:
- Customer data from the CRM (containing PII)
- Transaction history from the data warehouse
- Support ticket text from Zendesk
- Product documentation from Confluence
- All flowing through custom middleware before hitting the API
Each of these touchpoints creates compliance obligations under the GPAI rules. Organizations need to document not just that they use GPT-4, but how data moves through their entire pipeline, what transformations occur, and whether any copyrighted or personal data is being inadvertently included in prompts.
The real wake-up call? When your middleware vendor - often a small startup - has no documentation about data handling, no audit logs, and no ability to prove data isn't being retained. Under the EU AI Act, that vendor's gap becomes your liability.
Data Lineage Requirements That Span Beyond Model Boundaries
The technical documentation requirements for GPAI also explicitly require you to trace data from origin to output. This isn't just, "We pull from database X" - it's column-level precision about transformations, filtering, and augmentation.
Consider what this actually means in practice. You're using an AI model for revenue forecasting. The regulations require you to document:
- Which specific data fields feed into the model
- How those fields are transformed (aggregations, normalizations, encodings)
- What filtering logic is applied and why
- Whether any synthetic data augmentation occurs
- How the model's outputs flow into downstream systems
- What decisions are made based on those outputs
Think about organizations with 40+ data sources feeding their AI systems, crossing between modern cloud warehouses and legacy on-premises databases. A retail company might discover their AI-powered demand forecasting pulls from a 15-year-old inventory system that includes supplier data with unclear copyright status. Without documentation of how that data transforms through three different ETL tools before reaching the model, they're non-compliant.
The GPAI rules don't care that the model itself is compliant if you can't prove your data pipeline is clean.
Why "We Use Claude/GPT" Isn't a Compliance Strategy
This is the misconception that needs addressing: buying enterprise AI doesn't equal compliance. The model provider's compliance covers their infrastructure and base model training. Your compliance obligations cover:
- Your specific implementation: How you've configured the model, what guardrails you've implemented, how you handle responses
- Your data practices: What data you're sending to the model, how you're ensuring no copyright infringement in your prompts, how you're protecting personal information
- Your use case documentation: Proving your AI use doesn't fall under prohibited categories, documenting risk assessments for your specific application
- Your output handling: How model outputs are stored, processed, and used in decision-making
Consider a healthcare company using Claude for medical record summarization with enterprise-grade security. The compliance gaps that typically emerge:
- No documentation of prompt engineering practices
- No monitoring of what PII is being sent in prompts
- No audit trail of model responses
- No process for ensuring summaries aren't introducing clinical errors
- No way to prove they aren't sending copyrighted medical literature in their context windows
The model provider's compliance certificate becomes worthless for these specific obligations.
The Technical Reality Check
Let me be clear about what organizations are actually facing. GPAI compliance requires:
- Tracking every data source that touches AI systems, including third-party APIs, internal databases, and user inputs
- Documenting every transformation between source and model
- Proving respect for copyright in training data AND operational data
- Maintaining audit logs that can demonstrate compliance retroactively
- Monitoring for drift that could indicate compliance issues
The reality is stark: most think they're buying AI, but they're actually building a data governance program with AI attached.
This isn't vendor FUD - it's the technical reality of the regulations. Your AI vendor's compliance is table stakes. Your compliance is about proving everything that happens in your environment, with your data, for your use cases. And that's a fundamentally different challenge than most organizations are prepared for.
The Grace Period Trap
The regulations say models already on the market before August 2, 2025 don't need full compliance until August 2, 2027. Two years to get your house in order. But this grace period is creating dangerous complacency.
Here's the trap: every day you operate without proper documentation, you're accumulating technical debt that becomes exponentially harder to address retroactively. You can't recreate historical data lineage six months from now. You can't retroactively prove what copyrighted content wasn't in your training data. When 2027 arrives, you'll need to prove compliance for the entire grace period - how do you document two years of data flows you never tracked?
Consider this scenario playing out across the industry: a company discovers their AI vendor has no data lineage documentation. No record of training data sources. No audit trail of customer input processing. The vendor says they'll work on compliance before 2027. Guess who's responsible for that gap today? Under the GPAI rules, it's the organization using the model. That grace period isn't protecting you from liability - it's just delaying the reckoning.
The Enforcement Timeline Reality
Enforcement starts August 2, 2026 meaning regulators will have a full year to investigate before the grace period ends. They're not waiting until 2027 to build cases.
Based on GDPR patterns, expect regulators to prioritize companies with publicized AI incidents, and organizations in regulated industries where AI impacts are sensitive.
The test case risk for highly visible companies is real. If you've made AI central to your investor story or you're in a consumer-facing industry with high AI adoption, you're a candidate. Regulators need precedent-setting cases. They'll choose companies where violations are clear and enforcement will be visible. Organizations treating the grace period as a free pass? They're painting targets on themselves.
What Compliance Should Look Like
In a perfect world, GPAI compliance would mean complete data lineage from source to model output - every field, every transformation, every decision point documented and traceable. You'd know exactly which database table contributed to today's revenue forecast, how that data was cleaned, what filters were applied, and how it influenced the model's output. This isn't just documentation; it's active, queryable, real-time visibility.
Real-time monitoring of data quality metrics would catch issues before they compound. Schema changes, data drift, unusual patterns - all flagged immediately. Not after your model has been making bad predictions for a week. Your monitoring would track freshness, completeness, accuracy, and consistency across every data source feeding your AI systems. Automated alerts would fire when copyright-risk content appears in training data or when PII shows up where it shouldn't.
Automated documentation would capture every transformation without human intervention. Each aggregation, normalization, and encoding step would be logged with its business justification. Changes to data pipelines would trigger automatic updates to your compliance documentation. No more discovering that someone modified an ETL job six months ago without telling anyone.
Perfect visibility into third-party data usage would mean your vendors provide the same level of transparency you maintain internally. You'd see how they process your data, what they retain, how they transform it. Their black boxes would become glass boxes, with audit logs you can access on demand. Every API call would be logged, every data transfer tracked, every third-party model interaction documented.
What You're Actually Dealing With
But here's reality: your most critical data probably lives in legacy systems with no API access. That 15-year-old ERP system that runs your supply chain? It has a nightly batch export to CSV if you're lucky. The mainframe handling transaction processing? It speaks COBOL and nothing else. These systems weren't built for observability - they were built to run transactions, and they do that well. But they're compliance black holes.
You're juggling multiple data sources across cloud and on-prem environments. Your customer data is in Salesforce, your financial data is in SAP, your product data is in a PostgreSQL database that someone set up five years ago, and your real-time data is streaming through Kafka. Each system has its own authentication, its own access patterns, its own quirks. Creating unified lineage across this chaos isn't just technically challenging - it requires political capital to get different teams to cooperate.
Vendor black boxes are everywhere. Your AI vendor provides a confidence score but won't explain how it's calculated. Your data enrichment service adds fields but won't detail their sources. Your middleware processes data but provides no visibility into transformations. You're accountable for their compliance, but they give you nothing to work with.
Manual processes compound the problem. Data scientists copy files to their laptops for analysis. Business analysts download CSVs for quick checks that become permanent workflows. Someone manually uploads forecasts every Monday. These human touchpoints break lineage, introduce errors, and create compliance nightmares. They don't scale, but they're so embedded in how work gets done that nobody wants to address them.
The Pragmatic Path Forward
Start with critical AI-dependent business processes - the ones that would hurt most if regulators came knocking. Maybe it's your customer service chatbot, your demand forecasting, or your fraud detection. Pick one. Map it completely. This becomes your template for everything else.
Don't try to boil the ocean. Map dependencies you can actually control. Focus on data flows within your direct influence. Yes, you need to document vendor dependencies, but start where you have leverage. Build your monitoring and documentation capabilities on systems you own before wrestling with third-party challenges. Create a solid foundation that you can extend rather than a house of cards that collapses under its own complexity.
Build monitoring where it matters most. You don't need to monitor every field in every database. Focus on data feeding AI systems, especially anything touching personal or potentially copyrighted information. Monitor transformation points where data quality issues compound. Track schema changes that could break downstream AI systems. Be strategic about where you invest your monitoring resources.
Document everything, even the gaps. This is crucial: regulators respect organizations that understand their limitations.
- Document what you can't monitor and why.
- Note where vendor transparency is lacking.
- Record where manual processes exist and your plans to address them.
- Create a risk register of compliance gaps with timelines for resolution.
Showing you understand your compliance gaps and are actively working to address them is far better than pretending they don't exist. This documentation becomes your roadmap and your defense. It proves you're taking compliance seriously even when perfect compliance isn't achievable.
About the Author: Joan Pepin is Bigeye's Chief Information Security Officer, bringing 27 years of cybersecurity leadership to our team. She's held CISO and CSO roles at companies like Sumo Logic, Auth0, and Nike Digital, and co-founded her own security startup. As AI becomes central to data operations, she's been at the forefront of developing risk frameworks that protect businesses while enabling innovation.
Monitoring
Schema change detection
Lineage monitoring