3 min read

The compliance problem with LLMs

Our enterprise clients can't send data to OpenAI. For several of them, it's a legal constraint. Here's how we solved it.

Our enterprise clients can't send data to OpenAI. This is not a preference. For several of them, it's a legal constraint.

The specific friction

GDPR is the primary issue. When you send data containing personal information to an external API, you're transferring that data to a data processor. That transfer requires a Data Processing Agreement with the processor, and the processor has to meet the compliance requirements of your data handling obligations.

OpenAI offers a DPA. The issue isn't the existence of the DPA. The issue is the data residency question. Several of our clients have contractual commitments to their own enterprise clients that data doesn't leave the EU. The OpenAI API routes through US infrastructure. That's a problem for those clients regardless of what the DPA says.

The second issue is less formal but equally blocking: enterprise legal and procurement teams that are not yet comfortable with external AI providers processing their operational data. Not a regulatory constraint. A risk management one. Until the client's legal team is comfortable with the vendor's security posture, data handling practices, and audit trail, the integration doesn't move forward. That process takes time.

What this means in practice

The AI-powered features we're building for client-facing products can't use OpenAI or any external LLM API in several of our deployment contexts. The anomaly explanation feature I described in February: if it sends authentication event data to an external API, it's blocked for some clients.

We're evaluating two paths.

The first is on-premise or private cloud deployment of open-source models. LLaMA and its derivatives are now capable enough for the summarization and explanation tasks we need. Running them on our own infrastructure keeps the data within the deployment environment. The operational cost is higher. The capability is lower than GPT-4. For the tasks we're using it for, the capability gap is acceptable.

The second is waiting for the enterprise LLM providers to resolve the compliance questions more cleanly. Azure OpenAI Service runs on Microsoft's infrastructure with EU region options and a compliance posture that some of our clients' legal teams are more comfortable with. This path requires less infrastructure work on our end and produces better model quality. The timeline depends on our clients' procurement processes.

The broader pattern

This is the gap between the AI landscape as it appears to developers building in permissive environments and the AI landscape as it appears to technical teams inside regulated industries.

Healthcare, finance, logistics for regulated goods, anything touching European enterprise with significant DPA exposure: these contexts have constraints that make the simple API integration model unusable. The tooling that assumes you can route data to an external API doesn't apply.

The teams building in these spaces aren't behind the curve. They're solving a different problem. The AI capability story for regulated industries runs about 12-18 months behind the general market, not because the technology isn't there but because the compliance infrastructure isn't.

What I'd tell other founders building for enterprise

Map the data handling requirements of your target client before you commit to an AI architecture. If the client can't send certain categories of data outside their infrastructure, an architecture that depends on external API calls won't work for them. Find this out in the first conversation, not after you've built the integration.

The clients who are most valuable to have are often the ones with the most constraints. Building for the constraints is a moat, not a limitation.

With gusto, Fatih.