AI Vendor Lock-In: How to Build Without Getting Trapped
Three years ago, a mid-sized fintech company decided to build their entire document processing pipeline on a single AI provider's API. The integration was deep: custom fine-tuned models, proprietary prompt formats, and storage systems tied to the vendor's format. When that vendor raised prices by 40% in early 2025, the team discovered the switching cost was essentially the cost of rebuilding the entire system from scratch. They paid the price hike.
This story isn't unusual. As AI becomes core infrastructure, the lock-in patterns that enterprise software buyers have dealt with for decades are showing up in AI procurement too, but with some new wrinkles specific to how AI systems are built.
Here's how to avoid that trap.
Why AI lock-in is different from traditional software lock-in
With traditional SaaS, lock-in usually comes from data formats, workflow integrations, and user habits. With AI, there are additional layers.
Model-specific behavior. A prompt that works well on GPT-4o may produce noticeably different results on Claude 3.5 Sonnet or Gemini 1.5 Pro. If you've tuned a production workflow around one model's quirks, your "switching cost" now includes re-evaluating and re-tuning prompts. The more optimization you've done, the more you're anchored.
Fine-tuning and embeddings. If you've fine-tuned a model with a vendor, that fine-tuned model lives on their infrastructure. You can sometimes export weights (with OpenAI, for example, you cannot export fine-tuned GPT-4o weights), but you often can't. And even when you can export, a fine-tuned model from one provider doesn't simply transfer to another.
Embedding incompatibility. If your vector database is populated with embeddings from OpenAI's text-embedding-3-large, you can't just swap in Cohere or Voyage AI embeddings without regenerating your entire vector store. Different embedding models produce vectors in different spaces; you can't mix them.
Proprietary API patterns. Tool call formats, function schemas, multi-turn conversation structures, and streaming protocols differ enough between providers that code written for one rarely just works on another.
The abstraction layer approach
The most practical mitigation for most teams is building behind an abstraction layer that lets you swap providers without rewriting application code.
LiteLLM
LiteLLM is a Python library (with a proxy server option) that presents a unified OpenAI-compatible interface for 100+ models. You write code to the OpenAI SDK format, and LiteLLM handles translation to whatever provider is actually running the request.
from litellm import completion
# Switch between providers by changing this one string
response = completion(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Summarize this contract."}]
)
Change model="anthropic/claude-3-5-sonnet-20241022" to model="gpt-4o" and the same code works. LiteLLM handles retries, fallbacks, and logging. The proxy server adds load balancing and budget controls.
This is probably the fastest path to provider independence for teams already writing Python.
LangChain / LangGraph
LangChain's chat model abstractions have a similar philosophy. You write to a ChatOpenAI or ChatAnthropic interface, and the underlying model is swappable without changing the chain logic. LangChain is more opinionated and adds overhead, but if you're already using it for orchestration, you get model portability as part of the package.
For complex multi-step agents, LangGraph adds stateful workflow management on top, still with the model-swappable architecture underneath.
Portkey
Portkey takes the proxy approach further, adding production features: semantic caching, detailed observability, automatic fallbacks, and a UI for managing routing rules. If your load is large enough to make caching meaningful (semantic cache hit rates of 30-40% aren't uncommon for repetitive enterprise workloads), Portkey's value goes beyond just abstraction.
The routing rules are useful for lock-in avoidance specifically. You can write rules like "send requests requiring long context to Gemini, send everything else to Claude, fall back to GPT-4o if either fails." This keeps you from being dependent on any single provider's availability.
Data portability: what to demand from vendors
Model access is one thing. Your data is another, and often more important.
When evaluating AI vendors, ask these specific questions before signing:
Can I export my fine-tuned model weights? Most vendors will say no for their frontier models. This is a significant lock-in factor if fine-tuning is part of your plan. Open-source models fine-tuned and self-hosted give you full portability, at the cost of operational complexity.
What format are my fine-tuning datasets stored in? Even if you can't export model weights, you should always be able to export your training data in a standard format (JSONL is typical). Confirm this contractually.
How do I export conversation history and logs? If you're storing user interactions via the vendor's API, understand what export mechanisms exist. API access to your own logs isn't the same as a bulk export capability.
Are embeddings I generate portable? No, technically, because embedding spaces are model-specific. But if you keep your source documents and maintain a reproducible pipeline, you can regenerate embeddings with a new provider. The question is whether your data preparation pipeline is vendor-agnostic.
What's the data retention and deletion policy? If you send data through a vendor API, understand what they retain, for how long, and how you can verify deletion.
Multi-model architecture: routing instead of choosing
The most resilient approach isn't picking one provider and protecting against future switching costs. It's designing for multiple providers from day one.
Different models are genuinely better at different tasks. GPT-4o is strong at following complex JSON schemas. Claude 3.5 is known for long-document analysis and nuanced instruction following. Gemini 1.5 Pro has a massive context window useful for certain retrieval tasks. Building a routing layer that sends each task type to the best model for that task, rather than making one model do everything, produces better results and incidentally eliminates single-provider dependency.
A practical starting point: categorize your AI workloads into 3-4 task types. Map each to a primary and fallback model. Build the routing logic once. You've now removed vendor risk while also improving quality.
This is more work upfront. It pays off in two ways: you're not held hostage to one provider's pricing or reliability, and your outputs improve because tasks go to models suited for them.
Contract clauses that actually matter
If you're negotiating an enterprise AI contract, these provisions are worth fighting for:
Price stability clauses. API pricing can change. A clause locking prices for 12-24 months with a cap on increases (say, no more than CPI + 5% per year) gives your budget process predictability.
Data portability guarantees. Write in specific language about what you can export, in what format, within what timeframe. Vague language about "reasonable cooperation" isn't enough.
Exit assistance. Ask for a transition period (90-180 days) during which the vendor provides migration support if you decide to leave. This is more negotiable than you might expect with smaller vendors.
Termination for convenience. Make sure you can exit without cause. Some enterprise AI contracts include penalty clauses for early termination; remove them if possible.
Model version continuity. Understand what happens when the model you've integrated is deprecated. How much notice do you get? Is there a migration path?
The open-source hedge
One underused option is open-source models as a partial hedge. Models like Llama 3 70B, Mistral Large, and Mixtral are capable enough for many production tasks and can be self-hosted on cloud infrastructure you control.
Self-hosting isn't free: you pay for GPU infrastructure, and you take on model maintenance. But for specific high-volume, privacy-sensitive, or cost-intensive workloads, self-hosting a capable open-source model eliminates vendor dependency entirely for that task.
A common architecture: use a proprietary frontier model for complex reasoning tasks where quality matters most, and route high-volume, simpler tasks (classification, extraction, summarization of short text) to a self-hosted open-source model. The cost savings on the high-volume tasks often offset the infrastructure overhead.
A checklist before you build
Before you commit your architecture to any provider:
- Are you writing to a standard interface (OpenAI-compatible API) or a proprietary SDK?
- Do you have a fallback model configured for production workloads?
- Can you export your training data and logs independently?
- Have you stress-tested what a 50% price increase would cost you at your expected scale?
- Does your contract have a price stability clause and exit assistance?
- Is at least one workload running on a different provider, so you have an active second relationship?
The teams that avoid lock-in aren't doing anything exotic. They're using the same abstraction tools that exist for any software dependency, and they're asking the same procurement questions that any experienced software buyer asks. The difference is doing it before the system is built, not after you're already paying the 40% premium.