AI Tools Compared by Privacy 2026: Data Retention, Opt-Out, On-Prem
Privacy in AI tools is one of those topics where the marketing language and the actual policy often diverge significantly. Every provider claims to protect your data. What they mean by that varies a lot once you read the actual terms.
This covers the real policies from the major providers: what gets retained, for how long, whether your data trains their models, and what your options are if you need stronger guarantees.
What you're actually asking when you ask about privacy
There are a few distinct questions bundled under "AI privacy," and they have different answers:
- Does my data train their models? Is the text I send used to improve future versions of the model?
- How long do they retain my conversations? Even if they don't train on it, how long does the data sit on their servers?
- Who can see my conversations? Can employees access them? Under what circumstances?
- Can I get it deleted? If I ask for deletion, does it actually happen?
- What happens on business/enterprise plans vs. consumer plans? Most providers have different policies for different tiers.
These are separate questions, and conflating them leads to bad decisions. A provider might guarantee no training data use but still retain conversations for 90 days. Another might offer full deletion on request but have a broader employee access policy than you'd expect.
OpenAI / ChatGPT
Consumer plans (ChatGPT Free, Plus): By default, OpenAI uses conversations from consumer accounts to improve their models. You can opt out in settings (Settings > Data Controls > "Improve the model for everyone"). This opt-out is per-account and persists, but it applies to new conversations going forward, not retroactively.
Retention: OpenAI retains conversation data for 30 days after account deletion. Active conversations are retained as long as the account is active (unless you delete them manually, which you can do per-conversation or as a bulk delete through the interface).
ChatGPT Team and Enterprise: Team plan conversations are not used for training by default. Enterprise contracts include explicit DPA (Data Processing Agreement) terms with defined retention periods. Enterprise can negotiate for shorter retention or specific deletion schedules.
OpenAI's track record has been mixed on privacy. They've had internal incidents where employee access to user data was reported to have been broader than users expected. Their current policies are clearer than they were 2 years ago, but if you're handling genuinely sensitive data, the terms require careful reading.
Zero Data Retention option: For API users, OpenAI offers a Zero Data Retention (ZDR) policy where inputs and outputs are not stored at all after the API call completes. This requires a paid enterprise agreement and has pricing implications. It's not available on consumer plans.
Anthropic / Claude
Consumer plans (Free, Pro): Claude Pro's default settings allow Anthropic to use conversations for safety research and model improvement. You can opt out via the Privacy settings in Claude.ai, but the opt-out mechanism is less prominent than it should be, you have to look for it.
Business and Team plans: Claude Business and Team explicitly prohibit training on your organization's conversations by default. No opt-in required. The contractual commitment is in the Terms of Service for these tiers.
Retention: Anthropic retains conversations for 30 days by default for safety and trust purposes. Enterprise contracts can negotiate different terms.
Anthropic's privacy posture: Anthropic has been more proactive about transparency than some competitors. Their privacy documentation is detailed and regularly updated. They have a clear process for data deletion requests. For the Business tier, you get explicit contractual guarantees that your code, documents, and conversations aren't used for training.
One thing that's notable: Anthropic's constitutional AI approach means they have strong internal incentives to avoid the kinds of data misuse that would make the news. Their business model is less dependent on advertising or data brokering than some competitors, which aligns their interests with user privacy reasonably well.
Google / Gemini
Consumer (Gemini Free, Gemini Advanced via Google One): Google's privacy situation is more complex because Gemini is integrated with your Google account, which already has extensive data. By default, Gemini conversations are stored in your Google Account activity (similar to search history) and may be reviewed by human raters for quality.
You can turn off Gemini Apps Activity in your Google Account settings. When off, conversations aren't saved to your account and aren't used for training. But "off" means you also lose personalization features and conversation history.
The integration with Google's broader data ecosystem is a feature or a liability depending on your perspective. Gemini can access your Gmail and Drive with your permission, which is genuinely useful. But it also means your AI interactions are part of Google's data profile on you.
Google Workspace with Gemini: For paid Google Workspace users with Gemini, the policy is clearer: workspace data isn't used for training AI models by default. Google publishes a specific "Customer Data Protection Commitments" document for Workspace that's worth reading if your team uses Gemini through Workspace.
On-premise / data residency: Google Cloud offers Gemini API access through Vertex AI with data residency controls. Enterprise customers can specify that data stays within particular geographic regions and isn't used for training. This is a proper enterprise privacy option, not just policy language.
Microsoft / Copilot and Azure OpenAI
Microsoft's AI privacy situation splits into consumer (Copilot) and enterprise (Azure OpenAI).
Copilot (consumer): Similar to other consumer AI tools: conversations are retained and may be reviewed for quality. You can manage this in the Bing privacy controls in your Microsoft account.
Microsoft 365 Copilot (business): This is where Microsoft has put real work into enterprise privacy. M365 Copilot operates on your organization's Microsoft 365 tenant and is subject to your organization's data governance settings. Conversations stay within the Microsoft 365 compliance boundary, which means they're subject to your organization's retention policies, eDiscovery, and audit logging.
For organizations already in the Microsoft 365 ecosystem with IT policies around data governance, this is the most integrated enterprise AI privacy story available. You're not trusting a new provider; you're extending existing controls.
Azure OpenAI: Through Azure, organizations get OpenAI models with Azure's data handling guarantees. No training on customer data by default, data residency options, private endpoints, and compliance certifications (SOC 2, ISO 27001, HIPAA, etc.). This is the right path for regulated industries using OpenAI models.
Self-hosted and on-premises options
For teams where cloud-based AI is off the table entirely, self-hosting is the privacy guarantee that doesn't require trusting any vendor's policy.
Ollama: Run Llama, Mistral, and other open-weight models locally on your own hardware. Zero data leaves your machine. Ollama is free, open source, and works on Mac, Linux, and Windows (with WSL2). The limitation is that local hardware limits model size; a 70B parameter model requires 40+ GB of VRAM to run efficiently, which means you need serious GPUs or you're running smaller, less capable models.
LM Studio: A user-friendly front-end for running local models. Better UI than Ollama for non-technical users. Same fundamental privacy story: nothing leaves your machine.
vLLM / TGI (Text Generation Inference): Open-source inference servers designed for production deployment on your own infrastructure. More complex to set up but optimized for throughput and designed for teams that need to run their own AI backend.
What you give up with self-hosting:
- Model quality. The best open-weight models are good, but Llama 3.3 70B isn't GPT-4o or Claude 3.7 Sonnet. For most business tasks the gap has narrowed substantially, but it exists.
- Maintenance burden. You're responsible for updates, reliability, and scaling.
- Features. No built-in tooling, plugins, or integrations that the hosted providers offer.
For teams handling confidential or regulated data where no amount of contractual guarantees is sufficient, local deployment is the only option that provides true data isolation.
A practical privacy tier list
For people trying to quickly assess provider options by privacy level:
Highest privacy (appropriate for sensitive commercial/regulated data):
- Self-hosted open models (Ollama, LM Studio)
- Azure OpenAI or Google Vertex AI with ZDR options
- Microsoft 365 Copilot for M365-based organizations
Strong privacy (appropriate for most professional use): 4. Anthropic Claude Business/Team tier 5. OpenAI ChatGPT Enterprise with DPA 6. Google Workspace with Gemini
Reasonable privacy (consumer plans with opt-out): 7. Claude Pro with training opt-out enabled 8. ChatGPT Plus with "Improve the model" disabled 9. Gemini Advanced with Apps Activity disabled
Default / limited privacy: 10. Any free-tier AI tool without reviewing and adjusting default settings
What "zero data retention" actually means in practice
A few providers offer ZDR options (Zero Data Retention), where data is processed but not stored after the response is generated. This is a meaningful privacy upgrade over standard retention.
What ZDR covers: the input you send and the output the model generates are not written to persistent storage after the call completes.
What ZDR doesn't cover: logging at the infrastructure layer (network logs, API gateway logs), compliance monitoring that may retain metadata, and in some implementations, short-term caching that technically counts as temporary storage.
For most privacy use cases, ZDR is sufficient. For legal matters, healthcare data, or national security contexts, you need legal review of the specific provider's ZDR implementation, not just acceptance of the marketing claim.
Checking your current settings
Most people reading this are already using AI tools with the default settings. If you haven't checked your privacy configuration recently:
- ChatGPT: Settings > Data Controls > "Improve the model for everyone" (turn off to opt out)
- Claude.ai: Settings > Privacy > "Improve Claude for everyone" (turn off to opt out)
- Gemini: myaccount.google.com > Data and privacy > Gemini Apps Activity (turn off to stop saving)
- Copilot: Settings in the Microsoft account privacy dashboard
For any tool you use for professional work, spending 5 minutes confirming your settings match your expectations is worth it.
For teams that need to evaluate AI tools for security alongside privacy, the detailed data handling documentation is worth requesting from any vendor before committing to an enterprise contract.