developer-toolsapi Status: active

Portkey

AI gateway and observability platform: route, monitor, and control LLM API calls

Portkey is an AI gateway that sits in front of your LLM API calls to add routing, fallbacks, caching, observability, and guardrails without requiring framework changes. The open-source gateway supports over 200 LLM providers. Teams use Portkey to make production LLM deployments resilient, observable, and cost-controlled.

Portkey launched in 2023 with a focus that differs slightly from pure observability tools like Helicone and LangFuse. The core thesis was that production LLM deployments need more than logging: they need active traffic management. Routing, fallbacks, caching, and guardrails are infrastructure concerns that shouldn't require writing custom code in every application.

The result is a platform that spans two categories. It's an observability tool in that it logs and analyzes LLM requests. It's also a gateway in that it actively participates in request routing and can modify, cache, or block requests based on configured rules.

The gateway model

Portkey's gateway sits between your application and one or more LLM providers. Your application sends requests to Portkey's endpoint rather than to the provider directly. Portkey applies routing logic, checks caches, runs guardrails, and forwards the request to the appropriate provider. The response comes back through Portkey, which logs it and returns it to your application.

This adds one network hop compared to calling providers directly. The latency is typically 10-50ms, comparable to other proxy-based tools. For most LLM applications, this is acceptable. For very latency-sensitive use cases at the edge of model inference times, it's worth measuring.

The benefit is that the gateway logic (fallbacks, caching, routing) is configured in Portkey rather than in application code. A change to fallback behavior doesn't require a code deployment. It means caching rules can be adjusted without touching your application. It means routing logic is visible and auditable in one place rather than scattered across services.

Provider support

Portkey's gateway supports over 200 LLM providers and model endpoints. The provider library covers every major API: OpenAI, Anthropic, Google, Mistral, Cohere, Amazon Bedrock, Azure OpenAI, and dozens of smaller providers and open-source model hosting services.

The practical value is that switching providers or routing traffic between them doesn't require changes in your application. Your application sends requests to Portkey with a model identifier. Portkey's routing config determines which provider and endpoint actually receives the request. Changing the routing config in Portkey's dashboard updates the routing without touching application code.

For teams evaluating a new model or gradually migrating from one provider to another, this operational flexibility is valuable. You can route 5% of traffic to the new model, observe quality and cost in Portkey's dashboard, and adjust the percentage up or down without redeploying.

Fallbacks and reliability

Fallback configuration is one of Portkey's most concrete advantages over simpler observability tools. You define a priority order of providers or models. When the first choice returns an error, rate limit response, or timeout, Portkey automatically retries with the next option in the chain.

The fallback logic handles several common failure modes. Provider outages happen. Rate limits hit during traffic spikes. Specific model versions occasionally return errors. Building retry and fallback logic directly in an application requires careful error handling, testing, and maintenance. Portkey centralizes this logic so you configure it once.

The fallback chain is configurable per use case. A latency-sensitive endpoint might fall back to a faster, cheaper model rather than waiting for the primary provider to recover. A quality-critical endpoint might prefer to wait or try an equivalent-quality alternative.

Request caching

Portkey supports exact and semantic caching for LLM responses. Exact caching is straightforward: identical prompts return the cached response without hitting the provider. Semantic caching uses vector similarity to match prompts that are close in meaning and return cached responses for near-matches.

For applications with high prompt repetition. support chatbots that answer the same questions repeatedly, classification systems that process similar inputs, document processing pipelines; caching can reduce provider costs and improve response times significantly.

Exact caching is safe by definition; the cached response is identical to what the provider returned for the same input. Semantic caching requires more care. The similarity threshold needs to be set appropriately for the use case. For factual questions, a high similarity threshold is safe. For creative or generation tasks, semantic caching may return responses that are close but not quite right for the specific input variation.

Guardrails

Portkey's guardrails feature applies input and output filters to LLM requests. Input guardrails can check for PII in prompts before they're sent to providers, block prompts that contain specific patterns, and enforce content policies. Output guardrails can check model responses for policy violations before returning them to users.

The guardrail system is useful for applications with compliance requirements. Healthcare applications that need to ensure patient data doesn't go to external LLM providers. Financial services that need to block specific types of advice. Consumer applications that need content moderation.

Guardrails are configured in Portkey's dashboard as rules that apply to specific virtual keys, providers, or request patterns. Adding a new guardrail doesn't require application code changes; the filtering happens at the gateway level.

Observability

The observability layer logs every request and response with full prompt and completion capture, token counts, cost estimates, latency metrics, and provider/model information. The dashboard shows aggregate views by time period, model, virtual key, and custom metadata.

Per-user and per-organization cost tracking works via metadata attached to requests. This is similar to Helicone's property tagging. You attach a user identifier to each request header, and Portkey groups cost and usage data by that identifier in the analytics views.

One area where Portkey's observability is thinner than LangSmith is evaluation infrastructure. Portkey logs requests and lets you analyze them, but it doesn't have LangSmith's dataset management and systematic evaluation tooling. For teams that need both gateway capabilities and deep evaluation infrastructure, using Portkey for routing and Langfuse or LangSmith for evaluation is a common combination.

Virtual API keys and access control

Virtual API keys are a security feature worth highlighting. In most LLM applications, the actual provider API keys live in environment variables or secrets managers and are used directly by the application. If one key leaks, you rotate it and hope nothing was exploited in the interim.

Portkey's virtual keys sit in front of real keys. Applications have virtual keys. Real provider keys live in Portkey's vault. Portkey maps virtual keys to real keys when forwarding requests. If a virtual key is compromised, you revoke it in Portkey's dashboard and the underlying provider key is unaffected.

This also enables fine-grained access control. Different teams or services get different virtual keys. Each virtual key can have spending limits, per-minute rate limits, and provider restrictions. A virtual key for a development environment might have a $10/month spending cap. A key for a specific microservice might only be allowed to call specific models.

Open-source gateway

The Portkey gateway is MIT-licensed and available on GitHub. The open-source version handles provider routing, request forwarding, and basic logging. Advanced features like the observability dashboard, semantic caching, and guardrails are available in the cloud version.

Self-hosting the open-source gateway gives teams control over where request data flows. For organizations that can't route production prompts through a third-party cloud service, self-hosting the gateway while using the cloud dashboard for analytics is a hybrid option. For teams that want full data sovereignty, self-hosting the complete stack (available in the enterprise tier) keeps everything on-premises.

Comparison with Helicone

Both Portkey and Helicone use a proxy architecture and provide observability, but they emphasize different things. Helicone puts cost monitoring and per-user tracking front and center. Portkey's emphasis is on reliability features (fallbacks, caching, guardrails) with observability as a second pillar.

For teams whose primary concern is understanding and controlling costs, Helicone's cost analytics are more developed. For teams whose primary concern is production reliability and multi-provider flexibility, Portkey's gateway features are more complete. Teams that need both often use both, or pick the one that solves the more pressing problem first.

Getting started

Portkey's integration path is similar to other proxy tools: change your API base URL to Portkey's endpoint, add your Portkey API key as a header, and start seeing logs in the dashboard. Provider keys are stored as virtual keys in Portkey's vault.

The routing and fallback configuration happens in Portkey's dashboard as gateway configs. A basic fallback setup takes a few minutes to configure. Guardrails and semantic caching take longer to set up correctly but are well-documented.

The free tier's 10,000 requests per month is usable for small applications and integration testing. For production use, Developer at $49/month is the practical starting point.

Key features

AI gateway: single API endpoint for 200+ LLMs via unified interface
Automatic fallback routing when primary provider fails or rate limits
Load balancing across multiple providers or model versions
Request caching to reduce costs and latency for repeated prompts
Full observability with request logging, cost tracking, and latency monitoring
Guardrails: input and output filtering for content safety and PII detection
Virtual API keys with permission scopes and per-key rate limits
Prompt management with versioning and production/staging environments

Pros and cons

Pros

+ Unified API for 200+ LLM providers reduces vendor lock-in
+ Automatic fallbacks and retries add reliability without application code changes
+ Request caching can meaningfully reduce costs for applications with repeated queries
+ Open-source gateway with self-hosting support for data control
+ Virtual API keys with scoped permissions reduce credential exposure risk

Cons

− Free tier of 10,000 requests is limited for active production use
− Guardrails and advanced routing features require paid plans
− Adds proxy latency similar to other gateway-based tools

Who is Portkey for?

Production LLM apps needing multi-provider redundancy and failover
Teams controlling LLM API access across multiple teams or services
Applications with high prompt repetition that benefit from semantic caching
Organizations needing input/output guardrails for safety compliance

Alternatives to Portkey

If Portkey isn't quite the right fit, the closest alternatives are helicone , langsmith , and langfuse . See our full Portkey alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is Portkey?

Portkey is an AI gateway and LLM observability platform. It works as a proxy between your application and LLM provider APIs, adding routing logic, fallbacks, caching, and observability. The core open-source gateway supports over 200 LLM providers via a unified API. You point your application at Portkey instead of directly at OpenAI or Anthropic, and Portkey handles routing, retries, cost tracking, and logging.

What does the fallback feature do?

Portkey's fallback feature automatically routes requests to a backup provider or model if the primary one fails, rate limits, or returns an error. You configure a fallback chain in Portkey's routing config. for example, try GPT-4o first, fall back to Claude Sonnet if OpenAI is unavailable, fall back to Gemini 1.5 Pro if both fail. Then Portkey handles the retry and fallback logic without changes to your application code. For production applications where LLM API availability affects user experience, fallback routing adds resilience without building it yourself.

How does Portkey's caching work?

Portkey supports two types of caching. Simple caching stores exact prompt matches and returns the cached response immediately, bypassing the LLM provider. Semantic caching uses embeddings to match similar but not identical prompts, returning cached responses for queries that are close enough in meaning. Semantic caching is more complex to configure but can significantly reduce costs for applications where users ask similar questions in different phrasings. Both cache types reduce latency to near-zero for cache hits.

What are Portkey virtual API keys?

Virtual API keys in Portkey are proxy keys that map to real provider API keys stored securely in Portkey's vault. You give your applications virtual keys rather than real provider keys. If a virtual key is compromised, you revoke it in Portkey without rotating the underlying provider key. You can set spending limits, rate limits, and provider restrictions per virtual key. This is useful for organizations where multiple teams or services need LLM API access but you want centralized control over usage and cost.

Is the Portkey gateway open source?

Yes. The Portkey gateway is open source at github.com/Portkey-AI/gateway under the MIT license. The gateway handles provider routing, request transformation, and the core proxy functionality. The observability dashboard, prompt management, and advanced features like semantic caching and guardrails are cloud-only or enterprise features. Self-hosting the open-source gateway gives you the routing and reliability benefits without sending request data to Portkey's cloud.

Related agents

Anthropic Computer Use

Claude's computer-use capability that powers desktop and browser agents

Featured

autonomouscomputer-use Paid

Anthropic Skills

Pre-built and custom skills for Claude that extend what Claude can do in Claude Code

developer-toolsproductivity Free tier

AssemblyAI

Speech-to-text API and audio intelligence platform with LLM-powered analysis via LeMUR

speech-to-textaudio-intelligence Free tier

206 ★ — 0.0%