Helicone
LLM observability and cost monitoring for production AI applications
Helicone is an LLM observability platform that sits between your application and LLM provider APIs as a proxy. Change one URL in your code and every request gets logged, tracked, and analyzed in Helicone's dashboard. Teams use it to monitor costs, debug slow or failed requests, track usage per user, and manage prompt versions across environments.
AI applications have a cost problem that becomes visible as soon as you move past prototyping. In development, a few thousand API calls to OpenAI or Anthropic are cheap enough to ignore. In production, with real users making real requests, costs compound in ways that aren't always obvious until the invoice arrives.
Helicone was built to make that cost curve visible before it becomes a surprise. The company launched in 2023 out of Y Combinator with the observation that every team deploying LLM applications reinvents the same basic monitoring infrastructure. Helicone provides it as a service so you don't have to.
The proxy approach
Helicone's core design choice is the proxy model. Instead of adding an SDK and instrumenting your code, you change your API base URL from the provider's endpoint to Helicone's proxy endpoint. Your application code stays the same. Helicone intercepts the request, logs it, forwards it to the actual provider, receives the response, logs that too, and returns it to your application.
The integration typically takes five minutes. The tradeoff is the proxy latency: each request travels an additional network hop to Helicone's servers before reaching the provider. In practice this adds 10-50ms depending on your geographic proximity to Helicone's infrastructure. For most LLM applications where model inference takes 300ms to several seconds, this overhead is small enough to ignore.
The advantage is that instrumentation requires no code changes and no SDK upgrades. When Helicone adds new features, they're available immediately without redeployment. If you need to disable Helicone for any reason, you revert the URL and your application is back to direct provider calls.
What gets logged
Every request through the proxy is logged in full. This means the complete prompt sent to the model, the complete response received, the model name, the token counts, the inferred cost, the latency in milliseconds, and any custom headers you attached.
Having full request/response logging makes debugging dramatically easier. When a user reports a bad response, you can pull up the exact prompt that generated it. When you notice latency spikes, you can look at the actual requests that were slow and find patterns. When you're trying to understand why token usage jumped last week, you can diff the prompt content before and after.
The logging is opt-in-to-exclusion rather than opt-in. By default, everything is logged. If you have specific requests containing sensitive data that shouldn't be stored, you can pass a header to exclude individual requests from logging.
Cost tracking and breakdowns
The cost dashboard is where Helicone earns its keep for most teams. It shows total spend by time period, broken down by model, endpoint, user, and any custom properties you define.
Custom properties are particularly useful. You can attach metadata to requests. which product feature triggered this call, which A/B test variant the user is in, which customer organization the request belongs to. Helicone stores these properties and lets you filter cost reports by them. This turns cost monitoring from "we spent $400 this week" into "we spent $240 on the document summarization feature, $90 on the chatbot, and $70 on background processing."
For teams building multi-tenant products, the per-user cost tracking is essential. You can see which customers are the heaviest LLM users, whether your pricing covers your costs at different usage levels, and where to set usage limits if you need to cap exposure.
Prompt versioning
One of the less obvious problems in production LLM applications is prompt management. Prompts start as strings in code. Then someone tweaks one and pushes it to production without any record of what changed. Then a different team member changes it again. Three months later, nobody knows what version of the prompt is running in production, whether the change two weeks ago improved or hurt quality, or what prompt produced that particularly good response from last Tuesday.
Helicone's prompt management feature gives prompts first-class versioning. You define prompts in Helicone, give them names and versions, and reference them by ID in your application. Helicone tracks which prompt version generated which response. You can roll back to a previous version, compare performance across versions, and run A/B tests by assigning users to different prompt variants.
This is a meaningful operational improvement for teams that iterate on prompts as part of their product development cycle. The alternative is ad-hoc version control in code, which works poorly for prompts because the iteration cycle is faster and more experimental than regular code.
Alerts and rate limiting
Helicone supports cost alert thresholds. You set a dollar limit, and Helicone notifies you when spend approaches it. You can set limits by time period, by user, or by custom property. For startups where an unexpected traffic spike could mean an unexpected $5,000 invoice, cost alerts are table stakes that Helicone provides out of the box.
Rate limiting by user is also supported. You can set a maximum number of requests or a maximum cost per user per time period. This is useful for free tier management in consumer products, for preventing abuse, and for enforcing fair use policies in multi-tenant applications.
Open source and self-hosting
The full Helicone platform is open source at github.com/Helicone/helicone. The repository includes the proxy server (written in TypeScript), the web dashboard, and the data pipeline for processing logs.
Self-hosting is documented with a Docker Compose setup for smaller deployments and Kubernetes configurations for larger ones. Teams with data residency requirements. particularly in the EU where GDPR makes storing user interaction data in US infrastructure complicated, find the self-hosted path valuable.
The open-source codebase also means the proxy can be audited. For teams concerned about a third-party proxy handling their LLM API keys and prompt data, the ability to read and review the code that touches those credentials is a meaningful trust factor.
How Helicone compares to LangSmith
LangSmith, built by the LangChain team, is Helicone's closest competitor. The two products have meaningful differences in emphasis.
Helicone is stronger on cost monitoring, particularly per-user cost tracking and cost alerting. Its integration model is simpler: change a URL instead of adding an SDK. The open-source posture is more complete, with a fully self-hostable version.
LangSmith is stronger on evaluation. It has more developed tools for running systematic evaluations of LLM application quality, tracking metrics over time, and managing evaluation datasets. For teams doing serious model evaluation and quality benchmarking, LangSmith's eval infrastructure is more mature.
Many teams use Helicone for cost and request monitoring, and a separate eval tool for quality measurement. The two are not mutually exclusive.
Pricing for real applications
The free tier of 10,000 requests per month is enough for a small production application or active development. At 10,000 requests, you're probably running a tool that a handful of people use daily or doing heavy integration testing.
Growth at $80/month includes 200,000 requests. For a product with hundreds of daily active users making a few LLM calls each, this is the right tier. Pro at $200/month scales to larger teams and higher request volumes.
The pricing model is by request volume, not by token or dollar spend. This means costs are predictable relative to your application's request rate, regardless of which models you're using or how long your prompts are.
Getting started
The setup is deliberately minimal. Change your base URL to oai.helicone.ai for OpenAI or the equivalent Helicone endpoint for your provider, add your Helicone API key as a request header, and you're done. The dashboard starts populating within seconds.
The custom properties feature is worth setting up early. Adding a one-line header to your requests that identifies the feature or workflow that generated them pays off immediately when you're debugging cost spikes and trying to figure out which part of your application is responsible.
For teams evaluating Helicone against LangSmith or LangFuse, the simplest approach is to try Helicone first given the near-zero integration cost. If you find that you need more sophisticated evaluation tooling than Helicone provides, that's the right signal to investigate the alternatives.
Key features
- One-line integration via proxy URL change, no SDK required
- Real-time cost tracking per model, user, and custom property
- Request and response logging with full prompt and output capture
- Latency monitoring and percentile breakdowns by model and endpoint
- User segmentation: track costs and usage per end-user or organization
- Prompt versioning and management with A/B testing support
- Alert thresholds for cost spikes and error rate increases
- Open-source with self-hosting option for data privacy requirements
Pros and cons
Pros
- + Proxy-based integration requires changing one URL, no SDK changes or instrumentation
- + Open source with a self-hosted path for teams with data residency requirements
- + Real-time cost dashboards broken down by model, user, and custom dimensions
- + Prompt management with versioning solves the scattered prompt string problem
- + Free tier of 10,000 requests per month is usable for small production apps
Cons
- − Proxy architecture adds latency (typically 10-50ms per request)
- − Advanced features like A/B testing and custom evaluations require paid plans
- − Weaker evaluation and testing tools than LangSmith for teams running systematic evals
Who is Helicone for?
- Production LLM apps that need cost visibility and per-user tracking
- Teams debugging latency spikes and unexpected model failures
- Startups monitoring API spend before costs get out of control
- Developers managing prompt versions across staging and production
Alternatives to Helicone
If Helicone isn't quite the right fit, the closest alternatives are langsmith , langfuse , and portkey . See our full Helicone alternatives page for side-by-side comparisons.
Frequently Asked Questions
What is Helicone and how does it work?
Does Helicone see my API keys?
How much latency does the Helicone proxy add?
Can Helicone track costs per user?
Is Helicone open source?
Related agents
Anthropic Computer Use
Claude's computer-use capability that powers desktop and browser agents
Anthropic Skills
Pre-built and custom skills for Claude that extend what Claude can do in Claude Code
AssemblyAI
Speech-to-text API and audio intelligence platform with LLM-powered analysis via LeMUR