Agentbrisk
developer-toolsapiproductivity Status: active

Helicone

LLM observability and cost monitoring for production AI applications


Helicone is an LLM observability platform that sits between your application and LLM provider APIs as a proxy. Change one URL in your code and every request gets logged, tracked, and analyzed in Helicone's dashboard. Teams use it to monitor costs, debug slow or failed requests, track usage per user, and manage prompt versions across environments.

AI applications have a cost problem that becomes visible as soon as you move past prototyping. In development, a few thousand API calls to OpenAI or Anthropic are cheap enough to ignore. In production, with real users making real requests, costs compound in ways that aren't always obvious until the invoice arrives.

Helicone was built to make that cost curve visible before it becomes a surprise. The company launched in 2023 out of Y Combinator with the observation that every team deploying LLM applications reinvents the same basic monitoring infrastructure. Helicone provides it as a service so you don't have to.

The proxy approach

Helicone's core design choice is the proxy model. Instead of adding an SDK and instrumenting your code, you change your API base URL from the provider's endpoint to Helicone's proxy endpoint. Your application code stays the same. Helicone intercepts the request, logs it, forwards it to the actual provider, receives the response, logs that too, and returns it to your application.

The integration typically takes five minutes. The tradeoff is the proxy latency: each request travels an additional network hop to Helicone's servers before reaching the provider. In practice this adds 10-50ms depending on your geographic proximity to Helicone's infrastructure. For most LLM applications where model inference takes 300ms to several seconds, this overhead is small enough to ignore.

The advantage is that instrumentation requires no code changes and no SDK upgrades. When Helicone adds new features, they're available immediately without redeployment. If you need to disable Helicone for any reason, you revert the URL and your application is back to direct provider calls.

What gets logged

Every request through the proxy is logged in full. This means the complete prompt sent to the model, the complete response received, the model name, the token counts, the inferred cost, the latency in milliseconds, and any custom headers you attached.

Having full request/response logging makes debugging dramatically easier. When a user reports a bad response, you can pull up the exact prompt that generated it. When you notice latency spikes, you can look at the actual requests that were slow and find patterns. When you're trying to understand why token usage jumped last week, you can diff the prompt content before and after.

The logging is opt-in-to-exclusion rather than opt-in. By default, everything is logged. If you have specific requests containing sensitive data that shouldn't be stored, you can pass a header to exclude individual requests from logging.

Cost tracking and breakdowns

The cost dashboard is where Helicone earns its keep for most teams. It shows total spend by time period, broken down by model, endpoint, user, and any custom properties you define.

Custom properties are particularly useful. You can attach metadata to requests. which product feature triggered this call, which A/B test variant the user is in, which customer organization the request belongs to. Helicone stores these properties and lets you filter cost reports by them. This turns cost monitoring from "we spent $400 this week" into "we spent $240 on the document summarization feature, $90 on the chatbot, and $70 on background processing."

For teams building multi-tenant products, the per-user cost tracking is essential. You can see which customers are the heaviest LLM users, whether your pricing covers your costs at different usage levels, and where to set usage limits if you need to cap exposure.

Prompt versioning

One of the less obvious problems in production LLM applications is prompt management. Prompts start as strings in code. Then someone tweaks one and pushes it to production without any record of what changed. Then a different team member changes it again. Three months later, nobody knows what version of the prompt is running in production, whether the change two weeks ago improved or hurt quality, or what prompt produced that particularly good response from last Tuesday.

Helicone's prompt management feature gives prompts first-class versioning. You define prompts in Helicone, give them names and versions, and reference them by ID in your application. Helicone tracks which prompt version generated which response. You can roll back to a previous version, compare performance across versions, and run A/B tests by assigning users to different prompt variants.

This is a meaningful operational improvement for teams that iterate on prompts as part of their product development cycle. The alternative is ad-hoc version control in code, which works poorly for prompts because the iteration cycle is faster and more experimental than regular code.

Alerts and rate limiting

Helicone supports cost alert thresholds. You set a dollar limit, and Helicone notifies you when spend approaches it. You can set limits by time period, by user, or by custom property. For startups where an unexpected traffic spike could mean an unexpected $5,000 invoice, cost alerts are table stakes that Helicone provides out of the box.

Rate limiting by user is also supported. You can set a maximum number of requests or a maximum cost per user per time period. This is useful for free tier management in consumer products, for preventing abuse, and for enforcing fair use policies in multi-tenant applications.

Open source and self-hosting

The full Helicone platform is open source at github.com/Helicone/helicone. The repository includes the proxy server (written in TypeScript), the web dashboard, and the data pipeline for processing logs.

Self-hosting is documented with a Docker Compose setup for smaller deployments and Kubernetes configurations for larger ones. Teams with data residency requirements. particularly in the EU where GDPR makes storing user interaction data in US infrastructure complicated, find the self-hosted path valuable.

The open-source codebase also means the proxy can be audited. For teams concerned about a third-party proxy handling their LLM API keys and prompt data, the ability to read and review the code that touches those credentials is a meaningful trust factor.

How Helicone compares to LangSmith

LangSmith, built by the LangChain team, is Helicone's closest competitor. The two products have meaningful differences in emphasis.

Helicone is stronger on cost monitoring, particularly per-user cost tracking and cost alerting. Its integration model is simpler: change a URL instead of adding an SDK. The open-source posture is more complete, with a fully self-hostable version.

LangSmith is stronger on evaluation. It has more developed tools for running systematic evaluations of LLM application quality, tracking metrics over time, and managing evaluation datasets. For teams doing serious model evaluation and quality benchmarking, LangSmith's eval infrastructure is more mature.

Many teams use Helicone for cost and request monitoring, and a separate eval tool for quality measurement. The two are not mutually exclusive.

Pricing for real applications

The free tier of 10,000 requests per month is enough for a small production application or active development. At 10,000 requests, you're probably running a tool that a handful of people use daily or doing heavy integration testing.

Growth at $80/month includes 200,000 requests. For a product with hundreds of daily active users making a few LLM calls each, this is the right tier. Pro at $200/month scales to larger teams and higher request volumes.

The pricing model is by request volume, not by token or dollar spend. This means costs are predictable relative to your application's request rate, regardless of which models you're using or how long your prompts are.

Getting started

The setup is deliberately minimal. Change your base URL to oai.helicone.ai for OpenAI or the equivalent Helicone endpoint for your provider, add your Helicone API key as a request header, and you're done. The dashboard starts populating within seconds.

The custom properties feature is worth setting up early. Adding a one-line header to your requests that identifies the feature or workflow that generated them pays off immediately when you're debugging cost spikes and trying to figure out which part of your application is responsible.

For teams evaluating Helicone against LangSmith or LangFuse, the simplest approach is to try Helicone first given the near-zero integration cost. If you find that you need more sophisticated evaluation tooling than Helicone provides, that's the right signal to investigate the alternatives.

Key features

  • One-line integration via proxy URL change, no SDK required
  • Real-time cost tracking per model, user, and custom property
  • Request and response logging with full prompt and output capture
  • Latency monitoring and percentile breakdowns by model and endpoint
  • User segmentation: track costs and usage per end-user or organization
  • Prompt versioning and management with A/B testing support
  • Alert thresholds for cost spikes and error rate increases
  • Open-source with self-hosting option for data privacy requirements

Pros and cons

Pros

  • + Proxy-based integration requires changing one URL, no SDK changes or instrumentation
  • + Open source with a self-hosted path for teams with data residency requirements
  • + Real-time cost dashboards broken down by model, user, and custom dimensions
  • + Prompt management with versioning solves the scattered prompt string problem
  • + Free tier of 10,000 requests per month is usable for small production apps

Cons

  • − Proxy architecture adds latency (typically 10-50ms per request)
  • − Advanced features like A/B testing and custom evaluations require paid plans
  • − Weaker evaluation and testing tools than LangSmith for teams running systematic evals

Who is Helicone for?

  • Production LLM apps that need cost visibility and per-user tracking
  • Teams debugging latency spikes and unexpected model failures
  • Startups monitoring API spend before costs get out of control
  • Developers managing prompt versions across staging and production

Alternatives to Helicone

If Helicone isn't quite the right fit, the closest alternatives are langsmith , langfuse , and portkey . See our full Helicone alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is Helicone and how does it work?
Helicone is an LLM observability platform that works as a proxy between your application and LLM provider APIs like OpenAI, Anthropic, and others. You change your API base URL to route through Helicone's servers. Helicone logs every request and response, tracks costs in real time, measures latency, and displays everything in a web dashboard. The proxy approach means you don't need to add an SDK or change how your application calls the LLM; just update the URL.
Does Helicone see my API keys?
Helicone's proxy needs to forward your requests to the LLM provider, which means your API key passes through the proxy. Helicone handles this by accepting your key via a request header that it forwards to the provider without storing it. For teams with strict security requirements, Helicone offers a self-hosted deployment option and an enterprise tier with additional security documentation. The open-source codebase allows you to audit how keys are handled.
How much latency does the Helicone proxy add?
Helicone adds roughly 10-50ms of latency per request due to the proxy routing. For most applications, this is imperceptible. LLM responses typically take hundreds of milliseconds to several seconds, so a 30ms proxy overhead is a small fraction of total response time. For very latency-sensitive applications where you're already at the lower bound of model response times, this could be a consideration.
Can Helicone track costs per user?
Yes. Helicone's user segmentation feature lets you attach a user identifier to each request via a request header. Helicone then groups costs and usage by that identifier in the dashboard. For B2C applications where you're billing users based on AI usage, or for multi-tenant SaaS products where you want to track which customers are using the most tokens, this gives you the breakdown you need without building cost tracking yourself.
Is Helicone open source?
Yes. Helicone's core platform is open source at github.com/Helicone/helicone under the Apache 2.0 license. The repository includes the proxy server, the dashboard, and the data pipeline. Self-hosting is documented and supported. Most teams use the hosted cloud service for convenience, but self-hosting is viable for teams with data residency or compliance requirements.

Related agents

Search