AI Cost Attribution: Who Pays for AI Inside Your Organization
There's a moment that happens in most companies 18-24 months into their AI adoption: someone from finance shows up at the engineering all-hands with a slide showing that AI API spend grew 340% in the past year, and asks who's responsible for it. The engineers look at each other. Nobody has a good answer.
This isn't an unusual situation. AI costs are diffuse by default. Multiple teams use the same API keys. Different features call different models. Some of that spend is delivering real value; some of it is powering a feature that 12 users have opened once. Without cost attribution, you can't tell which is which.
Cost attribution means knowing, for every dollar you spend on AI, what team spent it, on what feature, and what it produced. Getting to that state requires some infrastructure work, but it's not a long project. Here's how to do it.
The attribution problem in concrete terms
Say you have three teams: product, customer success, and data science. All three use your company's OpenAI API key. At the end of the month, OpenAI sends one bill for $47,000.
Product thinks they used about $15,000 of that (their new AI writing assistant). Customer success used the API for their case summarization tool, probably another $10,000. Data science is running nightly batch analysis jobs. And there's $22,000 of unknown origin, probably split between one-off experiments, the AI features that aren't tracked, and that data engineering pipeline someone set up six months ago.
This is the attribution gap. You're spending real money, but you can't tell which parts of your organization are responsible for which portions of the bill.
Three things go wrong without attribution: budget planning is impossible (you can't plan what you can't measure), there's no incentive for teams to optimize (they're not paying for it), and you have no way to evaluate ROI on specific features.
The tagging approach: metadata on every request
The most direct path to attribution is tagging. Every API call includes metadata that identifies its origin.
For OpenAI and most other providers, you can pass custom metadata or user identifiers with each request. These don't affect the response but get logged and can be used for cost analysis.
With the OpenAI SDK in Python:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
# Custom metadata for attribution
extra_headers={
"X-Team": "customer-success",
"X-Feature": "case-summarization",
"X-Environment": "production"
}
)
The exact field names depend on the provider and whether you're using a proxy layer. If you're routing through a proxy like LiteLLM or Portkey, you set metadata at the proxy layer and it gets applied to all requests from a given service.
The tagging taxonomy matters. You need at least:
- Team: which organizational unit made the request
- Feature: which product feature or use case triggered it
- Environment: production vs staging vs development (so dev experiments don't pollute your production cost analysis)
Optional but useful: user ID (for per-user cost analysis in multi-tenant products), request type (if you have multiple AI functions within one feature), and a job ID for batch workloads.
Helicone: observability and cost tracking with minimal setup
Helicone is an AI observability proxy that sits between your application and the AI provider. All requests route through Helicone, which logs them, adds latency measurement, and tracks costs. The setup is a one-line change: you swap the API base URL in your SDK initialization for Helicone's proxy URL.
from openai import OpenAI
client = OpenAI(
base_url="https://oai.helicone.ai/v1",
default_headers={
"Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
"Helicone-Property-Team": "customer-success",
"Helicone-Property-Feature": "case-summarization"
}
)
Helicone's dashboard shows cost by model, by time, and by the custom properties you've tagged. You can filter costs to any combination of tags: "show me production spend from the customer success team on the case-summarization feature over the last 30 days."
The free tier covers up to 100,000 requests per month. At volume, Helicone's pricing is around $20/month per 1 million requests, which is negligible compared to model costs.
For multi-provider setups, Helicone supports routing to Anthropic, Gemini, and others through the same proxy, with unified cost reporting across providers.
Langfuse: attribution with deeper LLM observability
Langfuse covers similar ground to Helicone but with a stronger focus on trace-level observability. It's open-source and can be self-hosted, which matters for companies with data residency requirements.
Langfuse's attribution model is built around traces and observations. A trace represents a single user-facing operation (a document summary request, a chat turn, a batch processing job). Each trace has associated cost data, metadata, and can include user feedback if you wire that up.
The advantage of the trace model is granularity. If one user request triggers five model calls (a planning call, a tool call, a synthesis call, etc.), Langfuse shows you the cost breakdown within that trace, not just the total. This matters when you're debugging why one feature is more expensive than expected.
from langfuse import Langfuse
langfuse = Langfuse()
# Create a trace with attribution metadata
trace = langfuse.trace(
name="contract-review",
user_id="user_1234",
metadata={
"team": "legal",
"feature": "contract-review",
"document_type": "nda"
}
)
Langfuse generates cost reports by any combination of trace metadata. For internal billing, you can export monthly cost summaries per team/feature combination and feed them into your finance process.
Self-hosted Langfuse (via Docker or Kubernetes) is free. The cloud version has a free tier and paid plans starting around $49/month.
Setting usage caps by team
Cost visibility is more useful when combined with caps. A monthly cap gives teams a soft budget constraint that drives optimization without requiring constant manual oversight.
The mechanic: when a team has used 80% of their monthly AI budget allocation, they get an automated alert. When they hit 100%, either requests are queued or the team lead gets a second alert requiring approval to continue. Hard blocks at 100% are generally too disruptive for production workloads; soft alerts with escalation work better.
Both Helicone and Portkey support budget limits at the API key or project level. You create a separate API key for each team, set a monthly spend cap on that key, and configure alert thresholds. The teams use their assigned key, and you get per-key cost reports automatically.
For self-hosted LiteLLM proxy users, budget controls are configured in the proxy's config:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: sk-...
general_settings:
master_key: sk-...
user_settings:
customer_success_team:
max_budget: 2000 # USD per month
budget_duration: "monthly"
soft_budget: 1600 # Alert at 80%
With this configuration, the customer success team's requests fail gracefully (or trigger an alert) when they've spent $2,000 in a month.
Internal billing: making the money actually move
For most organizations, the first step is just visibility: teams can see what they're spending, but money doesn't actually move between budgets. This is enough to change behavior.
The more mature approach is internal billing: actual accounting transfers between cost centers based on AI usage. At the end of each month, finance runs a report from your attribution system, calculates each team's AI spend, and allocates it against their budget in the general ledger.
This sounds bureaucratic, but it produces real effects. When AI costs hit a team's budget, the team lead has a direct incentive to ask "is this feature worth what we're spending on it?" and "are we using the right model for this task?" Teams that absorb AI costs as a line item in their budget treat AI spend as a business decision, not as a free resource.
The internal billing process doesn't require a formal chargeback system. A monthly spreadsheet with per-team AI costs, reviewed in team budget reviews, accomplishes most of the same behavioral effects.
What to do with attribution data
The point of cost attribution isn't the reporting itself. It's the decisions it enables.
Feature ROI analysis. For any significant AI feature, compare monthly AI cost to the business value it produces (revenue attributed, time saved, errors avoided). A feature that costs $3,000 per month in AI and saves 200 hours of engineer time at $150/hour is generating 10x return. A feature that costs $3,000 per month and is used by 8 people twice a week is probably not.
Model selection optimization. Attribution by model shows you where you're spending the most. If one feature is using GPT-4o for tasks that GPT-4o mini handles just as well, that's a direct cost reduction opportunity. For most classification and extraction tasks, smaller models are within a few percentage points of the quality of larger models at 10-20x lower cost.
Anomaly detection. Attribution data establishes a baseline for each feature's normal cost. If a feature's daily cost doubles without a corresponding increase in usage, something has changed: a prompt is longer, an edge case is triggering expensive retry logic, or a bug is causing duplicate requests.
Budget planning. Next year's AI budget is much easier to defend and calibrate when you have a clean cost-by-feature breakdown from this year. You can project costs based on feature roadmap and usage growth, rather than guessing.
Getting cost attribution working doesn't require a large project. Adding one metadata header to your API calls, routing through an observability proxy, and setting up monthly cost reports is 2-3 days of engineering work. The payoff is months of meaningful data that makes AI spending a first-class financial decision rather than an unexplained line item on your cloud bill.