Agentbrisk

AI Token Tracking in 2026: Per-User, Per-Feature, Per-Org Attribution

April 5, 2026 · Editorial Team · 6 min read · ai-infrastructuretoken-trackingcost-monitoring

Token tracking is the foundational layer underneath every other LLM cost and usage metric. Before you can tell a customer how much of their plan they've consumed, before you can calculate your unit economics, before you can identify which feature is driving cost growth, you need accurate token counts attributed to the right entity: the right user, the right feature, the right organization.

This sounds simple. In practice, it's one of those "obvious in theory, fiddly in execution" problems. Multi-step agents make multiple API calls per user action. Streaming responses return token counts at the end, not the beginning. Cached tokens cost less than non-cached tokens on some providers, so raw token counts aren't enough. Background agents run on no user's behalf specifically but need to be accounted for somewhere.

This guide covers the patterns that work, the tools that help, and the data model you'll need.


Why token counts from the API aren't enough

Every LLM API response includes token usage in the response object. For Anthropic:

response = client.messages.create(...)
print(response.usage.input_tokens)   # tokens in the prompt
print(response.usage.output_tokens)  # tokens in the response

For OpenAI:

response = client.chat.completions.create(...)
print(response.usage.prompt_tokens)
print(response.usage.completion_tokens)
print(response.usage.total_tokens)

The numbers are there. The problem is attribution. These numbers tell you what a single API call consumed. They don't tell you why the call happened, whose action triggered it, or what feature it served.

If a user sends one message in your chat interface and that triggers a planning call, two tool calls, and a synthesis call, you have four separate API responses with four sets of token counts. The sum of those four is what you want to attribute to that user's action. Assembling that sum requires your application to know that all four calls belong together.


The attribution data model

A practical token tracking schema has a few key tables. The exact names don't matter; the relationships do.

LLM calls table. One row per API call. Columns: call ID, timestamp, model, input tokens, output tokens, cached input tokens (where the provider distinguishes these), user ID (nullable), session ID, feature ID, environment, cost (computed from tokens and model pricing).

Sessions table. Groups of related calls. A session is a user interaction: one message from a user that triggers N internal calls. Columns: session ID, user ID, org ID, feature ID, start time, end time, total input tokens, total output tokens, total cost.

User token budgets table (if you have per-user limits or billing). Columns: user ID, org ID, period (month), allocated tokens, consumed tokens, updated at.

The sessions table is the key building block for attribution. Every time a user action starts, you create a session ID and pass it through to every downstream LLM call. At the end of the interaction, you aggregate the session. This gives you per-user-action cost and token usage without losing the individual call data.


Implementing context propagation

The session ID needs to flow from the user-facing layer down to every LLM call, even if they go through multiple hops.

In Python with a synchronous stack, the cleanest approach is a context variable:

from contextvars import ContextVar
from typing import Optional
import uuid

current_session_id: ContextVar[Optional[str]] = ContextVar(
    'session_id', default=None
)

def start_session(user_id: str, feature: str) -> str:
    session_id = str(uuid.uuid4())
    current_session_id.set(session_id)
    # Write session start to DB
    db.sessions.insert({
        "id": session_id,
        "user_id": user_id,
        "feature": feature,
        "started_at": now()
    })
    return session_id

def record_llm_call(model: str, input_tokens: int, output_tokens: int):
    session_id = current_session_id.get()
    cost = calculate_cost(model, input_tokens, output_tokens)
    db.llm_calls.insert({
        "session_id": session_id,
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "cost": cost,
        "created_at": now()
    })

If you're using an observability platform like Langfuse, you can pass the session ID as a tag on every trace. Langfuse's dashboard lets you group traces by session ID and aggregate costs.

For async or distributed systems, context propagation is harder. If your LLM calls happen in a worker process separate from your web process, you need to pass the session ID explicitly, either as part of the task payload or via a distributed tracing header like the W3C Trace Context.


Cached tokens: accounting for them correctly

Anthropic's prompt caching feature returns different token types in the usage object:

response.usage.cache_creation_input_tokens  # tokens written to cache
response.usage.cache_read_input_tokens      # tokens read from cache
response.usage.input_tokens                 # non-cached input tokens

Cache read tokens cost 0.1x the normal input token price. Cache write tokens cost 1.25x (the premium for creating the cache entry). If you're just summing input_tokens you're getting the wrong cost numbers when caching is active.

Your cost calculation function needs to handle all three cases:

def calculate_cost_anthropic(
    model: str,
    input_tokens: int,
    output_tokens: int,
    cache_creation_tokens: int = 0,
    cache_read_tokens: int = 0,
) -> float:
    pricing = ANTHROPIC_PRICING[model]
    return (
        input_tokens * pricing["input"]
        + output_tokens * pricing["output"]
        + cache_creation_tokens * pricing["cache_write"]
        + cache_read_tokens * pricing["cache_read"]
    ) / 1_000_000  # pricing is per 1M tokens

Keep your pricing table updated when providers change rates. Hard-coding these numbers without version tracking is a common source of cost calculation drift.


Per-org and per-user token budgets

If your product charges for AI usage (either directly or as part of a tier), you need to enforce budgets. This means checking remaining budget before each call and updating the budget atomically after each call.

A minimal budget enforcement pattern:

def check_and_consume_budget(
    org_id: str,
    user_id: str,
    estimated_tokens: int,
) -> bool:
    """Returns True if the call is allowed, False if over budget."""
    with db.transaction():
        budget = db.budgets.get_for_update(
            org_id=org_id,
            period=current_month()
        )
        if budget.consumed + estimated_tokens > budget.allocated:
            return False
        budget.consumed += estimated_tokens
        db.budgets.update(budget)
    return True

The estimated_tokens for input is known before the call (you can count the prompt tokens client-side or use a tokenizer). Output tokens aren't known until the response completes, which means you need a post-call adjustment. The standard pattern: reserve a buffer (say, your expected max output tokens), then release the unused portion after the call completes.

For soft limits (warning users they're at 80% of their budget, not hard-cutting them off), you can do this asynchronously: let the call proceed, write the actual token count to the budget table after completion, and send a notification if the cumulative total crosses a threshold.


Tools that help with token tracking

Langfuse. Its native session and user tracking lets you group traces by session ID and user ID, then aggregate token costs in the dashboard. Langfuse's API lets you query total cost per user per time period, which is useful for per-seat reporting.

Helicone. Custom properties on requests (set via headers) let you attach user ID, org ID, and feature name to every call. Helicone's dashboard then shows cost breakdowns by any of these dimensions without extra aggregation work on your side. It's the fastest way to get per-user cost visibility with minimal code.

LiteLLM. LiteLLM is a proxy layer that normalizes calls across different LLM providers. It includes budget management features: you can set token budgets per user, per team, or globally, and LiteLLM enforces them before forwarding calls to the provider. For multi-provider setups, this is useful because you get unified token tracking across Anthropic, OpenAI, and others.

Custom database. For teams with complex attribution requirements (multi-level orgs, fine-grained feature billing, custom reporting for enterprise customers), the tools above are often not flexible enough. A custom token tracking table with good indexing on (user_id, org_id, feature, period) and a nightly aggregation job is often the most practical path.


The report nobody knows they need

Once you have per-user, per-feature token data, build this report: total LLM cost per business outcome, segmented by feature.

For each major feature in your product, what does it cost in LLM tokens per completion? What's the range (p50, p90, p99 cost per completion)? Which features have cost distributions that are stable vs. highly variable?

This report tells you:

  • Where to focus prompt optimization effort (high-cost or high-variance features)
  • Whether your per-seat or per-usage pricing covers your costs (margin per feature)
  • Which features have runaway edge cases (p99 cost much higher than p50 = some inputs are very expensive)

Most teams don't have this report. Building it takes about a day of engineering once you have the underlying token data. It's one of the highest-ROI internal analytics tools you can build for an AI product.


The cost monitoring platforms that build dashboards on top of this token data are compared in the AI cost monitoring guide.

Search