AI Agent Costs: Real Monthly Bills Broken Down by Usage Tier
Most articles about AI agent costs stay abstract. They quote per-token prices and leave you to do the math yourself. That math is harder than it sounds, because real AI agent costs have four distinct components: infrastructure, LLM API calls, tool/API calls, and observability. Each scales differently, and the ratio between them shifts dramatically as you grow.
Let me break this down with three real customer profiles. These aren't invented examples. They're composites of actual production deployments I've seen in 2026, with specific numbers rounded but roughly accurate.
The four cost buckets
Before the profiles, you need to understand what you're actually paying for.
LLM tokens are the obvious one. You send text in (input tokens) and get text back (output tokens). Output tokens cost more because they require the model to generate sequentially. At current pricing, Claude 3.5 Sonnet is $3/million input and $15/million output via the Anthropic API. GPT-4o is $2.50/$10. Claude 4 Opus is $15/$75. The "price" you see quoted is rarely what you actually pay, because real agents send context with every call.
Infrastructure covers compute for your orchestration layer, databases for memory and state, vector stores for retrieval, queues for async work, and network egress. For small deployments this is negligible. For large ones it rivals the LLM costs.
Tool calls are often overlooked. Every time your agent calls an external API, you pay for that. Web search APIs run $0.002 to $0.005 per query. Code execution sandboxes (like E2B or Modal) charge for compute time. Browser automation, email sending, database queries, all of these have their own pricing that adds up.
Observability is the cost people forget until they're debugging a production incident and have no idea what happened. Logging, tracing, and monitoring for an agentic system with complex multi-step flows costs real money, especially at scale. LangSmith, Langfuse, Helicone, and similar tools charge based on trace volume.
Profile 1: The $500/month startup
This is a team of two engineers building an internal automation agent for a seed-stage company. The agent handles customer support triage, drafting responses to common tickets based on a knowledge base, and routing complex tickets to the right person.
Typical usage:
- 200 tickets processed per day
- Average conversation: 4 LLM calls, ~3000 input tokens, ~600 output tokens per call
- Knowledge base retrieval: 2 vector search queries per conversation
- Roughly 6000 LLM calls per day, 180,000 per month
Monthly bill breakdown:
- LLM API (Claude 3.5 Sonnet): $195/month (input: ~32M tokens at $3 + output: ~6.4M at $15)
- Vector database (Pinecone or Weaviate free/starter tier): $0-25/month
- Infrastructure (one small EC2/Cloud Run instance, Redis for state): $40/month
- Observability (Langfuse open-source self-hosted): $10/month (just hosting)
- Slack API, email API: free tier
- Total: roughly $260-$270/month
The actual $500/month budget leaves meaningful headroom. This team uses the remainder for a staging environment, slightly more capable model calls for edge cases, and a small buffer for traffic spikes.
The main cost lever at this scale is model choice. If they used Claude 4 Opus for everything, the LLM bill alone would be over $3,000/month. Using a tiered approach where Sonnet handles routine responses and a more capable model only handles escalations keeps costs sensible without sacrificing quality on the cases that need it.
Profile 2: The $5k/month growing company
A 25-person B2B SaaS company using AI agents for sales prospecting, technical documentation generation, and customer onboarding automation. Three distinct agent workflows running in production.
Sales prospecting agent:
- Runs 500 prospect research jobs per week
- Each job: web search (4 queries at $0.003 each), 3 LLM calls, one enrichment API call
- LLM calls use GPT-4o for the synthesis step
Documentation agent:
- Generates and updates docs from code commits and changelogs
- 200 documentation updates per week
- Higher output token count: 2000 output tokens per job
Onboarding agent:
- Guides new customers through setup flows
- 80 new customers per month, 6-session average onboarding
- Real-time chat, so latency matters (using faster models)
Monthly bill breakdown:
- LLM API (mix of GPT-4o and Claude 3.5 Sonnet): $1,800/month
- Web search API (Exa or Serper): $300/month (100k+ queries)
- Infrastructure (two app servers, PostgreSQL, Redis, vector DB): $600/month
- E2B code execution sandbox (used in docs agent for code verification): $200/month
- Observability (Langfuse cloud): $150/month
- Misc APIs (email enrichment, LinkedIn, company data): $700/month
- Total: roughly $3,750-$4,200/month
The surprise here is how much the enrichment and search APIs cost relative to the LLM calls. Teams often budget only for LLM costs and underestimate the tool call stack. At this scale, the third-party data APIs are the second-largest cost category.
The main optimization available is caching. The documentation agent sends similar system prompts and context repeatedly. With prompt caching enabled on the Anthropic API, cache hit rates of 70-80% are achievable on the repeated context, cutting effective input token costs significantly.
Profile 3: The $50k/month scale-up
A 150-person company where AI agents are core to the product, not just internal tools. The platform does AI-assisted financial analysis for mid-market CFOs: competitive benchmarking, variance analysis, board report generation, and real-time cash flow monitoring with alert generation.
Scale:
- 2,000 active customer accounts
- 40,000 agent jobs per day across all agent types
- Multiple agent types: research, analysis, report generation, monitoring
- Strict latency SLAs on monitoring agents (under 10 seconds response)
Monthly bill breakdown:
- LLM API (Claude 4 Opus for analysis/reports, Sonnet for monitoring/alerts): $18,000/month
- Financial data APIs (Bloomberg, FactSet data hooks, SEC Edgar): $8,000/month
- Infrastructure (multi-region Kubernetes, RDS, Redis Cluster, Qdrant): $9,000/month
- Observability (Datadog with LLM observability add-on): $4,500/month
- Storage (S3 for report artifacts, 12TB): $280/month
- Misc (Stripe fees, auth, CDN): $2,000/month
- Total: roughly $41,780-$44,000/month
At this scale, infrastructure has become a major cost center. Running multi-region for reliability, with proper auto-scaling and redundancy, is not cheap. The team evaluated whether to run their own GPU inference for some workloads and decided against it: the operational overhead of managing GPU infrastructure exceeds the cost savings until you're significantly larger.
The observability cost is also notable. At 40,000 agent jobs per day, you're producing enormous trace volumes. Without good observability, debugging any production issue is nearly impossible. They've accepted the $4,500/month as non-negotiable for a production-critical product.
Where costs actually scale fastest
If you're planning growth from one tier to the next, the cost structure doesn't scale linearly. A few things grow faster than you expect:
Observability costs scale roughly linearly with job volume, but at higher tiers you need more sophisticated tooling, not just more volume handling. Moving from Langfuse community edition to a fully managed observability stack like Datadog with AI add-ons is a step function increase.
Tool call costs are treacherous. Web search and data APIs often have per-query pricing with no volume discount until you're at enterprise scale. A 10x increase in jobs produces a 10x increase in search API costs with no offsetting efficiency gain.
Infrastructure can actually scale sub-linearly if you're smart about it. Batching async jobs, using spot/preemptible compute, and sharing infrastructure across agent types keeps this from blowing up.
LLM costs are the most controllable with smart model routing. Tiering your calls (fast/cheap model for simple tasks, expensive model for tasks that need quality) is the single highest-impact optimization available to teams at any scale.
One number that surprises teams making the jump from $5k to $50k/month: you should expect roughly a 10x increase in support and reliability costs (engineering time, incident response, on-call overhead) alongside the 10x increase in direct infrastructure costs. The operational cost of running production AI agents is real even when it doesn't show up directly on the AWS bill.