Back to Field Notes
Cost AnalysisApril 24, 20268 min read

What does a 10,000-user chatbot actually cost to run in 2026?

A concrete breakdown of API costs for a realistic customer-support chatbot across six mainstream LLMs — plus the architectural decisions that move the number by 10×.

DT

Drew Thacker

DrewIs Intelligence LLC

Everyone wants to build an AI chatbot. Few people want to sit down and do the math on what it actually costs to run one for real users. So let's do it.

The scenario

We're modeling a mid-size SaaS company's customer support chatbot with realistic numbers:

  • 10,000 active users per month
  • Each user averages 3 conversations per month
  • Each conversation averages 6 turns (user message + assistant reply)
  • System prompt: 800 tokens (company persona, rules, tool definitions)
  • Conversation history grows across turns, averaging 1,200 tokens of context per call
  • Output length averages 350 tokens per reply

That gives us roughly 180,000 calls per month (10,000 × 3 × 6).

The raw numbers

Without any optimization, running this workload on the flagship tier costs real money:

ModelInput costOutput costMonthly total
Claude Opus 4.7$1,080$1,575$2,655
GPT-5.4$540$945$1,485
Claude Sonnet 4.6$648$945$1,593
Gemini 3 Pro$432$756$1,188
Gemini 3 Flash$108$189$297
DeepSeek V3.2$30$18$48

The gap between the cheapest and most expensive option is 55×. That's the price of Claude Opus 4.7 minus DeepSeek V3.2 — over $2,600 per month that either comes out of your margin or doesn't.

Now apply prompt caching

Most of that 1,200-token context window is the same every call — the system prompt, rules, tool definitions. That's roughly 800 tokens of cacheable prefix (67% of every input).

With caching enabled on providers that support it:

ModelWithout cachingWith 67% cachedSavings
Claude Opus 4.7$2,655$1,988-$667 (25%)
GPT-5.4$1,485$1,160-$325 (22%)
Claude Sonnet 4.6$1,593$1,203-$390 (24%)
Gemini 3 Pro$1,188$1,058-$130 (11%)

Caching is free leverage. Turn it on.

Now apply model routing

Here's where it gets interesting. Not every turn needs a flagship model.

Roughly 60% of customer support queries are simple: account questions, order status, basic FAQ lookups, sentiment triage. These can be handled by a cheap model with zero quality loss.

Only about 40% of queries need flagship reasoning: complex multi-step problems, ambiguous requests, anything requiring nuanced tone or judgment.

With a two-tier router (DeepSeek V3.2 for simple + Claude Sonnet 4.6 for complex):

  • 60% of 180,000 = 108,000 calls at $0.00027 = $29
  • 40% of 180,000 = 72,000 calls at $0.00889 = $640
  • Total: $669/month

Compare that to running everything on Claude Sonnet 4.6: $1,593/month. Compare to Claude Opus 4.7 everywhere: $2,655/month.

The takeaway

The difference between an expensive chatbot and a cheap one isn't which model you pick — it's how you compose multiple models.

The engineering team that ships "we use Claude Opus 4.7" pays $2,655/month. The team that ships "we route by complexity, cache our system prompts, cap output tokens, and run evals against our real tickets" pays $500–700/month for better user outcomes.

Do the math before you pick. Your future CFO will thank you.


Numbers in this post are based on listed API pricing as of April 2026. Your mileage will vary based on your actual traffic patterns, prompt design, and negotiated enterprise rates. Use The Token Meter's calculator [blocked] to plug in your own numbers.