Notes

Field Note

← Home
Feb 22, 2026 · Rob Kopel

LLM Pricing Isn't Falling

Everyone knows LLM prices are in freefall. Here, I've plotted every model released since 2023 with a line of best fit.

Price:

That's a bit weird — doesn't seem like model pricing is decreasing at all. Well, that's ok because we are getting more intelligence per token. So this should reflect in Artificial Analysis's cost to run a benchmark... right?

All models priced at $3 input / $15 output per MTok — same price, very different bills

OK that's facetious, but there's a real point here. Anthropic's economic index found that contrary to most chatbot usage, API usage is mostly automation tasks — managing inboxes, extracting invoice fields, generating sales content. These are solved problems running at scale. From experience a smarter output token here doesn't save you anything when the input PDF is 25,000 tokens either way. And the reliability gain from Sonnet over Haiku is almost always worth more than having staff reroute bad triages.

Small Models Are Getting More Expensive

Additionally we need to talk about small models. Now it's no secret to anyone that the big 3 have been jacking up small model prices over time.

OpenAI (mini)OpenAI (nano)AnthropicGoogle
Budget & small models from each provider — output $/MTok, log scale

What often isn't discussed is that they are also getting more and more frustrating to deal with. Gemini 3 Flash is reasoning only, the opposite of what I'd want from a small ultra-fast model. GPT-5 nano has oddly long (often 30+ second) response times. Haiku has decided to leave the budget tier altogether.

Stop Comparing Across Tiers

And lastly we need to stop this practice of talking about thousand-fold cost reductions — comparing o3-preview down to GPT-5. It's frustrating.

Frontier models only — output $/MTok, log scale

It's Only Getting Worse

Last year classifying an invoice cost a fixed prompt. A thousand tokens in, a couple out. Done. Now I boot up an agent and it takes a screenshot of my browser, opens my finance software, takes another screenshot, clicks the recent transactions tab, takes another screenshot, reads the line items, classifies each one. A million tokens for what a thousand used to do. A heavy day of agentic coding blows through 300 million.

And it's accelerating. Agent teams spin up multiple instances working in parallel, each with its own context window, each burning tokens independently - roughly 7x the consumption of a solo session. Scheduled tasks run overnight unattended. The per-token price keeps falling. Nobody's bill is.