LLM Pricing Isn't Falling
Everyone knows LLM prices are in freefall. Here, I've plotted every model released since 2023 with a line of best fit.
That's a bit weird — doesn't seem like model pricing is decreasing at all. Well, that's ok because we are getting more intelligence per token. So this should reflect in Artificial Analysis's cost to run a benchmark... right?
OK that's facetious, but there's a real point here. Anthropic's economic index found that contrary to most chatbot usage, API usage is mostly automation tasks — managing inboxes, extracting invoice fields, generating sales content. These are solved problems running at scale. From experience a smarter output token here doesn't save you anything when the input PDF is 25,000 tokens either way. And the reliability gain from Sonnet over Haiku is almost always worth more than having staff reroute bad triages.
Small Models Are Getting More Expensive
Additionally we need to talk about small models. Now it's no secret to anyone that the big 3 have been jacking up small model prices over time.
What often isn't discussed is that they are also getting more and more frustrating to deal with. Gemini 3 Flash is reasoning only, the opposite of what I'd want from a small ultra-fast model. GPT-5 nano has oddly long (often 30+ second) response times. Haiku has decided to leave the budget tier altogether.
Stop Comparing Across Tiers
And lastly we need to stop this practice of talking about thousand-fold cost reductions — comparing o3-preview down to GPT-5. It's frustrating.
It's Only Getting Worse
Last year classifying an invoice cost a fixed prompt. A thousand tokens in, a couple out. Done. Now I boot up an agent and it takes a screenshot of my browser, opens my finance software, takes another screenshot, clicks the recent transactions tab, takes another screenshot, reads the line items, classifies each one. A million tokens for what a thousand used to do. A heavy day of agentic coding blows through 300 million.
And it's accelerating. Agent teams spin up multiple instances working in parallel, each with its own context window, each burning tokens independently - roughly 7x the consumption of a solo session. Scheduled tasks run overnight unattended. The per-token price keeps falling. Nobody's bill is.