What is the difference between an LLM router and an AI inference marketplace?

An LLM router or gateway is a convenience layer for an app or agent builder: it gives you one endpoint and key across many providers and fails over when one breaks. It picks among posted rates but cannot make providers compete for a specific job. A two-sided marketplace clears each job at a price set by live competition, surfaces specialist models on value, and gives providers a say in which work they take and at what price.

Does an LLM router optimize for cost?

Not in the way a market does. A router can sort a static rate card and pick the cheaper listed endpoint, but it hands your request to a provider at that provider's posted price. It never creates competition for your individual job, so it cannot capture the clearing price that a two-sided marketplace produces.

Why does a marketplace surface specialist models a router does not?

A router exposes the models its operator chose to integrate, which biases usage toward a few well-known frontier models. A marketplace matches every job to whatever model is the best value for it, including small fine-tuned or purpose-built models. In the LoRA Land study, fine-tuned models beat GPT-4 by about 10 points on average across 31 specialized tasks, and 25 of them ran on a single GPU.

← Blog

EssayJune 29, 2026

A router finds you a model. A marketplace finds you a price.

By Federico Enni · 9 min read

When teams first wire up AI, they hit the same wall: every model lives behind a different API, with a different key, a different schema, and a different way of falling over at 3am. The LLM router — also sold as a gateway — was the answer. One endpoint, one key, many models, automatic failover when a provider has a bad night. It is a genuinely useful tool, and most serious AI stacks should have one. But it solves the integration problem, not the economics, and those are not the same problem. A router decides where your request goes. It was never built to decide what your request should cost.

A router picks a model from a list it was handed. A marketplace makes a price from sellers who compete. Those are different machines, and only one of them works on your bill.

What a router actually solves

A router or gateway is a convenience layer for an app or agent builder, and a good one earns its place. It collapses a dozen integrations into a single interface, normalises the request format, handles keys and rate limits, and reroutes around a provider that goes down. The value it delivers is reach and reliability: every model you might want, reachable through one door, with the failover plumbing already written. That is real engineering relief, and the scale of what now sits behind that one door is striking. OpenRouter, one widely used gateway, reports access to more than 300 models from over 70 providers, and says it processed roughly 100 trillion tokens in the year to 30 November 2025.¹

That reach is the whole pitch, and it is worth having. But notice what kind of problem it is. Reach is about access — can you call this model at all, and will the call still work when something breaks. None of that touches the question of whether the call was priced well. A gateway can put a thousand models within reach and still pass every one of them through to you at exactly the rate its provider chose to post.

Routing is not pricing

The thing a router optimises is which endpoint receives your request, not what that request clears at. When a job arrives, the router consults a fixed set of integrations its operator wired up and forwards the work to one of them at that provider's posted rate. Even a "cost-aware" router is only ever sorting a static rate card — picking the cheaper number from a list it did not create. It never makes the providers compete for your specific job, because there is no mechanism underneath it that could.

That gap matters more every quarter, because the underlying prices are not stable — they are in freefall. Epoch AI found that the cost of running a fixed task on a language model has fallen by a median of about 50x per year, with the rate ranging from 9x to 900x depending on the benchmark.² Stanford's AI Index puts a single line on the same trend: the cost to query a model at GPT-3.5 quality dropped from $20.00 to $0.07 per million tokens between November 2022 and October 2024 — roughly 280-fold in eighteen months.³ A static integration list cannot track a market moving that fast in that many directions. A market clears at it by construction.

Most of your work is a job, not a prompt

The deeper reason a router can't price well is that the unit it moves is too thin to price against. A router forwards a request — a prompt and a destination. But the work behind that request usually has far more shape than a prompt can carry: a deadline you can tolerate, a ceiling you won't exceed, and a use case that defines what "good" even means. "Classify last night's support backlog by morning" is not a prompt you wait on. It is a job you hand off, and almost none of its real character survives the trip through a router.

The moment you describe work as a job — deadline, ceiling, use case — it stops being a request to forward and becomes an order to fill.

And an order is something a market can act on. Once a job carries its own terms, you can let sellers bid to fill it inside your window and under your ceiling, instead of accepting whichever posted rate the router happened to forward you to. The choice of which operations to run this way is yours — the latency-tolerant pipelines, the bulk enrichment, the overnight evaluation runs — and each one you convert from prompt to job is a line item you can now shop rather than simply pay.

The models a router never shows you

Reach also turns out to be narrower in practice than it looks on paper. A gateway exposes the models its operator chose to integrate, and usage clusters hard around the famous few, so most teams quietly standardise on a handful of frontier models they already trust. The catalogue, meanwhile, is fragmenting the other way: on OpenRouter, programming alone grew from about 11% of token usage in early 2025 to more than half, and open-weight models climbed to roughly 30% of usage by late 2025.¹ The interesting work is concentrating exactly where specialist models tend to win.

And they do win. In Predibase's LoRA Land study, 310 small fine-tuned models were measured across 31 tasks; the fine-tuned models beat GPT-4 by about 10 points on average, and outscored their own untuned base models by 34 — while 25 of them ran together on a single GPU.⁴ A model of that kind, trained for one narrow job, can deliver better answers at a fraction of the cost of a general-purpose giant. A router will not surface it unless someone integrated it first. A marketplace surfaces it because it won on value — including models an enterprise would never have known to look for.

The side a router forgets: the provider

Every router shares one blind spot, and it is structural: a router only has one side of the story. To a gateway, a provider is a static integration — a destination that takes whatever traffic gets routed to it, at the rate it posted, with no visibility into the demand on the other side and no say in which workloads it handles. The provider is something the router reads a price off of, not something that participates.

To a router, a provider is a price to read off a list. In a market, a provider is a participant who sets one.

That missing seat is expensive for everybody, because the supply side is sitting on enormous idle capacity. Cast AI's 2026 study of tens of thousands of Kubernetes clusters found average GPU utilisation of just 5% — the silicon spends roughly 95% of its life waiting.⁵ A provider with that much slack would happily take more work, at the right price, on the right schedule — but a router gives it no instrument to express any of that. It can only wait to be routed to.

A marketplace hands that instrument over. On Keld, a provider uses Keld Trade to post the models it wants to sell, against the use cases it chooses, at prices it is happy with — and to move those asks as its own capacity and the live book shift. It is opting in to specific demand, not absorbing whatever a router points at it. Underneath, Keld Flow takes the jobs it wins and slices them into micro-batches paced to the provider's real-time fleet limits, so matched work lands in lockstep with available compute rather than spiking a cluster. Idle GPUs become yield, and the provider keeps its hand on every lever. That is a relationship a router has no way to offer.

	Router / Gateway	Marketplace (Keld)
What it optimises	Reach and reliability — call any model, fail over when one breaks	Price, deadline and fit — clears each job against live supply
Who it serves	The buyer only (demand side)	Both sides — buyers and providers, on one book
How price is set	The provider's posted rate on a static integration	A clearing price set by competition for your specific job
The provider's role	A fixed endpoint that takes whatever it's routed	A participant that posts models, use cases and prices it chooses
Model discovery	The models the operator wired up	Any model that wins on value — including specialists you'd never integrate

A router and a marketplace answer different questions. The first is "how do I reach a model?" The second is "what should this job cost, and who should run it?"

Two sides, one clearing price

This is the line between the two tools. A router is a one-sided pipe that moves your request to a model. A marketplace is a two-sided book where buyers post jobs with ceilings and deadlines, providers post models with prices, and the matching engine clears the two against each other. Because both sides act with intent, the price a job settles at is the real price — what a willing seller will take to run this work, on time, today — rather than a number copied off a rate card.

Why this is a market, not a menu · 2025–2026

300+models from 70+ providers reachable through a single gateway — the choice surface a router exposes today

~50x / yrmedian annual fall in the cost of a fixed language-model task — prices a static list cannot keep up with

5%average GPU utilisation across tens of thousands of clusters — idle supply a marketplace can put to work

Sources: OpenRouter State of AI (to Nov 30, 2025); Epoch AI inference price trends; Cast AI 2026 State of Kubernetes Optimization Report. Retrieved June 29, 2026.

For the buyer, that clearing mechanism is what turns the freefall in prices and the abundance of specialist models from a number on a chart into a number on your invoice. You get inference that is better suited to the job — because the job's use case, not the operator's integration list, decides the model — at a price the market set rather than one a provider posted. For the provider, the same mechanism is what turns idle silicon into revenue on terms it controls. The neutrality is the point: Keld does not steer inference demand to a favoured seller or settle on anything but a job's stated terms, which is exactly why the price you pay is the honest clearing price, with nobody's thumb on the scale.

You don't have to rip out your router

None of this asks you to tear down what works. Keep your gateway for the real-time, latency-critical calls it handles well. The shift is to stop treating every piece of AI work as an undifferentiated request and start treating the complex, long running jobs as orders. Map your spend in Keld Atlas to see which jobs those are, point them at the marketplace through the same SDKs and plugins you already use, and let each one clear against the best-value provider that can meet its deadline and ceiling. A router got every model within reach. A market is what finally gets each job to the right one, at the right price — and lets the providers on the other side choose to be there. If you build or host models, that other side is open to you too.

Sources

OpenRouter, "State of AI" — 300+ models across 70+ providers; ~100 trillion tokens processed in the year to Nov 30, 2025; programming token share rising from ~11% to over half; open-weight models ~30% of usage by late 2025, retrieved June 29, 2026 — openrouter.ai/state-of-ai
Epoch AI, "LLM inference price trends" — the cost of a fixed task fell a median of ~50x per year (range 9x–900x across benchmarks), published March 12, 2025, retrieved June 29, 2026 — epoch.ai
Stanford HAI, "AI Index Report 2025," Research & Development chapter — cost to query a GPT-3.5-quality model fell from $20.00 to $0.07 per million tokens (Nov 2022–Oct 2024), retrieved June 29, 2026 — hai.stanford.edu
Justin Zhao et al. (Predibase), "LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report," arXiv:2405.00732, 2024 — fine-tuned models beat GPT-4 by ~10 points on average across 31 tasks and base models by 34; 25 models served on a single A100 80GB GPU, retrieved June 29, 2026 — arxiv.org/abs/2405.00732
Cast AI, "2026 State of Kubernetes Optimization Report" — average GPU utilisation of 5% across tens of thousands of analysed clusters, released April 21, 2026, retrieved June 29, 2026 — cast.ai