Patterns

Pattern 10 of 26

Routing and Intent Detection

Match the task to the model, not the other way around

Not every request deserves the same model or the same agent. Routing classifies incoming requests and sends each one to the right handler. Simple queries go to a fast cheap model. Complex ones go to the expensive powerful one. Domain-specific requests go to the specialized agent. The goal is matching cost and capability to actual difficulty, rather than treating every request identically.

Why it matters

If you run everything through your most capable model, you are probably overpaying by a lot. RouteLLM showed cost reductions of 40-85% with minimal quality loss just by routing well. Most requests in a production system are simple. A small routing layer that identifies the complex ones and escalates them pays for itself very quickly.

Deep Dive

Routing is a classification step that sits at the front of your agent pipeline. The router looks at the incoming request and decides which handler should take it: a cheap fast model for simple factual queries, an expensive powerful model for tasks that need deep reasoning, a specialized agent for domain-specific work. The decision can be based on content, cost constraints, latency requirements, or a learned preference model. The core insight is that most real-world requests are not uniformly difficult, and treating them as if they are wastes money without improving quality.

RouteLLM from the LMSYS team, presented at ICLR 2025, is the most rigorous demonstration of this. They trained a small classifier on human preference data to predict which model a person would prefer for a given query. Using that classifier for routing, they achieved 40-85% cost reductions with minimal quality loss. The intuition behind it is straightforward: most queries do not require the best model available. A routing layer that reliably identifies the minority that do is worth building.

You do not need a learned model to get most of the benefit. Anthropic's December 2024 agents post recommends a simpler approach: add a classification step at the start of any workflow where different request types need different handling. Ask the model to classify the intent into a small set of categories, then route based on the result. I have done this in production and it works. The important thing is making the routing step explicit and observable. When a monolithic agent implicitly decides how to handle a request, you cannot see or debug that decision. When routing is a separate, logged step, you can.

In the Wild

NVIDIA LLM Router
Vercel AI SDK Middleware
Hierarchical Agent Routing

Go Deeper

PAPERRouteLLM: Learning to Route LLMs with Preference DataARTICLERouteLLM: An Open-Source Framework for Cost-Effective LLM RoutingGUIDEBuilding Effective AgentsDOCSTicket Routing Use Case Guide

Related Patterns