Model routing for builders.
Use the right model for the part.
The bleeding edge isn't using the smartest model everywhere. It's knowing which model to reach for in each step of the build — and which one to skip — so you keep latency, cost, and quality on the right side of the curve.
"Pick the best model" was last year's answer.
The frontier models can do anything, badly, if you point them at the wrong job. Coding-specialized agents beat general models on diff-shaped work. Fast extraction models beat frontier reasoning on bulk classification. Local models beat everything when the data can't leave the machine. The operator skill is matching the task to the lane — not memorizing model names.
Match the part of the build to the model class that's actually good at it.
Lanes are named in task-category terms — frontier reasoning, coding-specialized, fast extraction, local, separate verifier — so the routing stays correct even when the specific best model changes. The "Use" and "Avoid" notes hold up across model generations.
Ambiguous product planning
Use: a frontier reasoning model. Long context, careful tradeoffs, the kind of thinking that has to weigh five things at once. Avoid: a fast extraction model — you'll get a confident answer that missed the constraint that mattered.
Fast code edits in a known codebase
Use: a coding-specialized agent wired to your repo. Diff-shaped output, tool use, lint and test feedback. Avoid: a general chat model — it'll write code that looks right and breaks at the import line.
Repetitive extraction or classification
Use: a fast extraction model — small, cheap, predictable. Tag a thousand emails, pull entities out of transcripts, classify support tickets. Avoid: the frontier reasoning model. You're paying ten times the price for the same JSON.
Private or offline experiments
Use: a local or open-weight model. Sensitive data, internal documents you can't send to a cloud API, the cheap repetitive sub-task you don't want metered. Avoid: assuming local is the default — most of your work belongs in a hosted model, see the section below.
Final verification
Use: a separate verifier — a second model, sometimes a smaller one, sometimes a different vendor, that grades the first model's output against a checklist. Avoid: the same model that produced the work. The grader needs to be independent.
Three TeamVince projects, three routing decisions.
Routing only becomes real once you can point at where in a working build you chose lane A over lane B. Each card links to the case study so you can read what was actually shipped.
ADHDOS — dual-model routing
Gemini handles fast routing and planning; Claude covers nuance-heavy emotional and decomposition tasks. Two lanes inside one product because the parts have different shapes. Read the case study →
RefereeOS — synthesis lane, deterministic floor
Five specialized agents run as deterministic Python checks against the same JSON. The Area Chair runs through autogen.ConversableAgent with Gemini because multi-turn reasoning over the whole board is the part that needs a frontier model. Read the case study →
Heartbridge — foundation lane
Built with Lovable for rapid prototyping, then re-architected in Factory AI using Opus to lay a scalable foundation. The MVP and the foundation are different jobs — use a different model for each. Read the case study →
Know when, not first.
Don't start here. The synchronous loop you're building works in a hosted frontier model, and that's the right starting point. Local and open-weight models are a lane you reach for once you've identified a sub-task that genuinely belongs there.
Reach for it when
The data is sensitive enough that it can't leave the machine — patient records, internal contracts, unreleased product docs. Or the sub-task is repetitive enough that the per-call cost of a hosted model starts to matter.
Skip it when
You're shipping a first version. Local setup, model selection, and GPU plumbing trade your build time for a problem you haven't proven you have yet. Hosted is faster for v1 — local can come later for the specific sub-task that justifies it.
Practice it on the side
Pull down a small open-weight model and route one minor task to it — a private inbox tagger, an offline transcript summarizer. The routing muscle is worth building before you need it for a real deliverable.
Routing decisions come after you can ship the loop.
Don't optimize the model mix before you've shipped a synchronous v1 in a single hosted model. The harness teaches the loop. The scope filter checks whether the project belongs in any agent at all. This page is the next step up.
Start with the harness
If you haven't yet, grab the harness. AGENTS.md, CLAUDE.md, five slash commands, scope filter. Ship a v1 in a single hosted model first.
Run the scope filter
Use the scope filter on the project. If it can't ship in 14 days in one model, splitting it across three won't fix that.
Then route the parts
Once v1 is shipping, find the two parts that are clearly mismatched — the bulk extraction running on frontier reasoning, the diff work running on a chat model — and move them to the right lane. Or turn the workflow into a routine once you trust each lane.
The lanes don't change. The names do.
Where this page names specific models — Claude Sonnet, Claude Opus, Codex, Gemini Flash, Gemini Pro — read them as examples of a category, not commitments. The lane labels (frontier reasoning, coding-specialized, fast extraction, local, verifier) hold up as the models shuffle underneath. Last reviewed May 2026.
I'll scope the lanes with you.
Picking the right model for each part is one of the highest-leverage moves on a build, and one of the easiest to get wrong on your own. Bring a project; we'll name the lanes, pick a model for each, and ship the v1.