Model Routing for Builders

Model routing for builders.
Use the right model for the part.

The bleeding edge isn't using the smartest model everywhere. It's knowing which model to reach for in each step of the build (and which one to skip) so you keep latency, cost, and quality on the right side of the curve.

Free walkthrough. Five lanes. Worked examples from TeamVince builds.

Why this matters now

"Pick the best model" was last year's answer.

The frontier models can do anything, badly, if you point them at the wrong job. Coding-specialized agents beat general models on diff-shaped work. Fast extraction models beat frontier reasoning on bulk classification. Local models beat everything when the data can't leave the machine. The operator skill is matching the task to the lane, not memorizing model names.

Five tasks, five lanes

Match the part of the build to the model class that's actually good at it.

Lanes are named in task-category terms (frontier reasoning, coding-specialized, fast extraction, local, separate verifier) so the routing stays correct even when the specific best model changes. The "Use" and "Avoid" notes hold up across model generations.

Ambiguous product planning

Use: a frontier reasoning model. Long context, careful tradeoffs, the kind of thinking that has to weigh five things at once. Avoid: a fast extraction model. You'll get a confident answer that missed the constraint that mattered.

Fast code edits in a known codebase

Use: a coding-specialized agent wired to your repo. Diff-shaped output, tool use, lint and test feedback. Avoid: a general chat model. It'll write code that looks right and breaks at the import line.

Repetitive extraction or classification

Use: a fast extraction model: small, cheap, predictable. Tag a thousand emails, pull entities out of transcripts, classify support tickets. Avoid: the frontier reasoning model. You're paying ten times the price for the same JSON.

Private or offline experiments

Use: a local or open-weight model. Sensitive data, internal documents you can't send to a cloud API, the cheap repetitive sub-task you don't want metered. Avoid: assuming local is the default. Most of your work belongs in a hosted model, see the section below.

Final verification

Use: a separate verifier: a second model, sometimes a smaller one, sometimes a different vendor, that grades the first model's output against a checklist. Avoid: the same model that produced the work. The grader needs to be independent.

Routing diagnostic

Start from the symptom, not the model name.

Most routing mistakes show up as a familiar failure pattern: slow runs, vague diffs, brittle JSON, private data questions, or answers you can't trust. Use the symptom to decide which lane to test next.

The build is slow and expensive

Check whether you're using frontier reasoning for a task that is really extraction, tagging, or formatting. Move the repetitive step to a fast extraction lane, keep the reasoning model for the judgment call.

The code looks right but breaks

Route the edit through a coding-specialized agent inside the repo, with the relevant tests and lint command in view. Chat-shaped code is fine for sketching; diff-shaped code needs repo feedback.

The output sounds confident but wrong

Add a separate verifier with a checklist, source requirement, or tiny eval set. Don't ask the same model to grade its own work when the decision matters.

The data cannot leave the machine

That's when local or open-weight models enter the conversation. Keep the private sub-task small: summarize, tag, redact, or extract. Don't rebuild the whole workflow locally unless the whole workflow needs it.

The agent keeps drifting

Before switching models, check the harness: AGENTS.md, allowed actions, acceptance criteria, and a review command. Routing won't fix a task the agent cannot read clearly.

You don't know if it improved

Stop tuning by vibes. Save five representative inputs, write the expected behavior, and run the same check after each routing change. That is enough eval for a first pass.

What this looks like in real builds

Three TeamVince projects, three routing decisions.

Routing only becomes real once you can point at where in a working build you chose lane A over lane B. Each card links to the case study so you can read what was actually shipped.

ADHDOS: dual-model routing

Gemini handles fast routing and planning; Claude covers nuance-heavy emotional and decomposition tasks. Two lanes inside one product because the parts have different shapes. Read the case study →

RefereeOS: synthesis lane, deterministic floor

Five specialized agents run as deterministic Python checks against the same JSON. The Area Chair runs through autogen.ConversableAgent with Gemini because multi-turn reasoning over the whole board is the part that needs a frontier model. Read the case study →

Heartbridge: foundation lane

Built with Lovable for rapid prototyping, then re-architected in Factory AI using Opus to lay a scalable foundation. The MVP and the foundation are different jobs, so use a different model for each. Read the case study →

Local and open-weight

Know when, not first.

Don't start here. The synchronous loop you're building works in a hosted frontier model, and that's the right starting point. Local and open-weight models are a lane you reach for once you've identified a sub-task that genuinely belongs there.

Reach for it when

The data is sensitive enough that it can't leave the machine: patient records, internal contracts, unreleased product docs. Or the sub-task is repetitive enough that the per-call cost of a hosted model starts to matter.

Skip it when

You're shipping a first version. Local setup, model selection, and GPU plumbing trade your build time for a problem you haven't proven you have yet. Hosted is faster for v1, and local can come later for the specific sub-task that justifies it.

Practice it on the side

Pull down a small open-weight model and route one minor task to it: a private inbox tagger, an offline transcript summarizer. The routing muscle is worth building before you need it for a real deliverable.

Where this fits

Routing decisions come after you can ship the loop.

Don't optimize the model mix before you've shipped a synchronous v1 in a single hosted model. The harness teaches the loop. The scope filter checks whether the project belongs in any agent at all. This page is the next step up.

Start with the harness

If you haven't yet, grab the harness. AGENTS.md, CLAUDE.md, five slash commands, scope filter. Ship a v1 in a single hosted model first.

Run the scope filter

Use the scope filter on the project. If it can't ship in 14 days in one model, splitting it across three won't fix that.

Then route the parts

Once v1 is shipping, find the two parts that are clearly mismatched (the bulk extraction running on frontier reasoning, the diff work running on a chat model) and move them to the right lane. Or turn the workflow into a routine once you trust each lane.

A note on specific models

The lanes don't change. The names do.

Where this page names specific models (Claude Sonnet, Claude Opus, Codex, Gemini Flash, Gemini Pro), read them as examples of a category, not commitments. The lane labels (frontier reasoning, coding-specialized, fast extraction, local, verifier) hold up as the models shuffle underneath. Last reviewed May 2026.

Want help building a routing setup?

I'll scope the lanes with you.

Picking the right model for each part is one of the highest-leverage moves on a build, and one of the easiest to get wrong on your own. Bring a project; we'll name the lanes, pick a model for each, and ship the v1.

1:1 coaching → · Consulting build → · Next cohort →