The agent harness.
The part that makes agents reliable.
An agent harness is everything around the model that makes it do the right thing: the context you hand it, the limits you set, and the checks its work has to pass before it reaches you. Models keep changing. The harness is the part that compounds.
What is an agent harness?
An agent harness is the layer around the model: the instructions, files, tools, and gates that turn a general model into something that does your specific work the same way every time. The model is the engine. The harness is the car you actually drive.
Context
What the agent reads before it acts. AGENTS.md and CLAUDE.md, the repo, the docs, the one example that shows the house style. Get this wrong and nothing downstream saves you. Write it light →
Restraint
What the agent is not allowed to do. The scope of the task, the actions it can take by name, the gate it can't skip. Restraint is what lets you stop watching every step. Run the scope filter →
Power
The tools, commands, and model choices that let it do real work without wandering off. The right model for the part, the slash command for the repeatable move. See model routing →
Six parts make a harness hold.
Miss one and the agent still runs. It just quietly does the wrong thing. Name all six and you have something you can hand a teammate. Each part below is a working piece on this site.
Context files
AGENTS.md and CLAUDE.md are the two files the agent reads first. Keep the shared rules in AGENTS.md so any tool can read them, and keep both short. The free starter ships both. Get the starter →
Scope
Right-size the work before the agent touches it. A v1 that can't ship in two weeks is a v1 the harness can't keep honest. Pick the lane, name the fallback. Run the scope filter →
The build loop
Scope, spec, build, diff-review, verify, ship, write up. The synchronous loop you run by hand until the diff looks the same every time. This is the muscle everything else sits on. The build loop →
Model routing
The right model for each part. Frontier reasoning for the plan, a coding model for the diff, something fast and cheap for extraction, a verifier to check the work. One agent, several engines. Model routing →
Async routines
Once you trust the manual loop, turn it into something that runs on a trigger, with a checkpoint artifact and an approval line you can review. Routines come after the loop, not before. Async routines →
Review gates
The check the work passes before it reaches you. Tests, a lint, a self-evaluation, a diff you actually read. Cheap, deterministic, run every time. No gate means you are the gate, every time. The context audit →
A harness compounds. A clever prompt doesn't.
A prompt fixes one reply. A harness fixes the next thousand. Every correction you make becomes a rule the agent keeps, and the rules outlive whatever model you happen to be using this month.
Models change, the harness stays
Swap Claude Code for Codex, or this month's model for next month's, and your AGENTS.md, your gates, and your routing carry straight over. The harness is the stable part. Why context compounds →
Corrections become rules
Every time you catch the agent doing the wrong thing, the fix goes into the harness as a rule. Same mistake doesn't come back. That's the whole flywheel: review, write it down, never repeat it.
The team inherits it
A clever prompt lives in your head. A harness is the thing you can hand to someone else and have them ship the same quality on day one. What I teach →
Where harnesses go wrong.
Most broken harnesses fail the same three ways. None of them are about the model.
A 500-line CLAUDE.md
Past roughly 60 lines, instruction-following starts to drop. A giant context file reads like a sign nobody follows. Cut it to the rules that apply every time. Keep it light →
No gate
If nothing checks the work before you see it, you're the gate on every run. That feels safe and doesn't scale. Add one cheap, deterministic check the work has to pass first.
Trusting it unattended too early
Put a workflow on a trigger only after you've run it by hand and the diff looked the same three times. Automating a loop you don't trust just makes the mistakes faster. Routines come last →
Questions builders ask.
The four that come up in every cohort and every consulting call.
What is an agent harness?
The layer around the model: the files it reads, the tools it can use, the limits it works inside, and the checks its output passes before it reaches you. The model is the engine; the harness is what makes it drive your road.
AGENTS.md vs CLAUDE.md?
Same job, two audiences. AGENTS.md is the open file most coding agents read. CLAUDE.md is Claude Code's file. Keep the shared rules in AGENTS.md and let CLAUDE.md point at it, so a non-Claude agent isn't flying blind.
How long should my CLAUDE.md be?
Short. Past roughly 60 lines, models start skipping instructions. Keep the rules that apply every time and move the rest into Skills or docs the agent loads on demand.
Claude Code and Codex both?
One harness, both tools. The point of AGENTS.md is that the context and gates travel. Write them once, and moving from Claude Code to Codex is a model swap, not a rebuild.
Get the harness, then build the habit.
The free starter drops AGENTS.md, CLAUDE.md, a scope filter, and five slash commands into any project. The cohort is where you turn it into a habit that sticks, on one real tool you ship in four weeks.