Per-Phase Model Routing for AI Coding Agents in Pi

One model for every kind of work is the compromise you stopped noticing. Exploring wants a different brain than building. The harness should switch for you, not make you remember.

You pick a model at the start of a session and then you run it for everything. You use it to explore an unfamiliar codebase, to plan a change, to write the code, to review the diff, to fix what the review caught. Those are five different jobs, and you handed all of them to one model on a setting you chose before you knew what the work would be.

That is a compromise, and it costs you on both ends. On the easy phases you overpay, running a strong expensive model to do exploration that a cheap one handles fine. On the hard phases you underthink, running whatever you last selected instead of the careful model the job deserves. I got tired of paying that tax, so I wired the switch into the harness. This is how, and why it belongs there.

One model for everything is a compromise

Think about what each phase actually wants.

Exploring wants a cheap model with a wide, roaming attention and plenty of reasoning budget. It is going to read a lot and commit to nothing. Building wants the opposite: a precise coding model that takes an approved plan and executes it with little ceremony and low reasoning overhead. Reviewing wants a skeptical model at high effort, looking for the thing that will break. Fixing wants precision again, scoped tight.

Run one model across all five and it fits none of them. This is not a small inefficiency. Over a week of real work it is the difference between a bill you resent and a bill you forget about, and between a review that catches the bug and one that waves it through.

One model, every phase

01Overpay running a strong model on exploration
02Underthink when the hard phase gets your default
03Forget to switch, so the model rarely fits the work
04The reasoning budget is wrong in both directions

One model per phase

01A cheap model roams while you explore
02A precise model executes the build
03A careful model at high effort runs the review
04The switch happens on its own, every time

Same work, two ways to assign the model.

OpenCode got this right with modes

The tool that taught me this was OpenCode. Its real value is not the models it ships, it is the idea of modes. You define a set of named agents, and each one carries its own model, its own temperature, its own step budget, its own permissions. You switch mode and the whole configuration switches with it.

Here is the shape of my own setup, trimmed to the point:

Modes, each with its own model

Explore reads and cannot edit. Build gets a coding model at low temperature and can write. The mode is the unit, and switching it switches everything.
Mode	Model	Temp	Can edit?
explore	glm-5.1	0.7	no
plan	deepseek-v4-pro	0.2	no
build	kimi-k2.6	0.1	yes
review	glm-5.1	0.0	no
fix	deepseek-v4-pro	0.1	yes

Explore reads and cannot edit. Build gets a coding model at low temperature and can write. The mode is the unit, and switching it switches everything.

The mode is the unit of work. Switch it and you get the right model, the right creativity, and the right permissions in one move. It stops you from exploring with your build model, and from letting your explore model ship code.

Pi has none of this. Pi is one loop and a /model command you drive by hand. I run pi as my daily agent for exactly the reasons in the pi guide: it is minimal and you can see all of it. But minimal meant it had no modes, and driving /model by hand meant I forgot, constantly. So I built the missing piece.

So I wired handoffs into pi

pi-skill-model-handoff is a small pi package. It reads two fields off whatever skill you load and applies them the instant the skill activates: the model, and the thinking level.

The insight is that the work already announces itself through a skill. When you start reviewing, you load your review skill. When you start building, you load your build skill. The skill already knows what kind of work it is, so the model and the reasoning effort belong on the skill. Put them there once, and the harness hands off automatically every time the skill loads. You stop managing model selection as a separate chore.

Wire it up in three steps

Required:
Install the packageRun pi install npm:@felipefontoura/pi-skill-model-handoff, then /reload or restart pi.
Required:
Add model and thinking to each skillTwo optional frontmatter fields in the skill's SKILL.md. Point each phase at the model and reasoning budget it deserves.
Required:
Load a skill and watch the handoffPi prints handoff active: <skill> the moment it switches. The change is visible, not magic behind your back.

The configuration lives in the skill’s frontmatter. Two fields, both optional:

---
name: review
description: Review code changes.
model: openai/gpt-5.5
thinking: high
---

model is any provider-qualified model pi can reach. thinking is one of off, minimal, low, medium, high, xhigh. That is the entire interface. No config file to maintain, no router to reason about. The skill you were going to load anyway now carries its own model and effort.

What happens when a skill loads

Input

You load the skill for the work at hand

READRead the two fields
The package reads model and thinking from the skill's frontmatter as it activates.
SWITCHApply model and effort
It sets the active model and the reasoning budget for the turns that follow. Nothing else changes.
CONFIRMPrint the handoff
Pi shows handoff active: <skill>, so you can see the switch happened and to what.

Output

The model follows the work, and you never touch /model again

Here is the handoff map I run, the same phases OpenCode gives you as modes, expressed as skills:

An example handoff map

The models are yours to choose. The point is that the phase carries them, so cheap exploration and precise building stop being a decision you forget to make.
Skill / phase	Model	Effort
explore	opencode-go/glm-5.1	high
plan	anthropic/claude-sonnet-4-5	high
build	anthropic/claude-sonnet-4-5	minimal
review	openai/gpt-5.5	high
fix	openai/gpt-5.5	medium

The models are yours to choose. The point is that the phase carries them, so cheap exploration and precise building stop being a decision you forget to make.

Tie it to skills, not to modes, on purpose

A mode is a concept you have to remember to enter. It is one more thing to manage next to the actual work. A skill is not. You load a skill because the work called for it, so hanging the model and the effort on the skill means the routing rides along for free. There is no separate switch to flip and forget.

That is one fewer knob, and one fewer knob is the whole design philosophy pi is built on. The best control is the one you do not have to remember to use.

Be honest about what it does not do

If you want a system that reads your message and auto-selects the skill, this is not that, and I would be wary of the version that is. The moment a harness starts guessing which phase you are in, you are back to debugging a black box that decided something for you. I would rather load the skill myself and know exactly what the model is about to become.

This is harness engineering, not a plugin trick

The package is small. The point is not the package. Model routing is a control mechanism, one of the parts that make up a harness, and I added it as a few lines I can read instead of waiting for a vendor to ship modes into a tool I do not control. That is the whole argument of harness engineering: the model is a commodity, and the advantage is in the code you wrap around it.

There is a cost dividend too. Route a cheap model to explore and a strong one only for the build, and the exploratory turns stop billing you at premium rates for work that never needed them. That is the same fight as the rest of the token bill, fought at the routing layer. A minimal harness did not lose to the feature-rich one. It let me add the one control I wanted and skip the twenty I did not.

Model handoffs, quick answers

What does pi-skill-model-handoff do?

It is a pi package that automatically switches the active model and the reasoning effort when you load a skill. Each skill declares a model and a thinking level in its SKILL.md frontmatter, and the package applies them the moment the skill activates, printing handoff active: <skill> so the switch is visible.

How do I install it?

Run pi install npm:@felipefontoura/pi-skill-model-handoff, then /reload or restart pi. After that, add the optional model and thinking fields to any skill's frontmatter and the handoff happens on load.

What can I set for the thinking level?

One of off, minimal, low, medium, high, or xhigh. Pair a high level with your exploration and review phases where reasoning pays off, and a low or minimal level with building, where the plan is already made and you want fast, precise execution.

Does it choose the skill for me based on my prompt?

No, and that is on purpose. The package is passive. Pi still decides which skill loads; it only applies the model and effort after the skill is selected.

Routing that reads your prompt and guesses the phase would put a black box back in the loop. You pick the phase, the harness handles the mechanical handoff.

How is this different from OpenCode modes?

OpenCode has modes as a built-in concept: named agents that each carry a model, temperature, and permissions. Pi has no modes. This package gives you the same per-phase model routing, but attached to skills you already load, as a small open-source package you can read and modify rather than a feature you wait for a vendor to ship.