Pi Coding Agent: The Minimal, Transparent AI CLI

Every mainstream AI coding agent added more in 2025. Pi went the other way, and that is the whole point.

The industry shipped hidden planners, parallel subagents, MCP integrations, and thousands of tokens of scaffolding in the system prompt. Agents got more capable on paper and harder to reason about in practice. The pi coding agent is the counter-bet: four tools, a 150-word system prompt, no hidden layers, and full transparency by default.

This is not a minimalist toy. Pi, built by Mario Zechner under earendil-works, holds its own on Terminal-Bench 2.0 and runs against any frontier model: Anthropic, OpenAI, Gemini, or local. I have it in my daily stack. I shipped 13 production apps in a crypto fintech, solo, with AI agents in 70 days. Pi was part of that harness.

If you have read harness engineering, you already know the frame: Agent = Model + Harness. Pi is the clearest demonstration available that the harness can be smaller than you think and still carry a serious workload.

The four-tool bet is the whole argument

Most AI coding agents hand the model a large toolkit. Shell commands, browser automation, file search, diff utilities, memory stores, dozens of MCP-connected services. The assumption is that more tools means more capability.

Pi ships with four:

read. Read a file.
write. Write a file.
edit. Apply a targeted edit to an existing file.
bash. Run a shell command.

Grep, find, and ls are available but off by default. That is the full list. The argument is that four well-described tools produce more reliable tool-calling than forty loosely described ones, because the model does not have to navigate noise when deciding which one to reach for. A model that always picks correctly from four beats a model that sometimes picks wrong from forty.

There is a harder point underneath that. Four tools covers almost every real coding task: read the repo, write code, edit existing code, run tests and builds. If you want a fifth tool, you already have it: bash. The surface is small not because the design ran out of ideas but because the problem genuinely does not require more.

A 150-word system prompt is a budget decision

Every word in a system prompt is a fixed cost you pay on every turn of every session, forever. A long system prompt does not expire. It sits in the context window alongside your code, your history, and your question, competing for attention.

Pi’s system prompt is around 150 words. The underlying logic is blunt: a frontier model already knows how to be a coding agent. It knows what a code review is. It knows what tests are for. It knows to ask before deleting things. You do not have to teach it. You have to get out of its way.

System-prompt economics

~150
words: 13k to 18k
tokens: 4
tools: 0
hidden layers

The system prompt is a fixed cost on every turn. Pi keeps it near zero.

Compare this to agents that front-load their prompt with tool descriptions, persona instructions, safety layers, and orchestration logic. A popular MCP server alone adds 13,000 to 18,000 tokens before you type your first message. Those tokens do not disappear. They sit in every turn, compressing the room you have left for actual work. If you want the longer treatment on where tokens go, the token-cost breakdown covers it in full.

A 150-word system prompt does not mean an unconstrained agent. It means the constraints live in the right place: the model’s pretraining, not a prompt you maintain.

What pi refuses to ship, and why each refusal is correct

No MCP by default. Model Context Protocol lets agents connect to external services. The idea is reasonable. The cost is not. A typical MCP server registers 13,000 to 18,000 tokens of tool descriptions into the context window on every session, before the model reads a line of your code. Pi leaves MCP out by default. If you want a specific integration, you wire it deliberately and the cost is explicit. See MCP vs CLI for the tradeoffs in full.

No hidden plan mode. Some agents run an internal planning step before surfacing a response. You see the output; you do not see the reasoning. Pi’s planning lives in PLAN.md and AGENTS.md, on disk, where you put it. The agent reads them as regular files. If you want to understand what the agent knows about the project, you open the AGENTS.md file. There is no abstraction between you and the plan.

No invisible subagents. Parallel subagents are a real performance improvement for some workloads. They are also a debugging nightmare when something goes wrong and you cannot tell which agent did what. Pi’s answer: invoke pi from a bash command. The subagent is a bash call. You see the invocation, you see the output, you can read the session file afterward. The orchestration is visible because you wrote it.

One loop, any model

Pi does not tie you to a provider. It works with Anthropic, OpenAI, Gemini, and local model endpoints. You supply the API key. You pick the model at session start, or switch with /model mid-session. The agent loop is identical regardless of what is on the other end.

This matters for cost more than vendor allegiance. When a task needs strong reasoning, run the best available model. When a task is mechanical (renaming, reformatting, boilerplate), run a smaller cheaper one. The loop does not change. The bill does.

Managed agent (locked provider)

01Model tied to the vendor's stack
02Pricing set by the platform
03Switching models means migrating the workflow
04No task-level cost optimization

Pi (bring your own key)

01Anthropic, OpenAI, Gemini, or local
02You pay the API rate directly
03Switch with /model, no migration
04Match the model to the task

Provider lock-in changes the economics of every session.

Give each phase its own model and effort

Here is the one thing OpenCode got right that pi leaves to you: modes. In OpenCode you switch to an explore mode or a build mode, and each one carries its own model, its own reasoning budget, its own permissions. Explore runs a cheap, wide model. Build runs a precise coding model. Review runs a careful one. Pi has none of that built in. It has one loop and a /model command you drive by hand.

So I wrote the missing piece as a package. pi-skill-model-handoff reads two fields off a skill and applies them the moment the skill loads: the model, and the thinking level. Load your explore skill and pi drops to a cheap, wide-ranging model. Load your build skill and it switches to a precise coding model at low reasoning. The mode follows the work, and you stop touching /model.

Install it in one line and point your skills at the models you want:

pi install npm:@felipefontoura/pi-skill-model-handoff

---
name: review
description: Review code changes.
model: openai/gpt-5.5
thinking: high
---

That is the whole interface. Two optional fields in a skill’s frontmatter, model and thinking (off, minimal, low, medium, high, xhigh). When the skill loads, pi prints handoff active: review and the switch is done.

An example handoff map

The models are yours to choose. The point is that the phase carries them, so cheap exploration and precise building stop being a manual decision you forget to make.
Skill / phase	Model	Effort
explore	opencode-go/glm-5.1	high
plan	anthropic/claude-sonnet-4-5	high
build	anthropic/claude-sonnet-4-5	minimal
review	openai/gpt-5.5	high
fix	openai/gpt-5.5	medium

The models are yours to choose. The point is that the phase carries them, so cheap exploration and precise building stop being a manual decision you forget to make.

Be honest about what it does not do. It is passive. Pi still decides which skill loads, and the package does not read your prompt to pick a phase for you. It applies the model and effort after the skill is chosen. That is a feature, not a gap. Routing that guesses your intent is the hidden magic pi exists to avoid. You pick the phase, the harness handles the handoff.

This is the payoff of a small, transparent harness in one example. OpenCode’s modes are a feature you wait for a vendor to ship and shape. Here the same capability is a package you can read in a sitting, install in one line, and bend to your own phases. The minimal harness did not lose to the feature-rich one. It let me add the one control I wanted and skip the twenty I did not.

History that does not disappear

Pi saves every session as a JSONL file. Open it in any text editor and you have a full record: every tool call, every model response, every branch. Auto-compaction runs when the context window gets close to the limit. Trigger it manually with /compact if you want to reduce token burn mid-session.

The session commands: /resume continues a past session, /fork branches from a specific point in history, /tree shows the branch structure. The loop itself has no arbitrary step cap. It runs until the model stops calling tools. On long tasks that means you do not have to restart because the agent hit a limit you did not set.

Who should run pi

Pi fits if

Required:
You work in a terminal.Pi has no GUI. If your workflow is keyboard and shell, this costs you nothing. If not, it is genuine friction.
Required:
You want to see what the agent is doing.Transparency is pi's core value. If you want automation and guardrails and do not care to inspect the loop, a managed agent with richer tooling is a reasonable choice.
Required:
You manage your own token spend.Bring-your-own-key means you pay API rates directly and can optimize per task. On a flat subscription plan where cost is not a variable you control, this advantage disappears.
Optional:
You want to switch models by task.Running one provider consistently is fine. The /model command is useful if you do multi-model work or want to compare outputs on the same task.
Anti-pattern:
You need managed enterprise tooling.Pi is an open-source CLI. It is not a managed SaaS product with audit logs, role-based access controls, or vendor SLAs. Wrong tool for that context.
Anti-pattern:
You want an IDE with a project management panel.Cursor, Windsurf, Kiro. All of them are better answers if you want the GUI experience. Pi is a loop in your terminal and nothing else.

Honest criteria. Pi is not the right tool for everyone.

The 13 apps I shipped in 70 days used pi as part of the harness. That is one real datapoint from one builder with a terminal-first workflow who had specific reasons to keep the loop visible. Your situation may be different.

Transparency is what you are actually buying

The four tools and the 150-word system prompt are not the product. They are consequences of one decision: make the agent something a programmer can fully understand.

When pi does something wrong, you have four places to look. A bad file read. A write that targeted the wrong path. A bash command that returned noise. A model response that misread the instruction. The debugging surface matches the tool surface exactly.

That sounds obvious. Most agents make it nearly impossible. The MCP layer has 18,000 tokens of tool descriptions you did not write. The hidden planner ran three steps before you saw the first output. The subagent pool dispatched work you did not observe. When something breaks, the failure lives inside a box you cannot open.

Pi opens the box. If you want to run it, it is on GitHub under earendil-works. Install it, watch the tool calls, read the session files. What you see is what runs.

Common questions

How do I install the pi coding agent?

Pi ships as an npm package. Install it globally with npm i -g @mariozechner/pi, then run pi to start a session. You will need an API key for whichever provider you use: Anthropic, OpenAI, Gemini, or a local model endpoint.

Does the pi coding agent support MCP?

Not by default. Pi deliberately excludes MCP out of the box because a typical MCP server adds 13,000 to 18,000 tokens of tool-description overhead to every session before you type a single character. If you want a specific MCP integration, you can wire it manually. The cost becomes explicit, not hidden.

Can I use pi with Claude Sonnet or Opus from Anthropic?

Yes. Pi is provider-agnostic. It works with Anthropic (including Sonnet and Opus), OpenAI, Gemini, and local model endpoints. You supply your own API key. Switch the active model during a session with the /model command.

How does pi handle long sessions that approach the context limit?

Pi auto-compacts when the context window gets close to the limit, keeping the session alive without forcing a restart. You can also trigger compaction manually with /compact at any point.

Full session history is saved as JSONL files. Use /resume to continue a past session, /fork to branch from a specific point in the history, and /tree to see the full branch structure. Nothing gets thrown away.

Is the pi coding agent production-ready?

Pi holds its own on Terminal-Bench 2.0, which is one objective measure. The more useful answer: I used it as part of the harness that shipped 13 production apps in a crypto fintech in 70 days, solo. That is a real workload across a sustained period, not a benchmark run. Whether it is the right fit for your production workflow depends on your tolerance for a terminal-only, GUI-free tool. The capability is there.

What is the difference between pi and Claude Code?

Pi is minimal and open-source: four tools, a short system prompt, no MCP by default, and full session transparency via JSONL. Claude Code is a managed agent product from Anthropic with richer built-in tooling, native MCP support, a permission and hook system, and ongoing feature development from the Anthropic team. If you want the full feature surface and are already on Anthropic's ecosystem, Claude Code is the natural choice. If you want a loop you can fully inspect, fork, and modify at the source level, pi is the better fit.