MCP vs CLI Tools for AI Agents: Which to Use and When

Every MCP server you install is a silent tax on every conversation you have after that. You pay it whether or not the tool gets called.

The instinct makes sense. A new capability appears. Someone publishes an MCP server for it. You install, configure, forget. Repeat until your tool list is fourteen entries long and every session opens by burning eighteen thousand tokens on schemas the model may never invoke that day.

I run an AI coding agent daily. I built a 13-app crypto fintech solo with AI agents in 70 days. In that sprint I learned one thing about MCP and CLI faster than everything else: the protocol is not the bottleneck the ecosystem pretends it is. The shell handles most of it cheaper, and you only notice the gap when you start watching where the tokens go.

MCP adds tokens before you type anything

Here is the mechanic. When your agent starts, every registered MCP server pushes its tool schemas into the context window. Not lazily. Not on demand. All of them, up front, every session, whether or not you use them today.

A popular MCP server can consume 13,000 to 18,000 tokens of context before you have typed a single character. That is roughly seven to nine percent of a 200k window, gone. Run three or four MCP servers and the math compounds. You have spent a meaningful fraction of your entire context budget on schema definitions for capabilities that may never run in that session.

A CLI tool costs none of this. You invoke it through bash. The model pays zero tokens to know it exists. When the model needs to learn what flags it takes, it runs <tool> --help and the schema appears on demand, once, at the moment of use. That is progressive disclosure. MCP is the opposite: front-load everything, use some fraction of it, pay for all of it every turn.

The upfront token tax

13k-18k
tokens per MCP server: 0
tokens for CLI via bash: 7-9%
window consumed

Approximate context cost before a single user message is typed.

The asymmetry matters most in long or repeated sessions. The model re-reads the full context on every turn. Those 18k schema tokens are not a one-time startup cost. They are a per-turn cost, for every session, whether the tool fires or not. Over a week of daily use, you have paid for that server dozens of times for zero sessions where it ran.

–help is a perfectly good schema

People treat CLI as a fallback when no MCP server exists. That framing is backwards. A CLI tool with readable --help output is a better interface for an agent in most cases, not a worse one.

Here is why. The schema in an MCP server is static, written once by the server author. If it is verbose or redundant, you pay for that verbosity every session. A CLI’s --help is its schema too, but the agent fetches it only when relevant and only for the subcommand it needs. The model can run <tool> --help <subcommand> to drill in to exactly the right level of detail. No MCP server does that automatically.

This is what harness engineering calls progressive disclosure applied at the tool layer: give the model less up front, and let it ask for exactly what it needs at the moment it needs it.

MCP server

01Loads all tool schemas into context at session start
02Token cost is fixed per server per session, usage or not
03Schema is static, written once by the server author
04Multiple servers multiply the upfront cost linearly
05Opaque: the model reads schemas you did not explicitly write

CLI via bash

01Zero tokens until the tool is actually invoked
02Token cost is proportional to actual usage
03--help is the schema, loaded only on demand
04Progressive disclosure: drill into subcommands as needed
05Transparent: you read exactly what the model reads

MCP vs CLI: token behavior and schema access.

Pi’s stance: four tools and a shell

Pi, the open-source coding agent by Mario Zechner, ships with zero MCP servers by default. It gives the model four tools: read, write, edit, and bash. That is the entire tool surface.

This is a deliberate design decision, not an oversight. Zechner’s argument is that mainstream harnesses became opaque and unstable because their tool lists and injected context kept shifting behind the user’s back. The model is capable enough on its own. Every tool you add is an assumption that it will be used often enough to justify the overhead. Most of those assumptions do not hold.

The bash tool is the escape hatch that makes the rest unnecessary. Need git? Bash. Need grep, jq, curl, or a JSON transformer? Bash. Every CLI tool built in the last fifty years works through it immediately, with zero integration cost and zero schema overhead. The shell is an incredibly capable tool interface. Pi’s bet is that it covers the vast majority of what MCP servers are installed to provide, and it covers it at a fraction of the context cost.

Read the token cost breakdown for the full accounting of where tokens go across tool output, conversation history, and session re-explanation. MCP schema overhead fits into the tool-output layer of that same framework.

When MCP genuinely earns its place

None of this is an argument that MCP is wrong. The protocol solves real problems in specific situations, and those situations exist.

MCP earns its place when a capability has no clean CLI equivalent. Some APIs require OAuth flows, stateful sessions, or structured resource navigation that bash calls handle poorly. A properly built MCP server for an authenticated internal system is a legitimate trade. You pay the schema cost, you get something you genuinely could not replicate cleanly through a shell command. That is a fair bargain.

MCP also justifies its overhead when you use the tool in most sessions. If you open your agent and query a database in nine out of ten sessions, the upfront schema cost is amortized across real usage. The token-per-value ratio holds up. If you query it in one out of twenty sessions, you paid for it nineteen times for nothing, and a bash call on the one session you needed it would have cost you less in total.

The recommendation is not to remove all MCP servers. It is to be intentional about which ones stay registered. Keep the ones you use in most sessions, or the ones covering capabilities with no CLI path. Cut the rest.

The audit

Running a quick audit of your MCP setup takes ten minutes and pays back immediately.

MCP audit

Required:
List every MCP server currently registered in your agent
Required:
Check your last 20 sessions for which servers were actually called
Required:
Keep servers called in more than half your recent sessions
Required:
For servers called rarely: check whether a CLI equivalent exists
Required:
Replace CLI-equivalent servers with a bash invocation
Optional:
Keep MCP servers covering capabilities with no CLI path
Anti-pattern:
Installing just-in-case servers to cover every possible edge case
Anti-pattern:
Adding a server without checking its token cost first

Run this before adding the next server, and before the next quarter begins.

The audit is a one-time fix that compounds. Once you cut a server, you stop paying its overhead on every turn of every future session. The return is not linear. It is permanent.

The decision rule

MCP or CLI is not a philosophical debate. It is a token budget question with two inputs: how often do you use the capability, and does a CLI equivalent exist?

If the capability is load-bearing in most sessions and has no clean CLI path, install the MCP server. Accept the overhead as the cost of a real capability gap. If the capability is occasional, or if a CLI handles it, use bash. The model figures out the flags. It always has.

The broader principle is the same one that makes pi’s minimal harness work: fewer assumptions about what the model will need, and lower cost on the assumptions that prove wrong. Give your agent a shell. Most of what you reach for MCP to provide, the shell already handles.

Common questions

Does MCP vs CLI matter if I have a large context window?

Yes. A larger window does not make waste free. It makes waste easier to ignore.

A server consuming 13k to 18k tokens per session still does so regardless of total window size. In a 200k window that is nine percent. The cost is not a one-time startup fee either: the model re-reads the full context on every single turn, so you pay those schema tokens again on every step of every task.

Context quality matters as much as quantity. Filling a large window with tool schemas the model will not use crowds out real working context: the spec, the diff, the error message the model needs to fix the bug right now. A bigger window lets you ignore the problem longer; it does not fix it.

Can I use both MCP and CLI in the same agent?

Yes, and most production setups do. The goal is not to ban MCP. It is to be intentional about which servers stay registered. Keep MCP for capabilities you use in most sessions, or ones with no CLI equivalent. Route everything else through bash. The two approaches are not mutually exclusive. The audit is about being deliberate rather than accumulating servers reflexively.

How do I estimate the token cost of an MCP server before installing it?

Check how many tools the server registers and how verbose their descriptions are. Each tool definition contributes to the upfront token count. A server registering eight tools with detailed parameter schemas will cost more than one with three simple tools.

The practical method: install the server in a test session, then ask your agent to show its full tool list or system prompt. Count those tokens. A rough estimate is 200 to 500 tokens per tool definition, plus any resource schemas. A server with ten verbose tools can easily reach the 13k to 18k range before you do anything.

What does 'progressive disclosure' mean for AI agent tools?

Progressive disclosure means the agent loads information about a tool only when it needs to use it, not before. CLI achieves this naturally: the model has no schema for a CLI tool until it runs --help, which loads just the relevant documentation at that moment. MCP is the inverse: all schemas load at session start, whether or not the model calls those tools. Progressive disclosure keeps the context window reserved for actual work rather than prefilled with capability definitions that may never matter in that session.

Does cutting MCP servers affect what my agent can do?

Only for the specific sessions where you would have used them, and even then only if no CLI path exists. If you cut a server and replace it with bash calls, the agent can still do everything it did before, through shell commands instead of the protocol. For standard development tasks the practical difference is invisible.

The genuine exception is capabilities with no CLI path: OAuth-based APIs with no token-based alternative, stateful resource systems, or structured data navigation that would require building a substantial custom tool to replicate. Keep MCP for those. For everything else, bash is faster to set up and cheaper to run at every turn.

Is the no-MCP approach practical for teams, not just solo builders?

For small teams doing standard development work, it is very practical. The bash tool covers the vast majority of common operations without any extra setup, and the transparency benefit is higher for teams because everyone can read exactly what the agent is given.

Larger teams with shared infrastructure have more legitimate MCP use cases: authenticated access to internal systems, structured resources that are inconvenient through raw CLI, or capabilities requiring persistent state between tool calls. The argument is not that four tools is the answer for every team at every scale. It is that the default instinct to add servers deserves more scrutiny than it usually gets before the install.