My agents had 157 tools and called none of them
Agents swore their tools did not exist. The execution log said otherwise: a cold-spawn race, not a tool problem. Modeling the latency where it actually lived became an open-source MCP pooler.
The goal was simple: get the agents to cite real data natively instead of shelling out or guessing. Every backend was wired. The agents had 157 tools in front of them. They used zero, and they told me why: “the tool does not exist,” “the upstream did not connect.” Confident, detailed, and completely wrong.
I spent an hour debugging a fiction
- I believed the agent. For an hour I debugged the world the agent described. The agent is an unreliable narrator. It reports a story, not a stack trace. The truth was sitting in the execution log the whole time.
- I blamed the tool count. I assumed 157 tools was too many and the search layer was confused. Both felt right until the log killed both. Every “tool does not exist” had a real cause underneath: a discovery race, a session that was not ready yet, a schema mismatch.
- I suspected aggregation overhead. Also wrong. A warm backend answers a
tools/listin about ten milliseconds. Measurement killed the theory.
Cold spawn beat the discovery window
The latency was cold-spawn. Every session was spawning seven backends from cold, and that cost between four and twenty-seven seconds. Discovery had a narrow window, under a second, and the cold spawn blew straight through it. The agent asked for its tools before the backends could answer, got nothing, and narrated a fiction about missing tools.
The fix followed from one reframe: warmth lives in the data layer, not in the
ephemeral client. The CLI client is one-shot, you cannot pool the process. So
put a persistent caching proxy in front of the data: one warm upstream session,
a cached tool listing, discovery back down to about fifty milliseconds, well
inside the window. Keep the admin catalog as a control plane, but stop routing
hot traffic through it. Control plane and data plane want different speeds, so
separate them instead of trading one for the other. That proxy shipped as
open-source: mcp-pooler, MIT.
One habit made the whole night clean: revert early and measure. Every test patch went in with a backup and a checksum, and came back out before the next one, so production stayed honest while I proved and killed theses.
Takeaway
When an agent reports a failure, do not debug the agent, read the execution log. The agent is the narrator; the log is the ground truth. And latency usually hides one layer below where it shows up: the symptom was “no tools,” the cause was a cold start.