← lab
modelholds-upagentsarchitecture

A single agent holding everything timed out

One CEO agent holding all context timed out; splitting into focused Head agents with distinct cadences made the fleet cheaper and more robust at the same time.

The CEO agent timed out. Not because the task was hard. Because the context was too wide.

The design had one orchestrating agent holding the full company picture, all five motors, all four quadrants, trying to make decisions across all of them in a single run. It timed out exactly because of that: one agent, too much context, too many judgment calls in one pass.

The fix was to split it into focused Head agents, one per motor, each with its own narrow judgment and its own cadence. Library runs daily. Academy runs when a launch is near. Arsenal and Build run on demand. Each agent holds only what it needs to decide its piece of the work. The fleet got cheaper and more robust at the same time.

One rule out of this: same judgment, one agent with lazily-loaded skills. Different judgments, different agents.

What made this harder to see than it should have been: the assistant kept arguing against it. The argument was cost. Five heads means five runs per day means five times the model spend. Convergence would save tokens. Fewer agents, one unified decision loop.

The counter was correct: different cadences mean you rarely run all five on the same day. Focused context means each agent is less likely to get confused or time out. And the CEO timeout was live proof that the monolith was already failing. The assistant was optimizing for tokens. The operator was defending robustness.

AI models optimize for cost by default. They will keep proposing convergence: fewer agents, fewer calls, shorter context. The operator has to defend robustness. That defense does not show up in the model’s utility function.

Takeaway

Give each agent one judgment and a focused context; the fleet gets cheaper and more robust at the same time.