The 'light' mode that cost six times more

The template was called LIGHT. It was supposed to be the economical path. The full prompt clocked in at around 4.6k input tokens. The LIGHT prompt hit 28.6k.

Why? Without explicit constraints, the agent pulled three large skills from disk, around 90k characters of content, and injected all of it into context. The mode was not lighter. It was just less controlled, and less controlled meant more expensive.

It also broke a code path. A line that told the agent to use the terminal with curl was missing from the LIGHT template. Without it, the agent fell into a Python sandbox, hit a SyntaxError, and stalled. Two bugs for the price of one optimization.

The same session surfaced a second problem. Every agent call was injecting the full body of every skill into the prompt: 14k to 15k tokens per call, for skills that might never get invoked in that session. The fix was lazy loading. Inject only the frontmatter: name and description. Load the full skill body on demand, when the model actually calls for it. The cost dropped sharply.

Both problems had the same root. Someone guessed what would be lighter and did not measure it. The LIGHT template accumulated tokens because it constrained nothing. The skills loaded upfront because it felt safer than deferring. Both were wrong.

Token cost is invisible from inspection. A template named LIGHT can be the expensive one. A lazy-load that feels like added complexity can eliminate 15k tokens per call. The only way to know is to run it and measure.

Takeaway

Never assume the cost of a token optimization. Measure it: the path labeled “light” can be the expensive one.