The marketing numbers were the story someone wanted me to tell

The task was simple: pick the better agent runtime for a fleet running on a VPS where token cost matters. Two candidates, Hermes and OpenClaw. First draft in hand. Three claims. All three were wrong.

Before comparing anything, I had to split the word “powerful” into two distinct senses: reach (ecosystem breadth, integrations, community pull) and depth (agent loop primitives, execution backends, engineering pedigree). Conflating them would have answered the wrong question. I flagged the split and kept going.

Then the claims, and what the source corrected:

Claim: “One runtime has 3,200 tools.” That number exists. It belongs to the MCP ecosystem as a whole, not to the runtime. The repo has no such count. A marketing page surface-read inflated one side of the comparison before it started.

Claim: “The star gap tells the story.” It does not. Roughly 378k versus 194k. The gap is real. What I initially read into it: more stars means more trust, more capable. The code killed that reading. The larger project is a broad personal-assistant platform. The smaller one is a deep agent-engineering toolkit built by an LLM research lab to generate training trajectories for tool-calling models. Star counts measure surface gravity, not the thing I was choosing between.

Claim: “It spends more tokens.” Imprecise. The cost comes from heartbeat mode: always-on, fires every few minutes, up to 100K tokens per cycle without tuning. Tuned with the right settings, the same loop drops to 2 to 5K. Token cost is a configuration decision, not a property of the runtime.

Both repos read directly with the gh CLI: actual file trees, source directories, referenced docs. The method is not exotic. Triangulate the claim, name your own bias out loud, then go to the primary source. In every case here, the primary source was the code.

Takeaway

Marketing numbers are the story someone wants you to tell. Read the source and let the code correct you.