SotaZoo

May 2026 · Draft

AI agents, where they actually help

Draft — text in progress, not yet published.

Every product team has the same poster on the wall right now: 'AI agents in our workflow.' A lot of those agents won't be there in 18 months.

Here's the heuristic we use at SotaZoo to decide whether an agent earns its keep.

Where AI agents pay off

Matching under constraints. Problems with combinatorial structure — pairing players, scheduling resources, routing through a park — get worse for humans as the input grows, but stay tractable for an agent that can reason about thousands of options. TennisMatch sits in this bucket.

Compressing the morning brief. Anywhere a senior person spends 60–90 minutes reading and synthesizing to produce a 5-minute summary, the agent does the synthesis 10× faster and the senior person spends the 5 minutes making the decision. That's the actual job of 'AI for research' — not replacing the analyst, but cutting the prep.

Personalized teaching loops. Static documentation is dead the moment a user has a real question. An agent that can answer the user's specific situation, with context, is the obvious upgrade — and is why the Claude Code Handbook works better as a published handbook plus an in-IDE agent than either alone.

Repetitive admin with judgment. Triage, follow-up, scheduling, expense classification — the stuff that bores humans but requires light reasoning. Agents are good at this when the cost of a wrong choice is small and human-correctable.

Where AI agents waste time

Pretending to make strategy. Strategy decisions are about which futures to bet against and which to invite. They need taste, courage, and stake. Agents don't have stakes. Anyone shipping a 'strategic agent' is shipping a worse version of an experienced operator who got out-of-the-loop.

Replacing the thing being learned. The reason a beginner gets good is because they wrestle with the hard parts themselves. Hand the hard parts to an agent and the beginner stays a beginner. Tools should expose the hard parts at the right cadence — not hide them.

Cargo-culting agentic workflows. Multi-agent dialogues, recursive plans, swarm orchestration — all impressive in demos and brittle in production. If a single well-prompted call gets you 90% of the result, the additional 9 agents add latency, cost, and surface area for failure.

The test

Before shipping an agent, we ask: would a senior, skeptical operator look at the inputs and outputs and say, 'This actually changes what gets done'? If yes, the agent earns its slot. If the answer is 'this is a wrapper around what I would have asked GPT,' we don't ship it. There's enough of that already.