I went to lunch while it built

The lunch test. I've been rebuilding our website (the current one is the worst). For each page, an agent writes the spec (copy plus component structure) and the build runs in the background while I do something else. This week I went to lunch mid-build and came back to one small tweak before merging. The threshold that makes it work: when output is consistently 95–100% of the spec, your job stops being building and becomes writing specs. I recorded a live demo of the whole loop, from spec to proof. Demo

Writing loops is a ladder. Boris Cherny, who built Claude Code, says he doesn't prompt Claude anymore: "my job is to write loops." There's a command literally called /loop, and assuming that's what he means is the mistake – it's the bottom rung of five. The real work as you climb is verification design. If the output doesn't come with proof the task was done, you're back to reviewing everything by hand, and you haven't saved the time, just moved it. Blog post

Everything is a spec. A PRD (product requirements doc) is just one type. A well-written bug ticket is a spec too: pass the ticket number to an agent and it can execute end to end, because the ticket says what done looks like. That's the test of whether something qualifies – can you hand it to the background and get proof back.

Don't bother with embeddings. For a markdown knowledge base the agent reads directly, vector search (similarity-based retrieval) is usually overkill. The agent navigates the way you would: search, then follow links to the right files. Claude Code handles even large codebases exactly this way. Add embeddings when retrieval actually starts failing, not when the doc count feels big.

On my radar

Fable 5 landed. Anthropic's new top model, out on Tuesday. My early read: slightly slower than Opus and double the usage cost, so I'm reserving it for strategic and long-running build work. It's included on standard plans until 23rd June – worth forming your own view before that window closes. Anthropic

The throttling row. Anthropic is restricting Fable for anyone it reckons is building competing AI – the analogy doing the rounds is AWS throttling your servers because you're building a competing cloud product. Last issue's lesson stands: keep your setup model-agnostic.

Desktop vs CLI cost gap. If Claude Desktop burns through usage faster than the terminal does, this is likely why: desktop loads every connected tool's schema up front (hundreds of tools can mean 100k+ tokens before you've typed a word), while the CLI loads them on demand. Fewer connectors means cheaper sessions.

Build Notes

On my radar