The Gap Between Coding Agent Capabilities & Reality
Coding agents deliver incredible capabilities, but friction remains
Hi all,
Over the last 6 weeks or so, much my thinking and writing has been about the gaps between the potential of current models, and their real-world impact. Coding agents deliver incredible capabilities, but friction remains:
We still rely on cross-platform frameworks: For example, “Why is Claude an Electron App?” With cheap, powerful coding agents we could build fast, reliable, native apps everywhere. But people (including Anthropic!) still rely on cross-platform frameworks.
Specs & tests aren’t enough: Previously, I posited that we might ship software libraries with no code, only tests and specs. In practice, I think this isn’t a sustainable approach, if only because the act of implementing the spec reveals the spec’s shortcomings. In reality, we need code, the spec, and tests. The challenge, as I cover in this presentation, is keeping all three in sync. Here’s the write up, “Learnings from a No-Code Library.”
Slow feedback means we focus on personal tools: Code is so cheap and fast, the only feedback source that keeps pace is your own. So we spend our time building sprawling, idiosyncratic tools for ourselves. This has take-aways for open source builders, companies, and ourselves. Read, “The Cathedral, the Bazaar, and the Winchester Mystery House.”
Assembling context remains tricky & complex: When you type something and hit enter in Claude Code, so much more than your words gets fed to the LLM. The leak of Claude Code’s source code gave us a peak at how the software assembles a problem from a complex decision tree. I broke down how the system prompt is assembled in, “How Claude Code Builds a System Prompt”
Code is cheap, but securing it is expensive: New assessments of Anthropic’s Mythos Preview, with its dramatic cyber claims, paint a worrying picture. The AI Security Institute’s findings suggest the model doesn’t hit diminish returns when looking for exploits, which (if true) means to harden a system we need to spend more tokens discovering exploits than attackers will spend exploiting them. As a result, “Cybersecurity Looks Like Proof of Work Now.”
Coding agents are absurdly capable. I continue to believe we could freeze model development today and be eking out gains for years. All the stuff around them – security, user feedback, harnesses, organizations, communities, and support – mean having a magic code machine alone isn’t enough.
The first cars looked a lot like carriages because they had to drive on cart paths and use horse infrastructure. They were called “high wheelers,” and they looked like this:

From 1907 to 1912, these things were the auto industry. 75 different manufacturers produced them.
Only after paved roads, gas stations, drive-ins, drive-ways, and the assembly line did cars truly transform society.
Coding agents are in their high-wheeler era.
Art Break

I try to keep these art breaks off topic, but these pictures of an underground data center beneath Stockholm are too cool.
From Wikipedia:
Pionen is a data center deep below 30 meters of granite, with three physical datalinks into the mountain. Also, Pionen is located in Central Stockholm, with 1,100 square meters of space. Pionen features fountains, greenhouses, simulated daylight and a huge salt water fish tank. Its data center has two backup power generators, which are actually submarine engines.
According to the architects, the design is mostly inspired by the 1972 sci-fi flick, Silent Running (“and a bunch of Bond films with Ken Adams set design.”)
Until next time,