Collin's Thoughts logo

Collin's Thoughts

Archives
April 21, 2026

The Only Transformation You Control

Not the model. The layer above it.

Anthropic shipped Claude Opus 4.7 last Wednesday.

Per the release, agentic coding jumped: 87.6% on SWE-bench Verified (up from 80.8% on 4.6) and 64.3% on SWE-bench Pro (up from 53.4%). Multidisciplinary reasoning went from 40.0% to 46.9% on Humanity's Last Exam without tools.

But not everything improved: agentic search on BrowseComp dropped to 79.3% from 4.6's 83.7%. Real gains on the same shape of work 4.6 was already doing, which means fewer stuck loops and longer autonomous runs without you changing anything.

What the headlines skipped: a new tokenizer consumes 1.0–1.35x more input tokens for the same content, and Claude Code now defaults to a higher effort tier called xhigh that adds output tokens on later turns.

Price per million tokens held steady. Price per task probably didn't, and that's the trade you signed up for this week if you simply defaulted to the new model

Why It Matters

Margaret Boden split creative work into three levels.

  • Combinatorial: recombine things that already exist, like every summarize-your-meeting tool shipped in the last two years.
  • Exploratory: find new points inside a known rule system, which is what agentic coding does when it iterates toward a passing test.
  • Transformational: change the rules of the space itself, which is what happens when someone invents cubism or calculus.

Every LLM release to date has been combinatorial or exploratory. 4.7 is solidly better than 4.6, and still not transformational.

Pete Koomen's essay AI Horseless Carriages expands on this. His argument is that most AI products are also combinatorial: bolt a language model onto an email client, onto a spreadsheet, onto a search bar, ship the same UI with a chat box glued to the side. Name an AI feature you used this week and it probably fits. Koomen's transformational reframe is that users should be able to edit the system prompt directly: "Most AI apps should be agent builders, not agents."

That's what a convention file is. CLAUDE.md and AGENTS.md are editable system prompts that live alongside the code. Teams treating them properly (reviewed, versioned, measured) have already done the transformational work at the workflow layer. When 4.7 landed, they could point to which workflows improved and which stalled. Teams still pasting prompts into Slack can't say what changed, only what they paid.

Not sure where you stand? Take the 2-minute AI Readiness Quiz →

When Capability Meets Weak Scaffolding

Opus 4.7 shipped with a reminder. Elliot Arledge woke up Saturday to find Claude had deleted project directories during an overnight session. I wrote up the three foundational protections worth locking down before handing any agent more capability:

Tweet from @CollinWilkins7 quote-tweeting Elliot Arledge about the Opus 4.7 incident, listing three foundational protections: scope permissions, learn git and commit often, and run automated backups.

More capable models expand your potential. The basics (scoped permissions, git discipline, real backups) are what keep a model upgrade from turning into a weekend disaster.

From the Mailbag

A subscriber named Maurizio wrote in last week asking whether the DOE framework (Directives, Orchestration, Execution) still matters now that Claude shipped Skills, Managed Agents, and Cowork. Fair question.

The jist: Skills are essentially a 1:1 replacement for the manual directives layer, with better ergonomics. If that were all DOE offered, it would be obsolete.

What I learned writing up the full answer: the value moved. Orchestration (multi-model routing,role separation, pipelines that aren't just Claude) is still custom work. The outer loops are what Skills can't touch: persistent memory, session recaps, weekly audits that run between sessions and let the system compund. A stronger harness is what protects you when the model wobbles, and the Hesamation analysis in Worth Reading is that wobble showing up in the data.

The Part Most People Miss

The next model release will be combinatorial too, and probably the one after that. Waiting for the transformational model is the wrong game. Transformation at the model layer is rare, and most teams wouldn't know what to do with it if they got it tomorrow. The transformation available right now is at the workflow layer: conventions, measured workflows, and owners who can name what changed. That's Boden's transformational move applied to operations instead of to the model, and it's the only lever every team controls.


Worth Reading

2025 AI Adoption Benchmark — Worklytics Marketing teams at 70-85% AI tool adoption. Finance and sales at 65-80%. Every function is already using AI. The differentiator isn't access anymore, it's depth of use.

financial-services-plugins — Anthropic 41 purpose-built skills for equity research, investment banking, PE, and wealth management. Installable without writing code. Example of the skills layer shifting what "AI adoption" looks like for non-engineering functions.

agents.md — Community A community catalog of real AGENTS.md and CLAUDE.md files. If you're at Level 1 and need a starting template, copy one of these and iterate from there.

Getting Started with Claude Design — Muzli Independent walkthrough with a bullish take on Anthropic's new design collaborator. Calls the design-to-code handoff "the game changer." Worth testing for an afternoon, but watch the token burn. If you've already wired Figma MCP into Claude Code, you'll likely get more throughput per dollar from that setup.

Claude performance under load: notes from an analysis — via @Hesamation An AMD senior AI director analyzed Claude session logs from January through March and reported:

  • Median thinking depth dropped from ~2,200 to ~600 characters
  • API retries up 80x from February to March
  • Reads-per-edit fell from 6.6x to 2.0x (the model stopped investigating code before touching it)
  • "Should I continue?" bail-outs went from 0 to 173 in 17 days starting March 8
  • Self-contradictions in reasoning tripled
  • 5-7pm PST were the worst hours, late-night significantly better, pointing to GPU-load-sensitive throttling

Conventions like CLAUDE.md get ignored when there's less thinking budget to cross-check edits. Even a well-crafted system prompt needs a model that can afford to read it.


I just opened 5 AI Readiness Assessment spots at $99 (normally $250) while I build out case studies. You get a 1-hour working call, a written action plan with the 3-4 highest-ROI next steps for your team, a custom AI Adoption Playbook adapted to your stack, and a ready-to-commit AGENTS.md for your main repo. (First 5 only) Book one ->

— Collin

Don't miss what's next. Subscribe to Collin's Thoughts:
collinwilkins.com
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.