Collin's Thoughts logo

Collin's Thoughts

Archives
Log in
Subscribe
June 17, 2026

The tools are already good enough

SpaceX keeps launching, Fable got restricted, and the rest of us need to stop spectating.

Everyone is still arguing about whether AI is "really" good yet, and I'm tired of it. People are waiting for the official AGI ribbon-cutting while the tools already sitting on their desk beat average human output on a big share of real work.

Two things happened this week that should settle it...

SpaceX Falcon 9

The most valuable company bought the coding tool

Look at where that money went. SpaceX went public at a record valuation, and its first move with fresh cash was a $60 billion deal for Anysphere, the company behind Cursor, the tool engineers use to write code. You can argue about Elon all day but don't bet against him.

My read on this: A $60 billion all-stock bet only makes sense if the plan is bigger than a code editor. SpaceX and Cursor have reportedly been training a model together, and the real target is a general-purpose agent app, something that competes with Claude Desktop and Codex for the whole surface, not just the IDE. When that interface lands, you want your skills, your context files, and your workflow to move with you instead of getting rebuilt from scratch.

People laughed at SpaceX landing boosters on barges right up until it was routine, and they make the same mistake with AI tools. They judge the whole category by the last bad answer from a lazy prompt, while someone else wraps the same model in context and a verification loop and quietly gets three times the work done.

Fable's short lifespan

The Fable story is messy. Anthropic launched Claude Fable 5 on June 9 and called it the most capable model it had ever made generally available. Three days later the US government issued an export-control directive, and Anthropic had to disable access for everyone while it complied.

I don't think the government handled it well. But notice what actually happened: a model release went from product launch to geopolitical incident in 72 hours. Governments don't move that fast over a toy. Whatever your view on politics, that's a signal towards capability, not the product. There's also some interesting theories that Anthropic planned this from the start...

Fable Fan Theory

The clawback complaints miss the point

I get why people are mad when a tool gets nerfed, metered, or pulled. But Fable being gone doesn't make Claude Code (or other models) useless. A model getting pricier doesn't make Codex useless. A provider tightening limits doesn't make your stack incapable. It means you want a setup that swaps the model and keeps the workflow. That problem is already solved, OpenRouter Fusion does exactly this, and I walked through the setup last issue.

AGI is probably already here

By that, I mean the practical version: a general-purpose system that beats an average human on most bounded knowledge-work tasks when you give it context, tools, and a way to check the result.

Karpathy put the mechanism cleanly.

Old software automates what you can specify. New software automates what you can verify.

That's why engineering went first, and it had nothing to do with engineers being special. Code is checkable - a test passes or it doesn't. The moment a task has a right answer a machine can confirm, it gets put on the line, and capability spreads to every function with a clear pass/fail: research with citations, financial reconciliation, data cleanup, structured writing.

By that read, I think we may already be there for most of the work a knowledge team does in a week. It doesn't mean the tools are safe unsupervised but average human output was never a high bar, and a well-scoped AI workflow clears it more often than people want to admit.

Benchmark

check out this benchmark for a specific example of this to compare YOUR feeling of a model vs. an omp set up. Gemini, GLM, Kimi are closer to GPT/Opus than you think.

The move

Stop asking whether the next release qualifies as real AGI. Ask whether your current setup can clear five real tasks.

The "Already Good Enough" Test. Take five real tasks from last week, whatever your actual work looks like. A first draft, a research pull, a spreadsheet or a bit of code, a powerpoint deck. Hand each one to your current AI setup and score it 0-2 on four questions:

Question Score
Did it produce a usable first pass? 0-2
Did it cut your time by at least half? 0-2
Was the review light enough to be worth it? 0-2
Could a teammate run the same workflow from your instructions? 0-2

That's 40 points (5x8). Score 25 or higher and the model isn't your bottleneck, your workflow is. Score under 25 and you need to fix the inputs first.

If you scored low

If you ran the test and scored low, that's the work I can do. I help engineering and ops teams turn the current tools into usable workflows: context files, model routing, review loops, cost controls. Reply with the one task you want to stop doing by hand, and I'll tell you whether the missing piece is the model, the context, the tooling, or the verification loop.

Quote of the Week

"Every system is perfectly designed to get the results it gets." — Paul Batalden

If your current setup keeps handing you mediocre output, the system you built around it is doing exactly what it's designed to do. That's the good news. A workflow is a thing you can redesign, and you don't need a new model to start.

— Collin

Don't miss what's next. Subscribe to Collin's Thoughts:
collinwilkins.com
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.