The Moat That Wasn't: When Every AI Lab Ships the Same Model
The Observation Post
Tech · AI · Cyber · Defence
The Moat That Wasn't: When Every AI Lab Ships the Same Model
27 May 2026 · 8 min read
Six months ago, which model was best actually mattered. You had to pick — GPT-4 or Claude or Gemini or Llama or DeepSeek — and the choice came with real trade-offs. Better at coding meant worse at creative writing. Cheaper meant dumber. Open meant less capable. These were real decisions with real consequences.
That era is ending. The flagship models from the major labs are now close enough that the question "which is smarter" is becoming hard to answer. Not because they're all excellent — though some are — but because the differences are smaller than the noise in any specific evaluation. The gap that existed between GPT-4 and everything else in 2023 is gone. What remains is a tight cluster of roughly comparable systems. The competitive dynamic has shifted from who can build the smartest model to who can serve it fastest, cheapest, and most reliably.
This is not a claim I make lightly. I have been tracking these benchmarks since before ChatGPT existed, and I was as sceptical as anyone when people started saying the frontier had converged. But the data is hard to argue with. Chatbot Arena Elo scores, MMLU-Pro, GPQA, SWE-bench, HumanEval — across the board, the spread between the top 5 models has shrunk from 20+ points to single digits. On some benchmarks, the gap between 1st and 5th place is smaller than the standard error bar.
The scale of AI capability has broadened significantly. But the question is shifting from "how smart" to "how fast, how cheap, and how reliable." (Image: CC BY-SA 4.0)
How We Got Here
Look at what happened to get to this point. In 2022, GPT-3.5 was the only game in town for fluent text generation. In 2023, GPT-4 arrived and the gap was enormous — Claude could match it on safety and tone, Gemini on multimodality, but nobody had the whole package. By early 2024, the field had narrowed. By late 2024, DeepSeek had proven that a Chinese lab with a fraction of the budget could match frontier performance. By early 2025, the gap had collapsed.
The reasons are structural. The transformer architecture is well understood. The training recipes have been reverse-engineered. The scaling laws are public knowledge. Any well-funded lab can replicate the essential approach, and the improvements that separate one model from another — better data filtering, better RLHF tuning, better post-training — are getting harder to keep proprietary. Papers get published, weights leak, and the techniques diffuse faster than anyone can patent them.
The irony is that the open-source ecosystem accelerated this collapse. Meta releasing Llama 3 and Llama 4 as open weights meant that every lab on earth could start from a strong baseline and fine-tune upward. The gap between a fine-tuned open model and a proprietary flagship shrank from a chasm to a crack in about 18 months.
The Real Moats Were Never Intelligence
The thing that keeps getting airbrushed out of VC pitch decks: if your competitive advantage is "our model is smarter," you have about 6 months before someone catches up. The real defensible advantages look nothing like model quality.
Inference infrastructure. The difference between a 100ms response and a 500ms response is visible to users. The difference between $0.10 per million tokens and $0.50 is visible to CFOs. The labs that own their hardware — or have negotiated exclusive access to the best chips — can undercut everyone else on price while maintaining margins. This is why Microsoft's investment in OpenAI was never just about having the best model. It was about making sure the best model also ran cheapest on Azure.
Data flywheels. Every query processed is a training signal. Google has this in spades — every search, every YouTube transcript, every Google Doc. Meta has the social graph. But OpenAI, Anthropic, and Mistral have to buy or license their data. The companies with proprietary data pipelines can iterate faster and cheaper.
Integration depth. Apple's models don't need to be the best in the world. They just need to be good enough and deeply embedded in iOS, macOS, and the Apple ecosystem. Google's Gemini has the same advantage across Workspace, Android, and Chrome. A model that is 5% worse but lives in every text field on your phone is infinitely more useful than a model that is 5% better behind an API.
Safety and trust. This is an underrated moat. Anthropic's entire positioning — safety first, constitution-based RLHF, transparent development — is a bet that enterprise buyers and regulators will pay a premium for a model they trust. It is working. Governments and financial institutions are signing contracts with Anthropic not because Claude is technically superior, but because Anthropic has the cleanest safety record.
What This Means for the Companies Involved
Each lab faces a different strategic question now that the intelligence gap is dead.
OpenAI has the most to lose. Their entire brand was built on being first and being best. They were the frontier. Now they are one of several. The pivot to reasoning models (o1, o3) was an attempt to widen the gap again, but DeepSeek replicated the reasoning approach within weeks. OpenAI's real bet is the consumer brand — ChatGPT has more users than any competitor by a wide margin — and the enterprise agreements tied to Azure. That might be enough. It is not the commanding position it was two years ago.
Anthropic bet on safety as differentiation and it paid off. They own the high-trust market. The question is whether that market is big enough to sustain a $60B valuation. I suspect it is, but the ceiling is lower than OpenAI's mass-market ambition. Claude is the Volvo of AI models — safe, reliable, a bit boring, beloved by people who care about safety. That is a good business. It is not a trillion-dollar business.
Google should be winning this narrative and somehow keeps fumbling it. Gemini is technically strong — competitive on benchmarks, deeply multimodal, integrated into the entire Google ecosystem. But the product execution has been uneven. The launches have been messy. The brand trust is lower than it should be. If Google gets its act together, the distribution advantage is overwhelming. That is a big if.
Meta does not need to win on capability. They need Llama to be good enough that the ecosystem builds around it, which is exactly what is happening. Llama is the Linux of AI — not the most polished, not the most feature-rich, but open, everywhere, and good enough that the ecosystem becomes self-sustaining. Meta's moat is the community.
DeepSeek proved that a well-funded outsider can match the frontier. Their export-controlled hardware access is a constraint, but their engineering is exceptional. The question for them is whether they can turn technical excellence into commercial traction outside China. The geopolitical headwinds are real, and most Western enterprises will think twice before routing data through a Chinese API.
Mistral is the wild card. Small, fast, European, open-by-default. If the EU regulation landscape makes it hard for American companies to operate freely, Mistral benefits. If it does not, Mistral stays a niche player. Their efficiency advantage is real — they ship models that compete with much larger ones using fewer parameters — but efficiency is a feature, not a business model.
The User Perspective
For people who actually use these models, convergence is mostly good news. You do not have to worry as much about picking the wrong horse. The quality floor is high across the board. Price competition is fierce. Models are getting faster. The ecosystem of tools, integrations, and SDKs is expanding rapidly.
But convergence has a downside. If every model converges, the differentiation moves up the stack to data, integration, and trust — which means the winners will be the companies that already have users, not the ones with the cleverest research. That is good for incumbents and bad for the startups that bet everything on being the smartest model in the room.
The wave of AI-native startups that raised billions with "we have a better model" as their whole pitch are facing an existential question. If your model is 2% better today and someone matches it next quarter, what do you actually own? The ones that survive will have built something around the model — a workflow, a data advantage, a distribution channel. The ones that did not will quietly shut down.
A Generation of Applied AI
The model convergence story sounds like a letdown if you expected AGI to arrive in a blaze of exponential improvement. But I think this is the most interesting development in AI since the transformer was invented. When the model itself stops being the moat, the entire industry shifts from research to engineering. From "can we make it smarter" to "what can we actually build with it."
The value gets created downstream. Not in the next half-percent of MMLU, but in the applications, workflows, and systems that sit on top of models that are good enough and getting cheaper by the quarter. The companies that figure out what to do with these tools are going to make a lot more money than the companies that build them.
I keep coming back to the same thought. When GPT-3 came out in 2020, OpenAI was basically alone at the frontier. When GPT-4 launched in 2023, there were maybe three competitors within a quarter of its capability. Today, there are at least six models that would have been the best in the world two years ago. The gap is gone. The moat is gone. What remains is a competitive landscape where the winners will be determined by execution, not intelligence. And that is a much healthier place to be.
The Observation Post — daily posts on tech, AI, and what matters.