The Observation Post logo

The Observation Post

Archives
Log in

The Open Model Revolution — When AI Stopped Needing Permission

The Observation Post

Tech · AI · Cyber · Defence

The Open Model Revolution — When AI Stopped Needing Permission

25 May 2026 · 8 min read

Open Model Revolution

The Quietest Revolution in AI

Something shifted in the last six months, and most people didn't notice.

Open-weight AI models — freely downloadable, runnable on your own hardware — crossed a threshold. They stopped being "almost as good" as the proprietary APIs and started being genuinely competitive. For some tasks, they're better. For privacy-sensitive use cases, they're the only rational choice.

The numbers tell the story. Llama 4 from Meta, Qwen 3 from Alibaba, DeepSeek V4, Mistral Large — every major lab now releases open-weight models that score within 2-5% of GPT-4o and Claude Opus on benchmarks. And they run on a single workstation, not a datacenter.

What Changed

Three things happened at once:

1. Training efficiency improved dramatically. DeepSeek proved you could train a frontier model for under $6M — a tenth of what OpenAI spends. Their Mixture-of-Experts architecture meant the model only activates 37B of its 671B parameters per token, slashing inference costs.

2. Quantization became practical. Running a 70B model at 4-bit precision uses 35GB of VRAM — that's a single NVIDIA RTX 6000 Ada, or two consumer GPUs. A year ago, you needed a cluster.

3. The API pricing dropped below cost. Providers like Together AI, Fireworks, and Groq now serve open models at prices that make running your own seem uneconomical — until you hit scale. At 10M tokens/day, self-hosting breaks even. At 100M/day, it's 5x cheaper.

Real Implications

For privacy: Healthcare, legal, finance, and defense organizations can now run state-of-the-art AI entirely within their own network. No data ever leaves. The EU's AI Act and India's DPDP Act make this more than a preference — it's becoming a compliance requirement.

For cost: A company processing 1 billion tokens per month pays roughly $5,000 on GPT-4o. Self-hosting DeepSeek V4: about $800 in GPU rental. That's an 84% savings, every month.

For control: When OpenAI changes their API, your app can break. When they deprecate a model, you migrate on their timeline. With open models, you freeze the version, fine-tune it on your data, and deploy it behind your own API — total control.

The Catch

Open models are not free in effort. You need someone who can set up vLLM or llama.cpp, configure GPU infrastructure, and handle updates. The total cost of ownership includes engineering time, not just GPU hours.

But the ecosystem is maturing fast. One-click deploys through RunPod, Banana, and Replicate. Managed inference APIs that let you bring your own model. Apple's MLX running 70B models on a MacBook. The friction is dropping every quarter.

What This Means

The debate isn't "open vs closed" anymore. It's "when do you make the switch." For most organizations, the answer is: start experimenting now, plan the migration for Q3 2026, and by Q4 you'll wonder why you ever paid per token.

The open model revolution isn't coming. It's already here, running on someone else's GPU while their competitors are still counting tokens.

Read on web →

The Observation Post — daily posts on tech, AI, and what matters.

Don't miss what's next. Subscribe to The Observation Post:
Powered by Buttondown, the easiest way to start and grow your newsletter.