Awesome Agents Weekly: Anthropic Overtakes OpenAI, MCP Under Fire

Your weekly roundup of the most important AI developments, benchmarks, and tools.

        April 21, 2026

Awesome Agents Weekly: Anthropic Overtakes OpenAI, MCP Under Fire

        Awesome Agents Weekly
Your weekly roundup of the most important AI developments, benchmarks, and tools.
This week had two dominant stories running in parallel: a historic revenue flip at the top of the AI industry, and a wave of serious security findings across the agent ecosystem. Anthropic crossed $30B ARR and overtook OpenAI for the first time - while researchers and auditors found critical flaws in MCP servers, agent routers, and at least two major platforms. Claude Opus 4.7 also shipped, Amazon doubled down with a $25B commitment, and the Stanford AI Index dropped its bleakest transparency data yet.
Pick of the Week
Anthropic Passes OpenAI in Revenue at $30B ARR
This one matters beyond the number itself. For the first time since ChatGPT launched generative AI into the mainstream, a rival has outpaced OpenAI on revenue - and that rival is Anthropic, which has spent most of its life positioning itself as the safety-first alternative to the move-fast crowd. Crossing $30B ARR, ahead of OpenAI's $24B run rate, is both a commercial milestone and a signal about what enterprise buyers actually want. The timing - simultaneous with Amazon's additional $25B commitment and news of Dario Amodei meeting senior White House officials - makes it hard to read this as anything other than a genuine inflection point for the industry.
This Week on Awesome Agents
News
Industry and Business

Anthropic Passes OpenAI in Revenue at $30B ARR - Anthropic's annualized revenue crossed $30 billion in April, overtaking OpenAI's $24 billion run rate for the first time.
Amazon Bets $25B on Anthropic and 5GW of Trainium - Amazon adds up to $25B more to its Anthropic stake, with Anthropic committing over $100 billion to AWS infrastructure over the next decade.
Stanford 2026 AI Index: Cash In, Transparency Out - Global AI investment hit $581B in 2025 while foundation model transparency scores fell by a third.
AI Labs Are Losing Billions - Here's Who Really Pays - OpenAI burned $2.5B cash on $4.3B revenue in H1 2025; Anthropic cut gross margin forecasts from 50% to 40%.
Cursor Targets $50B Valuation - Enterprise Now Pays the Bills - Cursor is in advanced talks for a $2B+ raise at $50B pre-money, nearly double its November figure, with enterprise clients now driving 60% of revenue.
Factory Raises $150M to Scale Enterprise AI Droids - Factory closed a $150M Series C at $1.5B to expand autonomous agents handling full software development lifecycles.
TSMC Q1: $35.9B Record as AI Now Powers 61% of Revenue - AI and HPC now account for 61% of TSMC's wafer sales, with CoWoS packaging still fully booked.
74% of AI's Gains Flow to Just 20% of Firms - PwC - A PwC survey of 1,217 executives finds 74% of AI's economic returns concentrate in just 20% of companies, while 56% of CEOs report no measurable benefit.

Model Releases

Claude Opus 4.7 Is Here - Less Supervision, Better Vision - Anthropic releases Claude Opus 4.7 with 3x higher resolution vision, a new xhigh effort level, task budgets for cost control, and cyber safeguards.
Kimi K2.6 - Open Weights, 300 Agents, Top Coding Score - Moonshot AI releases Kimi K2.6 under Modified MIT with open weights, 300-agent swarm execution, and the top SWE-Bench Pro score among open models.
Alibaba's Qwen3.6-Max Ships Closed - Tops Six Coding Evals - Alibaba's first closed-weights flagship ranks third globally on the Artificial Analysis Intelligence Index while topping six coding benchmarks.
OpenAI Releases GPT-Rosalind for Drug Discovery - A frontier reasoning model for biology that outranked human experts on RNA prediction and competes directly with AlphaFold.
Physical Intelligence Launches π0.7 for Untrained Tasks - PI's robot model generalizes to never-trained tasks by recombining skills compositionally, matching specialist fine-tunes.
NVIDIA Lyra 2.0 - Explorable 3D Worlds from One Photo - NVIDIA's Spatial Intelligence Lab released Lyra 2.0, a 14B model that turns a single photograph into a navigable 3D environment under a research-only license.

Security

MCP's STDIO Flaw Puts 200K AI Servers at Risk - Ox Security found MCP's STDIO transport executes arbitrary OS commands before validating the server, exposing 200K+ instances across every major AI coding tool.
The Claw Security Ledger: 10 Products in the Dock - An audit of ten Claw-branded AI agent products found 11 live CVEs, 130 published advisories, 1,184 malicious marketplace skills, and one leaked SSL private key - concentrated almost entirely in a single vendor.
9 of 428 LLM Routers Were Secretly Hijacking Agent Calls - UC Santa Barbara researchers found 9 of 428 third-party LLM routers injecting malicious tool calls, draining crypto, and stealing AWS credentials from agent sessions.
Lovable Users Report Leak of Chats, Code, Credentials - Free Lovable accounts can still read other users' AI chat histories, source code, and database credentials on projects created before November 2025.
Vercel Breach Traced to AI Office Suite OAuth Token Theft - Vercel confirms an April 19 intrusion that pivoted from compromised Context.ai OAuth tokens into internal systems holding customer environment variables.
MCP Marketplace Audit: 32% of Servers Are Stale - An audit of 11,447 MCP servers across four registries found nearly a third haven't been touched in six months.

Policy and Geopolitics

NSA Uses Mythos Even as Pentagon Blacklists Anthropic - NSA is running Anthropic's Mythos Preview while its parent department, the Pentagon, fights to keep Anthropic out of federal systems.
The Left Hand Bans What the Right Hand Deploys - The Trump administration is simultaneously suing Anthropic over a supply chain risk designation and sending Treasury officials to convince banks to use Claude.
Trump Says 'Who?' as His Own Staff Courts Anthropic - Dario Amodei met with Susie Wiles and Scott Bessent at the White House while Trump, on a Phoenix runway, said he had "no idea" about the meeting.
Japan Forms $6B AI Alliance to Rival US and China - SoftBank, Sony, Honda, and NEC formed Japan AI Foundation Model Development with $6.3B in government backing to build a trillion-parameter physical AI model.
Google Bids for Pentagon's Classified Gemini Contract - Google is negotiating to deploy Gemini on classified Pentagon networks - the same tier Anthropic was blacklisted for refusing to serve without safeguards.

Infrastructure and Developer Tools

Snap Fires 1,000 as AI Now Writes 65% of Its Code - Snap cut 16% of its workforce citing AI-produced code as the direct cause; the stock jumped, 1,000 employees didn't.
A $900 RTX 3090 Now Beats an M5 Max at LLM Inference - Researchers fused all 24 layers of Qwen 3.5-0.8B into a single CUDA kernel, delivering 1.8x the throughput of a M5 Max - the gap was software, not silicon.
Linux Kernel Finally Sets Rules for AI-Assisted Code - Linux 7.0 ships an official AI code policy: disclose AI tool usage with an Assisted-by tag and keep humans accountable for every line.
Anthropic Safety Overseer Gets Board Majority at Last - Anthropic's Long-Term Benefit Trust appointed Novartis CEO Vas Narasimhan, giving its independent safety overseers a board majority for the first time.
OpenAI Gives Codex Desktop Control and 111 Plugins - Codex now runs background computer use on Mac, adds in-app browsing, image generation via gpt-image-1.5, and 111 new plugins.
Claude Code Desktop Gets a Ground-Up Rebuild for Parallel Work - Anthropic rebuilt Claude Code's desktop app with an integrated terminal, in-app file editing, a diff viewer, SSH on Mac, and parallel session management.
Cal.com Closes Its Source Code, Blames AI Hackers - Cal.com moved its core codebase private after five years of open source, arguing AI tools make public code 5-10x easier to exploit.
Claude Beat Human Alignment Researchers - Then Failed - Nine Claude Opus 4.6 agents hit 97% on a core alignment benchmark vs. 23% for humans - then showed no statistically significant improvement in production.

Reviews

Claude Opus 4.7 Review: Coding Giant, Mixed Signals - Leads SWE-bench and agent benchmarks but regresses on web research, inflates token costs by up to 35%, and trades prose quality for literal instruction-following.
GPT-5.4-Cyber Review: Defensive AI, Controlled Access - A fine-tuned defensive security model with lowered refusal thresholds and binary reverse engineering, but access is identity-gated through the Trusted Access for Cyber program.
GLM-5.1 Review: Open-Source Model Tops SWE-Bench Pro - Z.ai's 754B open-weight model claims the top spot on SWE-Bench Pro without a single NVIDIA chip - here's how it holds up in practice.

Guides

How to Use AI for Travel Planning in 2026 - A beginner's guide to building full trip itineraries with AI, from destination selection to day-by-day schedules and packing lists.
How to Build AI Presentations - A Beginner's Guide - How to use Gamma, Canva, and PowerPoint Copilot to build polished decks in minutes, even without design experience.

Tools
This week brought a large directory refresh across 40+ tool categories. A few standout additions:

Best Open-Weights AI Models 2026 - Top picks by size tier, from 400B+ MoE giants to 1B edge models, with benchmark scores and deployment hardware.
Best AI Observability Tools 2026 - LangSmith, Langfuse, Arize Phoenix, WhyLabs, and more compared across LLM tracing, eval, and production monitoring.
Best Open-Source LLM Inference Servers 2026 - vLLM, SGLang, TGI, llama.cpp, and TensorRT-LLM benchmarked head to head.
Best AI Deep Research Tools 2026 - OpenAI, Claude, Perplexity, Gemini, Grok, Exa, and Elicit compared for accuracy and pricing.
Best AI Fine-Tuning Platforms 2026 - 14 managed and open-source platforms with verified pricing, supported methods, and a decision matrix.

Leaderboards
A full leaderboard refresh dropped across 20+ benchmark categories. Key updates:

Overall LLM Rankings: April 2026 - Comprehensive ranking combining reasoning, coding, knowledge, and cost-adjusted value across 12 frontier and open-weight models, updated with Claude Opus 4.7 and Qwen 3.6.
SWE-Bench Coding Agent Leaderboard 2026 - Pass rates, pricing, and scaffold notes for the top software engineering agents, updated with Claude Opus 4.7 and Kimi K2.6.
Jailbreak and Red-Team Resistance Leaderboard - How 14 frontier LLMs hold up against adversarial prompts, injection, and harmful-behavior elicitation across HarmBench, AdvBench, and AgentHarm.
Web Agent Benchmarks Leaderboard: Apr 2026 - Verified scores for browser-driving AI agents across WebArena, WebVoyager, BrowseComp, Mind2Web, and more.
Vision-Language Benchmarks: Image Reasoning Ranked - AI model rankings on MMMU, MathVista, ChartQA, DocVQA, and more, updated to reflect Claude Opus 4.7's vision improvements.

Science

LeCun's JEPA World Model Plans 47x Faster on One GPU - LeWorldModel strips JEPA world models to two loss terms, trains 15M parameters on a single GPU in hours, and plans roughly 47x faster than DINO-WM.
Distillation Leaks, Weak Agents, and Research Sabotage - New papers show distillation silently transfers unsafe behaviors, weak agents bottleneck multi-agent pipelines, and frontier AI can't reliably audit sabotaged ML research.
MoE Routing, Prompt Gambles, and Where Reasoning Breaks - Three papers challenge assumptions in MoE routing design, prompt optimization workflows, and LLM reasoning chains.
LLM Chaos, AI Peer Review, and Auto Fine-Tuning - Floating-point chaos in transformers, GPT-5 reviewing 22,977 AAAI papers, and an agent that automates LLM fine-tuning better than human experts.
Compact Contexts, Smarter Fine-Tuning, and the Solver Trap - A joint fix for KV cache bloat and attention cost, new evidence that fine-tuning belongs in the middle of a transformer, and why stronger reasoning hurts behavioral simulation.
MoE Myths, Context Compression, and Steering Proofs - Three papers challenge how we think about MoE expert routing, LLM context management, and the limits of activation steering.

Models

Claude Opus 4.7 - Anthropic's latest flagship with 3x higher resolution vision, xhigh effort level, and 13% better coding at unchanged pricing.
Kimi K2.6 - Moonshot AI's 1T-parameter MoE with 32B active per token, 300-agent swarm execution, and the top SWE-Bench Pro score among open weights.
Qwen3.6-Max-Preview - Alibaba's first closed-weights flagship with 256K context, topping six agentic coding benchmarks and ranking third on the global intelligence index.
Qwen 3.6-35B-A3B - A 35B sparse MoE activating only 3B parameters per token, scoring 73.4% on SWE-bench Verified with vision and video support under Apache 2.0.
GPT-5.4-Cyber - OpenAI's defensive security fine-tune with 88.23% on professional CTFs, access gated through the Trusted Access for Cyber program.
GPT-Rosalind - OpenAI's first domain-specific reasoning model for biology and drug discovery, with a 0.751 BixBench score in US-only research preview.
Veo 3.1 - Google DeepMind's 4K video model with native audio, now free for every Google account at 10 clips per month via Google Vids.
Gemini 3.1 Flash TTS - Google's voice model with 30 voices, 70+ languages, 200+ inline audio tags, and Elo 1,211 on the Artificial Analysis TTS Arena.
EXAONE 4.5 - LG AI Research's 33B open-weight vision-language model with 262K context and STEM scores above GPT-5-mini, under a non-commercial research license.

Elena Marchetti, Senior AI Editor
Awesome Agents - AI news, benchmarks, and tools for practitioners

                                Don't miss what's next. Subscribe to Awesome Agents:

            Email address (required)

                    ← Newer

                Awesome Agents Weekly: Anthropic tops OpenAI, $40B Google deal, and AI agents cause real damage

                    Older →

                Awesome Agents Weekly: Benchmarks broken, AI finds zero-days at scale