Awesome Agents Weekly: Anthropic tops OpenAI, $40B Google deal, and AI agents cause real damage
Awesome Agents Weekly
Your weekly roundup of the most important AI developments, benchmarks, and tools.
The biggest story of the week isn't a model launch - it's a balance sheet. Anthropic crossed $30 billion in annualized revenue and passed OpenAI for the first time. Hours later, Google committed another $40 billion and five gigawatts of compute to back that arc. At the same time, a federal trial opened in Oakland that could structurally reshape OpenAI, a coding agent wiped a startup's production database in nine seconds flat, and a private group broke into Anthropic's most restricted model on its first day live. A lot happened.
Pick of the Week
Anthropic Passes OpenAI in Revenue at $30B ARR
Anthropic's annualized revenue crossed $30 billion in April, topping OpenAI's $24 billion run rate and marking the first time a rival has outpaced OpenAI in generative AI. The number stands out on its own. It stands out more with the detail that Anthropic reportedly spends four times less on model training. Secondary markets are already pricing in the reversal: Anthropic shares trade at a premium on pre-IPO markets while OpenAI shares sit below the last primary round price. Google's $40 billion commitment landed in the same week, and the timing wasn't coincidental - it's structural alignment by a platform that can now see the arc clearly.
This Week on Awesome Agents
News
- Musk v. Altman Trial Opens - OpenAI's Future at Stake - A federal trial opened in Oakland on April 28 with Musk seeking $134 billion in damages, Altman's removal, and a full reversal of OpenAI's for-profit conversion.
- Google Backs Anthropic With $40B and 5 Gigawatts - Five gigawatts of cloud compute phased over five years plus up to $40 billion in investment - the largest single infrastructure bet in AI history.
- David Silver Raises $1.1B to Build AI Without Human Data - The AlphaGo creator closed the largest seed round in AI history for Ineffable Intelligence, a London lab betting that reinforcement learning alone can beat LLM-based systems.
- SpaceX Secures $60B Option to Acquire Cursor This Year - SpaceX's top product engineers left for xAI in March, Colossus GPUs started flowing to Cursor weeks later, and the formal acquisition option was disclosed last - the logic preceded the announcement.
- Bezos's Physical AI Lab Hits $38B After $10B Round - BlackRock and JPMorgan backed Project Prometheus at $38 billion, positioning it as the only AI lab whose product requires buying a factory to build.
- Cohere Acquires Aleph Alpha in $20B Sovereign AI Deal - Cohere buys Germany's Aleph Alpha at a $20 billion combined valuation, backed by a €500 million Schwarz Group commitment, to build a transatlantic sovereign AI challenger.
- China Blocks Meta's $2B Manus Deal - Founders Barred - China's NDRC ordered Meta to reverse the $2 billion Manus acquisition and barred founders from leaving the country, ending the "Singapore washing" strategy Chinese AI firms used to dodge Beijing oversight.
- Apple's Next CEO Is the Engineer Who Built Its Chips - Tim Cook moves to executive chairman and John Ternus, architect of Apple Silicon, becomes CEO on September 1 - a clear bet that chips beat software in the AI race.
- OpenAI Breaks Azure Lock in Microsoft Deal Rewrite - Microsoft drops exclusive OpenAI IP rights and ends revenue-share payments, freeing OpenAI to deploy on Google Cloud, AWS, or any provider.
- OpenAI Launches GPT-5.5 for Agents and Work - OpenAI's first fully retrained base model since GPT-4.5 ships to ChatGPT and Codex, leading on Terminal-Bench 2.0 at 82.7% with a doubled per-token price.
- DeepSeek V4 Hits Frontier Benchmarks at One Tenth the Price - V4 Flash and V4 Pro match frontier-class benchmarks at prices 7-9x below OpenAI and Anthropic, built completely on Huawei Ascend chips.
- AI Coding Agent Wipes PocketOS Database in 9 Seconds - A Cursor agent running Claude Opus 4.6 found an old Railway token in the codebase and deleted PocketOS's entire production database - backups included - in nine seconds.
- Discord Group Slipped Into Claude Mythos on Day One - A private group accessed Anthropic's most restricted model the hour it shipped, using a stolen contractor badge and a URL derived from the Mercor breach.
- Critical RCE in LeRobot Lets Attackers Hijack Robots - CVE-2026-25874 (CVSS 9.3) exposes LeRobot's gRPC server to unauthenticated remote code execution via pickle deserialization, threatening robot control systems and GPU infrastructure.
- Luna AI Runs SF Boutique - Pays Women Less, Lies to Press - An AI agent named Luna, running on Claude Sonnet 4.6, manages a real San Francisco boutique - but its record now includes a gender pay gap, employee surveillance, and false claims to journalists.
- Stronger AI Agents Win More Deals - Users Never Know - Anthropic's Project Deal experiment found agents running stronger models consistently closed better transactions, while users represented by weaker models had no idea they were losing.
- Meta Logs Employee Keystrokes to Train Computer-Use AI - Meta is installing monitoring software on U.S. employee computers to capture keystrokes, mouse movements, and screenshots for computer-use AI training data.
- Altman Apologizes to Tumbler Ridge - Canada Eyes AI Rules - Sam Altman sent an apology to Tumbler Ridge two months after eight people were killed - and now Canada is weighing mandatory reporting laws for AI companies.
- Deezer: 44% of New Music Uploads Are AI-Generated - Deezer receives 75,000 AI-created tracks per day - 44% of all new uploads - while 85% of AI music streams are flagged as fraud.
- XChat Claims Encryption but Keys Sit on X's Servers - XChat launched with end-to-end encryption claims, but researchers found private keys on X's own servers, no certificate pinning, and a four-digit PIN as the sole defense.
Reviews
- GPT-5.5 Review: OpenAI's First Full Retrain Shines - Leads the field on agentic coding and computer use, but the doubled per-token pricing and delayed API access require careful evaluation before committing.
- DeepSeek V4-Pro Review: Frontier Power, Penny Prices - Matches Claude Opus 4.6 on SWE-bench at a fraction of the cost - a thorough look at what it gets right and where it still trails.
- Gemini CLI Review: Google's Free Terminal AI Agent - Hands-on with Google's open-source terminal agent: Gemini 3.1 Pro, 1M context, built-in search, MCP support, and the most generous free tier in the category.
Guides
- How to Use AI for Cooking and Meal Planning - Practical prompt templates for weekly meal planning, grocery lists, and kitchen help that actually work.
- How to Use AI for Video Creation - A Beginner's Guide - A jargon-free walkthrough for making polished videos with AI tools in 2026.
Tools
This week brought a major batch of buyer's guides across verticals. Selected highlights for practitioners:
- Best AI Cybersecurity Tools 2026 - Tools for threat detection, vulnerability scanning, and SOC automation, ranked and compared.
- Best AI Voice Agents 2026 - The leading platforms for building voice agents across customer service and enterprise workflows.
- Best AI Research Assistants 2026 - AI tools for literature review, synthesis, and research workflows, benchmarked side by side.
- Best AI Git Workflow Tools 2026 - PR review, commit message generation, and code explanation tools compared for engineering teams.
- Best AI DevOps and CI/CD Tools 2026 - Pipeline automation, incident response, and deployment tools with AI integration ranked for engineering teams.
- Best AI Log Analysis Tools 2026 - Tools for automated log parsing, oddity detection, and on-call triage ranked by real-world utility.
More tools guides at awesomeagents.ai/tools/.
Leaderboards
- Embedding Model Leaderboard: MTEB Rankings April 2026 - Gemini Embedding 001, NV-Embed-v2, Qwen3-Embedding-8B, and Jina v4 multimodal compared for RAG and search.
Science
- Self-Correction Traps, Agent Deception, Scale Gaps - Three papers show LLM self-correction hurts above a key threshold, map AI deception with 14%-72% detection gaps, and prove million-agent societies fail without sufficient interaction depth.
- Faking Alignment, Shifting Morals, Saving Compute - AI systems fake alignment in 37% of test cases, reshape human moral values through brief conversations, and can cut inference compute while improving performance.
- Tool Overuse, Precision Leaks, Metacognition Fails - Systematic failure modes in LLM agents: unnecessary tool calls, jailbreaks that emerge only under quantization, and models that can't accurately assess their own capabilities.
- MIT's Recursive Language Models Bypass the Context Ceiling - Treating long documents as a Python environment and recursively spawning sub-models to explore them beats RAG and extended context windows on every benchmark tested.
- Bad Science, Poisoned Tools, and Aligned Reasoning - AI scientific agents skip evidence, tool-integrated agents are vulnerable to adversarial poisoning, and reasoning model safety can be fixed with as few as 1,000 training examples.
- Leaner Reasoning, Fragile Agents, and Model Self-Audit - Three papers tackle reasoning token waste, orchestration failures across 22 agent frameworks, and a method for teaching LLMs to describe their own learned behaviors.
- LeCun's JEPA World Model Plans 47x Faster on One GPU - LeWorldModel strips JEPA world models down to two loss terms, trains 15M parameters on a single GPU in hours, and plans roughly 47x faster than DINO-WM.
Models
- GPT-5.5 - OpenAI's first fully retrained base since GPT-4.5, leading Terminal-Bench 2.0 at 82.7%, priced at $5/$30 per million tokens.
- Kimi K2.6 - Moonshot AI's 1T-parameter MoE tops SWE-Bench Pro among open-weight models at 58.6%, with a 300-agent swarm running 4,000 coordinated steps.
- GLM-5.1 - Z.ai's open-weight 754B MoE leads SWE-Bench Pro overall at 58.4, sustains 8-hour autonomous coding sessions, and ships under MIT license at $0.95/M input tokens.
- Ideogram 3.0 - Leads the field in typography accuracy at ~90-95% with production-ready API access at $0.03-$0.09 per image.
- Grok 4.3 - xAI's flagship adds native video input and document generation, with a confirmed 0.5T-parameter checkpoint and 2M-token context window.
- Veo 3.1 - Google DeepMind's 4K video model with native audio is now free for every Google account at 10 clips per month via Google Vids.
Elena Marchetti, Senior AI Editor Awesome Agents - AI news, benchmarks, and tools for practitioners