Awesome Agents Weekly: Verdict, IPO, and Anthropic Takes the Lead
Awesome Agents Weekly
Your weekly roundup of the most important AI developments, benchmarks, and tools.
The Oakland jury took less than two hours to dismiss every claim Elon Musk brought against OpenAI, and the trial testimony - Altman's revelation that Musk's opening ask was 90% equity - lands harder than the verdict itself. Anthropic had perhaps its strongest week to date: it surpassed OpenAI in enterprise adoption for the first time, spent $300M to acquire SDK startup Stainless, and committed $200M to the Gates Foundation for global health AI. Cerebras hit the Nasdaq with a 68% debut pop, Google I/O unveiled Android XR glasses with four hardware partners, and arXiv started handing out year-long bans for AI-produced submissions.
Pick of the Week
Cerebras $5.5B IPO Pops 68% - Biggest US Tech Debut Since 2020
Cerebras raised $5.55B and surged 68% on Nasdaq, the largest US tech debut since Snowflake in 2020. The company's wafer-scale chips occupy a real niche - dramatically faster inference than GPU clusters for the right workloads - and market enthusiasm reflects genuine appetite for alternatives to Nvidia's supply-constrained ecosystem. A 200x revenue multiple and a backlog controlled by a single customer are the two numbers that will matter more than the debut pop once the lockup period ends. If Cerebras can diversify its customer base before the euphoria fades, the valuation has a path; if it doesn't, the gap between the story and the financials will be hard to ignore.
This Week on Awesome Agents
News
- Musk Loses OpenAI Lawsuit - Jury Rejects All Claims - A unanimous Oakland jury dismissed all three counts in under two hours, finding them barred by the statute of limitations.
- Altman: Musk Wanted 90% of OpenAI From the Start - Altman testified that Musk's opening demand was 90% equity, a detail that reframes the entire founding partnership dispute.
- Anthropic Tops OpenAI in Business Adoption for First Time - Ramp's May 2026 AI Index shows 34.4% of businesses now pay for Anthropic, edging out OpenAI at 32.3% in a first-ever lead for the younger company.
- Anthropic Acquires Stainless SDK Startup for $300M - Anthropic bought the SDK automation startup behind developer tooling used by OpenAI, Google, and Cloudflare for more than $300M.
- Android XR Glasses Land with Samsung and Warby Parker - Google unveiled three Android XR smart glasses form factors at I/O 2026, backed by Samsung, XREAL, Warby Parker, and Gentle Monster as launch partners.
- Gemini Omni Leaks Before I/O - Inside Google's Video Plans - Google's next video model surfaced in the Gemini UI a week before I/O, showing editing-first features including in-chat watermark removal and object swapping.
- Open Agent Leaderboard: Model Beats Architecture - IBM Research tested 25 agent configurations across 6 real-world benchmarks and found backbone model choice matters 58x more than agent framework design.
- arXiv Hits Researchers With 1-Year Ban for AI Slop - arXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.
- xAI Runs 46 Gas Turbines Near Memphis - NAACP Sues - xAI is operating 46 gas turbines at its Southaven data center, five above its state permit, as the NAACP seeks an emergency court order over Clean Air Act violations.
- Thinking Machines Builds AI That Listens While Talking - Mira Murati's startup unveiled TML-Interaction-Small, a 276B MoE model hitting 0.40-second response latency by listening and creating speech at the same time.
- Anthropic and Gates Foundation Put $200M on Global AI - Anthropic commits $200M in grants, Claude credits, and technical support to the Gates Foundation for AI in global health, education, and agriculture over four years.
- ChatGPT Gets Bank Access - Day After Data Lawsuit Filed - OpenAI launched ChatGPT Personal Finance with read access to 12,000+ banks via Plaid one day after a class action alleged it had shared user conversations with Meta and Google.
- Microsoft Drops Claude Code for GitHub Copilot Desktop - Microsoft is canceling thousands of Claude Code licenses for engineers in its Experiences + Devices division and replacing them with GitHub Copilot Desktop.
- Google Turns to SpaceX for Orbital AI Data Centers - Bloomberg reports Google is in talks with SpaceX to launch Project Suncatcher satellites, TPU-equipped spacecraft designed to run ML workloads in low Earth orbit.
- Suleiman Claims AI Takes White-Collar Jobs in 18 Months - Microsoft AI CEO Mustafa Suleiman says professional jobs face automation within 18 months, but independent research paints a more complicated picture.
- US and China Agree to AI Guardrails at Beijing Summit - Trump and Xi announced plans for a formal AI safety channel at their Beijing summit, though expert skepticism and stalled chip deals complicate the picture.
- OpenAI May Sue Apple as ChatGPT-Siri Deal Implodes - OpenAI hired outside lawyers to explore a breach-of-contract case after its ChatGPT-Siri integration failed to generate anything near the expected revenue.
- Cisco Bets $9B on AI Networking, Cuts 4,000 Jobs - Cisco reported record quarterly revenue of $15.8B and immediately announced 4,000 layoffs, raising its FY2026 AI infrastructure order target from $5B to $9B.
- OpenAI Ships Codex Mobile App for iOS and Android - OpenAI's Codex coding agent arrives on iPhone and Android as a remote control for desktop sessions, with QR code pairing and live terminal output for its 4 million weekly users.
- Vapi Raises $50M After Amazon Ring Picks It Over 40 Rivals - Vapi closed a $50M Series B at a $500M valuation after Amazon Ring routed 100% of its inbound calls through the platform, chosen over 40 competing voice AI vendors.
- Notion 3.5 Turns the Workspace Into an Agent Hub - Notion 3.5's Developer Platform adds Workers, live database sync, and first-class Claude Code and Cursor support, turning the workspace into an AI agent orchestration layer.
- Raindrop Workshop Gives AI Agents a Local Debugger - Raindrop's MIT-licensed Workshop streams every token and tool call from your AI agent to a local browser dashboard, then lets Claude Code write and fix evaluations automatically.
Reviews
- Augment Cosmos Review: Building the Agent OS - Augment Cosmos enters public preview as a team-level operating system for AI-driven development, though $200 per developer per month is a steep bet on ambition over proven ROI.
- SubQ Review: 52x Faster, but Show Your Work - Subquadratic's SubQ claims the first linear-scaling LLM with a 12M-token window, but private beta access and a 17-point MRCR gap make independent verification the only test that matters.
Guides
- How to Use AI for Note-Taking: A Beginner's Guide - A practical walkthrough covering AI tools for meeting transcription, summarization, and building a searchable personal knowledge base.
- How to Use AI for Legal Documents - A Beginner's Guide - A jargon-free guide to reading, summarizing, and understanding legal documents with AI, without replacing your lawyer.
Tools
- Best GitHub Copilot Alternatives in 2026 - Seven tested alternatives ranked by use case, with verified pricing and honest trade-offs.
- Best Devin Alternatives in 2026 - Seven autonomous coding agent alternatives compared on autonomy, pricing, and workflow fit.
- Best Cursor Alternatives in 2026 - Seven verified Cursor alternatives compared on IDE support and real workload fit, from Windsurf to open-source tools like Cline and Aider.
- Best ChatGPT Alternatives in 2026 - Eight alternatives compared on pricing, context limits, and real-world performance, from Claude and Gemini to DeepSeek and self-hosted setups.
- Best Claude Alternatives in 2026 - Seven models compared on API cost, context window, coding performance, and data privacy.
- Best Perplexity Alternatives in 2026 - Seven alternatives compared on citation quality, pricing, and research depth, from ChatGPT Search to developer-focused Exa.
- Best Replit Agent Alternatives in 2026 - Seven alternatives compared on stack support, pricing, and deployment, from Bolt.new and Lovable to Cursor and Windsurf.
- Best Notion AI Alternatives in 2026 - Seven alternatives compared on AI depth, pricing, and data control, including open-source options like Obsidian and Anytype.
- Best Midjourney Alternatives in 2026 - Seven alternatives compared on image quality, pricing, and commercial rights, from FLUX.1 Pro and Ideogram 3.0 to free options.
- Best Jasper Alternatives in 2026 - Seven alternatives compared on price, writing quality, and workflow fit.
- Best AI-Integrated Offensive Security Tools in 2026 - Ten offensive security tools ranked by AI integration depth, from Burp Suite and Nuclei to Ghidra and Metasploit.
- Best AI Tools for Teachers and Educators in 2026 - A roundup of K-12 and higher education AI tools with current pricing and honest limitations.
- Best AI Tools for Lawyers in 2026 - Seven tools compared on pricing and task fit, from CoCounsel and Lexis+ AI to Briefpoint and EvenUp.
- Best AI Tools for Healthcare in 2026 - Seven tools for doctors compared on clinical accuracy and task fit, from OpenEvidence and Glass Health to Freed.
- Best AI Tools for Sales Teams and SDRs in 2026 - Seven tools compared on pricing, autonomy, and pipeline fit, from Gong and Salesloft to 11x.ai and Nooks.
- Best AI Tools for Recruiters and HR in 2026 - Seven tools compared on sourcing depth and workflow fit, from Manatal and Workable to Ashby and Humanly.
- Best AI Tools for Real Estate Agents in 2026 - Seven tools compared on pricing, workflow fit, and ROI, from lead nurturing to predictive seller targeting.
- Best AI Tools for Marketing Agencies in 2026 - Seven tools compared on pricing and white-label fit, from AgencyAnalytics to Madgicx and Supermetrics.
- Best AI Tools for Accountants and CPA Firms in 2026 - Seven tools compared on practice management, audit automation, and tax workflow.
- Best AI Tools for Construction in 2026 - Seven tools compared on workflow stage and ROI, from Procore and Autodesk Build to Togal.AI and OpenSpace.
Leaderboards
- GAIA Benchmark Leaderboard: Best AI Agents May 2026 - Rankings of AI models and agent frameworks on GAIA, which tests real-world multi-step tasks requiring web browsing, tool use, and multi-hop reasoning.
Science
- Self-Correcting Models, Smarter Monitors, AI Designs Itself - Three papers tackle critique dependency in LLMs, ensemble monitoring for AI control, and agents that autonomously discover better neural architectures.
- Physics Predicts AI Risk, Math Still Hard, Tokens Saved - A physics formula predicts AI behavioral shifts before they happen, LLMs still fail at 90% of graduate math formalization tasks, and a training-free method cuts synthetic data costs by up to 78%.
- Olympiad Gold, Broken Memories, and Attention Loss - A 30B model earns IMO gold, memory consolidation silently corrupts agents, and a new metric predicts when LLMs lose track of their instructions.
Models
- Gemini 2.5 Pro - Google DeepMind's flagship thinking model with a 1M-token context window, 84% GPQA Diamond, and native multimodal understanding across text, images, audio, and video.
- Claude Opus 4.5 - Anthropic's November 2025 flagship with top SWE-bench scores, a new effort parameter for reasoning control, and a 66% price cut from its predecessor.
- GPT-4.1 - OpenAI's coding-optimized API model with a 1M-token context window, 54.6% SWE-bench Verified, and $2/$8 per million token pricing.
- Claude Haiku 4.5 - Anthropic's fastest model with 73.3% on SWE-bench Verified and first-in-family extended thinking and computer use at $1/$5 per million tokens.
- Claude 3.7 Sonnet - Anthropic's first hybrid reasoning model with togglable extended thinking, a 200K context window, and state-of-the-art SWE-bench performance.
- SU-01 - Shanghai AI Lab's 30B MoE reasoning model that earns gold-medal performance on IMO 2025, USAMO 2026, and IPhO 2024/2025.
- HiDream-O1-Image - A 8B open-source text-to-image model that outperforms 32B FLUX.2 [dev] across five major benchmarks.
- SubQ - The first LLM built on a fully subquadratic attention architecture, with a 12M-token research context and 52x faster inference than FlashAttention at 1M tokens.
Elena Marchetti, Senior AI Editor Awesome Agents - AI news, benchmarks, and tools for practitioners