Awesome Agents Weekly: Musk Admits Grok Used OpenAI Models
Awesome Agents Weekly
Your weekly roundup of the most important AI developments, benchmarks, and tools.
The week's biggest story unfolded in an Oakland courtroom, where Elon Musk testified under oath that xAI "partly" distilled OpenAI's models to train Grok - the same practice US labs have spent months condemning when Chinese firms do it. That wasn't the only billion-dollar news: Anthropic is reportedly eyeing a $900B valuation on a potential $50B raise, and OpenAI countered reports it missed its own revenue targets. AI medicine posted two strong peer-reviewed studies, a Cursor agent wiped a startup's production database in nine seconds, and the week's security scorecard was rough for the AI tool stack.
Pick of the Week
Musk Admits xAI Distilled OpenAI Models for Grok
The admission came under oath on the first day of the Musk v. Altman trial, and the irony is hard to miss. Musk said xAI "partly" used OpenAI's models to bootstrap Grok's training - the same technique he and other US lab executives have framed as intellectual property theft when applied by Chinese labs to American models. "Partly" is doing a lot of work in that sentence, and Musk's lawyers will work hard to narrow what it means. But the testimony is sworn, public, and now part of a federal trial record that'll be looked at long after the verdict. It also raises a genuine question about whether US courts will apply different standards to domestic distillation than to the imports they've been busy restricting.
This Week on Awesome Agents
News
- OpenAI, Anthropic Race to Build Their Own Palantir - Both companies announced PE-backed enterprise deployment ventures on the same day, valued at a combined $11.5B and built on the forward-deployed engineer model Palantir made famous.
- Musk v. Altman Trial Opens - OpenAI's Future at Stake - A federal trial over OpenAI's nonprofit-to-for-profit conversion opened in Oakland with Musk seeking $134B in damages and Altman's removal.
- Anthropic Weighs $50B Raise - Could Top OpenAI - Anthropic is reportedly considering a raise at up to a $900B valuation, which would make it the world's most valuable private AI company.
- OpenAI Misses Revenue Targets - IPO in Doubt - The WSJ reported OpenAI missed multiple monthly revenue targets and its 1B user goal, sending chip stocks lower and raising fresh questions about its IPO timeline.
- Microsoft AI Revenue Hits $37B as Copilot Tops 20M Seats - Q3 FY2026 earnings put Microsoft's AI business at $37B annualized, up 123%, with Copilot enterprise seats at 20M and new contract signings up 228%.
- OpenAI Moves to AWS One Day After Microsoft Exclusivity Ends - The moment Microsoft's exclusive license expired, OpenAI launched GPT-5.4, Codex, and Managed Agents on Amazon Bedrock in limited preview.
- Mayo Clinic AI Spots Pancreatic Cancer 3 Years Early - REDMOD detects 73% of pancreatic cancers in CT scans that look normal to radiologists, nearly double the 39% rate specialists achieve, confirmed across multiple institutions.
- OpenAI o1 Outperforms ER Doctors in Harvard Trial - A peer-reviewed Science study put o1 through 76 live ER cases and found 67.1% initial triage accuracy against 55% and 50% for expert physicians.
- David Silver Raises $1.1B to Build AI Without Human Data - The creator of AlphaGo closed the largest seed round on record for Ineffable Intelligence, a London lab building AI that learns purely through experience with no human-labeled data.
- Huawei Eyes $12B as Nvidia Cedes China AI Market - Bernstein projects Nvidia's China AI chip share falls from 66% to 8% in 2026, while Huawei targets $12B in revenue with ByteDance alone committing $5.6B in Ascend 950PR orders.
- Pentagon Clears 8 AI Firms for Classified Networks - The Pentagon signed AI agreements with eight firms for its most classified military networks, pointedly excluding Anthropic even as its Mythos model is reportedly already in use at the NSA.
- AI Coding Agent Wipes PocketOS Database in 9 Seconds - A Cursor agent powered by Claude Opus 4.6 found an old Railway token in the codebase and deleted PocketOS's entire production database - backups included - in under ten seconds.
- OpenAI Faces $1B Lawsuit After Ignoring Shooting Flags - Seven families filed federal lawsuits seeking over $1B after the company allegedly ignored its own safety team's warnings before the Tumbler Ridge shooting.
- LiteLLM Exploited 36 Hours After Vulnerability Disclosure - Attackers hit CVE-2026-42208, a pre-auth SQL injection in LiteLLM proxy, within 36 hours of the public advisory, targeting database tables holding API keys for every upstream AI provider.
- Critical RCE in LeRobot Lets Attackers Hijack Robots - CVE-2026-25874 (CVSS 9.3) exposes LeRobot's gRPC server to unauthenticated remote code execution via pickle deserialization, threatening robot control systems and GPU infrastructure.
- Chrome Installs 4 GB Gemini Nano Without Asking - Google Chrome silently installs a 4 GB Gemini Nano model on user devices with no consent prompt and re-downloads it if you delete it.
- Microsoft Sneaks Copilot Credit Into VS Code Commits - VS Code 1.118 defaulted Copilot as co-author on every git commit - even with AI features turned off - triggering developer backlash and a promised revert in 1.119.
- Academy's New Rules Bar AI Performances and Scripts - The Academy banned AI-generated actors and AI-authored scripts from Oscars eligibility, codifying union positions weeks after the WGA secured its new four-year studio deal.
- Mistral Ships Medium 3.5 With Cloud Coding Agents - Mistral released Medium 3.5, a 128B open-weights model scoring 77.6% on SWE-Bench Verified, with async cloud coding agents in Vibe that open pull requests while you're offline.
- Nemotron 3 Nano Omni Unifies Vision, Audio, Language - NVIDIA's open omni model activates 3B of 30B parameters and processes video, audio, and documents in a single pass, with up to 9.2x higher throughput than comparable open omni models.
- Nebius Buys Eigen AI for $643M to Own Inference - Nebius bought a 20-person MIT inference startup whose founders invented AWQ and Sparse Attention, betting that knowing how to use Nvidia hardware better than anyone else is the real moat.
- Cisco Buys Astrix for $400M to Lock Down AI Agent Keys - Cisco closed its acquisition of Astrix Security, folding a non-human identity platform into Cisco Identity Intelligence to govern the API keys and OAuth tokens powering enterprise AI agents.
Reviews
- Claude Mythos Preview Review: Escaped Its Sandbox - Mythos Preview posts the highest SWE-bench score ever recorded, found thousands of real zero-days in production software, and during safety testing escaped its sandbox to email a researcher eating lunch outside.
- Qwen 3.6 Max Review: Alibaba's Coding Contender - Qwen3.6-Max-Preview tops six coding benchmarks and ranks third globally, but its closed-weights pivot and 256K context ceiling complicate an otherwise strong case.
Guides
- How to Make Music with AI - A Beginner's Guide - A practical walkthrough for creating your first AI-produced song using Suno and Udio, no musical training required.
- How to Use AI for Fitness and Workout Planning - A beginner's guide to building personalized workout plans using ChatGPT, Fitbod, and Freeletics.
Tools
- GPT-5.5 vs Claude Opus 4.7: Benchmarks and Pricing - A head-to-head comparison of April's two biggest model launches: 1M context, agentic coding focus, and per-token pricing compared - one leads on math and long-context retrieval, the other on software engineering and vision.
Leaderboards
- Cost Efficiency Leaderboard: Best AI Performance Per Dollar - Updated May 2026 rankings with DeepSeek V4, GPT-5.5, and Kimi K2.6; DeepSeek V3.2 still holds the API value title at $0.28/M input tokens.
Models
- DeepSeek V4 - DeepSeek's latest open-weight MoE ships in two variants - V4-Pro at 1.6T/49B active and V4-Flash at 284B/13B active - both with 1M-token context under MIT license.
- Mistral Medium 3.5 - Mistral's 128B dense model with configurable reasoning, vision, and 77.6% SWE-Bench Verified; self-hostable on four GPUs.
- Nemotron 3 Nano Omni - NVIDIA's open omni-modal model processes text, images, audio, and video in a single inference loop with 9x higher throughput than comparable open omni models.
Science
- Async RL Speedups, Unsafe Robots, and Routing Math - Three papers cover a 2-4x async RL training speedup, a 54.4% safety violation rate in medical robot simulations, and a training-free routing trick that lifts math accuracy 3-7%.
- Prompt Traps, Swarm Failures, and AI-Discovered Physics - New research shows when few-shot examples hurt scientific reasoning, why homogeneous agent swarms lock in errors, and how an AI autonomously found a novel physical mechanism.
- Tool-Use Tax, Jailbreak Risk, and Robot Vision - Three papers: tools slow LLM agents under noisy prompts, jailbreaks barely reduce frontier model capabilities (Opus 4.6 lost only 7.7%), and interleaved text-vision traces push robot task success to 95.5%.
Elena Marchetti, Senior AI Editor Awesome Agents - AI news, benchmarks, and tools for practitioners