Silv's AI Weekly: Gemini 3.5 Flash, Karpathy to Anthropic, Mythos kernel exploit
Hey — here's this week's AI roundup from silv.blog.
Curated from 225 tweets liked by @mattsilv, May 11 - May 19, 2026:
AI for Everyone
Gemini 3.5 Flash goes live at Google I/O at 3x the price of the previous Flash. Sub-200ms latency on most queries, benchmarked around 92% of GPT-5.5 on coding and reasoning. Available in AI Studio and the API today. Read more →
Gemini Omni, Spark, an AI pointer, and a new $100 Ultra tier. Omni does coherent 9-second video with character consistency and synchronized audio (live in the app). Spark is a 24/7 background agent rolling out to U.S. Ultra subscribers next week. The pointer lets you hover-and-talk to any element on screen. Google's top tier dropped from $250 to $200, with a new $100 entry point. Read more →
Andrej Karpathy joined Anthropic. Independent since OpenAI, he picked Anthropic with one line: "the next few years at the frontier of LLMs will be especially formative." He also committed to resuming his education work. Read more →
Anthropic passed OpenAI in business adoption for the first time, per Ramp's AI Index. 34.4% to 32.3% on actual corporate card spend from 50,000+ businesses. Anthropic's adoption quadrupled in a year; OpenAI's grew 0.3%. Ramp also flagged the risks — usage-based pricing misalignment, recent service issues, cheaper open-source inference platforms growing fast. Read more →
Researchers used Anthropic's Mythos model to bypass Apple's M5 Memory Integrity Enforcement. Bug to working kernel exploit in six days, then walked into Apple Park to deliver the report in person. The broader point: AI can now chain low-severity vulnerabilities into working exploits. The right response is faster regression test coverage, not faster patching. Read more →
The token number behind everything. Google's I/O chart: 9.7T tokens in May 2024, ~480T in May 2025, 3.2 quadrillion in May 2026. ~7x year-over-year. Every compute shortage and price increase you're reading about ladders up to that curve. Read more →
AI for Developers
Hermes Agent v0.14.0 adds native OAuth into X and xAI Premium+. If you already pay for Premium+, you get Grok models, image and video generation, and X semantic search in Hermes via hermes auth add xai-oauth — no separate API key. Codex landed as a runtime backend, plus a LINE messenger gateway and a native Windows beta. Hermes is now the only local agent platform with credentialed access to all three major labs at once. Read more →
Claude Managed Agents: self-hosted sandboxes and MCP tunnels. Run agent execution inside your perimeter (Cloudflare, Daytona, Modal, Vercel, Docker) with copy-paste cookbooks. MCP tunnels let agents reach your private MCP servers without public exposure. Hot-swappable tools on live sessions is the quality-of-life win. Read more →
Stitch by Google shipped streaming canvas and a DESIGN.md standard. Watch designs build in real time and redirect mid-generation. DESIGN.md is a portable agent-readable spec file Stitch can generate from a codebase, Figma file, or live website. One-click export to Netlify, Lovable, and Bolt. Read more →
Claude Code agent view turns it into a fleet. Run claude agents to see all running sessions, dispatch multiple in parallel, and reply inline to unblock them. The earlier /goal feature keeps Claude working until the task is done. Together they shift Claude Code from interactive assistant to parallel workers. Read more →
Anthropic's post on Claude Code at scale argues the harness matters more than the model. CLAUDE.md files at root and subdirectory level, hooks that propose updates back to your config after sessions, no RAG (Claude navigates live files like a developer), and a self-improving loop. The team that invested in configuration beats the team that didn't. Read more →
Honorable Mentions
For everyone:
- ChatGPT personal finance launched for U.S. Pro users. Connect bank and card accounts, ask about your spending. (source)
- OpenAI and Google agreed on something: DALL-E images now carry both C2PA Content Credentials and Google's SynthID watermark, with a public verification tool. (source)
- Ramp's spend data is now queryable inside Claude, ChatGPT, Bloomberg, Perplexity, and Grok. Real corporate spend grounding for any pricing or category question. (source)
- KPMG deployed Claude to all 276,000 employees via its Digital Gateway platform. Full organizational rollout, not a pilot. (source)
- End-to-end encryption for RCS between Android and iPhone rolls out automatically. Not AI news, but the most practically significant thing this week for most people. (source)
For developers:
- Cursor Composer 2.5 is Cursor's most capable model yet, built on Kimi K2.5. Cursor claims ~10x cost-per-capability efficiency and Opus 4.7 parity (vendor-reported). Usage limits doubled for a week. (source)
- Browserbase launched browse.sh, an open-source catalog of agent skills for navigating specific websites reliably. Ramp, Lovable, Interaction, and Reducto contributed. (source)
- Anthropic acquired Stainless, the SDK and MCP server generation platform behind every official Anthropic SDK. Framed around expanding Claude's agent connectivity. (source)
- Prompt cache diagnostics in Claude Console — see exactly which part of your prompt changed on a cache miss and what the token cost was. (source)
- Claude Code Learning Mode makes Claude explain every decision while it works. Lydia Hallie's daily driver on side projects. A real counter to the skill-atrophy concern. (source)
- OpenAI's Codex Windows sandbox writeup is candid about the design constraints when Linux's lightweight isolation primitives aren't available. (source)
Try This Weekend
For everyone: Try Gemini 3.5 Flash in AI Studio with the same task you usually hand to GPT-5.5 or Claude, connect a financial account to ChatGPT and ask where your money went last month (U.S. Pro), point at things with Google's AI pointer experiments in AI Studio, or verify an AI image with OpenAI's public provenance tool.
For developers: Stand up a self-hosted Claude Managed Agents sandbox on Cloudflare, Modal, or Vercel using the cookbook, turn on Claude Code Learning Mode for one side-project session, generate a DESIGN.md for your product in Stitch (import from a codebase or live website), or install Hermes Agent v0.14.0 and run hermes auth add xai-oauth if you pay for X Premium+.
Read the full post with all sources and links
Know someone who'd find this useful? Forward this email or have them subscribe:
You're receiving this email because you subscribed to the silv.blog weekly AI digest. Unsubscribe anytime.