A world-class security researcher just said Claude is better at his job than he is — and proved it with $3.7M in live exploits. Meanwhile, Claude Code was silently resetting your repo every 10 minutes. Know what your agents are actually doing.
| March 30, 2026 |
Edition 10
|
ML security legend Nicolas Carlini says Claude outperformed him — and he has $3.7M in live smart contract exploits to prove it
Nicolas Carlini, one of the most-cited researchers in ML security, publicly declared that Claude has surpassed him as a security researcher. His evidence isn't synthetic: Claude autonomously found exploits in live smart contracts worth $3.7M and identified previously unknown vulnerabilities in Linux and the Ghost CMS. This isn't a benchmark — it's a world-class expert showing specific, dollar-quantified results from an AI agent doing real offensive security work.
Why it matters: Audit what access you've given your agents — the gap between "useful AI" and "AI that can cause serious damage" just got measurably smaller, and you want to be on the right side of it before someone else finds out for you.
Read more →
Claude Code is running `git reset --hard origin/main` on your repo every 10 minutes — without being asked
A GitHub issue surfaced on Hacker News showing Claude Code entering a destructive loop: it repeatedly executes `git reset --hard origin/main` against the working project repo on a roughly 10-minute interval, silently wiping uncommitted local changes. The issue is confirmed, reproducible, and people are actively losing work.
Why it matters: If you run Claude Code in headless or automated mode, restrict working-directory write permissions and audit your session flags before you walk away — every autonomous agent has a blast radius, and this one just showed you its ceiling.
Read more →
Simon Willison vibe-coded a full macOS presentation app in SwiftUI — zero prior framework knowledge, shipped and working
Simon Willison — a developer whose "this actually works" bar is high — built a native macOS presentation app via vibe coding with Claude, starting from no SwiftUI experience. The result was a working, shipped desktop application. His write-up is honest about where friction remained, but the headline is that the workflow now reliably produces native platform software without learning the framework.
Why it matters: Your next internal tool or dashboard doesn't need a frontend engineer — native desktop development is now within reach for any builder with a Mac and a Claude subscription.
Read more →
|
Pattern Watch
The agentic frontier is moving from "can it do the task?" to "what happens when it does?" — with security researchers quantifying agent superiority in dollars and destructive loops revealing the operational risks of unchecked autonomy. Meanwhile, vibe coding is quietly eliminating entire skill barriers for native development.
|
Radar
|
"2 years. $0. Dead broke. Built an AI agent. Paying customers in weeks."
The builder transformation story of the week — specific, personal, and a playbook more founders should attempt before giving up
Link →
|
|
3 weeks, 6 AI agents, 24/7: what I killed and what I kept
rare honest field report from someone who ran a multi-agent setup at scale; the "kill" list is as instructive as the "keep" list
Link →
|
|
Cut Claude Code token usage 68.5% by giving agents their own OS
specific, reproducible, posted with methodology; if you're burning through your Max plan, test this today
Link →
|
|
Full AI agent OS open-sourced: CLAUDE.md boot file, skill modules with self-improving learnings, autonomous posting pipeline
complete reference architecture you can fork now
Link →
|
|
"48 hours after my 'dreaming agent' post, it started rewriting itself"
self-modification in the wild from the OpenClaw community; worth watching as a case study in emergent agent behavior
Link →
|
|
Tool of the Day
claude-mem
claude-mem is a persistent memory layer for Claude agents — lets agents remember context across sessions without prompt stuffing or RAG setup. Drop-in, minimal configuration, MIT-licensed. Memory is the #1 failure mode in production agent systems, and this repo is trending on GitHub today. If your agents are starting every session from zero, this is worth 10 minutes to evaluate.
Link →
|
|
Under the Hood
Today's edition: 348 items scanned by Atlas (DeepSeek) → Curator (Claude) selected the stories → Scribe (Claude) wrote the draft → Mercury (DeepSeek) formats for delivery. Atlas: $0.003 | Claude agents: ~$0 (Max subscription). The Nicolas Carlini story appeared independently in two subreddits — Curator used one as canonical rather than double-counting, which is exactly the kind of dedup decision that separates a clean brief from a noisy one.
|
|