Archive • Practical AI • Buttondown

5 engineering loops I'd actually run.

June 24, 2026

This week's video covers the shift from one-shot prompts to recurring engineering loops, plus five concrete examples for dependencies, docs, releases, meeting prep, and RFC drift.

Stop giving your agent every tool

June 17, 2026

This week's video breaks down tool search, context bloat, and the search-load-call pattern for large agent tool surfaces. Plus two related reads on why tool retrieval is becoming a core architecture layer.

Treat your AI agent like a new engineer

June 10, 2026

This week's video walks through CreatorSignal and shows how repo-local onboarding context helps Codex make better engineering decisions.

When agent loops should become workflows

June 4, 2026

This week: where an agent loop should end and a workflow should begin, why routing is the unglamorous backbone of inbox triage, and how deterministic guardrails keep model judgment bounded.

The 4 levers that diagnose broken AI agents

May 27, 2026

This week: a practical framework for diagnosing agent failures, how to turn failed runs into eval cases, and a few resources for building better harnesses.

Approval gates that actually hold

May 20, 2026

Why 'always confirm before sending' doesn't hold (and what does), plus three more places agent governance is showing up this month: Mastra's runtime layer, OpenAI's on-device PII filter, and Anthropic's Wall Street agent rollout.

Your AI team doesn't need more people. It needs agents.

May 13, 2026

Stripe merges 1,300 agent-written PRs per week. Coinbase is restructuring around AI-native pods. This week: what 'supported by a fleet of agents' actually means in practice, plus Codex for Claude Code users and OpenAI's $4B deployment bet.

Your AI assistant doesn't need a bigger model. It needs colleagues.

May 6, 2026

This week: the supervisor + specialists pattern in Mastra, eight AI colleagues on one local model, a one-sentence prompt that turned four manual rebases into a routine, and three links worth your time.

I built Karpathy's LLM knowledge layer into my AI operator (live)

April 30, 2026

This week's video: live build of the Karpathy/Holmberg knowledge layer wired into Emma, plus why I've been running her on a local Qwen model since picking up a DGX Spark.

Governing AI agents without killing them

April 22, 2026

The governance post landed this week. Three failure patterns I've either hit or narrowly avoided, what governance-as-code actually looks like in TypeScript, and the scorecard that tells you which layer to fix first.

Your agent reported success. It failed.

April 15, 2026

This week's video on why agent logs aren't observability, a new doc-sentinel plugin for catching documentation drift, and three links on the AI-at-home use case.

I cut a production prompt by 28% (and the evals that made it safe)

April 8, 2026

How I used autoresearch to run 65 autonomous optimization iterations on a production agent prompt, plus the eval mental model that made any of it trustworthy.

I gave my AI agent access to my second brain

April 1, 2026

This week: giving a Mastra AI agent a persistent file system and hybrid search, the grep-over-files pattern, a new doc-audit plugin, and Boris Cherny hidden Claude Code features.

Claude Code Channels: interact with your agent from anywhere

March 25, 2026

Anthropic shipped Channels for Claude Code. Text Claude from your phone, pipe in webhooks from CI or Cal.com, and let your agent react while you're away. Plus: four patterns that separate agent-ready codebases.

What Claude Code does in your terminal (and when to pause)

March 18, 2026

Claude Code isn't just for engineers. This week: a terminal guide for non-technical builders, the agent-ready plugin, and Tobi Lutke's Liquid performance PR.

Extending Claude Code's native worktree support

March 11, 2026

The WorktreeCreate hook for database isolation, the agent-skills and codebase-readiness open-source releases, and what agent-ready development actually looks like at scale.

Build your own personal AI agent (here's how I did it)

March 5, 2026

This week: building a personal AI agent from scratch with Mastra and TypeScript, plus Google's new Workspace CLI, Qwen's small model series, and how a two-person law firm is outrunning much larger competitors with Claude.

Audit Your CLAUDE.md

February 25, 2026

32% of my CLAUDE.md was dead weight. Plus: native git worktrees, open source roundup, and tonight's live stream.

How AI Agents Search Their Memory

February 18, 2026

Memory retrieval, agent loops, and a Claude Code plugin that audits your setup.

How AI Agents Remember Things

February 11, 2026

How agents remember, Claude Code from your phone, and Opus 4.6