Are You Still Typing to Your AI Coding Agent?
We tend to treat AI chat interfaces like a command line—brief, terse, and functional. We type "fix bug in auth" instead of explaining why the auth is broken, what the user was doing, and which edge case we're worried about.
We do this because typing detailed context is high-friction. It takes effort. So we self-edit, we summarize, and we leave out the "obvious" context.
But we often overestimate what the AI knows. The gap between what we're thinking and what we actually type is where the results break down.
The Junior Engineer Mental Model
Think about how you'd delegate a complex task to a junior engineer. You wouldn't just slack them a single line of code and walk away. You'd have a conversation. You'd explain the architecture, the constraints, and the "gotchas" you learned the hard way last year.
Mitchell Hashimoto (founder of HashiCorp) compares using AI to "bowling with bumpers." The model provides the momentum, but you have to set the detailed guardrails. If you give it a vague shove, it goes into the gutter. If you set the stage properly, it strikes.
The problem is, typing out that level of context feels like writing documentation. Nobody wants to do it.
Speaking is Natural
This is where voice input becomes a workflow unlock. It isn't just about speed; it's about how we are wired to communicate.
Speaking is far more natural than writing. We've been doing it for hundreds of thousands of years. When you switch to voice, the friction disappears. You stop summarizing and start "brain dumping." You naturally explain the nuance. You mention that weird dependency issue. You describe the exact user experience you want.
You can speak 150 words per minute effortlessly. Typing that same amount of context would take you three times as long and feel like a chore.
And the AI? It loves the rambling. Unlike a human coworker, it doesn't get impatient. It has infinite attention span. It can parse a three-minute stream-of-consciousness explanation in seconds and extract exactly what it needs.
As Simon Willison notes, English is a "lossy" compression format, but good LLMs are excellent at filling in the gaps—provided you give them enough raw material to work with. Voice allows you to provide that raw material without the tax of typing it out.
“But I feel weird talking to my computer”
If voice is so much better, why aren't we all doing it?
For a long time, the tech wasn't there. But modern models (like Whisper) have solved the accuracy problem. They handle technical jargon, mixed languages, and "umms" perfectly.
The real blocker now is social.
Talking to your monitor in an open-plan office feels performative. It feels awkward. You don't want to broadcast your debugging process to the entire room.
I've found a way around this—a "social camouflage" that lets you use voice input anywhere, even in a quiet coffee shop, without looking like "that guy" talking to his laptop.
Spoiler alert: It involves using AFK. I'll share exactly how it works in the next newsletter.
Until then, try talking to your agent when you're working from home. You might be surprised by how much smarter it gets.