Crouton Creations logo

Crouton Creations

Archives
April 29, 2026

Trust, but verify (with screenshots)

AI writes code fast. The bottleneck is verifying it. Here's what's working for me.

Howdy and welcome back. It's time for another update from Crouton Creations.

Agent Work Verification

These days AI can write code really fast. That's no surprise. Because of this more and more of my time goes into verifying the work. Did it do what I asked, does it actually work, does it all look ok? So I've been pushing on how much of the verification work the AI can do for me. For example I now:

  1. Generally have the AI write anything reasonably long to me in an HTML style report with slides or separate page sections, etc. HTML is a richer medium for communicating ideas, and much easier to scan than the text terminal output I'd otherwise get.
  2. For anything UI related I have the agents generate screenshots, before and after when applicable, and for critical user flows I have it generate a video running at human speed that I can quickly review. When well prompted, I find this both makes my first pass reviews notably faster, and it also helps keep the agents from cheating or cutting corners as much.
  3. Define expected outputs up front, which gives me both an integration test and a verification system - even a rough "here's what I expect when you're done" saves several iteration rounds.

All of this is moving in the same direction - I'm updating my process to incorporate these at every level. The design phase is now often HTML slides or a report with mock ups in JSX or screenshots, architecture diagrams, and progressive depth of information on each area where I can quickly drill down into something and pop back up to understand the work as a whole. The review process now generally starts with looking at screenshots, videos, and/or test output to verify the high level expectations are met before I open the app. This has reduced the time I spend iterating and lets agents run longer, on bigger tasks - which ultimately means I can tackle more in parallel.

DevX Artifacts

The above is actually what led me to add Artifacts into DevX. It lets me store html, markdown, screenshots, videos - any file I need to reference with the agent in a known location. I can get previews right in DevX, the agent has a tool call to open any of them for me in a side panel or overlay, and I can quickly grab the URLs/paths to throw into chat conversation when referencing them. It makes it really quick and easy to work with all these assets with the agents, and it's all agent/harness agnostic.

What I'm checking out

A few things on my radar lately:

  • Pi - My new agent harness. It's not opinionated, it's not noisy. I can configure it exactly to my workflow and I've been shifting more and more of my work over to it recently. Definitely check it out if you haven't.
  • Basic Memory - simple markdown based, indexed memory for agents. I now have a SyncThing folder to synchronize this across all my OpenClaw, Hermes, Claude, and coding agents. It's great to quickly ramp everyone up and keep them in sync on projects and context.
  • NVIDIA Build - free (really) inference on a bunch of open source models, including the latest like Deepseek v4. Kind of slow, but great for background jobs and playing with new models.

Thanks for your support

All replies go straight to my inbox, so reach out if any of this was helpful or interesting. And if you know someone else who might be interested please forward this or have them sign up themselves here.

Thanks for reading!

--

Jon Fox

Don't miss what's next. Subscribe to Crouton Creations:
croutoncreations.com
Powered by Buttondown, the easiest way to start and grow your newsletter.