Human Tech, with Jacob Lowe

Archives
April 18, 2026

🌚 The agent loop pattern that made a 2B model stop embarrassing itself

Hey y’all,

Quick thing before we get into the post: I'm looking for beta testers for Sandman.

Sandman is the on-device dream journal app I've been building and writing about. It runs a fine-tuned 2B model entirely on your Android phone. no cloud, no data leaving your device. The dream journal part works, the AI assistant part is getting good, and I want to get it in front of real people before I go wider.

If you have an Android phone and you're even a little interested in your dreams, I'd love to have you try it. Just reply to this email and I'll get you set up.


Also β€” I'm experimenting with embedding the full post directly in the newsletter going forward instead of just linking out. Let me know if you like that or if you'd rather just get the link. For now, here's the latest:


The agent loop pattern that made a 2B model stop embarrassing itself

Read it on the web

My last post was about adding tool calling to Sandman. This one is about the part that took the longest to actually get right: the agent loop.

My first version was one giant prompt. System instructions, tool definitions, dream context, conversation history β€” the model got all of it every turn. This works fine with big cloud models. A 2B model running on a phone does not have that luxury. It would call tools when it should have just responded, hallucinate tool syntax that almost parsed, forget how to format tool calls mid-conversation. The same message would get routed correctly one time and completely wrong the next. I spent a long time tweaking that one prompt before I admitted the fundamental problem: I was asking a small model to do too many things at once.

The fix was splitting the loop into three focused steps.

Step 1: Router. A tiny ~300 token prompt that just classifies the request. Is this a question about a past dream? A symbol lookup? A conversational response? The router emits a structured tag like <lookup symbol="water" /> and nothing else. I run it at low temperature so it's deterministic.

Step 2: Execute. Pure function calls based on what the router decided. No model involved. Query the dream database, fetch memories, format the results as <tool_result> blocks.

Step 3: Respond. Now the model gets a completely different prompt β€” the persona prompt, focused just on writing a good response using the data Step 2 already gathered. By the time it gets here, it doesn't have to decide whether to use a tool. That already happened.

The router works because classification is a much easier job for a small model than open-ended generation. Five possible routes, structured output, low temperature. It's good at that. The response step works because the model only has to do one thing: be a good conversational partner with all the context it needs already in front of it.

There's one more piece. After Step 3 generates a response, I check if the model emitted any additional tool calls. If it did, I run them and loop through Step 3 once more with the new data. I cap it at one extra loop β€” latency on a phone is already noticeable, and letting a 2B model loop indefinitely is asking for trouble.

The thing I kept getting wrong along the way: too many routes in the router. I started with ten different intents. The model couldn't reliably distinguish between them. I collapsed it to five and accuracy jumped. With small models, fewer categories is almost always better.

I also spent too long fighting the tool output format. The model doesn't always produce perfect XML β€” wrong closed tags, weird attribute ordering, sometimes a JSON wrapper for no reason. I eventually gave up trying to make the model behave and just wrote a very forgiving parser. I have 56 tests just for that parser now, each one a different way the model has mangled a tool call. Flexibility beats strictness here.

The thing I keep coming back to: with big models, the prompt is important but the model can compensate for a mediocre one. With small models, the prompt is basically the entire product. Every token counts. The model will faithfully follow whatever you tell it to do, including the parts where your instructions are ambiguous or contradictory.

This is part of an ongoing series about building Sandman. If you want to read the whole thing or follow along as I keep going, it's all on the blog.

And again β€” if you have an Android phone and want to try Sandman, just hit reply.

Cheers,

Jacob

Don't miss what's next. Subscribe to Human Tech, with Jacob Lowe:
Share this email:
Share on Twitter Share on LinkedIn
GitHub
Twitter
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.