Your AI assistant doesn't need a bigger model. It needs colleagues.

not

        May 6, 2026

Your AI assistant doesn't need a bigger model. It needs colleagues.
This week: the supervisor + specialists pattern in Mastra, eight AI colleagues on one local model, a one-sentence prompt that turned four manual rebases into a routine, and three links worth your time.

        This week's video
My AI operator has eight agents. They all run on the same model (Qwen 3.6-27B), on one machine in my office. Different identities, different memory, different tools. There's a ninth agent too, but she's isolated from the rest of the team.
This week's video walks through the supervisor + specialists pattern in Mastra using the production system that runs my consulting business as the case study.

Watch the video →
The thesis: one of the ceilings on a single agent isn't the model, it's identity. One agent doing CRM lookups, calendar triage, code review, and sales drafting tends to be mediocre at all four. Past a certain number of tools, quality plateaus and then regresses on edge cases. The usual reaction is to add more tools and write better instructions, which works for a while.
One alternative worth knowing is the supervisor + specialists pattern: split the work across multiple agents that share a model but not an identity. The supervisor delegates by tool call. Each specialist owns its domain. It has its own costs (coordination overhead, more identities to maintain, harder debugging across spans), and over-extracting is the most common mistake I've seen builders make, including me. But past a certain complexity threshold, those costs are smaller than the precision loss from cramming everything into one agent.
In Mastra, the implementation is a single agents: property on the supervisor. Mastra auto-generates agent_<name> delegation tools and the supervisor calls them like any other tool. No HTTP, no orchestration framework, no extra primitives.
The video also spends real time on when not to extract an agent, which is the part I had to learn the hard way. I went from one agent to fifteen and had to claw it back. The three rubrics I use now for the extraction decision (domain size, the "would you hire a human for this role" test, and whether the domain deserves its own memory) are in the video, along with the over-delegation trap that taught me the rules.
The other angle worth flagging is structural trust. Envoy is the ninth agent, the one I expose to my Practical AI community. She's safe to expose because she doesn't import the tools the rest of the fleet has. You can prompt an agent not to do something. You can also just not give it the tools.
Resources mentioned:

Mastra supervisor pattern docs
OpenClaw, origin of the SOUL.md / USER.md / AGENTS.md / MEMORY.md per-agent pattern
Part 1: Build a Personal AI Agent with Mastra
Part 2: Mastra Workspaces

The lever I didn't pull in the video: model tiering
The video runs eight agents on the same local model. That's a choice, not a requirement. Mastra lets you set the model per agent, and once you separate the supervisor from the specialists, you also separate the cost-quality tradeoff for each role.
Most of an agent fleet's token volume gets burned by the high-volume specialists doing classification, extraction, and routing. The supervisor handles fewer turns but makes the most consequential calls (which agent to delegate to, when to stop, how to merge sub-agent results). That asymmetry maps cleanly onto a mixed-model setup: a frontier model on the supervisor where decision quality matters most, smaller and cheaper models on the specialists where routine work doesn't justify the per-token cost. The Anthropic SDK setup looks the same in shape: you'd put Opus on the orchestrator and Haiku on the workers, and most of your bill ends up flowing through the cheaper tier.
My fleet currently runs every agent on the same local model. The practical reason is that my hardware fits one larger model comfortably right now, and I'd rather keep inference on my own box than route everything through hosted APIs. Once I have more hardware to spread inference across, mixing a stronger model on the supervisor and a smaller one on the specialists is the first lever I'd reach for. If you're already running on hosted infrastructure, that lever's available to you today, and the supervisor pattern is the same in either world.

Routines: agents that work while you don't
Shifting gears. The Claude Code feature I keep coming back to right now is routines. A routine is a saved prompt that runs on a schedule, locally or on Anthropic's cloud, against a repository. Most of the attention on coding agents has been on the synchronous case: paired with you at the terminal. Routines are the asynchronous one. An agent that does small mechanical jobs while you're not at the keyboard.
This week's blog post is the smallest illustration of that shift I could write. I had a backend PR open all week, waiting on a maintenance window. Every other PR merging in the same window added a database evolution, which kept stealing the number mine was using. Rebase, renumber, push. Four times in one week. Same mechanical fix every time.
I described the problem to Claude in one sentence and let it write a routine to handle the rest.
Read the full post →
The shift here is similar to what CI did for tests. Most of what a routine handles isn't hard. It's mechanical and frequent enough that doing it by hand burns more time than the work deserves. PR maintenance, dependency drift, doc audits, scheduled health checks, anything that boils down to "do this small thing if a condition is true." The bar is no longer "is this worth writing a script for?" It's "is this worth describing to an agent for a minute?"
If you're babysitting a long-running PR or repeating a small task more than twice a week, that's the shape of work routines are good at.

Curated links
AccessLint MCP. Cameron Cundiff shipped an MCP server (and Claude Code plugin) that helps coding agents build accessible web content. The recent additions are sourcemap support and Chrome CDP for DOM scans, which means deterministic a11y fixes with much less token consumption than parsing HTML in-context. Accessibility is one of those domains where agents either do it well from the start or never go back and add it. This shifts the cost curve in the right direction.
Anthropic's "Claude for creative work". Anthropic shipped a wave of connectors for creative tools: Ableton, Adobe, Affinity by Canva, Autodesk Fusion, Blender, Resolume, SketchUp, Splice. The interesting move isn't any single connector. It's that Anthropic is treating creative tools as a first-class agent surface, not a side experiment. Bridging tools in a pipeline (translating formats, syncing assets across design, 3D, and audio) is exactly the kind of multi-tool, structured-handoff work agents are good at. If you've been thinking about agents only inside the IDE, this is worth a look.
Matt Pocock's grill-me skill. An Agent Skill that grills you Socratically on a topic until you've actually internalized it. I've been using it for a few weeks to pressure-test a plan before I start implementing, and it's been one of the sharper ways I've found to land on a solid approach instead of a half-formed one. Most skills wrap a tool or a workflow. This one wraps a learning loop. Agent-as-tutor is one of the more underused skill shapes, and worth more attention than it's gotten so far.

Practical AI: a small Slack for builders
Quick note on the community side of all this. Practical AI is the small, invite-only Slack I've been building for engineers, designers, founders, product folks, and solo operators putting AI into real work. Builders comparing notes on what shipped, what broke, and the tradeoffs along the way. Skeptical of hype, serious about craft. Same tone as this newsletter, faster format.
The trust-boundary segment in this week's video is partly why Envoy exists in the first place: she's the agent I expose to the Practical AI Slack, structurally locked out of anything personal so the community can actually interact with her without me worrying about leakage.
If you've been reading along and that sounds like a room you'd want to be in, request an invite →. New members are coming in slowly on purpose, and I read every request.

If you're building a multi-agent system and want a second pair of eyes on where to draw the boundaries (which agents to extract, what each one should own, where the trust lines should live), reply to this email or book an intro call. Always happy to trade notes.
Damian

                                Don't miss what's next. Subscribe to Damian Galarza | AI Engineering:

            Email address (required)

          Add a comment: