How to Build a Multi-Agent AI System That Actually Runs Your Business (Not Just a Demo)
How to Build a Multi-Agent AI System That Actually Runs Your Business (Not Just a Demo)
Most "multi-agent" tutorials show you two chatbots passing JSON to each other. That's not a multi-agent system — that's a relay race.
I run 7 AI agents that manage 5 real businesses — sales, marketing, finance, engineering, content, customer success, and executive operations. They share context, delegate to each other, disagree with each other, and sometimes make decisions I don't find out about until the next morning.
Total cost: $200/month (one Claude Max subscription).
This isn't a tutorial. This is the actual architecture running in production since November 2025. Here's every layer, every failure mode, and every file you need.
The Stack (What Actually Runs)
Hardware: Mac Mini M4 Pro, 24GB RAM
Runtime: Node.js + Claude Agent SDK
Model: Claude Opus 4 (unlimited via Claude Max $200/mo)
Orchestration: "Rocky Relay" — custom TypeScript scheduler
Channels: Telegram bots (one per agent)
Persistence: JSONL transcripts + shared brain/ directory
No LangChain. No CrewAI. No AutoGen. Those frameworks add abstraction layers that break when you need agents to operate independently over days and weeks. We use the Claude Agent SDK directly.
The Agent Roster
| Agent | Role | What It Actually Does |
|---|---|---|
| Rocky | Chief of Staff | Routes tasks, manages brain/ memory, dispatches cron jobs |
| Mariano | Sales & CX | Scores leads, monitors customer health, writes email sequences |
| Draper | Marketing | Lead gen, SEO, email campaigns, competitive research |
| Burry | Finance | P&L reports, expense tracking, cash flow monitoring |
| TARS | Engineering | Deploys code, manages infra, debugging, DevOps |
| Drucker | Research | Deep dives, market analysis, competitor intel |
| Warhol | Content & Attention | Newsletter, content strategy, audience research |
Each agent has:
- Its own Telegram bot (separate conversation thread)
- Its own CLAUDE.md file defining personality, boundaries, and tools
- Access to shared brain/ directory (MEMORY.md, BUSINESSES.md, CONTACTS.md, etc.)
- MCP tools for workspace read/write, task delegation, team context
The Architecture That Took 4 Months to Get Right
Layer 1: The Brain Directory
~/.claude/brain/
├── MEMORY.md # Core memory, project status, lessons
├── BUSINESSES.md # Deep context on each business
├── CONTACTS.md # People, relationships, context
├── COMMITMENTS.md # Active follow-ups & deadlines
├── DECISIONS.md # Decision log with rationale
├── TIME.md # Schedule blocks
├── INBOX.md # Quick capture
└── contexts/ # Business-specific focus modes
├── cloudmd.md
├── esthetiqos.md
└── courtly.md
Every agent can read from brain/. Only Rocky can write to it. This prevents conflicting updates and creates a single source of truth.
The key insight: Agents don't need a vector database. They need a well-structured markdown directory that fits in context. Our entire brain/ is ~15K tokens. That's nothing for Claude's 200K context window.
Layer 2: The Trust Framework
Not all agents get the same autonomy. We learned this the hard way when an agent auto-approved its own financial decision at 2AM.
Tier 1 (Read-only): Read brain, read workspace, search
Tier 2 (Create): Write to own workspace, create tasks, update goals
Tier 3 (Execute): Send emails, post content, modify data
Tier 4 (Autonomous): Make decisions without approval — ONLY Rocky
Each CLAUDE.md file specifies the agent's tier. Tools are restricted per tier via MCP server configuration.
Layer 3: The Task Queue
Agents delegate to each other through a persistent task queue:
Warhol: "I need competitive research on AI newsletter monetization"
→ Creates task for @drucker via delegate()
→ Drucker picks it up in next cron run
→ Drucker writes findings to workspace/drucker/research-output.md
→ Warhol reads it via workspace_read()
Tasks persist across restarts. They have priority levels (P0-P3), status tracking, and dependency chains. This is what makes the system feel like a team instead of a single-shot prompt.
Layer 4: Team Context (Shared State)
Agents post ephemeral status updates that other agents can read:
// Mariano posts a customer alert
team_context_write({
category: "alert",
content: "Capitol Dental has 85% appointments stuck in 'scheduled'. Needs intervention."
});
// Rocky reads all alerts in morning briefing
team_context_read({ category: "alert" });
Categories: status_update (24h TTL), metric (7d), decision (30d), alert (48h), business_context (30d).
This is how agents "notice" what other agents are doing without direct conversation.
Layer 5: The Cron Scheduler
Every 2 hours: Rocky checks brain/ for stale commitments
Every morning 7AM: Rocky generates daily brief
Every evening 9PM: Burry runs financial reconciliation
On-demand: Any agent can be triggered via Telegram
The scheduler is what turns "7 chatbots" into "7 autonomous employees." Without it, agents only work when you talk to them.
The 3 Failures That Shaped the Architecture
Failure 1: The 2AM Auto-Approval
One agent approved a business decision autonomously. Another agent flagged it in its morning report. I didn't find out until 8AM.
Fix: Trust tiers. No agent above Tier 2 without explicit CLAUDE.md permission. Financial decisions require human approval via [APPROVAL_REQUEST] tag.
Failure 2: The Context Collision
Two agents updated the same brain/ file simultaneously. One overwrote the other's changes. Lost a full day of customer notes.
Fix: Only Rocky writes to brain/. Other agents write to their own workspace/ directory. Rocky merges during daily reconciliation.
Failure 3: The Echo Chamber
Agents started referencing each other's outputs as ground truth. Draper cited Drucker's research. Drucker had cited Draper's campaign data. Neither verified externally.
Fix: Every research task now requires at least one external source (web search, API call, or database query). Internal references must be flagged as [INTERNAL SOURCE].
Real Numbers After 4 Months
| Metric | Value |
|---|---|
| Monthly cost | $200 (Claude Max) |
| Agents running | 7 |
| Businesses managed | 5 |
| Total tasks completed | 400+ |
| Emails sent autonomously | 800+ |
| Leads scored | 348 |
| Code commits by AI | 200+ |
| Revenue influenced | $2,400/mo MRR across businesses |
| System uptime | ~95% (Mac Mini in a closet) |
Is it replacing a full team? No. Is it doing the work of 2-3 junior employees across multiple domains for $200/month? Yes.
The Files You Need to Build This
I've packaged the 10 production files that make this system work:
- CLAUDE.md — The master system prompt that defines Rocky's operating rules
- Agent CLAUDE.md templates — Per-agent personality, tools, and boundaries
- Brain directory structure — Complete markdown templates for all brain/ files
- Trust tier configuration — How to restrict agent autonomy
- MCP server config — Tool definitions and permissions
- Cron scheduler — The TypeScript scheduler that runs the loop
- Task queue schema — Persistent inter-agent delegation
- Team context protocol — Shared ephemeral state between agents
- Anti-hallucination prompts — The specific phrases that keep agents honest
- Failure playbook — What to do when agents go rogue
Get the AI Agent Toolkit — $19
Include your email in the PayPal note. Delivered within 24 hours. These are the actual files running 5 businesses, not a tutorial.
Start Here (Free)
If you want to try the basic setup before buying:
- Install Claude Code (
npm install -g @anthropic-ai/claude-code) - Create a
CLAUDE.mdin your project root with your agent's role and rules - Create a
brain/directory with a MEMORY.md file - Use
claude --agent-promptto load the CLAUDE.md automatically - Add MCP tools for file read/write to give the agent persistence
The free path gets you one agent. The toolkit gets you the multi-agent orchestration, trust tiers, and the hard-won failure patterns.
The $200/Month CEO is a weekly dispatch from a Filipino founder running his businesses with AI agents instead of employees. Real architecture. Real numbers. No hype.
→ Subscribe free on Buttondown → Follow on Dev.to → Read on Hashnode