How We Wired 7 AI Agents to Talk to Each Other Without Losing Their Minds (or Ours)
Playbook #1 showed you the setup: 7 AI agents running on a single $200/month Claude Max subscription. This week, the question everyone asked was: "Okay, but how do they actually communicate?"
Fair question. Because 7 agents that can't coordinate are worse than no agents at all. They duplicate work, override each other's decisions, and create chaos you have to clean up manually — which defeats the entire point.
Here's exactly how we wired ours together. Every architecture decision, every anti-chaos mechanism, every lesson from agents that broke things at 2 AM.
The Communication Layer: Telegram as the War Room
We didn't build a custom messaging protocol. We didn't spin up a Kafka cluster. We used Telegram.
Every agent is a Telegram bot. They all sit in a private group chat called the War Room. When RJ (the human) types @rocky please get TARS to score these 50 leads, the system parses the @mention, looks up the agent in the registry, and routes the message.
Why Telegram?
- Free, real-time, mobile-native. RJ can manage agents from his phone while walking between clinic demos in Cebu.
- Built-in threading. Conversations have natural boundaries.
- Bots are first-class citizens. The Telegram Bot API gives each agent its own identity, its own message stream, and its own permissions.
- Human-readable. You can literally scroll up and see what your agents said to each other. Try doing that with a custom message queue.
The stack: grammY framework (TypeScript) → 7 bot tokens from @BotFather → one private group chat.
The Routing Problem (and How We Solved It)
Here's the first thing that breaks when you have 7 bots in one group: every bot receives every message. Send one message, 7 bots try to respond. Chaos.
Solution 1: Message Deduplication
Every incoming message gets a dedup key: senderId + date + textHash. The first bot to process it wins. Everyone else ignores it. TTL: 60 seconds.
Message arrives → compute dedupKey → check processedMessages map
→ if exists: ignore
→ if new: process, add to map with 60s TTL
This prevents 7x duplicate responses to a single message.
Solution 2: @Mention Routing
Only the mentioned agent responds. The router parses every message for @BotUsername or @agentname patterns, does a case-insensitive lookup against the agent registry, and routes only to matched agents.
"@draper score these leads" → detectMentions() →
lookup 'draper' in registry → enqueueQuery(toAgent: 'draper')
No mention? No response. This keeps the group chat clean.
Solution 3: Domain-Based Routing
Each agent has declared domains:
| Agent | Domains |
|---|---|
| Rocky (Chief of Staff) | coordination, email, calendar, investor |
| TARS (Engineering) | engineering, devops, infrastructure |
| Attia (Health) | health, fitness, nutrition |
| Burry (Finance) | finance, risk, accounting |
| Draper (Marketing) | sales, marketing, CRM |
| Mariano (Sales/CX) | sales, customer-success |
| Drucker (Research) | research, competitive-intel |
When a message mentions a domain but not a specific agent, the router can infer who should handle it. "Score these leads" → sales domain → Draper or Mariano.
The Thread System: How Agents Talk Without Infinite Loops
This is where most multi-agent systems die. Agent A asks Agent B a question. Agent B's response triggers Agent A to ask another question. Repeat until your token bill looks like a phone number.
We solved this with thread tracking and hop limits.
Every conversation creates a thread in SQLite:
threads table:
id, initiator, participants[], hop_count, max_hops(4),
status, created_at, completed_at
The 4-Hop Rule
Every time a message passes from one agent to another, the hop count increments. At 4 hops, the thread is killed:
Hop 0: RJ → Rocky: "Get TARS to score leads"
Hop 1: Rocky → TARS: "50 leads at ~/leads/batch-5.csv. Score by fit."
Hop 2: TARS → Rocky: "Done. Top 5 ready for demo."
Hop 3: Rocky → RJ: "TARS scored 50 leads. Here are the top 5..."
If the hop count hits 4, the system posts: "Thread hop limit reached. Use THREAD_COMPLETE or start a new thread."
This sounds aggressive. It is. And it's saved us from runaway agent loops more times than I can count. A 4-hop conversation forces agents to be concise and decisive. No "let me check with Drucker who'll check with Burry who'll loop back to me."
Cooldown Anti-Spam
Even within 4 hops, agents can fire too fast. We added a per-agent, per-thread cooldown:
lastActivity[agentName:threadId] → if < 2000ms since last activity, skip
Two seconds between responses. Enough time for Telegram to render, enough delay to prevent machine-gun message loops.
The Query Queue: Priority, Concurrency, and Not Killing Claude
Seven agents can't all call Claude simultaneously on a single $200/month subscription. We'd hit rate limits in seconds.
Priority Levels
RJ (human) priority → bypass queue, run immediately
Event priority → jump ahead of agent requests
Agent priority → FIFO, max 2 concurrent
When RJ types something, it skips the line. When an agent needs something, it waits. Max 2 agents can call Claude at the same time. Max 5 items in queue — beyond that, the oldest non-RJ item gets dropped.
Worker Dispatch
Each query spawns a worker process running the Claude Agent SDK:
claude agent run --agent-dir /path --max-turns 50 \
--fork-session --session-id <sid> \
prompt.md
The session ID matters. When TARS continues a conversation it started earlier, it resumes from the same session — no re-reading files, no lost context.
Context Carry-Forward
Each agent in a thread sees the last 5 messages (max 2000 tokens) from previous hops:
getThreadContext(threadId) →
SELECT last 5 messages, 2000 token cap
Format: [Rocky]: "Score these 50 leads"
[TARS]: "Done. Top 5 ready."
→ Injected into next hop's prompt
This is how TARS knows what Rocky asked for, and Rocky knows what TARS found — without either agent re-reading the entire conversation history.
Memory Architecture: What Agents Remember (and What They Forget)
Brain Files (Read on Startup)
Every agent session begins by reading a set of markdown files:
| File | Purpose | Who Reads |
|---|---|---|
| SOUL.md | Agent identity and core personality | All agents |
| USER.md | RJ's profile, businesses, contacts | Rocky |
| MEMORY.md | Long-term institutional memory | Rocky |
| AGENTS.md | Agent framework rules, session management | All agents |
| HEARTBEAT.md | Periodic check guidelines | Rocky |
| ROUTING.md | Model selection rules | All agents |
These files are the "soul" of each agent. Rocky reads MEMORY.md and knows that EsthetiqOS has 837 leads, that Skin Buffet replied to Cold Email 0, and that the next investor update is due Friday. TARS reads AGENTS.md and knows its autonomy boundaries.
Daily Logs
memory/YYYY-MM-DD.md— Raw notes from each day's worktranscripts/YYYY-MM-DD.jsonl— Searchable message history
Dynamic State Files
BOT_UPDATES.md— Status board agents read during heartbeatsINBOX.md— Action items and capturesDECISIONS.md— Decision log with reasoningCOMMITMENTS.md— Deadlines and follow-ups
The key insight: agents don't share a database. They share files. Markdown files that any agent can read and write. This is dead simple, human-auditable, and works with the Claude Agent SDK out of the box.
Autonomous Goal Work: How Agents Self-Direct
The War Room isn't just reactive (wait for @mentions). Each agent has autonomous work cycles.
The Cron Schedule
War Room Agents (Rocky, TARS, Attia, Burry, Draper, Mariano, Drucker): - 2 cycles per day (10:30 AM, 2:30 PM Manila time) - Staggered 3-minute offsets so they don't all hit Claude simultaneously
Venture Agents (Edison, Warhol, Grove): - 12 cycles per day (every 2 hours) - Results delivered via DM instead of group chat
What Happens in a Goal Work Cycle
Each agent receives this prompt:
[WAR ROOM CRON — GOAL WORK] You are autonomous. Do NOT ask permission.
1. Use goal_list to see active goals
2. Pick MOST URGENT goal (deadline or impact)
3. Take CONCRETE ACTIONS — research, build, analyze, write
4. Use goal_update to record what you did
5. If COMPLETE, use goal_complete
6. If more steps needed, end with [CONTINUE]
Only use [APPROVAL_REQUEST] for: sending external emails,
invoices, payments. Everything else — just do it.
This is bounded autonomy. The agent can do whatever it needs to accomplish its goals — research, file operations, API calls, CRM updates. But the moment it needs to send an email to a client or make a payment, it pauses and asks.
The LLM Fallback Chain: Staying Alive When Claude Goes Down
Running everything on one LLM is a single point of failure. Here's our fallback:
Primary: Claude Max (unlimited Opus 4.5)
↓ (on rate limit)
Fallback: Local qwen3:14b via Ollama (runs on Mac Mini M4 Pro)
↓ (on quality concern)
API escalation: DeepSeek V3 via OpenRouter
Heartbeat checks (low-stakes "is there anything urgent?") always run on local qwen3:14b. Free. Zero API cost. The $200/month budget goes entirely to the thinking-heavy tasks.
Model selection per context: - Simple/fast tasks → local qwen (free) - Coding tasks → DeepSeek Coder (cheap) - Business tasks → Claude Opus (included in Max) - Critical/legal → Claude Opus with verification pass
What Broke (So You Don't Have To)
The Ghost Relay Incident
Telegram's long-polling sometimes creates "409 Conflict" errors — two instances of the same bot trying to receive updates. Result: messages disappear into a void. Nobody responds. We call this a "ghost relay."
Fix: Exponential backoff on connection + a 60-second watchdog that checks all bot PIDs. If a bot crashes, the watchdog restarts it. Self-healing infrastructure.
The Infinite Loop Near-Miss
Early on, we had no hop limit. Rocky asked TARS a question. TARS's response triggered Rocky to ask a follow-up. Rocky's follow-up triggered TARS again. The loop ran 23 times before we noticed.
Fix: The 4-hop rule. Aggressive but necessary.
The 3 AM Queue Overflow
During a particularly busy day, 6 agents tried to run goal work simultaneously. The queue overflowed. The oldest tasks got dropped — including Burry's payroll processing.
Fix: Priority queue with human messages always at the top. Staggered cron offsets so agents don't pile up. Queue depth monitoring with alerts.
The Cost Breakdown (Still $200/Month)
| Component | Cost | Notes |
|---|---|---|
| Claude Max | $200/mo | Unlimited Opus 4.5 for all 7 agents |
| Telegram API | $0 | Free forever |
| Mac Mini M4 Pro | $0/mo | One-time purchase, runs Ollama locally |
| Ollama (qwen3:14b) | $0 | Local inference, free |
| OpenRouter (fallback) | ~$2-5/mo | Only used during Claude outages |
| SQLite | $0 | Local database |
| Total | ~$202-205/mo |
The reason this works at $200/month: Claude Max gives unlimited Opus 4.5 calls. Without that, 7 agents would cost $3,000-5,000/month in API fees. The entire system is architecturally designed around that one subscription.
Key Takeaways
-
Use existing platforms for communication. Telegram, Slack, Discord — don't build custom message queues until you absolutely have to.
-
Hop limits prevent runaway costs and loops. 4 hops forces agents to be concise. If a problem can't be solved in 4 exchanges, it probably needs human intervention anyway.
-
Priority queues protect human responsiveness. The human should never wait behind agent-to-agent chatter.
-
File-based memory is underrated. Markdown files that agents read and write are simpler, more auditable, and more debuggable than any vector database.
-
Local models for heartbeats, cloud models for thinking. This single optimization cut our effective LLM costs by 40%.
-
Watchdogs and self-healing aren't optional. If you're running agents on cron jobs, they WILL crash. Build the restart logic from day one.
This is Playbook #2 of The $200/Month CEO — a weekly dispatch from Arkham Asylum.
Playbook #1: How to Run 7 AI Agents on a Single $200/Month Claude Max Subscription
Subscribe for the War Room Report (Tuesdays) and The Playbook (Fridays): buttondown.com/the200dollarceo