The AI email metric most teams are tracking wrong
I Am AI issue #6
Here's a stat that should bother you: in 2024, 62% of marketing teams needed two or more weeks to build a single email campaign. By 2025, that number dropped to just 6%. (Source: Litmus/Knak)
AI email tools did that. The production speed problem is solved. But the performance gap between teams using AI email well and teams using it poorly? That's wider than ever.
Brenthaven saw a 122% increase in open rates and a 211% improvement in CTR with AI-powered email. (Source: Seventh Sense case study) Stitch Fix grew email revenue by 30% and order value by 40% through AI-powered product recommendations. (Source: Chief AI Officer) Meanwhile, most teams using AI email report... marginal improvements at best.
Same technology. Wildly different outcomes. After pulling apart every published case study I could find, a clear pattern explains the gap. And it's not what most marketers think.
The Pattern: It's Not the Copy. It's the Data Access.
The teams getting 100%+ lifts from AI email aren't getting better copy. They're feeding AI richer behavioral data.
Klaviyo's research makes this concrete: brands using AI-driven behavioral segments see a 14-45% increase in revenue per recipient compared to traditional segmentation. (Source: Klaviyo) That's not a copy improvement — that's a segmentation improvement. The AI is making better decisions about who gets what message and when, based on behavioral signals most teams aren't even collecting.
Think about what the winning case studies have in common. Stitch Fix's AI doesn't just write "Hey, check out these clothes." It references your specific style preferences, past purchase history, items you browsed but didn't buy, seasonal patterns in your behavior, and price sensitivity signals. Each email has five, six, seven dynamic data points informing the personalization.
Now compare that to how most teams deploy AI email: first name merge tag, maybe a product category they browsed, generic "come back" copy. Two data points. The AI is writing fine emails — they're just not relevant to anyone's actual situation.
The gap between two data points and seven isn't incremental. It's the difference between Brenthaven's 211% CTR lift and the "AI is helpful but not transformative" conclusion most teams reach.
Open Rate Is the Wrong Metric (And It's Actively Misleading You)
Most teams evaluate AI email tools by looking at open rate improvements. It's the first metric in every case study, every tool comparison, every vendor pitch.
Here's why that's a problem: open rate optimization is table stakes now. Every decent AI tool can write subject lines that get opened. The AI systems have gotten extremely good at this — it's one of the most data-rich, easily testable aspects of email. Subject line optimization is, in a sense, the easiest problem in AI email.
But opens don't pay the bills. Revenue per recipient does.
The Brenthaven case is instructive here. Their open rates went up 122% — impressive. But their CTR went up 211%. That means the AI wasn't just getting more people to open emails. It was getting dramatically more people to click through and engage with the content inside. That's a different kind of optimization — it's about message relevance, not just subject line curiosity.
When Klaviyo reports a 14-45% lift in revenue per recipient, they're measuring what actually matters: did the email lead to a purchase? That metric captures everything — subject line performance, content relevance, offer targeting, send timing — in a single number that connects directly to business outcomes.
If you're evaluating AI email tools and the primary metric in the pitch is open rate improvement, you're looking at the wrong scoreboard.
What Actually Makes an AI Email Tool Effective
Based on the pattern across published case studies, here's what separates the tools (and implementations) that produce massive lifts from the ones that produce marginal improvements. Ranked by impact, not by what's easiest to evaluate in a demo.
1. Data Integration Depth
This is the single biggest differentiator. How many behavioral signals can the tool actually ingest and use for personalization?
Count the dynamic variables in a test email. If it's using fewer than five data points per email — first name, product viewed, maybe a category — the tool is essentially a copywriter with a name merge tag. That's not where the 100%+ lifts come from.
The tools producing the biggest results have deep integration with event tracking: browse behavior, scroll depth, comparison actions, pricing page engagement, cart composition, time-of-day patterns, device preferences. Each additional behavioral signal makes the personalization more relevant, and relevance is what drives clicks and conversions.
Supermetrics' research found that only 6% of marketers have fully implemented AI across their operations — while far more claim to be "using AI." (Source: Supermetrics) The gap between "using an AI tool" and "fully integrating AI into the data stack" explains most of the performance disparity.
2. Sequence Intelligence
Does the tool adjust timing and content based on engagement signals? Or does it just generate copy for a fixed schedule you built?
The highest-performing AI email systems don't operate on static Day 1 / Day 3 / Day 7 cadences. They adapt. If someone opens email one but doesn't click, the follow-up changes angle and timing. If they don't open at all, the system might shift channels entirely.
This is the approach Netflix takes across its recommendation and engagement systems — using AI to orchestrate the full sequence of touchpoints based on real-time behavioral signals, not just to generate individual pieces of content. (Source: TheAITrack) The orchestration is where the compound value emerges. Better individual emails produce linear improvements. Better orchestration produces exponential ones.
3. Revenue Per Recipient as the North Star
Not open rate. Not CTR. Revenue per recipient.
This is the only metric that connects email performance to business outcomes. Brenthaven's 122% open rate increase is a great headline. But if they'd only measured open rates, they'd have missed the fact that their CTR improvement was nearly double that — 211%. And without tracking all the way through to revenue, you can't know whether those clicks actually converted.
Klaviyo's 14-45% revenue per recipient improvement range is the right benchmark because it measures the full funnel, not just the top of it. When evaluating tools, ask specifically: can you show me revenue per recipient data, not just engagement metrics?
4. Time to Production
How fast can you go from brief to live campaign? This is where AI genuinely shines and where the production data is most dramatic.
The Litmus research showing the collapse from "62% of teams needed 2+ weeks" to "6% of teams needed 2+ weeks" tells the story clearly. (Source: Litmus/Knak) That's a real operational transformation. But speed is only valuable if the output quality justifies skipping the old review process. If the AI saves you 80% on production time but adds 40% in QA and editing, the net gain is smaller than the demo suggested.
5. Copy Quality
Dead last. Not because it doesn't matter — it does. But because it matters far less than the four things above. And frankly, most AI tools write acceptably good email copy at this point. The differentiation isn't in the words anymore. It's in the data, the segmentation, the sequencing, and the measurement.
The Uncomfortable Implication
Here's the take, and some of you won't like it: your AI email tool matters less than your data layer.
The best AI writing beautiful copy on bad data will lose to average AI on rich behavioral data. Every published case study I've reviewed supports this. The teams with the biggest lifts had the deepest data integration — not the fanciest AI model or the most sophisticated copy generation.
Industry-wide, 62% of marketers struggle with unreliable data and 47% report difficulty connecting AI to existing tools. (Source: Gartner, Backlinko) Those aren't email problems. They're infrastructure problems. And no AI tool, no matter how good its copy generation, will fix them.
If you're spending months evaluating AI email tools but haven't audited your event tracking, behavioral data collection, and segmentation infrastructure, you're optimizing the wrong layer of the stack.
The Evaluation Framework (Use This)
If you're choosing an AI email tool — or reevaluating one you're already using — here's what to assess, in order:
-
Data integration depth. Ask: how many behavioral signals can this tool ingest and use per email? Ask to see a real personalized email with all dynamic elements highlighted. Count them. Below five? It's a copywriter, not an optimizer.
-
Sequence intelligence. Ask: does the system adjust email timing and content based on engagement signals? Or does it operate on a fixed schedule? Ask for a specific example of how the sequence adapts when a recipient doesn't open the first email.
-
Revenue attribution. Ask: can you show me revenue per recipient data from a comparable customer? If the best metric they can offer is open rate improvement, that's a red flag.
-
Time to production. Ask: from brief to live email, what's the typical timeline? What percentage of AI output ships without human editing?
-
Copy quality. Ask for sample outputs on a brief similar to yours. But weight this last in your evaluation — it's the least predictive of actual performance.
What We're Exploring Next
Based on this research, we've started auditing our own behavioral data collection — mapping every event we track, identifying gaps in our email personalization signals, and evaluating how deeply our current tools can actually use that data.
Because the real takeaway from these case studies isn't "Tool X is better than Tool Y." It's that AI email performance is capped by data quality. The teams investing in their data layer are the ones seeing 100%+ lifts. Everyone else is optimizing subject lines and wondering why the results are underwhelming.
TL;DR
- Brenthaven: 122% open rate increase, 211% CTR improvement with AI email (Source)
- Stitch Fix: 30% email revenue growth, 40% order value increase (Source)
- Klaviyo research: 14-45% revenue per recipient lift from AI behavioral segmentation (Source)
- The #1 differentiator across case studies isn't copy quality — it's data integration depth
- Open rate is the wrong metric. Revenue per recipient captures the full funnel.
- Only 6% of marketers have fully implemented AI (Source: Supermetrics) — the gap between "using AI" and "integrating AI into the data stack" explains the performance disparity
- Production speed is solved (62% → 6% needing 2+ weeks per email). The next battleground is personalization depth.