Stop reporting DORA. Start interpreting it.

June 8, 2026 · Issue 028

A quarterly DORA scorecard with no narrative attached has become the new "lines of code" — a vanity radiator that improves the same week your best engineers update their LinkedIn.

Walk into almost any engineering all-hands in 2026 and you will see the same slide. Four numbers, four sparklines, four upward arrows. Deployment frequency: up. Lead time: down. Change failure rate: down. MTTR: down. The room nods. The CTO clicks next.

Nothing on that slide tells you whether the org is healthier. It tells you the org is faster. Those are not the same thing, and pretending they are is the most expensive mistake senior TPMs and tech leaders are making this year.

The fix is not to throw DORA out. The fix is to stop treating DORA as a scorecard and start treating it as a thermometer — a reading you have to interpret in context before you decide whether to act, escalate, or leave well enough alone.

Deep Dive — Your DORA dashboard is a thermometer, not a treatment plan

Nicole Forsgren and the original DORA team have been clear about this for years: the four keys are indicators of performance across a wide set of practices, not the practices themselves, and emphatically not goals. The 2025 DORA report — the one that put AI adoption at 90% of respondents — reinforced the same idea in a different register: AI does not fix a team; it amplifies what is already there. Strong teams get stronger. Struggling teams generate more, worse, faster.

In 2026 that amplification is exactly why DORA-as-scoreboard fails.

Deployment frequency climbs because AI-assisted PRs are smaller and more numerous. Lead time drops because Copilot-shaped diffs sail through review. Change failure rate looks flat because incident classification policies have not caught up with the shape of AI-introduced regressions. The numbers go up and to the right, and the team's experience of work — cognitive load, focus time, friction in CI, time spent on toil — goes the other direction.

Abi Noda's reading of Bryan Finster's "How to Misuse & Abuse DORA Metrics" catalogues the predictable failure modes: comparing teams to each other on the four keys, setting numerical targets, spending more effort on pulling data into dashboards than on improving anything. Each is a textbook instance of Goodhart's Law — when a measure becomes a target, it ceases to be a good measure. Gergely Orosz has been hammering this same drum in The Pragmatic Engineer: whatever you measure gets gamed, especially in the AI era when "tokens shipped" becomes the new "lines of code."

So what should a senior TPM or engineering leader do instead? Three moves.

One: pair every DORA number with a SPACE or DevEx signal before you discuss it. The SPACE framework (Forsgren, Storey, Maddila, Zimmermann, Forsgren-Velasquez, 2021) was explicit that productivity has five dimensions — Satisfaction, Performance, Activity, Communication, Efficiency — and that any single metric will mislead. The DevEx framework (Noda, Storey, Forsgren, Greiler, 2023) collapsed that into three practical axes: feedback loops, cognitive load, flow state. Pair the activity metric (deployment frequency) with the experience metric (DXI score, focus-time minutes, perceived CI friction) and the picture clarifies in one slide. Throughput up + cognitive load up = a team that is being driven, not enabled.

Two: adopt a unified frame so you stop arguing about which framework wins. The DX Core 4, introduced by Abi Noda and Laura Tacho in late 2024, intentionally folds DORA, SPACE, and DevEx into one quadrant: Speed (diffs per engineer per week), Effectiveness (DXI), Quality (change failure rate, straight from DORA), Impact (percentage of engineering time on new capabilities). The point of Core 4 is not that it is a better dashboard. The point is that it forces a leader to read four different shapes of signal — activity, experience, reliability, strategic allocation — in the same sentence. You can no longer celebrate Speed while Impact rots underneath.

Three: require a written interpretation, not a number, in your operating reviews. This is the discipline most teams skip. A DORA number on a slide is a noun. A leadership team needs a verb. Replace the four-up DORA tile with a one-paragraph narrative: "Deployment frequency rose 22% QoQ, driven primarily by the migration off the monorepo pre-deploy gate. DXI fell two points in the same window, concentrated in the Platform pod. We believe the gate change improved throughput at the cost of CI predictability for one team; we are reverting the gate's parallelism setting next sprint and will re-measure." That paragraph is what your CTO actually needs. The four arrows are decoration.

If you remember nothing else: the DORA score is not the work. The interpretation is the work. A team that cannot tell you, in prose, why their numbers moved is a team that does not yet understand its own system. Help them get there before you escalate the trend line.

Try this week. Take last quarter's DORA scorecard. Delete every number. Write a one-paragraph narrative — driver, signal, hypothesis, planned action — for each of the four keys. Send it to your skip-level. If you cannot write the paragraph, you have your real status update: we are reporting numbers we do not yet understand.

Method — The Force Field Diagram (Lewin, 1943)

What it is. A two-column visual analysis, first published by Kurt Lewin in Defining the "Field at a Given Time" (Psychological Review, May 1943). You name a desired state at the top, then list the driving forces pushing toward it on one side and the restraining forces holding it back on the other. Lewin's claim: change happens not by adding more drivers but by weakening the restrainers.

When to use it. Anytime a metric moves and you do not yet know why. Also for any program where the obvious move is "push harder" — Force Field flips the analysis to "what is holding the system in equilibrium, and which restrainer is cheapest to remove?"

How to run it:

State the desired state in one sentence. Not "improve DORA" — "Reduce p50 lead time from 36h to 12h for the Checkout team without raising change failure rate."
Brainstorm driving forces: tools, incentives, recent investments, vocal stakeholders, recent wins. Five to nine items.
Brainstorm restraining forces: bottlenecks, on-call load, review queues, hidden coupling, cultural objections, missing skills. Five to nine items.
Score each force 1–5 for strength.
Decide: which one restrainer, if reduced by half, would shift the equilibrium most? That is your next sprint goal. Drivers are a distraction until restrainers are addressed.

When NOT to use it. When the system is in Cynefin's chaotic domain (active incident) or when the desired state is genuinely contested by stakeholders — you need a Polarity Map for the second case, not a Force Field.

Example. A platform team's DORA deployment frequency was stuck at 1.4/day for two quarters despite three tooling investments (drivers). Force Field surfaced that the dominant restrainer was a single mandatory security review queue with a 19-hour median wait. Reducing the queue — not adding a fourth tool — moved frequency to 4.2/day in six weeks.

Field Notes

How to Misuse & Abuse DORA Metrics — Abi Noda's walkthrough of Bryan Finster's paper is the cleanest single read on why your scorecard is failing you. Required reading before your next QBR.

2025 DORA Report: AI does not fix a team, it amplifies one — the headline finding (90% adoption, two hours/day median) matters less than the structural one: strong internal platforms and clear workflows determine whether AI shows up as ROI or as faster chaos.

Strong engineering foundations drive AI ROI (InfoQ on DORA's May 2026 follow-up) — confirms the amplifier thesis with a year of additional data. If your platform is shaky, AI's effect on your DORA numbers is a measurement artifact, not a result.

Events

Sep 15–16 · LDX3 New York (LeadDev) — first US edition of LeadDev's flagship; 2,000+ engineering leaders, heavy track on measurement and AI adoption.
Nov 9–10 · LeadDev Berlin — European engineering leadership track; strong content on org design and platform investment.
Dec 2–4 · TechLead Summit, Clearwater FL — smaller, executive-skewed; valuable for staff+/director-level peer conversations.

Reading

The SPACE of Developer Productivity · Forsgren, Storey, Maddila, Zimmermann, Forsgren-Velasquez, ACM Queue (2021) — the canonical paper; re-read the section on why activity alone misleads.
Introducing the DX Core 4 · Abi Noda, DX Newsletter (Dec 2024) — the unified frame; specifically the Speed/Effectiveness/Quality/Impact quadrant.
Goodhart's Law in Software Engineering · Hillel Wayne — the sharpest short essay on why every engineering metric gets gamed and what to do about it.

"When a measure becomes a target, it ceases to be a good measure."

— Marilyn Strathern's formulation of Goodhart's Law (1997)

Critical Path

Stop reporting DORA. Start interpreting it.

Stop reporting DORA. Start interpreting it.

Deep Dive — Your DORA dashboard is a thermometer, not a treatment plan

Method — The Force Field Diagram (Lewin, 1943)

Field Notes

Events

Reading