Public postmortem: API availability issues

Our public postmortem for the incident on November 6th, 2025.

Justin Duke
November 16, 2025

For around three hours on November 6th, 2025, Buttondown's API faced severe availability issues. The root cause of these issues is that our analytics caching fell behind, meaning that the analytics data associated with emails, such as number of clicks, deliveries, etc., was not being cached on the email, meaning that they were getting calculated in band for emails accessing the analytics endpoint. This, coupled with the fact that we had an errant code path on the email completion modal that would fairly aggressively pull the API every few seconds without exponential backoff on failures, meant that we drowned ourselves in redundant requests for a very computationally expensive operation.

We've since tuned the analytics cacher and introduced a number of safeguards to make sure this doesn't happen again. In particular, we either have or are doing the following:

  1. Added a circuit breaker at the per route level. (We had circuit breakers for the entire API as well as for the entire API scope to a given user, but we could have mitigated the pain much quicker here. We were able to just temporarily disable the entire analytics endpoint.)
  2. Don't generate analytics live. (This is the really painful one in retrospect, because we abstractly knew that this kind of thing could theoretically happen.)
  3. Introduced exponential backoff across the entire API. (We already have it for some more sensitive routes, such as email creation or subscriber creation, but we'll roll out a baseline for all routes as well.)
  4. Introduced per-route limiting across all routes. (We already have it for some more sensitive routes, such as email creation or subscriber creation, but we'll roll out a baseline for all routes as well.)
  5. Decoupled the API from the core subscriber-facing archives. (The impact of this issue briefly bled over into archives as well, purely due to the amount of request volume that was failing over, trampling the ability for our main routers to fulfill new requests, even ones that weren't hitting this problematic code path.)
  6. Started to serve all API requests from a standalone server in order to limit the blast radius for things like this going forward.

Buttondown is the last email platform you’ll switch to.