Disjointed Thoughts About Incidents
Hey!
Welcome back to another week of musings. Let's begin June, we've arrived at the middle of the year. Where has all the time gone?
I hope you had a great weekend. I spent it mostly at home resting and recovering from past weeks of too much work.
Was this forwarded to you? You can subscribe here!
Things I enjoyed in the past week
- The software industry: annealing, but wrong a great post by Avery Pennarun (aka apenwarr; cofounder of Tailscale)
- Guidelines for Respectful Use of AI by Camille Fournier
- Knowing about things is cheaper than knowing things by Hillel Wayne
I've spent the last few weeks knee-deep in an incident that stopped a lot of progress across the org.
As part of that, I've been thinking a lot about incidents, incentives, and layoffs, among other things. I think after getting to a more stable position last Friday, I have a lot of thoughts circling in my mind. So I think this one will be a smorgasbord of things, not really a conclusion.
Incident Management is Hard
Great incident management is hard and takes time and effort. Real actions by teams, not only pre to create capacity, but also post to improve any of the underlying factors that produced the incident.
Incentives Are Never Aligned
In large corporations, teams get so split apart that they tend to focus on very narrow aspects of a function, so if you need to achieve a goal, you need to align 3, 4, or 5 teams before starting to work, and even then, one of those might step away for a higher-priority project midway through your project.
This is more noticeable during incidents. Some teams might be pulled, but they will try to get out of it as fast as possible if "not required", but also sometimes their function is so narrow, and there's no incentive for them to jump in and help.
As opposed to a startup, where "everything" might be a single person, or a small team, so all hands on deck is a more literal thing.
Building Relationships Gets You Very Far
During incidents, or even pre/post, you can and need to build relationships in large corporations.
While there are official organizational positions, who manages whom. There are also unofficial positions, people whom you actually need to convince to get a thing going.
Building trust helps a lot during an incident; people can help you out if they trust you'll have their back, or trust your decision-making during incident response.
Emotions Run High, Acknowledge Burnout
During incidents, emotions can run high due to the desire to restore the product to a working state. It's easy to use your position to force an action.
Especially in large corporations, where people without domain knowledge might be required to be present during triage calls.
Also, burned-out people might be either slow to respond or lash out. Sometimes, demotivated people will feel like the world is crushing down on them. During an incident, it's important to acknowledge the difficult path, but also to focus on the issue at hand and move forward.
Everyone Has The Best Intentions
I've come across people reaching out to help, or telling what you should be doing, what should have happened, or what actions should be taken.
Sometimes advice is sound; other times it is devoid of context, and we have to learn to recognize this. Part of understanding this is recognizing that not everyone can participate effectively during incidents.
Accountability Doesn't Mean No Help
This is mostly about me. I thought that being accountable for resolving the incident also meant not asking for help.
Other times, during the incident, it's easy to forget to ask for help, to feel "faster" or "easier" to decide, and to implicitly create a bottleneck as everyone asks you for updates or decisions.
Your turn!
Let me know what you think about incidents or working across teams during incidents by replying to this email!
Happy coding!