I wanted to care about software estimation

                December 31, 2020

            I wanted to care about software estimation

            I wanted to care about software estimation
I'm the kind of person
who hears about a good idea
and immediately wants to try it.
So when I read
about reference class forecasting,
and coincidentally heard some managers in my org
discussing the problem of software projects
going way over (time) budget,
I eagerly suggested we try reference class forecasting.
Reference class forecasting is a familiar principle:
you should use data to inform your estimate.
People tend to underestimate the time it takes
to deliver a project
when they're deep in the details of the project.
Due to optimism or overconfidence,
"everyone thinks they're above average."
I know I often underestimate.
Reference class forecasting
suggests you instead predict a project's delivery time
based on the actual delivery time
of past "similar" projects.
In mathy-softwarey lingo,
collect a dataset of project completion time estimates
and their actual completion times,
define a probability distribution over the past data,
and compare your estimate for a new project
to the distribution.
This will cause you to adjust your estimate
to hopefully better match the truth.
"Similar" gives a bit of ambiguity here.
How do you decide what class of projects
to use to compare against this new project?
That can be a problem,
but somehow people are decently good
at telling whether two projects are similar
in scope and complexity.
With enough examples of projects like
"refactor codebase,"
and "migrate off deprecated dependency,"
and "extract business logic into a new service,"
the correct reference class to pick should be clear
without needing to be too rigorous.
After all,
how many truly novel kinds
of software features and projects are there?
This became a topic of an hours-long meeting
in which,
perhaps betraying my overconfidence and optimism,
nothing got done.
My suggestion was very concrete:
for each sufficiently large project
(expected to take at least 1 quarter)
everyone involved will come up with an estimate
for when it will be completed.
We'll keep the predictions in a spreadsheet or database,
and once we have enough,
consult it for future estimates.
It was met with mostly crickets.
Some people questioned whether the right metric
was completion time (calendar time)
compared to total engineer-months spent.
This sort of makes sense as priorities shift quickly,
but adds the complexity that
people would have to track their effort spent,
which is harder than it seems.
Another question was,
what do you do with a project that gets abandoned?
I suggested it would be good to know
how often projects are abandoned.
Others seemed opposed to the idea
of keeping a record of predictions.
While I might speculate
that they were concerned about egos being bruised,
externally they voiced
that they didn't think it was worth the work.
Either they thought it wouldn't improve the estimates enough,
or else that the value of having better estimates was too low.
This surprised me.
Why wouldn't better estimates be valuable?
Is recording estimates really such a burden?
This reminded me of the writing of Cassie Kozyrkov,
who taught a series of excellent statistics courses internal to Google.
Cassie stresses the importance of determining
whether the question you want to answer is actually important.
That is, will the decision maker commit
to a different course of action
if the answer to the question changes?
If so, then you can devote the time to form hypotheses,
carefully collect data,
and do statistical analyses.
But more often than you'd expect,
people have already decided what they want to do.
Data will not change their minds.
They may reject the need for data outright.
Or they may appeal to data,
but only for inspiration and reinforcement.
When you need data to confirm
your immutable belief,
you can always torture the data enough to get a confession.
Or at least,
you can rationalize discrepancies away
in a puff of smoke.
In these situations, you, the person responsible for answering the question,
are lucky.
You don't need to do any hard work.
You can do just what's necessary
to placate the decision maker's need for a justification
to do whatever it is they were going to do anyway.
In the end,
we didn't start tracking project estimates.
It seems that "how long will a feature take to deliver"
is simply not an important enough question,
and we decide to build projects no matter how long they take,
or give up when priorities change.
So I can provide care-free estimates,
and be happy that there are no consequences to being wrong.
Wrapped up in this is a familiar lesson.
Talk is cheap, 
and people will often say they value something
that their actions contradict.
More subtly, as a consequence people like me,
with bright-eyed, bushy-tailed enthusiasm and naiveté,
grow jaded, lose trust, and lose motivation.
I'm more willing to lean on institutional bloat,
and reply to "this is valuable,"
with a masked form of "prove it with your actions."
I have to spend more effort reading between lines,
engaging in politics,
and less time building useful software.
Or even, if I had nothing else to do 
at least I'd clean up old tech debt,
which is more valuable than 
pretend caring about software estimation.
But then, I'm still interested 
in whether reference class forecasting works well for software!
Do you have an experience with reference class forecasting for software projects? If so, how did it go?
Or, does your experience line up with mine,
that people say they want better estimates,
but their actions suggest 
it's not valuable enough to put in the work?

Don't miss what's next. Subscribe to Halfspace: