Evidence Based Scheduling - 1
這兩天換個口味,後天再繼續OKR。
關於這篇,我之前看的是翻譯文,直到今天我才正式去把它給細讀一次。沒想到內容跟我原來讀的竟然有蠻大的差異,也不枉費我花一些時間去查單字細讀了。
Joel的口吻一向很幽默,因此我希望保留原文給英讀能力好的讀者,一起欣賞他的文采。
Evidence Based Scheduling
Evidence Based Scheduling – Joel on Software
Software developers don’t really like to make schedules. Usually, they try to get away without one. “It’ll be done when it’s done!” they say, expecting that such a brave, funny zinger[^2] will reduce their boss to a fit of giggles[^1], and in the ensuing joviality[^3], the schedule will be forgotten.
軟體開發者都不怎麼喜歡訂時程。而且,他們總是會想辦法逃避這件事。對於老闆的詢問,他們總是回答說:「完成的時候就是完成了!」他們也總是會期望有一個精緻且有趣的玩笑,能讓他們的老闆笑一笑,然後就可以忘了時程這件事。
Most of the schedules you do see are halfhearted attempts. They’re stored on a file share somewhere and completely forgotten. When these teams ship, two years late, that weird guy with the file cabinet in his office brings the old printout to the post mortem[^4], and everyone has a good laugh. “Hey look! We allowed two weeks for rewriting from scratch in Ruby!”
許多你看起來排好的時程,它們都不是真心全意的嘗試。它們只會被儲存在某個共享空間,然後就完全被大家所遺忘。直到這個團隊在晚了2年才能交付產品的時候,會有一個怪咖從舊的檔案櫃中,把之前印好的時程表帶到這個死透的產品的"驗屍現場",大家會笑出來,而且說:「嘿,你們看!我們還有排上2週,要用Ruby從無到有,整個重寫呢...」。
Hilarious[^5]! If you’re still in business.
太爆笑了!如果你們還在營運的話…
You want to be spending your time on things that get the most bang for the buck[^6]. And you can’t figure out how much buck your bang is going to cost without knowing how long it’s going to take. When you have to decide between the “animated paperclip” feature and the “more financial functions” feature, you really need to know how much time each will take.
你一定想要讓時間是花在刀口上的。所以當你決定在「小幫手(animated paperclip)」和另一個「更重要的財務功能」還一個來做的時候,你真的必須知道它們各自要花多少時間。
Why won’t developers make schedules? Two reasons. One: it’s a pain in the butt[^7]. Two: nobody believes the schedule is realistic. Why go to all the trouble of working on a schedule if it’s not going to be right?
為何開發者都不做時程規劃?理由有兩個,第一就是那實在太費事傷神了。第二就是沒人相信時程是真的。所以,如果這一切根本就不會照規劃走,為什麼他們要去訂時程?
Over the last year or so at Fog Creek we’ve been developing a system that’s so easy even our grouchiest[^8] developers are willing to go along with it. And as far as we can tell, it produces extremely reliable schedules. It’s called Evidence-Based Scheduling, or EBS. You gather evidence, mostly from historical timesheet data, that you feed back into your schedules. What you get is not just one ship date: you get a confidence distribution curve, showing the probability that you will ship on any given date. It looks like this:
去年在Fog Creek我們開發了一個很簡單的時程估計系統,連我們最會埋怨的開發者都可以與之和平共處。據我們所知,它可以產出極度可信的時程,這個方法被稱為「證據式時程規劃(Evidence-Based Scheduling)」,簡稱EBS。用戶只要收集足夠多開發歷程的「證據」,多數是來自於之前的耗費的時間記錄,輸入到這個系統,最終不只是得到一個交付的日期,而是一個可信度分佈曲線,讓你知道不同的日期有多少確信率可以真正交付產品,就像這樣:
The steeper the curve, the more confident you are that the ship date is real.
Here’s how you do it.
曲線越陡峭,代表交付日期的"成真率"越高。
接下來這就是你使用這個系統的過程說明。
1) Break ‘er down
When I see a schedule measured in days, or even weeks, I know it’s not going to work. You have to break your schedule into very small tasks that can be measured in hours. Nothing longer than 16 hours.
1) 把所有的工項都確實拆解
當我看到估出來的時程是以「天」為單位,甚至是以「週」為單位時,我就知道這完全是玩假的。你必須把時程中的每個工項拆解到非常小,小到可以用「小時」為單位來估計,而且沒有任何一個工項可以超過16個小時。
This forces you to actually figure out what you are going to do. Write subroutine foo. Create this dialog box. Parse the Fizzbott file. Individual development tasks are easy to estimate, because you’ve written subroutines, created dialogs, and parsed files before.
這能要求你確實把你要做的事情搞清楚。這可能包括要寫一隻子函式「foo」,創建一個對話框,解析一個Fizzbott檔等等。這些事情是容易被估出會耗費的時間的,因為你寫過這些函式,創建過對話框,也解析過那個檔案。
If you are sloppy[^9], and pick big three-week tasks (e.g., “Implement Ajax photo editor”), then you haven’t thought about what you are going to do. In detail. Step by step. And when you haven’t thought about what you’re going to do, you can’t know how long it will take.
如果你很懶散,只是隨便為一個工項訂了3週的時程(像是「實作一個Ajax版的照片編輯器」),這就代表你壓根就沒想過你究竟要做些什麼事情來完成這個功能。一定要一步一腳印,確實把所有的細節拆出來,不這麼做,你絕對無法正確的預估你要花多少時間做這件事。
Setting a 16-hour maximum forces you to design the damn feature. If you have a hand-wavy[^10] three week feature called “Ajax photo editor” without a detailed design, I’m sorry to be the one to break it to you but you are officially doomed. You never thought about the steps it’s going to take and you’re sure to be forgetting a lot of them.
設定一個16小時的最大耗時,可以強迫你去設計出每件你應該要做的事。如果你只是掐指一算,"覺得"應該需要三週來完成「Ajax版的照片編輯器」,你就完蛋了~你一定不曉得幹這件事會有多少"意外"出現。
2) Track elapsed time
It’s hard to get individual estimates exactly right. How do you account for interruptions, unpredictable bugs, status meetings, and the semiannual Windows Tithe Day when you have to reinstall everything from scratch on your main development box? Heck, even without all that stuff, how can you tell exactly how long it’s going to take to implement a given subroutine?
You can’t, really.
So, keep timesheets. Keep track of how long you spend working on each task. Then you can go back and see how long things took relative to the estimate. For each developer, you’ll be collecting data like this:
追蹤耗費時間 要精準預估每個人所花費的時間是很困難的,因為我們都不知道要怎麼把「被插斷」,「非預期的bug」,「同步會議」或是大約半年就會出現的「Windows奉獻日(Windows Tithe Day)」:你必須把你的開發機整個重灌,正確的列入計算?就算這些事情都不算進來好了,我們又要怎麼正確的計算,我們花在一個工項的「實際研發時間」是多少?
你不行,真的。
所以,一定要保留那些耗時表。保留每個工項你真的花下去的時間,你才有辦法回頭去看看,實際上花下去的跟你一開始估計的差多少。隨便抓一個開發者的資料來看,你會得到一個這樣的統計圖:
Each point on the chart is one completed task, with the estimate and actual times for that task. When you divide estimate by actual, you get velocity: how fast the task was done relative to estimate. Over time, for each developer, you’ll collect a history of velocities.
在圖上的每個點都是一個完成的工項,也就是會有「預估」和「實際」的耗時。當你透過「預估 / 實際」這麼一除,就會得到所謂的「速度」:這個工項比預期快多少完成。隨記錄的時長越久,我們就能取得一連串的速度記錄。
The mythical perfect estimator, who exists only in your imagination, always gets every estimate exactly right. So their velocity history is {1, 1, 1, 1, 1, …}
A typical bad estimator has velocities all over the map, for example {0.1, 0.5, 1.7, 0.2, 1.2, 0.9, 13.0}
Most estimators get the scale wrong but the relative estimates right. Everything takes longer than expected, because the estimate didn’t account for bug fixing, committee meetings, coffee breaks, and that crazy boss who interrupts all the time. This common estimator has very consistent velocities, but they’re below 1.0. For example, {0.6, 0.5, 0.6, 0.6, 0.5, 0.6, 0.7, 0.6}
理想的完美預估時程(只存在你想像裡的那種),也就是預估和實際耗時一致,所以它們的速度記錄應該會是{1 ,1 ,1, 1, 1....}
很糟糕的預估時程,就會在圖表上點得到處都是,像是{ 0.1, 0.5, 1.7, 0.2, 0.9, 13.0…}
大部分的預估都會獲得一定程度的”低估”。基本上每件事情都會比預期的還要久,因為我們的預估通常不會把「修正臭蟲」,「同步會議」,「喝杯咖啡」或是「老闆的奪命連環扣」算進來。這些普遍的預估通常會以一種蠻一致的程度出現,也就是它們大約都會造成40%-50%的誤差,所以換算成速度(阻力)會長得像這樣:{0.6, 0.5, 0.6, 0.6, 0.5, 0.7, 0.6…}
As estimators gain more experience, their estimating skills improve. So throw away any velocities older than, say, six months.
If you have a new estimator on your team, who doesn’t have a track record, assume the worst: give them a fake history with a wide range of velocities, until they’ve finished a half-dozen real tasks.
隨著人們預估的經驗越來越豐富,預估能力會變得更好,所以我們就可以把隨口說說的預留時間(buffer),比如說「預留6個月」這種沒有依據的預估給丟了。
如果在你的團隊中有新人,他們沒有足夠的記錄可以去追蹤,計算或分析,那我們只好往最壞的狀況假設:給他們足夠混亂的假的速度分布表,去預估他們的時程,直到他們真的完成了一些(比如說半打)的任務,我們才開始做一些調整。
3) Simulate the future
Rather than just adding up estimates to get a single ship date, which sounds right but gives you a profoundly wrong result, you’re going to use the Monte Carlo method to simulate many possible futures. In a Monte Carlo simulation, you can create 100 possible scenarios for the future. Each of these possible futures has 1% probability, so you can make a chart of the probability that you will ship by any given date.
3) 模擬未來
比起只是隨便加入一個預估的時間去得到一個可以交付的日期(很顯然聽起來就是錯的離譜…),你應該要使用蒙地卡羅(Monte Carlo)法去模擬許多可能的未來。在蒙地卡羅的模擬中,你可以有100種未來的可能場景,而每一個狀況會有1%的可能性,那麼你就可以劃出一張圖表明不同的交付日期的”成真率”有多大。
While calculating each possible future for a given developer, you’re going divide each task’s estimate by a randomly-selected velocity from that developer’s historical velocities, which we’ve been gathering in step 2. Here’s one sample future:
當要開始預估某個開發者可能的交付日期時,你要先根據他的歷史速度記錄,把他的每個工項的預估時間,亂數選一個速度來除:
| Estimate | 4 | 8 | 2 | 8 | 16 | | — | — | — | — | — | — | — | | Random Velocity | 0.6 | 0.5 | 0.6 | 0.6 | 0.5 | Total: | | E/V | 6.7 | 16 | 3.3 | 13.3 | 32 | 71.3 |
Do that 100 times; each total has 1% probability, and now you can figure out the probability that you will ship on any given date.
這個動作執行100次,我們可簡單認定每種不同的速度狀況的發生率是1%,你就可以得到每個不同的交付日期,有多高的”成真率”。
Now watch what happens:
In the case of the mythical perfect estimator, all velocities are 1. Dividing by a velocity which is always 1 has no effect. Thus, all rounds of the simulation give the same ship date, and that ship date has 100% probability. Just like in the fairy tales!
The bad estimator’s velocities are all over the map. 0.1 and 13.0 are just as likely. Each round of the simulation is going to produce a very different result, because when you divide by random velocities you get very different numbers each time. The probability distribution curve you get will be very shallow[^11], showing an equal chance of shipping tomorrow or in the far future. That’s still useful information to get, by the way: it tells you that you shouldn’t have confidence in the predicted ship dates.
The common estimator has a lot of velocities that are pretty close to each other, for example, {0.6, 0.5, 0.6, 0.6, 0.5, 0.6, 0.7, 0.6}. When you divide by these velocities you increase the amount of time something takes, so in one iteration, an 8-hour task might 13 hours; in another it might take 15 hours. That compensates for the estimators perpetual optimism. And it compensates precisely, based exactly on this developers actual, proven, historical optimism. And since all the historical velocities are pretty close, hovering around 0.6, when you run each round of the simulation, you’ll get pretty similar numbers, so you’ll wind up with[^12] a narrow range of possible ship dates.
那我們現在可以來看看有幾種可能性:
在完全正確的預估中,所有的速度都是1,所以100次累積出來的交付日期等於是有100%的成真率。可以當作神話看看就好。
在很糟的預估狀況中,速度會以0.1到13亂數分布,每次的模擬計算都會的到非常不同的結果,所以最終得到的曲線斜率會是一個很平緩的分布,這表示交付日期不管是明天還是未來的某一天,機率幾乎是一樣的。當然,這看來似乎是沒什麼參考價值的資訊,但這至少告訴我們一項資訊:不要對任何一個交付日期抱有過高的期望。
在一般的預估狀況下中,我們的亂數速度差異並不會太大,大約就像是{0.6, 0.5, 0.6, 0.6, 0.5, 0.6, 0.7, 0.6}這種分布。所以當我們把開發者每個工項的預估時間,除以這些速度後,我們會得到一個”被放大一點”的預估時程。一個8小時的工項可能會放大到13小時,另一個可能會被放大到15小時,這就為我們的預估做了一些樂觀的補償,而且這個補償是根據這個開發者的歷史記錄的精確補償。因為這些速度分布的很靠近,所以你做出來的模擬最終就會獲得一個很小區間,成真率夠高的交付日期。
In each round of the Monte Carlo simulation, of course, you have to convert the hourly data to calendar data, which means you have to take into account[^14] each developer’s work schedule, vacations, holidays, etc. And then you have to see, for each round, which developer is finishing last, because that’s when the whole team will be done. These calculations are painstaking[^13], but luckily, painstaking is what computers are good at.
在每一次的蒙地卡羅模擬中,我們需要把算出來的工時轉換到日期在日曆上的日期。也就是說我們要把開發者的工作時程,包括假日和假期,換算成最終的交付日期。然後,你必須看看哪一個開發者是最後完成工作的,因為那就是整個專案產品最終交付的真實日期。這個計算過程很累人,不過幸運的是,”計算”是電腦最擅長的事。