How many times a day do you estimate the time and effort required to complete a project? I bet it is a lot more than you think. Did you recently adjust your alarm clock? Before you did that I expect that you, perhaps subconsciously, did the following:
- Set a deadline for yourself: Be at the office by 8:30.
- Set requirements: Must shower, eat breakfast, brush teeth, dress for success, arrive safely.
- Set stretch goals: Start a load of laundry before leaving, mail the RSVP to Aunt Petunia’s 75th birthday party on the way.
- Budgeted time for anomalies: Car was low on gas today so plan to fill up, expect to hit the snooze button between 1 and 3 times.
All these things get tallied up in your head and you set your bedside alarm clock for 6:45.
There is a lot of thinking and planning that has gone into this but you don’t have to do it every single time you touch your clock. Years of experience have gone into the estimation process. You may have driven to the Parkway Office 5 days a week for the last 3 years. You know that if you leave at 8:05, you will get there just before 8:30. An adult has showered, breakfasted, brushed, and dressed thousands of times. You know about how long it normally takes and where you can save time if you are running behind schedule. All these things make you the expert at estimating how long it takes to go from asleep to the office on a normal Wednesday. By 8:30, or usually earlier, you will know with absolute certainty if you made the right decision setting the alarm for 6:45.
Software engineers aren’t so lucky. They may make only a few estimates a month so it takes a whole career to get good at it, if they ever do. In addition, an engineer is often asked to estimate a project that takes thousands of hours, or at the extreme, projects that take hundreds of people a couple years to complete. Your alarm clock project had a staff of 1 and an estimated duration of 1 hour 45 minutes.
When a project is over, software engineers very rarely have enough data gathered to know if they were right or not. Sure, you know if the project shipped on the target date, but the project that ships is never exactly the one estimated. Customers can ask for requirement changes, they may drop some components to get the project done before a competitor ships, a team member may find an affordable off the shelf component that eliminates 1000 man hours budgeted into the estimate. In the end, with so many changes it’s hard to tell if the estimate was any good or not.
That’s for the big projects. For small bits, you actually can get good at estimating software development tasks. At least sort of good. To find out if you are good though, you need to collect metrics. I’m a big fan of metrics personally. Lord Kelvin was too. He said “If you can not measure it, you can not improve it.” How else can you tell if the changes you make are helping or hurting? There are lots of things that feel right, but when you measure them you find out that things weren’t going quite as well as you thought. Lots of people see this phenomenon in their financial budget. Buying coffee at Starbucks every day feels like a very good idea and can’t possibly cost very much. Then you add up the numbers and find you are paying $1560 a year for Starbucks coffee. Armed with that number you can make an informed decision and improve your use of money. Maybe a you would be happier drinking 7‑Eleven coffee and taking a 5 day cruise to Nova Scotia next June. That’s a personal decision only you can make, but without the hard numbers it is kind of difficult to realize the two choices are monetarily equivalent.
The NITRC project that I work on does in fact gather metrics on our small scale estimation. During the planning stage of an iteration, every task is broken down into a small piece we call a Feature Request, or FR for short. Each FR is then given an estimated level of effort, usually somewhere between 8 and 64 hours. When an FR is completed, the engineer who worked on it will record the actual number of hours used. Periodically, our project manager will tally up the estimates and actuals to see how close our estimates really are. So far, this looks like a great system for a metric-oriented engineer.
Here is where things got surprising for me: Our estimates tended to be too conservative. Meaning if we estimated 200 hours for a number of FRs, the actuals would consistently come in under 200 hours. Of course, that is good. It’s better to be under than over, at least most of the time. I know the guy who does the estimates pretty well, and I know he is good at his job. I was surprised that given the great feedback he has, he hasn’t been able to adjust his estimates and get them closer than the actual percentage difference we were consistently seeing.
The nuts and bolts of our metric gathering is a little unusual though and it got me thinking that there may be something fishy going on with the setup. Here is how it works: For the estimates, we categorize things as
- Easy, 8 hours
- Somewhat easy, 16 hours
- Average, 32 hours
- Difficult, 64 hours
- Very Difficult, 160 hours
Not a lot of granularity, but most of our FRs fall in the 8/16/32 hour range so generally you can find an estimate that you feel good about. For the Actuals, we also use the same category scheme so when you complete a FR, you round your hours to the nearest entry and record it with a drop down list. Except for the actuals, there is also a list entry for 2 hours and 4 hours for the really quick tasks.
After plugging in a few of my actuals, I realized that I could go over the estimate quite a bit without a penalty. If an FR has an estimate of 16 hours and I go 5 hours over, it still is closer to 16 than to the next step up. In that case, I get to mark the actual effort as right on the estimate. But if I go 5 hours under, then the actual level of effort rounds down to the 8 hour mark. Everyone cheers because the task is done early. Everyone except the poor technical lead who is left scratching his head saying, “why am I always budgeting more hours than we need?”
To make sure my intuition was right, I whipped up a quick Python script to check this out. Sure, I know that a decent statistician could prove using the power of Math, but I’m an engineer so I run a simulation instead. Here is a graph of the results:
The vertical axis is the percent below the estimate of the simulate batch of FRs. Keep in mind that the actual time simulated in total is pretty much right on the estimate. Because the reported acutals have to fall in specific buckets, there is a gap between what is reported and what really happened.
The horizontal axis is the value of the deviation I used for my simulated engineers. Basically, what it represents is how far off the estimates they would go. At a deviation of 0, every task will be completed exactly in the estimated time. With a deviation of .1, most of the tasks will be completed within 10% of the estimate. (I used a normal distribution so that 2/3 of the tasks would be completed within 10% of the estimate.) Some will be under and some over, but the average will still be right on the estimate. The higher the deviation, the more randomness I am throwing into the system. The estimator is still getting things right on average, but as the deviation grows, there are more outliers that take significantly longer or shorter than the time estimated.
I think this graph does confirm my intuition. Even though the estimates are right on the money, the reported actuals will tend to be below the estimates because of the quirks in the reporting system. The effect isn’t that big though. Even at a really high standard deviation of 50%, the quirks in the system are only accounting for a 7% difference. That’s not really enough to back up my original hypothesis. Sure it’s contributing to the difference between estimates and reality, but not by too much.
The system has an interesting mathematical quirk, but it probably doesn’t have a huge effect on real life. I’ll have to look at some other metrics to see what else can be done to improve our process. That’s the way science works. Make an educated guess. Test it out. Right or wrong, you still learn something. Lord Kelvin would be happy.
Really enjoyed your post and I love the title of your blog! The coestnt was so much fun! Hopefully we can meet at the next AFBA event.Becky