Today, I completed and deployed a minor release of TaskEstimation. Attendees at my express session on estimation at PHP Quebec earlier this month got to see the very first addition since the original release: the graphical representation. It seems that for most people without much statistics in their background, and most people who forgot it, R^{2} is not too meaningful. However, a visual representation of the dots and the regression line makes everything more obvious. The dots simply have to be close to the line.

I made this addition during the conference the day before I had to present. I must say I am really impressed by the possible output of eZ Components Graph. Hacking around in the code to produce a scatter plot was not too much trouble, but it did remind me how hard generating good graphs is. I have done it before and going back to my own code did not seem so attractive. Looking at the internals, I could see the same problems were faced: rendering graphics to multiple output formats without tangling the code is just hard.

I then remembered a voice in the back of my head. It’s Steve McConnell repeating that “simplistic single-point estimates are meaningless because they don’t include any indication of the probability associated with the single point.” I knew I had to do it eventually, but I just had to wrap my head around the maths involved and do it. In addition to the forecast, the regression tool now displays the confidence range based on the desired probability. The range also gets displayed on the graphic.

It turns out Wikipedia was the most accurate source of information I had available at the time. It can be hard to find a basic formula when you don’t have a textbook nearby.

Looking scary? It’s really not that bad, except that a few details are missing and I suspect there is an error in the first one. I’d have to check more sources, but alpha alone does not make sense and using the alpha with the hat does look much more accurate graphically.

The major detail that is missing was the way to calculate the t_{alpha/2,n-2} term. I wasn’t the only one searching for it. Someone linked to it on Wikipedia while I was reading. After a more while searching for the formula because I don’t like straight numbers, I settled for including the table itself. Those numbers are quite hard to obtain otherwise and so much accuracy is not required anyway.

TaskEstimation now provides nice confidence ranges to accompany the estimate, making the expected accuracy more obvious and taking into account that extrapolating your data beyond known values has more uncertainty.