Bad numbers

The most frequently quoted numbers in software engineering are probably those of the Standish Chaos report starting in 1995. To cut the story short, they are the ones that claim that only 16% of software projects succeed. Personally, I never really believed that. If it were that bad, I wouldn’t be making a living from writing software today, 15 years later. The industry would have been shut down long before they even published the report. As I was reading Practical Software Estimation, which had a mandatory citation of the above, I came to ask myself why there was such a difference. I don’t have numbers on this, but I would very surprised to hear any vendor claiming less than 90% success rate on projects. Seriously, with 16% odds of winning, you’re better off betting your company’s future on casino tables than on software projects.

The big question here is: what is a successful project? On what basis do they claim such a high failure rate?

Well, I probably don’t have the same definition of success and failure. I don’t think a project is a failure even if it’s a little behind schedule or a little over budget. From an economic standpoint, as long as it’s profitable, and above the opportunity cost, it was worth doing. Sometimes, projects are canceled for external factors. Even though we’d like to see the project development cycle as a black box in which you throw a fixed amount of money and get a product in the end, that’s not the way it is. The world keeps moving and if something comes out and makes your project obsolete, canceling it and changing direction is the right thing to do. Starting the project was also the right thing to do based on the information available at the time. Sure some money was lost, but if there are now cheaper ways to achieve the objective, you’re still winning. As Kent Beck reminds us, sunk costs are a trap.

When a project gets canceled, the executives might be really happy because they see the course change. The people that spent evenings on the project might not. Perspective matters when evaluating success or failure. However, when numbers after the fact are your only measure for success, that may be forgotten. Looking back, it won’t matter if a hockey team gave a good fight in the third period. On the scoreboard, they lost, and the scoreboard is what everyone will look back to.

Luckily, I wasn’t the only one to question what Standish was smoking when they drew those alarming figures. In the latest issue of IEEE Software, I came across a very interesting title: The Rise and Fall of the Chaos Report Figures. In the article, J. Laurenz Eveleens and Chris Verhoef explain how the analysis made inevitably leads to this kind of number. The data used by Standish is not publicly available, so analyzing it correctly is not possible (how many times do we have to ask for open data?). However, they were able to tap into other sources of project data to perform the analysis and come up with a very clear explanation of their results.

First off, my question was answered. The definition of success for Standish is pretty much arriving under budget based on the initial estimate of the project with all requirements met. Failure is being canceled. Everything else goes into the Challenged bucket, including projects completing slightly over budget and projects with changed scope. Considering that bucket contains half of the projects, questioning the numbers is fairly valid.

I remember a presentation by a QA director (not certain of the exact title) from Tata Consulting at the Montreal Software Process Improvement Network a few years ago in which they were explaining their quality process. They presented a graph where they had some data collected from projects during the post-mortem and asking to explain the causes of some slippage or other types of failures (it was a long time ago, my memory is not that great), but there was a big column labeled Miscellaneous. At the time, I did not notice anything wrong with it. All survey graphs I had ever seen contained a big miscellaneous section. However, the presented highlighted the fact that this was unacceptable as it provided them with absolutely no information. In the next editions of the project survey, they replaced Miscellaneous with Oversight, a word which no manager in their right mind would use to describe the cause for their failures. Turns out the following results were more accurate for the causes. When information is unclear, you can’t just accept that it is unclear. You need to dig deeper and ask why.

The authors then explain how they used Barry Boehm‘s long understood Cone of Uncertainty and and Tom DeMarco‘s Estimation Quality Factor (published in 1982, long before the reports) to identify organizational bias in the estimates and explain how aggregating data without considering it leads to absolutely nothing of value. As an example, they point our a graph containing hundreds of estimations at various moments in the projects within an organization of forecast over actual ratio. The graphic is striking as apparently no dots exist above the 1.0 line (nearly none, there are a few very close by). All the dots indicate that the company only occasionally overestimates a project. However, the cone is very visible on the graph and there is a very significant correlation. They asked the right questions and asked why that was. The company simply, and deliberately, used the lowest possible outcome as their estimate, leading them to a 6% success rate based on the Standish definition.

I would be interested to see numbers on how many companies go for this approach rather than providing realistic (50-50) estimates.

Now, imagine companies buffering their estimates added to the data collection. You get random correlations at best because you’re comparing apples to oranges.

Actually, the article presents one of those cases of abusive (fraudulent?) padding. The company had used the Standish definition internally to judge performance. Those estimates were padded so badly, they barely reflected reality. Even at 80% completion some of the estimates were off by two orders of magnitude. Yes, that means 10000%. How can any sane decision be made out of those numbers? I have no idea. In fact, with such random numbers, you’re probably better off not wasting time on estimation at all. If anything, this is a great example of dysfunction.

The article concludes like this:

We communicated our findings to the Standish Group, and Chairman Johnson replied: “All data and information in the Chaos reports and all Standish reports should be considered Standish opinion and the reader bears all risk in the use of this opinion.”

We fully support this disclaimer, which to our knowledge was never stated in the Chaos reports.

It covers the general tone of the article. Above the entertainment value (yes, it’s the first time I ever associate entertainment with scientific reading) brought by tearing apart the Chaos report, what I found the most interesting was the well vulgarized use of theory to analyze the data. I highly recommend reading if you are a subscriber to the magazine or have access to the IEEE Digital Library. However, scientific publications remain restricted in access.

One thought on “Bad numbers”

  1. I’d always suspected something like this with time and materials contract development, and it seems to be directly related to the size of the company or organization, although this is a personal conjecture without any readily available external justification. Also seemingly related to this is the project length and headcount. In sharp contrast with this are our contracts for skunk works projects, which have to quickly justify their existance in very clear, simple, direct and immediate measures.

Leave a Reply to Chuck Brooks Cancel reply

Your email address will not be published. Required fields are marked *