A Statistical Interlude | The Software Development Edge(c) Essays on Managing Successful Projects

At this point, I can attempt to figure out the right "scale" for the altitude. I can measure the edges along the base in familiar units:

Scope: Function points or features.
Quality: Inverse of number of defects allowed.
Speed: Function points or features/month.
Frugality: "Inverse" dollars or person-months.

But what about that pesky probability of successour altitude?

We know that "longer is better," that a higher altitude corresponds to a higher probability of success. But there is a slight problem with using probabilitya percentage-based measurementas the scale. For example, if I have a pyramid with an altitude corresponding to a 60 percent probability of success, I cannot, under the constant volume assumption, improve that percentage by cutting the area of the base in half in order to double the altitude. That would give us an absurd answer of "120 percent probability of success," and I know that probabilities must be between zero and 100 percent.

To resolve this conundrum, I must investigate how the outcomes of software development projects are distributed. Can I assume that these project outcomes are distributed according to the standard normal distributionthe well-known "bell curve"? The diagram in Figure 9.4 is worth a thousand words.

Figure 9.4. How software development project outcomes relate to the standard probability bell curve.

If you happen to be rusty on what a probability distribution function is, recall that the x-axis represents the outcome, and the y-axis represents the number of events with that outcome which, properly normalized, is the probability of that outcome. If you start from the left edge and sweep out the area under the curve, you measure the cumulative probability of attaining that outcome. In Figure 9.4, the percentages beneath the x-axis show us how much area is contained between the x-axis coordinates that are spanned.

Note that the distribution is "normalized" here, with the midpoint called µ, and the "width" of the distribution characterized by the standard deviation sigma (s). The distribution extends to both plus and minus infinity, but note that the "tails" of the distribution past the plus and minus 3 sigma limits are quite small; the two tails share less than 0.3 percent of the entire area under the curve. The graph tells us that 68 percent of the projects will be either somewhat successful or somewhat unsuccessful, that only about 27.5 percent (95.5 percent minus 68 percent) will be either very successful or very unsuccessful, and that only 4.2 percent (99.7 percent minus 95.5 percent) will be either extremely successful or extremely unsuccessful. To get the relevant percentages for each of these, you can just divide by two, as there is symmetry around the middle. For example, you can predict that around 34 percentapproximately a thirdof all projects will be somewhat successful.

In many applications you assume that µ is zero, so the outcomes range from minus infinity to plus infinity. But the standard normal distribution is also used to model things that are only positive, such as the heights of people. In that case, µ is shifted to represent the mean value of the distribution. What is the situation for project outcomes?

You can think of the x-axis as the net payoff or reward. Although I am sure many software development projects have had zero or negative payoff, it is hard to conceive of a project having a very large negative payoff.^[3] This is because all projects will be cancelled by management long before they even start to approach minus infinity! So the symmetrical standard normal distribution centered on zero with tails to infinity in both directions seems to be the wrong model. What about the "shifted standard normal?" Here again, there are lots of reasons to suspect that the distribution is skewed, with a very long (if not infinite) positive tail and a shorter, finite tail in the negative direction. For example, the total net payoff on a project such as MS-DOS is very, very large. On the other hand, it is hard to imagine a project not being cancelled long before its net payoff was the symmetrical negative number! What we'd prefer is a distribution that has a finite limit on negative outcomes.

^[3] Part of this has to do with the finite horizon of a project. If we ship a defective product, the company will suffer huge support costs post-deployment. But these costs are rarely charged back to the project. This is rather unfortunate, because it shifts the burden away from the place that originated the problem. True project cost would include some measure of the post-deployment support cost.