The software industry's estimation track record provides some interesting clues to the nature of software's estimation problems. In recent years, The Standish
Figure 3-2:
Project outcomes
What's notable about The Standish Group's data is that it doesn't even have a category for early delivery! The best possible performance is meeting expectations "On Time/On Budget"—and the other options are all downhill from there.
Capers Jones
|
Size in Function Points (and Approximate Lines of Code) |
Early |
On Time |
Late |
Failed (
|
|---|---|---|---|---|
|
10 FP (1,000 LOC) |
11% |
81% |
6% |
2% |
|
100 FP (10,000 LOC) |
6% |
75% |
12% |
7% |
|
1,000 FP (100,000 LOC) |
1% |
61% |
18% |
20% |
|
10,000 FP (1,000,000 LOC) |
<1% |
28% |
24% |
48% |
|
100,000 FP (10,000,000 LOC) |
0% |
14% |
21% |
65% |
|
Source: Estimating Software Costs (Jones 1998). |
||||
As you can see from Jones's data, the larger a project, the less chance the project has of completing on time and the greater chance it has of failing outright.
Overall, a compelling number of studies have found results in line with the results reported by The Standish Group and Jones, that about one quarter of all projects are delivered on time; about one quarter are canceled; and about half are delivered late, over budget, or both (Lederer and Prasad 1992; Jones 1998; ISBSG 2001; Krasner 2003; Putnam and Myers 2003; Heemstra, Siskens and van der Stelt 2003; Standish Group 2004).
The reasons that projects
The number of projects that run late or over budget is one consideration. The degree to which these projects miss their targets is another consideration. According to the first Standish Group survey, the average project schedule overrun was about 120% and the average cost
A more company-specific view of project outcomes is shown in the data reported by one of my
Figure 3-3:
Estimation results from one organization. General industry data suggests that this company's estimates being about 100% low is typical. Data used by permission.
The points that are clustered on the "0" line on the left side of the graph represent projects for which the developers reported that they were done but which were found not to be complete when the software
The diagonal line represents perfect scheduling accuracy. Ideally, the graph would show data points clustering tightly around the diagonal line. Instead, nearly all of the 80 data points shown are above the line and represent project overruns. One point is below the line, and a handful of points are on the line. The line illustrates DeMarco's common definition of an "estimate"—the earliest date by which you could possibly be finished.
We often speak of the software industry's estimation problem as though it were a neutral estimation problem—that is, sometimes we overestimate, sometimes we underestimate, and we just can't get our estimates right.
But the software does not have a neutral estimation problem. The industry data shows clearly that the software industry has an underestimation problem. Before we can make our estimates more accurate, we need to start making the estimates bigger . That is the key challenge for many organizations.