Research Evidence

Iterative and Evolutionary Research

Evidence on the question of IID and evolutionary delivery comes from several studies by Alan MacCormack and others at Harvard Business School. In the first study [MacCormack01, MVI01] the question, "Does evolutionary development, rather than the waterfall model, result in better success?" was explored in a two-year in-depth analysis of projects. The report's conclusion?

Now there is proof that the evolutionary approach to software development results in a speedier process and higher-quality products. … The iterative process is best captured in the evolutionary delivery model proposed by Tom Gilb.

Tom Gilb's Evo.

And specifically on evolutionary feedback-based requirements and design,

… our research suggests a clear agenda for managers: Get a low-functionality version of the product into customers' hands at the earliest possible stage and thereafter adopt an iterative approach to adding functionality.

… projects in which most of the functionality was developed and tested prior to releasing a beta version performed uniformly poorly. In contrast, the projects that performed best were those in which a low-functionality version of the product was distributed to customers at an early stage. [emphasis added]

The study identified four practices that were statistically correlated with the most successful projects:

An iterative lifecycle with early release of the evolving product to stakeholders for review and feedback.
Daily incorporation of new software and rapid feedback on design changes (daily builds with regression testing).
A team with broad-based experience of shipping multiple projects.
Early attention to an overall architecture of modular and loosely coupled components.

Practices 1 and 2 are associated with all modern IID methods. Practice 4 is a key element in the UP.

architecture in UP

In a follow-up study [MKCC03], MacCormack and colleagues examined the effect of eight practices on productivity and defects (reported by customers), including IID and releasing a partial system early for evaluation and evolutionary design. The projects ranged from application software to embedded systems, with median values of nine developers and a 14-month duration; 75% used iterative and evolutionary development, 25% the waterfall. A key conclusion of the study:

In this study, we find that releasing [the result of an iteration] at an earlier stage of development appears to contribute to both a lower defect rate and higher productivity.

Given this measure consistently predicts several different dimensions of performance across different samples of projects we conclude that it represents a software development "best" practice.

In contrast, early detailed design specifications were not particularly valuable:

We find a weak relationship (p = 0.0781) between the completeness of the detailed design specification and a lower defect rate.

And detailed design specs did not improve productivity. However, design reviews with peers did significantly reduce defect rates.

In the multivariate model of defect factors, the following iterative-related practices and their magnitude of impact were significant:

Releasing a partial system (for evaluation, not operation) when 20% of the functionality is complete as opposed to waiting until 40% (the sample median) is associated with a decrease in the defect rate of 10.

by "10" is meant 10 defects per month per million lines of code
Integration and regression testing at code check-in (the XP practice of continuous integration) is associated with a reduction in the defect rate of 13.

Similarly, in the model of productivity factors, over 50% of the variation in productivity was related to just two factors, both related to iterative practices:

Releasing the partial product earlier with less functionality (early iteration internal release for review) was better than waiting for more functionality. An increase in productivity of eight occurred when released at the 20% rather than 40% complete level.

by "8" is meant 8 more lines of source code per person-day
The use of daily builds with integration and regression testing is associated with a productivity improvement of 17.

In a study of productive software organizations [HC96], researchers at Bell Labs found a consistent pattern on highly successful projects:

Iterative development with customer evaluation and feedback each iteration.
Simple organizational structure; fewer roles than average.
Even distribution of communication among people, more direct involvement of developers with other stakeholders, and more overall communication sharing.

A study published in 2001 summarized the results of research into over 400 projects spanning 15 years [CLW01]. Less than 5% of the code was actually useful or used. This high "software pollution" rate (reflecting un-useful requirements and over-engineering within a waterfall lifecycle) was significantly reduced by adopting iterative, short evolutionary delivery cycles as in the Evo method reducing releases from about six months on average to about two weeks.

In a survey of agile method results [Shine03], 88% of organizations cited improved productivity, and 84% improved quality. The most frequently used agile methods were Scrum and XP. Regarding cost of development, 46% stated no change and 49% stated it was less expensive with agile methods. One of the more interesting results predictable in terms of agile method claims was the increase in business satisfaction with the new software: Overall 83% claimed higher satisfaction and 26% overall claimed "significantly better satisfaction." The most frequently cited positive feature of agile methods (48%) was "respond to change rather than follow a predefined plan."

Another large study [Standish98] illustrating the value of iterative-related practices is the Standish Group's CHAOS study of project failure and success factors, analyzing 23,000 projects in the 1998 version. In the CHAOS TEN list of the top ten factors for success, at least four of the top five factors are strongly related to IID practices (Table 6.1).

Table 6.1. top five project success factors
Success Factor	Weight of Influence
User involvement	20
Executive support	15
Clear business objectives	15
Experienced project manager	15
Small milestones	10

High user involvement is central to IID methods; short iterations with demos, reviews, evolutionary requirements refinement, and client-driven iterations are key practices.

Executive support is promoted by these practices and especially through the demonstration of early, tangible results; people like to be associated with projects that show quick value and progress.

Clear business objectives is supported by adaptive client-driven iteration planning. By asking each iteration "What is most valuable?" and building it, the business objectives are clarified and realized, and the project aligned with them.

client-driven planning

Of course, small milestones are at the heart of iterative methods.

To quote the study,

We have long been convinced that shorter time frames, with delivery of software components early and often, increase the success rate. Shorter time frames foster an iterative process of design, prototype, develop, test, and deploy small elements. "Growing" (instead of "developing") software engages the user earlier and confers ownership.

Size Research

There is significant size research indicating smaller (and thus, less complex) projects are more successful and productive. This is not direct proof of the value of iterative development, but is very relevant to the IID practice of decomposing large projects into a series of small, short sub-project iterations.

A large study [Thomas01] of failure and success factors in over 1,000 UK IT projects found that 90% of the successful projects were less than 12 months duration; indeed, that 47% were less than 6 months. To quote,

This is not to say that projects over 12 months should not be started but that they should be broken into smaller projects within a programme of change whenever possible.

The trend that the larger the project, the more likely it will fail, has been corroborated in a number of other studies. For example, in a study [Jones96] large sample set data show 48% of 10,000 function point (FP) projects are cancelled, as are 65% of 100,000 FP ones.

Going back to early, fundamental size issues, exploration of general systems theory in the 1950s by von Bertalanfy, Bateson, and others led to this fundamental conclusion [Bertalanfy68]:

The larger the system the harder it is to predict its behavior.

More straightforward evidence that small is beautiful comes from a 23,000 project study [Standish98]. For example, project success versus duration, see Figure 6.1.

Figure 6.1. success vs. duration Success was defined as "The project is completed on time and on budget, with all features and functions as originally specified."

graphics/06fig01.jpg

This trend was confirmed in a follow-up study spanning 35,000 projects [Standish00], regarding cost (another size measure) versus success (Table 6.2).

Table 6.2. success vs. cost
Cost (USD)	< $0.5M	0.5M-3M	3M-6M	6M-10M	> 10M
Success	68%	22%	9%	1%	0%

And, to reiterate a portion of the Standish conclusion,

… shorter time frames, with delivery of software components early and often, increase the success rate.

Another interesting research note on size in the Standish research was the declining project failure rates, from 31% in the 1994 study to 23% in the 2000 study. This was correlated with smaller, shorter projects and smaller teams.

Direct smaller-size and evolutionary delivery evidence was presented in a previously cited study [CLW01]. The percentage of developed code that was ultimately found to be useful increased when the delivery cycle was reduced from around six months to about two weeks, as recommended in Evo.

Change Research

For cohesion, this section summarizes and repeats research introduced in an earlier chapter.

A study by Boehm and Papaccio showed that a typical software project experienced a 25% change in requirements [BP88]. This trend is corroborated in another large study; as illustrated in Figure 6.2 [Jones97], software development is a domain of inventive high-change projects.

Figure 6.2. rates of change on software projects

graphics/06fig02.jpg

Another measure of change is to investigate how much use is actually made of implemented features defined in early specifications. A large study [Johnson02] showed that 45% of features were never used (Figure 6.3).

Figure 6.3. actual use of requested features

graphics/06fig03.jpg

Evolutionary requirements to address change is becoming more widespread. A study of 107 projects [CM95] showed that only 18% of the projects tried to complete the requirements in a single early step; 32% used two cycles of requirements refinement (with programming in between); and in 50% of the projects the requirements analysis was completed over three or more iterations.

The data in this section demonstrates that software development is a high-change domain. Practices or values that encourage early "complete" specifications or schedules are incongruous. Iterative and evolutionary practices that emphasize adaptability and steps to provoke early change are consistent with this research.

Waterfall Failure Research

In a study of failure factors on 1,027 IT projects in the UK [Thomas01] (only 13% didn't fail), scope management related to attempting waterfall practices (including detailed up-front requirements) was the single largest contributing factor for failure, being cited in 82% of the projects as the number one problem, with an overall weighted failure influence of 25%. To quote the study's conclusion,

This suggests that … the approach of full requirements definition followed by a long gap before those requirements are delivered is no longer appropriate.

The high ranking of changing business requirements suggests that any assumption that there will be little significant change to requirements once they have been documented is fundamentally flawed, and that spending significant time and effort defining them to the maximum level is inappropriate.

Other significant evidence of failure applying the waterfall comes from one of its most frequent users in the past, the USA Department of Defense (DoD). Most DoD projects were required by the standard DOD-STD-2167 to follow a waterfall lifecycle. A report on failure rates in a sample of earlier 2167-era DoD projects concluded that 75% of the projects failed or were never used [Jarzombek99]. Consequently, a task force was convened, chaired by Dr. Frederick Brooks, the well-known software engineering expert. The report recommended replacing the waterfall with IID [DSB87]:

DOD-STD-2167 likewise needs a radical overhaul to reflect modern best practice. … In the decade since the waterfall model was developed, our discipline has come to recognize that [development] requires iteration between the designers and users.

Evolutionary development is best technically, and it saves time and money.

See "Standards-Body Evidence"

In another study of 6,700 projects, it was found that four out of the five key factors contributing to project failure were associated with and aggravated by the waterfall model [Jones95], including inability to deal with changing requirements, and problems with late integration.

In 1996 Barry Boehm published a well-known paper summarizing failures of the waterfall [Boehm96], with advice to use a risk-reducing IID approach combined with three milestone anchor points around which to plan and control; this advice was eventually adopted in the UP.

UP phases

There are several studies (covering thousands of projects) that shed light on the value of large, up-front specifications in a waterfall-oriented lifecycle.

One study [Jarzombek99] cited a 1995 DoD software project study (of over $37 billion USD worth of projects) showing that 46% of the systems so egregiously did not meet the real needs (although they met the specifications) that they were never successfully used, and another 20% required extensive rework to meet the true needs (rather than the specifications) before they could be used.

As mentioned earlier, another study [Johnson02] showed that 45% of features were never used with an additional 19% rarely used.

In the previously cited study of over 400 waterfall-oriented projects [CLW01] averaging six-month cycles, only 10% of the developed code was actually deployed, and of that, only 20% was used. The prime reasons included:

Users couldn't provide much feedback before delivery.
Changes in the business since the requirements phase.
Requirements and business operations were misunderstood.

Productivity Research

There is a productivity motivation to apply short iterations, even if there were up-front requirements.

A study [Solon02] against a sample set (43,700 projects) showed the following productivity differences between IID and waterfall:

Rigorous IID or Evolutionary Prototyping

Rigorous Waterfall

570

function points per full-time equivalent developer

480

Interestingly, the same study showed that among the waterfall projects, those that applied it only "loosely" were significantly more productive than those that applied it "rigorously," indicating the negative effect that it has on productivity.

Another relevant study [Jones00] showed that as the size of project decreases (measured in language-independent function points), the monthly productivity of staff increases (Figure 6.4).

Figure 6.4. productivity vs. size

graphics/06fig04.jpg

This data illustrates the motivation of organizing a project into small mini-project iterations with low function points per iteration, as the most dramatic productivity drop occurs in the lower function point range (under 1,000).

Timeboxing by itself has been shown to have a productivity effect. Dupont, one of the earliest timebox pioneers, found developer productivity around 80 function points per month with timeboxed iterations, but only 15 to 25 function points for other methods [Martin91].

Note the rate of 80 function points per month at Dupont compared to a high of 12 function points per month in Figure 6.4. This suggests that the combination of a low-complexity step with timeboxing has a higher productivity impact than simply a small step without timeboxing.

In another study [Jones00], 47 factors that increase or decrease productivity were identified, including project complexity:

Low complexity	High complexity
+ 13% in productivity	35%

productivity was measured in function points per person-month

This indicates a productivity advantage by organizing projects in low-complexity mini-project iterations.

To reiterate the results of a study on productivity and iterative development [MKCC03], their conclusion was,

In this study, we find that releasing [the result of an iteration] at an earlier stage of development appears to contribute to both a lower defect rate and higher productivity.

Quality and Defect Research

Broadly, defect reduction comes from avoiding defects before they occur (Deming's Total Quality Management principle) and from feedback (tests, evaluations, and so forth). IID methods can address both. For example, several methods promote a per-iteration simple process assessment or reflection by the team, to encourage regular process improvement and defect avoidance. Feedback is enabled by the emphasis on early development of the riskiest elements, per-iteration demos, and a test-early, test-often approach. The association of lower defect rates with iterative development is consistent with Deming's predictions, as IID illustrates the Deming/Shewhart Plan-Do-Study-Act (PDSA) cycle, and supports a culture of continuous improvement by measuring, reflecting, and adjusting each iteration.

Specifically, the study [MKCC03] showed that IID was correlated with lower defects. In other research [MVI01], it was shown that as the time lag between coding and testing decreased, defect rates likewise decreased. A study by Deck [Deck94] also shows a statistically significant reduction in defects using an iterative method. Large case-study research [Jones00] showed that defect rates increase non-linearly as project size grows (Figure 6.5).

Figure 6.5. defects vs. size

graphics/06fig05.jpg

Although not statistically reliable, there are several single-case study reports of lower defect densities associated with iterative methods (e.g., [Manzo02]).