Results from the Experience | Extreme Programming Perspectives

In this section, we evaluate the results of the XP adoption efforts by using metrics gathered from both the V2 and V3 projects. The results are summarized in Figure 30.1.

Figure 30.1. Version 2 versus Version 3 comparison of results

graphics/30fig01.gif

We must point out that these comparisons were not made in the context of a controlled, scientific study and are therefore anecdotal. Some comparisons are based on subjective criteria. We nevertheless believe even anecdotal evidence in this context to be relevant and valuable, particularly because of the rarity of scientific studies conducted in actual production environments.

Comparing V2 and V3 Functionality

For the purposes of this evaluation, we must establish a comparison of the overall functionality of the V2 and V3 products. In the absence of objective function-point measurements, we must subjectively compare the two products.

In our subjective judgment, the V2 and V3 products were virtually equivalent in the total amount of functionality. Although the V2 product incorporated more complexity in its features, the V3 product targeted a wider set of simpler features.

Increased Rate of Development

The V2 project delivered its business value over a period of 20 months before the project was stopped because of the excessive costs of ownership.

The V3 project was suspended after nine months of development. At the established velocity, V3 would have delivered its total business value over a period of 12 months.

This result represents a 67% increase in the overall development velocity, as measured in terms of the rate of business value delivered over time.

Reduced Development Costs

The V2 project employed a total of 21 developers during its existence. The team size ranged from three developers to a maximum of 18 developers at its peak. The V2 project cost a total of 207 developer-months of effort.

The V3 project began with two developers. After four months, the team size was increased to four developers. Overall, using the estimated schedule, the V3 project would have cost a total of 40 developer-months.

These results represent an 80% reduction in developer-month effort and its corresponding personnel and overhead costs. We note that the V3 team was staffed by senior developers, and their expertise probably contributed to the productivity gains.

Increased Correlation of Software to Business Needs

The V2 project delivered features and technological capabilities beyond the requirements of the customer, at the expense of delivering revenue-generating features.

The V3 project's use of the planning game focused delivery only on clearly identified business requirements. As the expertise of the customer team increased, the development effort increasingly correlated directly to specific revenue-generating opportunities.

Reduced Release Cycle

The V2 project was unable to produce meaningful production releases in cycles of less than two to three months. The quality assurance cycle alone normally lasted two or more weeks.

The V3 project delivered production-ready releases in iteration cycles of two weeks. Because of the increased clarity and prioritization of the planning game, meaningful feature releases were produced in cycles of one to three iterations, representing a substantial improvement in the time between production releases.

The reduction in the release cycle enabled product managers to flexibly and quickly respond to changing business conditions. This was dramatically demonstrated by the rapid succession of changes to the product priorities following the introduction of XP. In January 2001, at the start of the XP adoption, two product lines were under development. In February 2001, two additional product lines were initiated, including the V3 product. In June 2001, development was stopped on V2, and another product line was suspended. Finally, in October 2001, V3 development was suspended, leaving one remaining active product line.

Increased Product Quality

For both projects, defects found in acceptance testing were tracked using a defect-tracking database. V2's policy was to verbally report minor defects without tracking, while V3 mandated that all defects be formally logged. Acceptance tests on both projects were manually executed, but the V3 project also used a suite of 1,230 automated unit tests. Acceptance testing on the V2 project was performed sporadically, while on the V3 project acceptance testing was performed repeatedly on all iterations.

The V2 project logged a total of 508 defects over its 20-month life cycle. Of these defects, 182 were logged during a difficult two-month period from September through November 2000. Another 123 defects were logged during the final one-month testing cycle before V2's last production release in June 2001.

The V3 project logged a total of 114 defects over its entire nine-month duration. All these defects were minor. Severe defects were discovered and fixed by the developers before concluding iterations. Assuming a linear increase in the number of defects had the project run to completion, V3 would have produced 152 total defects.

These results represent a 70% reduction in the number of defects discovered in acceptance testing. This reduction is even more significant when we take into account the lower defect severity levels.

Increased Quality of Implementation

Quality of design and implementation is difficult to measure objectively, because even the types of measurements are subject to debate in the software community. For these measurements, we chose a few relatively simple indicators of possible quality: total code size, the average size of classes, the average size of methods, and the average cyclometric complexity of methods. The total-code-size metric includes all types of sources (Java, JSP, ASP, and HTML), while the remaining metrics focus only on the Java sources.

Because the V3 project was suspended before completion, it was necessary to estimate the final total-code-size measurement, assuming a continuation of the observed linear growth.

Table 30.1 summarizes the comparison of these metrics.

Although these metrics are subject to differing analyses, when we combine them with our subjective reviews of the code base, we feel they represent an improvement in the quality of the implementation. The reduction in code size is indicative of a simpler implementation, assuming delivery of comparable functionality. The presence of a larger number of smaller methods per class, combined with the reduced complexity of methods, suggests an improved division of responsibility and behavior across methods.