Empirical Study Hypotheses | Extreme Programming Perspectives

The overall "effectiveness" of a new methodology is multifaceted. As a result, in designing the case study, we define several hypotheses. Each of our six targeted areas of investigation is now stated in terms of a null hypothesis that assumes there are no significant differences between agile and more traditional methodologies. The empirical investigation will determine whether the data is convincing enough to reject these null hypotheses.

Productivity and Cycle Time

1. There is no productivity or cycle time difference between developers using an agile methodology and developers using a more traditional methodology.

To assess developer productivity, organizations need to measure things such as the amount of time necessary to complete a task in relation to a size measure, such as lines of code or function points. The results need to be compared with the productivity results of engineers on past projects or compared, in aggregate, with software engineers not using the new methodology. Cycle time is a matter of survival for many companies. As a result, the cycle times to complete tasks, releases, and projects must be compared.

Externally Visible Prerelease Quality

2. There is no difference in externally visible prerelease quality between code produced using an agile methodology and code produced using a more traditional methodology.

We can assess this statement by examining the number and severity of functional test cases and the amount of time spent to fix these prerelease defects. To do this, the development team needs to record the amount of time they spend fixing a defect once an external test group reports it.

Externally Visible Postrelease Quality

3. There is no difference in externally visible postrelease quality between code produced using an agile methodology and code produced using a more traditional methodology.

We can assess this statement by examining the number and severity of customer and field defects and the amount of time spent to fix these released defects. To do this, the developers need to record the amount of time they spend fixing a defect once the code has been released.

Additionally, an important aspect of externally visible software quality is reliability. Measures of reliability widely used in software engineering include the number of failures discovered and the rate of discovery [Mendonça+2000]. Service Requests (SRs) are requests for changes (because of defects or the desire for enhanced functionality) by customers after product release. The literature on SRs has partially overlapped that of software reliability, because SRs often refer to occurrences of faults that also affect the reliability of software systems. Wood suggests that the models used for describing software reliability can be used also for the overall analysis of SRs, without any major loss of precision [Wood1996].

We also propose that the SRs be analyzed in relation to appropriate growth models to assess changes in product reliability. Software Reliability Growth Models (SRGMs), formal equations that describe the time of the discovery of defects, give additional insight into the defect behavior of the software product and the effort necessary for achieving the desired quality. The SRGMs can be used to compare the reliability of the pre-agile development process with the new methodology being introduced.

Responsiveness to Customer Changes

4. Developers using an agile methodology will be no more responsive to customer changes than developers using a more traditional methodology.

A significant professed advantage of agile methodologies is that the development team has the potential to improve its ability to adapt to customer changes and suggestions during product development. This could be done when the developers are asking for clarification on a requirement, which improves their understanding of the requirement. Alternatively, the development team could incorporate customer changes postrelease as the proposed change is prioritized into future product development.

Midrelease changes are very difficult to quantify. There is a fine line between simply clarifying a requirement and correcting or changing a requirement. The postrelease changes could be quantified by:

Counting the number of future requirements that relate to changing previously released functionality.
Counting the SRs that deal with altering previously released functionality.
Calculating the percentage of SRs that deal with altering previously released functionality that are handled by the development team, as Highsmith proposes [Auer+2002].
Recording the amount of time spent by the development team changing previously released functionality as a percentage of the total development effort.

It would also be advantageous to run a short customer-satisfaction survey after each release. As an organization transitions to an agile methodology and learns to improve its responsiveness to and understanding of customer requirements, customer satisfaction may improve.

Internal Code Structure

5. There will be no difference in internal code structure between code produced using an agile methodology and code produced using a more traditional methodology.

We believe that the project source code contains a wealth of information about the quality built in by the team members. Based on a set of design metrics, appropriate software engineering models can be built to link internal design aspects of the software product with its defect behavior, future maintenance costs, and ability to be easily enhanced. Increasingly, object-oriented measurements are being used to evaluate and predict the quality of software [Harrison+1998]. A growing body of empirical results supports the theoretical validity of these metrics [Glasberg+2000; Basili+1996; Briand+1995; Schneidewind1992]. The validation of these metrics requires convincingly demonstrating that (1) the metric measures what it purports to measure (for example, a coupling metric really measures coupling) and (2) the metric is associated with an important external metric, such as reliability, maintainability, and fault-proneness [ElEmam2000]. Note that the validity of these metrics can sometimes be criticized [Churcher+1995].

As proposed by Succi in [Succi+2001], the relatively simple and well-understood CK metrics suite proposed by Chidamber and Kemerer [Chidamber+1998] can be used to assess the internal structure of product code. Riel also advocates these metrics in Object-Oriented Design Heuristics [Riel1996]. This set of six metrics shows good potential as a complete measurement framework in an object-oriented environment [Mendonça+2000]. Important quality data can be mined from the source code based on these metrics, preferably with the use of automated tools. These are the six CK metrics.

Depth of inheritance tree (DIT) for a class corresponds to the maximum length from the root of the inheritance hierarchy to the node of the observed class. The deeper a class is within the hierarchy, the greater the number of methods it is likely to inherit, making it more complex to predict its behavior. Deeper trees constitute greater design complexity but also greater potential for reuse of inherited methods.
Number of children (NOC) represents the number of immediate descendants of the class in the inheritance tree. The greater the number of children, the greater the likelihood of improper abstraction of the parent and of misuse of subclassing. However, the greater the number of children, the greater the reuse.
Coupling between objects (CBO) is defined as the number of other classes to which a class is coupled through method invocation or use of instance variables. Excessive coupling is detrimental to modular design and prevents reuse. The larger the number of couples, the higher the sensitivity to changes in other parts of the design; therefore, maintenance is more difficult.
Response for a class (RFC) is the cardinality of the set of all internal methods and external methods directly invoked by the internal methods. The larger the number of methods that can be invoked from a class through messages, the greater the complexity of a class. If a large number of methods can be invoked in response to a message, the testing and debugging of the class becomes complicated because it requires a greater level of understanding from the tester.
Number of methods (NOM) is a simplified version of the more general weighted methods count (WMC), as usually done [Basili+1986]. The number of internal methods is extracted instead of forming a weighted sum of methods based on complexity. The number of methods and the complexity of the methods involved is a predictor of how much time and effort is required to develop and maintain the class. The larger the number of methods in a class, the greater the potential impact on children because children will inherit all the methods defined in a class. Classes with larger numbers of methods are likely to be more application specific, limiting the potential for reuse.
Lack of cohesion in methods (LCOM) is defined as the number of pairs of noncohesive methods minus the count of cohesive method pairs, based on common instance variables used by the methods in a class. High cohesion indicates good class subdivision. Lack of cohesion or low cohesion increases complexity, thereby increasing the likelihood of errors during the development process. Classes with low cohesion could probably be subdivided into two or more classes with increased cohesion.

Job Satisfaction

6. Developers using an agile methodology will be no more satisfied with their job than developers using a more traditional methodology.

We view employee job satisfaction to be of prime importance because happier employees are also less likely to leave their job. Therefore, improved job satisfaction means less risk of losing team members during project development. To assess job satisfaction, we suggest that a short survey be administered to the development team before transitioning to the agile methodology. Additionally, we periodically readminister the survey to the team as they transition.

In analyzing the data to assess these hypotheses, we must consider how well the software developers actually adhered to the practices of the chosen methodology. It is very important to set the results in the context of which practices were actually performed by most of the programmers. This is not to say that a team will succeed only through following the practices. Instead, we want to be able to find relationships and draw conclusions knowing what the team has actually done and how they have adapted and enhanced the methodology to meet the requirements of their project.