The PTR Arrival and Backlog Projection Model

Near the end of the development cycle, a key question to ask is whether the scheduled code-freeze date can be met without sacrificing quality. Will the PTR arrival and backlog decrease to the predetermined desirable levels by the code-freeze date? The PTR submodel discussed earlier is clearly not able to accomplish this task because it is a tracking tool, not a projection tool. In contrast, the exponential model and other reliability growth models based on system test data, while being sufficient for the task, require data points well into the system test phase. Moreover, the analytic models may or may not be adequate depending on the goodness of fit. For cases like this, other types of modeling approaches may be needed. Here we present an example that we call the PTR arrival and backlog projection models. Its purpose is to project the PTR arrivals and backlog at the end of the development process. Analytical models aside, our approach was to derive empirical models based on data from the current project. If we were able to capture key explanatory variables in the models, we should be able to tap the correct message of the data with a certain degree of confidence. In this regard, the general linear model approach is readily available. From experience, we know that polynomial time terms combined with relevant variables usually form good projection models.

This model is different from the exponential model in several aspects. First, the time frame covers all machine testing (all PTRs) after the code is integrated (part of unit test, component test, component regression test, and system test). The exponential model applies only to defect arrivals during system test. Second, the data for this model are PTR arrivals and backlog, while the exponential model includes only valid PTRs (defects).

In our model building, the following sets of predictor variables were tested and their relationships with PTR arrival and backlog were specified:

Chronological time: The rationale is to capture the chronological pattern of the development process. It is well known that software development has a life cycle of systematic processes. The specific time trend, however, varies among systems. It may be linear or polynomial patterns of second degree or higher, a Fourier series, or some other forms.
Time lag variables: This set of variables is relevant because the data are of a time series nature and we need to assess the length of memory of these time series processes. Is this week's PTR number affected by the PTR occurrence of the preceding five weeks? four weeks? or the preceding fourth and third weeks but not the immediate two weeks? Does this process have memory at all? Testing this set of variables can give answers to questions like these.
Cumulative thousand lines of code (KLOC) integrated: This variable is important because code was not integrated at only one point in time. Throughout the development cycle, pieces of code were integrated into the system library for testing. The number of PTRs is strongly related to the size of the code being tested.
Significant activities such as the onset of component test, system test, and other events: This set of variables is dichotomous, with 1 denoting the presence of the event and 0 denoting its absence.

Prior to statistical testing of significance, scatterplots were used to examine the patterns of bivariate relationships and to detect outliers (Figures 9.12 and 9.13). For PTR arrival, a few obvious outliers were found, namely, the weeks of Thanksgiving, Christmas, and New Year's Day. The conspicuously low PTR arrivals for these weeks were apparently attributed to fewer working days as well as fewer programmers, which were artifacts of our calendar-time data. The values for these weeks, therefore, were replaced by the medians of the five consecutive data points centering at the weeks of interest. Likewise, values for the weeks of Memorial Day, Independence Day, and Labor Day were replaced , although they were not particularly low. For the backlog data, no adjustment is necessary because the data are cumulative.

Figure 9.12. PTR Arrival by Week

graphics/09fig12.gif

Figure 9.13. PTR Arrival by KLOC Integrated

graphics/09fig13.gif

When the patterns of bivariate relationships were specified and separate significance tests performed, the independent variables were put together in a model and their net effects were estimated simultaneously by the method of least squares. For both the arrival and backlog data, several models were attempted and the final model was chosen based on the highest R 2 value.

The number of PTR weekly arrivals was found to be a linear combination of a cubic pattern of time, a quadratic pattern of KLOC, the number of arrivals in the preceding week, and the presence or absence of the system test:

graphics/09icon01.gif

The equation of the model is as follows :

graphics/09icon02.gif

The model was highly significant ( F = 169.6, df 1 = 7, df 2 = 55, p = 0.0001), as were its component terms. All independent variables together accounted for 95.6% of the variation of the arrival data. This R 2 translates to a multiple correlation of 0.978 between the model and the actual data.

Figure 9.14 compares the PTR arrival projection model with actual data points for the projection period. The model produces a projection that is accurate within one week in terms of when the PTR arrivals would decrease to the predetermined desirable level prior to code-freeze. A PTR backlog model was likewise established and the projection was borne out very well.

Figure 9.14. PTR Arrival Projection Model

Source: Kan, S. H., "Modeling and Software Development Quality," IBM Systems Journal , Vol. 30, No. 3, 1991, pp. 35 1 “362. Copyright 1991 International Business Machines Corporation. Reprinted with permission from IBM Systems Journal .

graphics/09fig14.gif

This analysis shows that the PTR arrival and backlog processes at the end of the development cycle are predictable with fairly good accuracy. Both our models are sufficiently strong, explaining about 95% of the total variation of the dependent variables. Both series of projections were borne out amazingly well, and were within one week in estimating the time of meeting the criteria levels.

This approach can be used in similar situations where projections for future dates are needed. It is especially useful when analytical models are not applicable . For the projections to be accurate, however, it requires a fairly large number of data points and the data collected must pass the last inflection point of the process. Another key is to capture significant variables in the model in order to obtain the highest R 2 possible. After the initial model is derived, updates should be done when new data points become available. It is advisable to attempt different projection scenarios based on differing assumptions, thereby giving a broader perspective for the assessment.

At the beginning of a process when few data points are available, analytical models or models based on experience can be derived for management purposes. When sufficient data are available, the best model can be determined based on good-ness-of-fit tests. Combined with graphic techniques, the modeling approach is a very useful tool for software project management.

Unlike other models discussed, the PTR arrival and backlog projection models are really a modeling approach rather than a specific model. Statistical expertise, modeling experience, and a thorough understanding of the data are necessary in order to deal with issues pertaining to model assumptions, variables specification, and final model selection. A desirable outcome often depends on the model's R 2 and on the validity of the assumptions.

What Is Software Quality?

Software Development Process Models

Fundamentals of Measurement Theory

Software Quality Metrics Overview

Applying the Seven Basic Quality Tools in Software Development

Defect Removal Effectiveness

The Rayleigh Model

Exponential Distribution and Reliability Growth Models

Quality Management Models

In-Process Metrics for Software Testing

Complexity Metrics and Models

Metrics and Lessons Learned for Object-Oriented Projects

Availability Metrics

Measuring and Analyzing Customer Satisfaction

Conducting In-Process Quality Assessments

Conducting Software Project Assessments

Dos and Donts of Software Process Improvement

Using Function Point Metrics to Measure Software Process Improvements

Concluding Remarks

A Project Assessment Questionnaire

A Project Assessment Questionnaire