To model software reliability, the following process or similar procedures should be used.
To illustrate the modeling process with actual data, the following sections give step-by-step details on the example shown in Figures 8.1 and 8.2. Table 8.1 shows the weekly defect rate data.
Step 1
The data were weekly defect data from the system test, the final phase of the development process. During the test process the software was under formal change con-trolany defects found are tracked by the electronic problem tracking reports (PTR) and any change to the code must be done through the PTR process, which is enforced by the development support system. Therefore, the data were reliable. The density plot and cumulative plot of the data are shown in Figures 8.1 and 8.2 (ignore temporarily the fitted curves).
Table 8.1. Weekly Defect Arrival Rates and Cumulative Rates
Week |
Defects/KLOC Arrival |
Defects/KLOC Cumulative |
---|---|---|
1 |
.353 |
.353 |
2 |
.436 |
.789 |
3 |
.415 |
1.204 |
4 |
.351 |
1.555 |
5 |
.380 |
1.935 |
6 |
.366 |
2.301 |
7 |
.308 |
2.609 |
8 |
.254 |
2.863 |
9 |
.192 |
3.055 |
10 |
.219 |
3.274 |
11 |
.202 |
3.476 |
12 |
.180 |
3.656 |
13 |
.182 |
3.838 |
14 |
.110 |
3.948 |
15 |
.155 |
4.103 |
16 |
.145 |
4.248 |
17 |
.221 |
4.469 |
18 |
.095 |
4.564 |
19 |
.140 |
4.704 |
20 |
.126 |
4.830 |
Step 2
The data indicated an overall decreasing trend (of course, with some noises), therefore the exponential model was chosen. For other products, we had used the delayed S and inflection S models. Also, the assumption of the S models, specifically the delayed reporting of failures due to problem determination and the mutual dependence of defects, seems to describe the development process correctly. However, from the trend of the data we did not observe an increase-then-decrease pattern, so we chose the exponential model. We did try the S models for goodness of fit, but they were not as good as the exponential model in this case.
Step 3
We used two methods for model estimation. In the first method, we used an SAS program similar to the one shown in Figure 7.5 in Chapter 7, which used a nonlinear regression approach based on the DUD algorithm (Ralston and Jennrich, 1978). The second method relies on the Software Error Tracking Tool (SETT) software developed by Falcetano and Caruso at IBM Kingston (Falcetano and Caruso, 1988). SETT implemented the exponential model and the two S models via the Marquardt non-linear least-squares algorithm. Results of the two methods are very close. From the DUD nonlinear regression methods, we obtained the following values for the two parameters K and l .
K = 6.597
l =0.0712
The asymptotic 95% confidence intervals for the two parameters are:
Lower |
Upper |
|
---|---|---|
K |
5.643 |
7.552 |
l |
0.0553 |
0.0871 |
Step 4
By fitting the estimated parameters from step 3 into the exponential distribution, we obtained the following specified model
where t is the week number since the start of system test.
Step 5
We conducted the Kolmogorov-Smirnov goodness-of-fit test (Rohatgi, 1976) between the observed number of defects and the expected number of defects from the model in step 4. The Kolmogorov-Smirnov test is recommended for goodness-of-fit testing for software reliability models (Goel, 1985). The test statistic is as follows :
where n is sample size , F *( x ) is the normalized observed cumulative distribution at each time point (normalized means the total is 1), and F ( x ) is the expected cumulative distribution at each time point, based on the model. In other words, the statistic compares the normalized cumulative distributions of the observed rates and the expected rates from the model at each point, then takes the absolute difference. If the maximum difference, D ( n ), is less than the established criteria, then the model fits the data adequately.
Table 8.2 shows the calculation of the test. Column (A) is the third column in Table 8.1. Column (B) is the cumulative defect rate from the model. The F *( x ) and F ( x ) columns are the normalization of columns (A) and (B), respectively. The maximum of the last column, F *( x ) - F ( x ), is .02329. The Kolmogorov-Smirnov test statistic for n = 20, and p value = .05 is .294 (Rohatgi, 1976, p. 661, Table 7). Because the D ( n ) value for our model is .02329, which is less than .294, the test indicates that the model is adequate.
Table 8.2. Weekly Defect Arrival Rates and Cumulative Rates
Week |
Observed Defects/KLOC Model Defects/KLOC Cumulative (A) |
Cumulative (B) |
F *( x ) |
F ( x ) |
F *( x ) - F ( x ) |
---|---|---|---|---|---|
1 |
.353 |
.437 |
.07314 |
.09050 |
.01736 |
2 |
.789 |
.845 |
.16339 |
.17479 |
.01140 |
3 |
1.204 |
1.224 |
.24936 |
.25338 |
.00392 |
4 |
1.555 |
1.577 |
.32207 |
.32638 |
.00438 |
5 |
1.935 |
1.906 |
.40076 |
.39446 |
.00630 |
6 |
2.301 |
2.213 |
.47647 |
.45786 |
.01861 |
7 |
2.609 |
2.498 |
.54020 |
.51691 |
.02329 |
8 |
2.863 |
2.764 |
.59281 |
.57190 |
.02091 |
9 |
3.055 |
3.011 |
.63259 |
.62311 |
.00948 |
10 |
3.274 |
3.242 |
.67793 |
.67080 |
.00713 |
11 |
3.476 |
3.456 |
.71984 |
.71522 |
.00462 |
12 |
3.656 |
3.656 |
.75706 |
.75658 |
.00048 |
13 |
3.838 |
3.842 |
.79470 |
.79510 |
.00040 |
14 |
3.948 |
4.016 |
.81737 |
.83098 |
.01361 |
15 |
4.103 |
4.177 |
.84944 |
.86438 |
.01494 |
16 |
4.248 |
4.327 |
.87938 |
.89550 |
.01612 |
17 |
4.469 |
4.467 |
.92515 |
.92448 |
.00067 |
18 |
4.564 |
4.598 |
.94482 |
.95146 |
.00664 |
19 |
4.704 |
4.719 |
.97391 |
.97659 |
.00268 |
20 |
4.830 |
4.832 |
1.00000 |
1.00000 |
.00000 |
D ( n ) = .02329 |
Step 6
We calculated the projected number of defects for the four years following completion of system test. The projection from this model was very close to the estimate from the Rayleigh model and to the actual field defect data.
At IBM Rochester we have been using the reliability modeling techniques for estimating the defect level of software products for some years. We found the Rayleigh, the exponential, and the two S-type models to have good applicability to AS/400's process and data. We also rely on cross-model reliability to assess the reasonableness of the estimates. Furthermore, historical data are used for model calibration and for adjustment of the estimates. Actual field defect data confirmed the predictive validity of this approach; the differences between actual numbers and estimates are small.
What Is Software Quality?
Software Development Process Models
Fundamentals of Measurement Theory
Software Quality Metrics Overview
Applying the Seven Basic Quality Tools in Software Development
Defect Removal Effectiveness
The Rayleigh Model
Exponential Distribution and Reliability Growth Models
Quality Management Models
In-Process Metrics for Software Testing
Complexity Metrics and Models
Metrics and Lessons Learned for Object-Oriented Projects
Availability Metrics
Measuring and Analyzing Customer Satisfaction
Conducting In-Process Quality Assessments
Conducting Software Project Assessments
Dos and Donts of Software Process Improvement
Using Function Point Metrics to Measure Software Process Improvements
Concluding Remarks
A Project Assessment Questionnaire