The control chart is a powerful tool for achieving statistical process control (SPC). However, in software development it is difficult to use control charts in the formal SPC manner. It is a formidable task, if not impossible , to define the process capability of a software development process. In production environments, process capability is the inherent variation of the process in relation to the specification limits. The smaller the process variation, the better the process's capability. Defective parts are parts that are produced with values of parameters outside the specification limits. Therefore, direct relationships exist among specifications, process control limits, process variations, and product quality. The smaller the process variations, the better the product quality will be. Such direct correlations , however, do not exist or at least have not been established in the software development environment.
In statistical terms, process capability is defined:
where USL and LSL are the upper and lower engineering specification limits, respectively, sigma is the standard deviation of the process, and 6 sigma represents the overall process variation.
If a unilateral specification is affixed to some characteristics, the capability index may be defined:
where u is the process mean, or
In manufacturing environments where many parts are produced daily, process variation and process capability can be calculated in statistical terms and control charts can be used on a real-time basis. Software differs from manufacturing in several aspects and such differences make it very difficult, if not impossible, to arrive at useful estimates of the process capability of a software development organization. The difficulties include:
Despite these issues, control charts are useful for software process improvement ” when they are used in a relaxed manner. That means that control chart use in software is not in terms of formal statistical process control and process capability. Rather, they are used as tools for improving consistency and stability. On many occasions, they are not used on a real-time basis for ongoing operations. They are more appropriately called pseudo-control charts.
There are many types of control chart. The most common are the X -bar and S charts for sample averages and standard deviations, and the X -bar and R charts for sample averages and sample ranges. There are also median charts, charts for individuals, the p chart for proportion nonconforming , the np chart for number nonconforming, the c chart for number of nonconformities , the u chart for nonconformities per unit, and so forth. For X -bar and S charts or X -bar and R charts, the assumption of the statistical distribution of the quality characteristic is the normal distribution. For the p and the np charts, the assumption of statistical distribution is the binomial distribution. For the c and the u charts, it is assumed that the distribution of the quality characteristic is the Poisson distribution. For details, see a text in statistical quality control (e.g., Montgomery (1985)).
The most approximate charts for software applications are perhaps the p chart, when percentages are involved, and the u chart, when defect rates are used. The control limits are calculated as the value of the parameter of interest ( X -bar or p , for example) plus/minus three standard deviations. One can also increase the sensitivity of the chart by adding a pair of warning limits, which are normally calculated as the value of the parameter plus/minus two standard deviations. As the calculation of standard deviations differs among types of parameters, the formulas for control limits (and warning limits) also differ .
For example, control limits for defect rates ( u chart) can be calculated as follows :
where , value for the center line, is the cumulative defect rate (weighted average of defect rates) across the subgroups, and n i is the size of subgroup i for the calculation of defect rate (e.g., the number of lines of source code or the number of function points). Usually the subgroups used as the unit for calculating and controlling defect rates could be program modules, components, design review sessions of similar length in time, design segments, code segments for inspections, and units of document reviews. Note that in the formula, n i is the subgroup size and therefore the control limits are calculated for each sample. Therefore the control limits will be different for each data point (subgroup) in the control chart. The second approach is to base the control chart on an average sample size, resulting in an approximate set of control limits. This requires the assumption that future sample size (subgroup size) will not differ greatly from those previously observed . If this approach is used, the control limits will be constant and the resulting control chart will not look as complex as the control chart with variable limits (Montgomery, 1985). However, if the sample sizes vary greatly, the first approach should be used.
Control limits for percentages (e.g., effectiveness metric) can be calculated as follows:
where , the center line, is the weighted average of individual percentages and n i is the size of subgroup i . Like the m chart, either the approach for variable control limits or the approach for constant control limits (provided the sample sizes don't vary greatly) can be used. If the true value of p is known, or is specified by management (e.g., a specific target of defect removal effectiveness), then p should be used in the formulas, instead of .
Some examples of metrics from the software development process can be control charted, for instance, inspection defects per thousand lines of source code (KLOC) or function point, testing defects per KLOC or function point, phase effectiveness, and defect backlog management index (as discussed in Chapter 4). Figure 5.12 shows a pseudo-control chart on testing defects per KLOC by component for a project at IBM Rochester, from which error-prone components were identified for further in-depth analysis and actions. In this case, the use of the control chart involved more than one iteration. In the first iteration, components with defect rates outside the control limits (particularly high) were identified. (It should be noted that in this example the control chart is one-sided with only the upper control limit.)
Figure 5.12. Pseudo-Control Chart of Test Defect Rate ”First Iteration
In the second iteration, the previously identified error-prone components were removed and the data were plotted again, with a new control limit (Figure 5.13). This process of "peeling the onion" permitted the identification of the next set of potentially defect-prone components, some of which may have been masked on the initial charts. This process can continue for a few iterations. Priority of improvement actions as they relate to available resources can also be determined based on the order of iteration in which problem components are identified (Craddock, 1988). At each iteration, the out-of-control points should be removed from the analysis only when their causes have been understood and plans put in place to prevent their recurrence .
Figure 5.13. Pseudo-Control Chart of Test Defect Rate ”Second Iteration
Another example, also from IBM Rochester, is charting the inspection effectiveness by area for the several phases of reviews and inspections, as shown in Figure 5.14. Effectiveness is a relative measure in percentage, with the numerator being the number of defects removed in a development phase and the denominator the total number of defects found in that phase, plus defects found later (for detailed discussion on this subject, see Chapter 6). In the figure, each data point represents the inspection effectiveness of a functional development area. The four panels represent high-level design review (I0), low-level design review (I1), code inspection (I2), and overall effectiveness combining all three phases (lower right). Areas with low effectiveness (below the warning and control limits) as well as those with the highest effectiveness were studied and contributing factors identified. As a result of this control charting and subsequent work, the consistency of the inspection effectiveness across the functional areas was improved.
Figure 5.14. Pseudo-Control Chart of Inspection Effectiveness
In recent years , control charts in software applications have attracted attention. The importance of using quantitative metrics in managing software development is certainly more recognized now than previously. A related reason may be the promotion of quantitative management by the capability maturity model (CMM) of the Software Engineering Institute (SEI) at the Carnegie Mellon University. The concept and terminology of control charts are very appealing to software process improvement professionals. A quick survey of the examples of control chart applications in software in the literature, however, supported and confirmed the challenges discussed earlier. For instance, many of the control limits in the examples were too wide to be useful. For such cases, simple run charts with common sense for decision making would be more useful and control charts might not be needed. There were also cases with a one-sided control limit or a lower control limit close to zero. Both types of cases were likely due to problems related to multiple common causes and sample size. The multiple common cause challenge was discussed earlier. With regard to sample size, again, a production environment with ongoing operations is more able to meet the challenge. The subgroup sample size can be chosen according to statistical considerations in a production environment, such as specifying a sample large enough to ensure a positive lower control limit. In software environments, however, other factors often prohibit operations that are based on statistical considerations. At the same time, it is positive that experts have recognized the problems, begun identifying the specific issues, started the discussions, and embarked on the process of mapping possible solutions (e.g., Layman et al., 2002).
To make control charts more applicable and acceptable in the software environment, a high degree of ingenuity is required. Focused effort in the following three areas by experts of control charts and by software process improvement practitioners will yield fruitful results:
In general, data from software maintenance is easier for control charting because it meets the basic assumption of time-related sequential data. For the problem backlog example, even for software maintenance data (i.e., field problem backlog), we recommend using a metric in which the effect of a possible second common cause (such as the cyclical pattern of problem arrivals due to the delivery of new products to the customers) is partialled out. (Refer to the backlog management index discussed in section 4.3.1 in Chapter 4.)
As another hypothetical example, we suggest that metrics related to defect removal effectiveness (see discussions in Chapter 6) are candidates for control charting for software development organizations that deliver a number of products or releases of products within a relatively short period of time. In this case, each product or release is a data point in the control chart. The data is still time related and sequential but the data points are farther apart in time so one could call such charts macro-level pseudo-control charts. It is established in the software engineering literature that the higher the defect removal effectiveness, the better field quality a product will have. With a number of products or releases in the field, one can even establish an empirical correlation between the defect removal effectiveness values and actual field quality levels (use nonparametric statistics if sample size is small). The results can be used to reset the center line of the control chart. The process capability of the organization can then be measured directly and expressed in SPC languages. When the process is under control, it means that the organization is able to keep delivering products that meet certain quality levels in the field. If a software development organization developed five products and provided two releases of each product each year, in one year there would be ten data points. Therefore, it would not take long to form such a control chart. For more data points and more granular control, the unit of observation can be applied to development teams so a given project will have a number of data points. In addition to the overall defect removal effectiveness, this approach can be applied to the specific effectiveness metrics such as inspection effectiveness and test effectiveness.
As a real-life example, Lipke (2002) applied the control chart techniques successfully to two indicators in project management, based on empirical data at the Oklahoma City Air Logistics Center. The two indicators are schedule performance index (SPI) and cost performance index (CPI), which are expressed in earned value terminology in the project management literature. Simply put, the project schedule or cost is on target when the index is 1, ahead of plan when the index is higher than 1, behind plan when the index is below 1. Such control charts are meaningful because when the project is under way, as long as the two indexes are under control, the final outcome will be successful ”in this case, schedule-wise and cost-wise. Lipke also made adjustments to the indexes so that the assumptions of control charts were met.