Collecting measurements | Practical Guide to Software Quality Management (Artech House Computing Library)

5.5 Collecting measurements

Once an organization has decided which metrics to use, it can turn its attention to collecting the necessary measures. From the SQS's point of view, most measures will be related to defects.

5.5.1 Classification of defects

As defects are detected, analyzed, and corrected, much data are available that is of use to the software quality assurance practitioner. Classification of defects aids in the use of defect data to guide defect resolution now and to identify software development process weaknesses or predict problem areas in the future. This is the connection or bridge between software quality control-finding and fixing defects-and software quality assurance-analyzing and improving the development process. Defects can and do occur in any phase of the SLC. The data gathered with respect to defect classification can direct additional testing of software, point out inherent anomalies in requirements or design, call attention to needs for enhancements to operational software, and give guidance in the correction of current defects.

Defects can be classified according to their various basic characteristics (see Figure 5.1), which should include at least the following:

Severity of the defect if encountered in operation;
Priority of immediate repair;
Source (life-cycle phase) of the defect;
Type of defect;
Phase (life-cycle phase) in which the defect was found;
Method by which the defect was found;
Estimated and actual costs to repair the defect.

The severity of a defect is an indication of the impact of not fixing it immediately. A defect that presents a life-threatening situation or could result in the loss of property if not fixed is a very severe defect indeed. On the other hand, some defects may result in a wrong answer from a calculation but do not, for instance, hold up further testing until they are corrected. Such defects would be fairly nonsevere until they began to impact the test program itself. (This shows that some factors are a function of situation as well as immediate impact.)

Related to, and sometimes dependent on, the severity of a defect is the repair priority that is assigned to it. Usually a life-threatening defect will be addressed immediately, and a noninterfering defect will be addressed when there is time. This, of course, is not a hard and fast rule. There will be occasions in which a severe defect can be isolated so that work can continue in other areas. Some defects may be of such complexity or wide-reaching effect that they cannot be repaired without extended study or serious impact on resources. These defects may be addressed immediately but require a solution that is a long time in coming. Recognition that work can continue while the defect in question is being solved can give it a lower priority. Other factors may affect the priority as well, not the least of which is visibility. A relatively minor screen format defect may become a top priority defect if it is in a highly visible demonstration that will affect the future funding of a software project.

A piece of classification data that is often overlooked is the source, or genesis, of the defect. This is an indication of where the original error was made and where the defect entered the product. It also points to areas in the SDLC that may profit from increased attention by software quality practitioners. When later correlation of defects shows a high concentration of defects that can be traced back to the requirements, it is probably wise to spend more effort in the generation and review of requirements in future projects. Likewise, a preponderance of coding errors may indicate the need for better programmer training. By looking at the data collected on multiple projects, the quality assurance practitioner can suggest changes to management that affect the software development process. New data from projects begun after the process changes have been made can provide information of the effectiveness of these modifications.

The type of defect encountered is one indication of weakness in design, implementation, or even support software. I/O defects involve the transfer of data into or out from the object, module, or other part of the software system. These transfers may be internal to the system or external to the software, as with a key entry or printer action. When seen frequently, I/O-type defects may suggest an operating system that is difficult to use. Arithmetic defects are problems in computations that may indicate weak support routines or less-than-desirable coding practices. Arithmetic defects are also caused by incorrect requirements, such as specifying an equation incorrectly. Control defects occur primarily in decisions within the software. Indexed loops, wrong exits from decision points within the software, and improper transfers of control between objects or modules are examples of the control type of defect. Control defects are often indicative of design or requirements deficiencies.

Two additional characteristics are less defects-based and more indicative of the benefit of the detection techniques being used. The phase in which defects are found, that is, in what part of the SLC, can be compared with the source to evaluate the various review methods being used. Capturing the method by which defects are found permits direct comparisons of the efficiency of the different methods and can also indicate which defect detection methods are more successful against various types and sources of defects.

Finally, the estimated and actual costs to repair lead to evaluations of the estimation techniques employed and can be useful in calculating the COQ (see Section 5.4.2.3).

5.5.2 Other defect measures

Certainly not all the measures will be restricted to defect classifications. Countless other defect-related measures can be made. The following list is not intended to be complete, but rather to suggest some potentially useful measures:

Number of defects;
Defect frequencies;
STRs open and resolved;
Time between defect detections;
Defects resulting from correction of previous defects;
Size of change;
Incorrect defect reports (incorrect information on STRs).

The number and frequencies of defects can be used to detect defect-prone products or processes. These measures are usually taken with reference to specific parts of the system or its documentation. Once the system is in production, these counts may be used to monitor the system's maturity or its need for maintenance. Modules or documents with higher-than-average defect counts may need redesigning or rewriting. In addition, high defect counts or frequencies in a particular product may require a company to redeploy its defect detection efforts.

Defects tend to clump. A quality control adage is that if a defect is found, look in the same area for more. Since defect detection resources are always limited, this adage can give an organization clues as to where to concentrate quality control activities. High counts or frequencies spread more or less evenly across products may indicate a development process problem. The quality assurance practitioner should always be alert to process flaws that may be indicated by inordinate defect experience.

Open and resolved STR counts can be used to determine defect detection and correction productivity, identify poor defect analysis and isolation methods, detect flawed defect correction techniques, and so on. The number of resolved STRs can be compared to the number of newly opened, or still open, STRs to monitor correction activities.

The time between defect detections, either directly indicated by date and time or via mean time to failure can be used in several ways. Longer times may indicate a reduced defect level or reduced defect detection success or effort. Stable, or shorter, times might indicate the addition of defects during the making of modifications or of increased defect detection efforts or improved detection processes.

Defects resulting from the resolution of other defects are known to be frequent. This measure will aid in the identification of poor defect resolution processes or insufficient quality control of software modifications.

The size of the change is often one of the comparative measures used to develop various metrics. In combination with other measures, size can be a normalizing factor. Do not compare data from small, short projects with data from large or long schedule projects. Such comparisons are often invalid and can lead to erroneous conclusions. For example, if two projects both have 10 STRs opened per day, one might presume that the defect levels were about equal. However, when it is discovered that the first project is only a 3-month project involving two people and the second is a 3-year project with 25 participants, a rather different conclusion about their respective defect levels will likely be made.

Not all reported defects are defects. In some cases, the detected operation of the system is not wrong, just unexpected; for example, incorrect expected results in the test case or an inexperienced system user. In other cases, the data entered on the STR may be incorrect-(i.e., wrong module name, incorrect document reference, wrong version, and so on). The quality assurance practitioner will want to determine what causes the incorrect STRs. Training users or defect reporters may be necessary, or better user documentation might be the answer. In any case, it is not productive to try to correct defects based on incorrect reports.

5.5.3 Nondefect measures

Defect analysis depends on defect data, but defect data alone is not sufficient for most metrics. Nondefect data is usually the basis for product and process metrics. In some cases, it forms the whole metric, as noted in Section 5.4.2.

Some nondefect measures are readily available and are in hard numbers. These include project size, budget and schedule figures, clock and processor time, number of people involved in an activity, and the like. These measures can be taken directly, and no interpretation of them is usually needed.

For the software quality practitioner, some measures are not available in hard numbers but rely on quantification of subjective data. These soft measures include customer impressions, perceived quality on some subjective scale, estimates of quality, and so on. Soft measures should be used with care, for there is often no precise way to quantify or validate them.

Derived measures include those that cannot be determined through either hard or soft means. One such derived measure might be quality, which ranks as good (a soft measure) since 90 users out of 100 (a hard measure) do not return survey forms and, thus, must not be dissatisfied. Great care must be exercised with measures such as these. Only organizations with significant experience in metrics should consider using derived measures.