5.3 Measures of Software Quality

The majority of software quality attributes relate to the operation of the code in terms of performance, availability, and security, to name but three. There are two aspects of software quality that do represent static attributes: software faults and software maintainability. We can measure maintainability but we cannot measure software faults. If we had a tool that could peruse a source code base and identify faults, the progress that could be made in software development would be astonishing. Sadly, we will only be able to recognize them when they cause problems or through intense scrutiny of individual source code modules.

5.3.1 Software Faults

Unfortunately, there has been no particular definition of just precisely what a software fault is, a problem that we intend to ameliorate. ^[18] In the face of this difficulty, it has rather difficult to develop meaningful associative models between faults and metrics. In other words, a fault is a physical characteristic of the system of which the type and extent can be measured using the same ideas used to measure the properties of more traditional physical systems. People making errors in their tasks introduce faults into a system. These errors can be errors of commission or errors of omission. There are, of course, differing etiologies for each fault. Some faults are attributable to errors in the specification of requirements; some faults are directly attributable to error committed in the design process; and finally, there are faults that are introduced directly into the source code. We will concern ourselves with source code faults in this chapter.

There are two major subdivisions of faults in our fault taxonomy: faults of commission and faults of omission. Faults of commission involve deliberate, albeit unwitting, implementation of a behavior that is not part of the specification or design. Faults of omission involve lapses wherein a behavior specified in the design was not implemented. It is important to make these distinctions, especially so the inspection protocol can be used as a checklist for specific faults that have been found in the past.

To count faults, there must be a well-defined method of identification that is repeatable, consistent, and identifies faults at the same level of granularity as our static source code measurements. In a careful examination of software faults over the years, we have observed that the overwhelming number of faults recorded as code faults are really design faults. Some software faults are really faults in the specification. The design implements the specification and the code implements the design. We must be very careful to distinguish among these fault categories.

There may be faults in the specification. The specification may not meet the customer's needs. If this problem first manifests itself in the code, it still is not a code fault. It is a fault in the program specification, or a specification fault. The software design may not implement the software requirements specification. Again, these design problems tend to manifest themselves during software testing. Any such design faults must be identified correctly as design faults. In a small proportion of faults, the problem is actually a code problem. In these isolated cases, the problem should be reported as a code fault.

We observed an example of this type of problem recently in a project on a large embedded software system. The program in question was supposed to interrogate a status register on a particular hardware subsystem for a particular bit setting. The code repeatedly misread this bit. This was reported as a software problem. What really happened was that the hardware engineers had implemented a hardware modification that shifted the position of the status bit in the status register. They had failed to notify the software developers of this material change in the hardware specification. The software system did exactly what it was supposed to do. It is just that this no longer met the hardware requirements. Yet the problem remains on record to this date as a software fault.

It is clear, then, that the etiology of the fault must be determined. It is the subject of this chapter to identify and enumerate faults that occur in source code. We ought to be able to do this mechanically; that is, it should be possible to develop a tool that can count the faults for us. Further, some program changes to fix faults are substantially larger than others. We would like our fault count to reflect that fact. If we have accidentally mistyped a relational operator such as "<" instead of ">", this is very different from having messed up an entire predicate clause from an if statement. The actual changes made to a code module are tracked for us in configuration control systems such as RCS or SCCS as code deltas. All we must learn to do is to classify the code deltas that we make as to the origin of the fix. That is, each change to each module should reflect a specific code fault fix, a design problem, or a specification problem. If we manifestly change any code module, give it a good overhaul, and fail to record each fault as we repaired it, we will pay the price in losing the ability to resolve faults for measurement purposes.

We will base our recognition and enumeration of software faults on the grammar of the language of the software system. Specifically, faults are to be found in statements, executable and nonexecutable. In the C programming language we will consider the following structures to be executable statements:

      <executable_statement> :: = <labeled_statement> |      <expression> |      <selection_statement> |      <iteration_statement> |      <jump_statement>

In very simple terms, these structures will cause our executable statements metric, Exec, count to change. If any of the tokens change that comprise the statement, then each of the change tokens will represent a contribution to a fault count.

Within the framework of nonexecutable statements there is:

      <declaration> :: = <declaration_specifiers> ;      | <declaration_specifiers> <init_declarator_list> ';'

We will find faults within these statements. The granularity of measurement for faults will be in terms of tokens that have changed. Thus, if I typed the following statement in C:

      a = b + c * d;

but I had meant to type:

      a = b + c/d;

then there is but one token that I got wrong. In this example, there are eight tokens in each statement. There is one token that has changed. There is one fault. This circumstance is very different when wholesale changes are made to the statement. Consider that this statement:

      a = b + c * d;

was changed to:

      a = b + (c * x) + sin(z);

We are going to assume, for the moment, that the second statement is a correct implementation of the design and that the first is not. This is clearly a coding error. (Generally, when changes of this magnitude occur, they are design problems.) In this case there are eight tokens in the first statement and fifteen tokens in the second statement. This is a fairly substantial change in the code. Our fault recording methodology should reflect the degree of the change. This is not an unreasonable or implausible notion. If we are driving our car and the car ceases to run, we will seek to identify the problem or have a mechanic do so for us. The mechanic will perform the necessary diagnostics to isolate the problem. The fan belt may have failed. That is a single problem and a simple one. The fan belt may have failed because the bearing on the idler pulley failed. We expect that the mechanic will isolate all the problems and itemize the failed items on our bill. How much information would we have if the mechanic simply reported that the engine broke? Most of us would feel that we would like to know just exactly what pieces of the engine had failed and were subsequently replaced. We expect this level of granularity in reporting engine problems. We should expect the same level of granularity of reporting on code fixes.

The important consideration with this fault measurement strategy is that there must be some indication as to the amount of code that has changed in resolving a problem in the code. We have regularly witnessed changes to tens or even hundreds of lines of code recorded as a single "bug" or fault. The only really good index of the degree of the change is the number of tokens that have changed to ameliorate the original problem. To simplify and disambiguate further discussion, consider the following definitions.

Definition: A fault is an invalid token or bag of tokens in the source code that will cause a failure when the compiled code that implements the source code token is executed.
Definition: A failure is the departure of a program from its specified functionalities.
Definition: A defect is an apparent anomaly in the program source code.

Each line of text in each version of the program can be seen as a bag of tokens. That is, there may be multiple tokens of the same kind on each line of the text. When a software developer changes a line of code in response to the detection of a fault, either through normal inspection, code review processes, or as a result of a failure event in a program module, the tokens on that line will change. New tokens may be added; invalid tokens may be removed; the sequence of tokens may be changed. Enumeration of faults under this definition is simple and straightforward. Most important of all, this process can be automated. Measurement of faults can be performed very precisely, which will eliminate the errors of observation introduced by existing ad hoc fault reporting schemes.

An example is useful to show this fault measurement process. Consider the following line of C code:

      (1) a = b + c;

There are six tokens on this line of code. They are B₁ = {<a>, < = >, <b>, <+>, <c>}, where B₁ is the bag representing this token sequence.

Now let us suppose that the design, in fact, required that the difference between b and c be computed; that is:

      (2) a = b - c;

There will again be six tokens in the new line of code. This will be the bag B₂ = {<a>, < = >, <b>, <->, <c>}. The bag difference is B₁ - B₂ = {<+>, <->}. The cardinality of B₁ and B₂ is the same. There are two tokens in the difference. Clearly, one token has changed from one version of the module to another. There is one fault.

Now suppose that the new problem introduced by the code in statement (2) is that the order of the operations is incorrect. It should read:

      (3) a= c - b;

The new bag for this new line of code will be B₃ = {<a>, < = >, <c>, <->, <b>}. The bag difference between (2) and (3) is B₂ - B₃ = { }. The cardinality of B₂ and B₃ is the same. This is a clear indication that the tokens are the same but the sequence has been changed. There is one fault representing the incorrect sequencing of tokens in the source code.

Now suppose that we are converging on the correct solution; however, our calculations are off by 1. The new line of code will look like this:

      (4) a = 1 + c - b;

This will yield a new bag B₃ = {<a>, < = >, <1>, <+>, <c>, <->, <b>}. The bag difference between (3) and (4) is B₃ - B₄ = {<1>, <+>}. The cardinality of B₃ is 6 and the cardinality of B₄ is 8. Clearly, there are two new tokens. By definition, there are two new faults.

It is possible that a change will span multiple lines of code. All of the tokens in all of the changed lines so spanned will be included in one bag. This will allow us to determine just how many tokens have changed in the one sequence.

The source code control system should be used as a vehicle for managing and monitoring the changes to code that are attributable to faults and to design modifications and enhancements. Changes to the code modules should be discrete. That is, multiple faults should not be fixed by one version of the code module. Each version of the module should represent exactly one enhancement or one defect.

We will take a simple example and trace the evolution of a source code program through three successive revisions in the UNIX RCS program. The sample program is from Exhibit 13 (repeated here with added line numbers for future reference).

Exhibit 13: RCS Header Information

1.4 date next	2005.02.01.22.17.38; 1.3;	author John Doe;
1.3 date next	2005.01.22.22.01.31; 1.2;	author John Doe;
1.2 date next	2005.01.20.21.54.08; 1.1;	author Sam Lee;
1.1 date next	2005.01.15.21.49.29;;	author Mary Roe;

      1 int Sum(int upper)      2 {      3   int sum = 0;      4   int index = 0;      5      6   label:      7     if(index < upper)      8       {      9         index++;      10        sum = sum + index;      11        goto label;      12      }      13    return sum;      14  }

The program above represents version 1.1 of the program. Successive updates to this will be 1.2, 1.3, etc. The RCS system will keep track of the version number, the date and time of the update, and the author of the RCS activity. An abridged version of the RCS module structure to record these data is shown in Exhibit 13.

The odd part of RCS is that the most recent version, in this case 1.4, is kept at the top of the list and the list is numbered chronologically backwards in time. Each version keeps a pointer to the next version in the table.

The actual changes to the source code at each version are shown in Exhibit 14. The RCS program will always keep the most recent version in the file. This is shown in the table entry beginning with, in this case, version 1.4. The second entry in the record for version 1.4 is an entry beginning with the word log and delimited by @s. This is the log comment introduced by the developer. In our proposed model, this log entry would begin with the word "fault" if the version increment were attributable to a fault fix, or the word "change" if it were attributable to a change in design or requirements. The initial log entry, version 1.1, is for neither a change nor a fault fix but is the title of the program.

Exhibit 14: RCS Text Information

 1.4 log @fault: fixed relational operator @ text @int Sum(int upper) {  int sum = 0;  int index = 0;  label:    if(index > upper)      {        index++;        sum = sum + index;        goto label;      }  update (index);  return sum; @ 1.3 log @fault: inserted call to update function @ text @d7 1 a7 1     if(index < = upper) @ 1.2 log @fault: found a problem with a relational operator @ text @d13 1 @ 1.1 log @Initial revision @ text @d7 1 a7 1     if(index < upper) @

Following the log entry is the text entry. In the case of RCS, the topmost text entry is the most recent version of the program. Each of the subsequent table entries show the changes that must be made to the most recent program to change it to a previous version. All changes are made, in RCS, by adding or deleting entire lines. Thus, to return to version 1.3 from version 1.4, the text part of record 1.3 tells us to go to line 7 (relative to 1) of the program and delete one line. That is what the line d7 1 tells us. The next text line says that we must add one line, a7 1, again at line 7. The text that must be added is on the following line. Thus, version 1.3 will look like this:

      1 int Sum(int upper)      2 {      3   int sum = 0;      4   int index = 0;      5      6   label:      7     if(index < = upper)      8       {      9         index++;      10        sum = sum + index;      11        goto label;      12      }      13    update (index);      14    return sum;      15  }

Line number 7 has been changed on version 1.3. Let

B₂ = {<if>, <(>, <index>, << = >, <upper>, <)>}

represent this bag of tokens. On version 1.4, the bag of tokens is:

B₁ = {<if>, <(>, <index>, <>>, <upper>, <)>}

The bag difference is B₂- B₁ = {<< = >, <>>}. The cardinality of B₂ is 6 and the cardinality of B₁ is 6. The cardinality of the bag difference is 2. Therefore, one token has changed and we will record one fault.

To return to version 1.2 from version 1.3, we see that we must delete line 13. All of the tokens on this line were placed there in remediation of a fault. The bag representing this line of tokens is:

B₃ = {<update>, <(>, <index>, <)>, <;>}

There are five tokens on this line. There was no former version of this line in version 1.2. Therefore, all of the tokens on this line were put into the program to fix a defect in the program. We will then record five faults for this fix.

Finally, to return to the initial version, 1.1, of the program, we must delete line 7 and add a new line represented by the bag

B₄ = {<if>, <(>, <index>, <<>, <upper>, <)>}

This is similar to the transition between versions 1.3 and 1.4. Only one token has changed. We will record one fault for this module version.

5.3.2 Software Maintainability

Our first task in learning to measure the maintainability of software will be to understand the attributes of maintainability that can be measured. It is clear, for example, that source code maintainability is directly related to the linkage between the source and a design. It is very difficult to fix something whose functionality you do not understand. The second attribute has to do with module coupling. If a module is called without arguments from one module, modifying this module will have little impact on the program as a whole. If, on the other hand, a module is called by many modules, calls many others, and has an extensive formal parameter list, then it is clearly woven tightly into the fabric of the program. Any changes to this module will have far-reaching consequences in the program operation. Thus, there are two significant aspects of maintainability that we wish to measure: (1) the ease of modification of the module, and (2) the impact of the module change on the program containing the module.

The principal criterion measure for maintainability will, of course, be the cost in staff resources of making changes to code. We would expect that our measures of maintainability would be directly related to the cost in human resources of making changes to systems.

5.3.2.1 Traceability.

Each statement in a programming language can be identified and counted by the set of counting rules established early for the executable statements metric Exec. We will have complete requirements traceability if and only if each source code statement can be mapped directly back to a design element. This will be possible, of course, only if there is an appropriate design database that is maintained as the code base changes. Again, the granularity of measurement will be the program module.

We will have complete requirements traceability if we can directly map each source code statement to a design database element. We will now define a Map metric to measure this mapping. For each module source code statement that is correctly mapped to a design element, we will increment the Map metric for that code module. A maintainable program, under this definition, will have the property that Exec = Map. The traceability of a system is then measured by the relationship between total executable statements and those executable statements that are traceable to design elements; that is:

      Traceability = Exec - Map

5.3.2.2 Coupling.

It is clear that the modules that are tightly bound to other program modules will be difficult to modify. There are two distinct attributes that must be considered in this binding. First there is the linkage of each program module to other program modules. We can measure this with the coupling metrics defined earlier. These metrics can easily be classified as measures of program maintainability. If a program module is called by one and only one program module, and calls only one program module, then the impact of any change to this module is probably local to a small number of modules. If the control structure integrating a particular module is tightly woven into the control fabric of other modules, as is the case with the object-oriented programming metaphor, then the impact of modifying a module may be very great.

The second attribute of program module binding is that of data binding between modules. We have accounted for some of this potential data binding in the measurement of data structures in the formal parameter list. The most insidious and dangerous aspect of data binding among program modules deals with global variables. We have seen hundreds of program trouble reports that deal specifically with changes made to global variables with far-reaching side effects in a host of other program modules. In essence, then, the maintainability of a program module is inversely proportional to the amount of data in global variables.

The global data structures attribute Global_DS will be the data structure metric applied to all data identifiers that are declared outside the scope of the current module. In the case of the C (or C++) programming language, if the data declaration for an identifier is not in the function module being measured, then the data structures complexity of this identifier will be used to increase the Global_DS metric for the module.

^[18]Munson, J.C. and Nikora, A.P., Towards a Quantifiable Definition of Software Faults, Proceedings of the 2002 IEEE International Symposium on Software Reliability Engineering, November 2002.