6.5 Principal Components Analysis as a Validation Tool

It is very easy to misuse software metrics if we do not understand just exactly what they are measuring. There have been some ill-considered attempts to design software systems based on misleading information derived from metrics. The most notable of these attempts relates to the use of McCabe's measure of cyclomatic complexity V(g) = Edges - Nodes + 2. This metric is a very good example of an inappropriately derived metric. It is a linear composite of the two metric primitives of Edges and Nodes, attributes of the control flowgraph representation of a program module. We would strongly suspect that there is little or no information in V(g) above and beyond the metric primitives. We will see that this is exactly the case.

For some unknown reason, magic values of cyclomatic complexity are now being incorporated into the requirements specifications of some software systems. For example, we might choose to specify that no program module in the software system should have a cyclomatic complexity greater than an arbitrary value of, say, 15. This is a very good example of how software measures might well be misused in the design process. If we base software development decisions on a particular metric, then it really behooves us to validate that metric first.

There are potentially catastrophic consequences associated with this univariate design criterion. First, there is little or no empirical evidence to suggest that a module whose cyclomatic complexity is greater than 15 is materially worse than one whose cyclomatic complexity is 14. Second, and most important, is the fact that if, in the process of designing a software module, we find that the module has a cyclomatic complexity greater than 15, the most obvious and common solution to the problem is to divide the software module into two distinct modules. Now we will certainly have two modules whose cyclomatic complexity is less than 15. The difficulty here is that instead of one program module, we have created two, or possibly three, in its place. This will increase the macro complexity of measures related to complexity. That is, we have decreased cyclomatic complexity but we have increased coupling complexity. The result of the ignorant decision may well be that the total system complexity will increase. This, in turn, will probably lead to a concomitant increase in total faults.

Now let us take the first step in the validation process and see whether the V(g) metric does, in fact, contribute new information to our understanding of the complexity of a software module. For V(g) to contribute new information to our understanding of software complexity, we would expect that a new source of variation directly attributable to this metric would be visible to us when we perform the PCA on the data containing the new metric. This analysis is shown in Exhibit 9.

Exhibit 9: An Analysis of Cyclomatic Complexity

Metric	Size	Control
η₁	0.67	0.35
η₂	0.87	0.35
N₁	0.92	0.29
N₂	0.93	0.28
Exec	0.91	0.35
LOC	0.82	0.38
Nodes	0.47	0.83
Edges	0.45	0.84
V(g)	0.38	0.85
Paths	0.37	0.65
Cycles	0.12	0.84
Maxpath	0.37	0.90
Avepath	0.35	0.90

From the attribute domain model, we can see that there are many distinct software complexity domains. If we make design decisions on the basis of incomplete measurements, we run the risk of creating bad designs. Design decisions may be made that will reduce a measurement in one domain but this may, in fact, cause a concomitant increase in measures in other complexity domains. The net result of this univariate decision is that the net complexity of the total software system may rise. While it is true that we may have fewer faults associated with aspects of control complexity, we run the risk of increasing the count of faults associated with coupling considerations.

We now turn our attention to the Halstead software science metrics. Essentially, all of the Halstead metrics were derived from the metric primitives of η₁, η₂, N₁, and N₂. It should be clear by now that program vocabulary (η = η₁ + η₂) and program length (N = N₁ + N₂) do not represent new sources of variation. When we augment our original 12 metrics with these two new metrics and perform the PCA for the new set of 14 metrics, we can clearly see that the new metrics vary in precisely the same way as the metric primitives from which they were derived. The results of this analysis is shown in Exhibit 10.

Exhibit 10: Halstead Vocabulary and Program Length

Metric	Size	Control
η₁	0.66	0.38
η₂	0.87	0.36
η	0.87	0.38
N₁	0.93	0.29
N₂	0.93	0.28
N	0.93	0.28
Exec	0.91	0.34
LOC	0.80	0.39
Nodes	0.47	0.81
Edges	0.45	0.82
Paths	0.35	0.67
Cycles	0.13	0.85
Maxpath	0.37	0.91
Avepath	0.34	0.91

It should not be necessary to revisit the rest of the Halstead software science metrics. As derivatives of the four metric primitives, we cannot expect them to disclose new sources of variation. As pointed out earlier in this chapter, the exceptions will be the measures of effort that are based on the work of Stroud. The Halstead metrics based on Stroud's work do, in some cases, identify new sources of variation. They will, however, fail to meet our additional requirements for metric validity in that they do not provide predictive validity for our criterion measure of software faults.