6.8 A Unitary Measure of Software Complexity

To simplify the structure of software complexity even further than the orthogonal domains produced by the principal component analysis, it would be useful if each of the program modules in a software system could be characterized by a single value representing some cumulative measure of complexity. The objective in the selection of such a linear function, g, is that it be related in some manner to software faults, either directly or inversely, such that g(x) = ax + b, where x is some unitary measure of program complexity. The more closely related x is to software faults, the more valuable the function g will be in the anticipation of software faults. Previous research has established that the fault index (FI) has properties that might be useful in this regard. The FI metric is a weighted sum of a set of uncorrelated attribute domain metrics. ^[5], ^[6] This metric represents each raw metric in proportion to the amount of unique variation contributed by that metric.

The FI of the factored program modules can be represented as follows:

where l_j is the eigenvalue associated with the j^th factor and d_ji is the j^th factor score of the i^th program module on the j^th domain. Each of the eigenvalues represents the relative contribution of its associated domain to the total variance explained by all of the domains. In essence then, the FI metric is a weighted sum of the individual domain metrics. In this context, the FI metric represents each raw complexity metric in proportion to the amount of unique variation contributed by that complexity metric.

The eigenvalues for the domain scores presented in Exhibit 17 are 5.29 and 4.77 for the size and control domains, respectively. If we weight each of the columns of Exhibit 17 by these two values, we will get a new vector of FI values for the 20 modules. These FI values are shown in Exhibit 18.

Exhibit 18: FI Values for the 20 PASS Metrics

Module	FI
1	19.03
2	-3.77
3	20.80
4	-3.02
5	7.77
6	11.62
7	-3.12
8	46.10
9	21.44
10	-0.93
11	-2.74
12	11.89
13	2.59
14	1.23
15	0.56
16	-0.02
17	-2.41
18	15.34
19	7.50
20	-0.29

The role of the fault index in software development is best understood in terms of this classification process. Through the use of the FI metric, individual programs and program modules can be arranged and grouped by this single measure. Complex programs are known to require a disproportionate amount of development effort and are also known to contain a disproportionate number of faults. The FI metric provides a simple mechanism of aggregating the many similar complexity metrics into one single metric that is a linear compound of the variance components of the set of metrics used to describe a program or a set of programs.

FI is not necessarily intended to represent the complete abstract complexity of a program. In fact, in a typical application, there may be several complexity domains that are systematically excluded from the computation of FI. FI is a stand-in for aspects of software quality that we cannot directly measure (e.g., software faults). FI is simply a surrogate for software faults.

A careful distinction must be made between statistical relationship and causality. There is a direct relationship between complexity measures; more specifically, the relative complexity metric and measures of program quality (i.e., the number of faults). This in itself does not imply that program complexity will cause program faults. Further, it is clear that the act of simplifying the structure of a program will not automatically decrease the number of faults in that program or the number of changes that will need to be made to it. Simple or complex, bad code is bad code. All things being equal with regard to programmer ability, FI is a good predictor of program modules of poor quality. The modules of high complexity, on examination, are generally found to contain labyrinthine control structures and bushy logic. In this sense, there is reason to believe that the use of complexity metrics to guide the preparation of test cases should contribute to the enhancement of the testing process.

Software systems are designed to implement each of their functionalities in one or more code modules. In some cases there is a direct correspondence between a particular program module and a particular functionality. That is, if the program is expressing that functionality, it will execute exclusively in the module in question. In most cases, however, there will not be this distinct traceability of functionality to modules; rather, the functionality will be expressed in many different code modules.

^[5]Munson, J.C. and Khoshgoftaar, T.M., Applications of a Relative Complexity Metric for Software Project Management, Journal of Systems and Software, 12, 283-291, 1990.

^[6]Munson, J.C. and Khoshgoftaar, T.M., The Relative Software Complexity Metric: A Validation Study, Proceedings of the Software Engineering 1990 Conference, Cambridge University Press, Cambridge, U.K., 1990, pp. 89-102.