11.5 Fractional Measures

Up to this point in our discussion of testing, we have regarded a program module as atomic. That is, it was not decomposable. As our need for increased resolution in the test activity begins to be an important factor, we can actually look within a program module and study how the parts of the module interact whenever it is invoked. One thing is clear for most program modules. Not all of the code in the module will execute when it is invoked. If the module is lacking in cohesion, only a very small subset of the executable statements in the module will, in fact, execute.

One thing is quite clear, however. Whatever subset of code that does execute when a module is called is a complete and viable program module in its own right. Thus, a module lacking in cohesion may well be seen as three or more very distinct program modules, depending on how the module is invoked. Each functionality, then, will effectively extract a subset of executable statements from the modules that are invoked in that functionality. We can, in fact, determine, a posteriori, which code segments have executed and measure just these code segments. Thus, if a functionality were to execute, say, 12 executable statements out of a total of 48 executable statements in a module, we could measure just the source code attributes of the 12 statements that did execute. This subset of the code would give us a very different view of the program module and its fault liability. It may very well be that the very code constructs that might make a module potentially fault prone are strictly avoided by functionality. The fractional FI of the code executed by that module by a particular functionality will consist of measurements of source code attributes of the subset of source code executed.

11.5.1 The FI of the XC Modules

For the purposes of exploring the notions of fractional complexity and its impact on the test process, a small open-source program called XC was chosen for an investigation on the assessment of test activity on fractional code segments. ^[5] XC is an asynchronous communications program for the UNIX environment. It supports xmodem, ymodem, xmodem-g, ymodem-g, and cis_b+ internal file transfer protocols. Use of external zmodem is also supported with auto-triggering of downloads. XC is a text-mode or command line application written in the C programming language.

For this study, the UX metric tool was employed to measure each of the 15 modules of the XC program. A set of nine metrics was obtained from the UX metric measurement tool. These metrics are shown in Exhibit 7. This is a reduced set of metrics obtained from the measurement process. They constitute the set of primitive measures that we have found to be the most closely related to software faults. These nine metrics were obtained for each of the fifteen program modules.

Exhibit 7: Metrics and Their Definitions

η₁	Halstead's count of the number of unique operators
η₂	Halstead's count of the number of unique operands
N₁	Halstead's count of the total number of operators
N₂	Halstead's count of the total number of operands
Stmt	Count of total noncommented source statements
LOC	Count of the total number of noncommented lines of code
Blks	Count of the total number of program blocks
Span	Average span of a variable reference
V(g)	Nodes - Edges + 2

While this reduction in the number of metrics has simplified the problem somewhat, we will further simplify the metric values for each module by computing their individual FI values, ρ_i. This static measure of FI for the 15 program modules, ρ, is shown in Exhibit 8.

Exhibit 8: FI Values for XC Modules

Module	7	1	6	4	9	3	8	13	15	14	11	10	12	2	5
ρ	119	118	112	104	102	102	101	101	94	93	93	92	90	90	89

In its capacity as a fault surrogate, the program modules shown in Exhibit 8 are ordered from the most complex to the least complex. We can see from this arrangement that modules 7, 1, and 6 have disproportionately high values of the FI. Previous validation studies support the conclusion that these are the modules that have the greatest potential for faults. ^[6] However, when we get to executing the actual modules, we will discover that not all of module 7, for example, ever executes when the module receives control. Some statements will execute and others will not. The design of module 6, on the other hand, may result in all of the code in this module being executed every time it is entered. Thus, there is a dynamic aspect to the complexity of the module. The dynamic complexity of a module, like module 7, may be much less than its static complexity.

It can be seen from Exhibit 8 that there is a great deal of variability in the complexity of these 15 modules. For example, module 7 has an FI of 119, two standard deviations above the mean. Conversely, module 5 has an FI of 89, one standard deviation below the mean. From these static measurements, and the high correlation of FI with software faults, it can be surmised that there are likely to be more faults in module 7 than in module 5. However, this fact does not indicate the likelihood of the system failing. Module 7 rarely, if ever, executes during normal operation. Module 5, on the other hand, executes frequently. Because of this great difference in the exposure each of these two modules receives during execution, we may well observe more failures attributable to faults in module 5 than we do faults in module 7. To develop an understanding of this failure potential, we must develop a means of characterizing the functional behavior of a system in terms of how it distributes its activity across a set of modules. This requires knowing what the program does, its functionalities, and how these different functionalities allocate time to the modules that comprise the system.

11.5.2 Fractional Complexity

In the usual computation of static software complexity, the complexity of the entire module is computed for each program module. This static measure may be very misleading. As the code in each program module executes, only a small number of the possible paths through the module may execute, in which case the complexity of the functionality has been exaggerated by an amount equal to the complexity of the code that did not execute. Hence, it only seems reasonable in the computation of dynamic program complexity to assess the complexity of just that code that executed for each of the distinct functionalities. As a program executes each of the sets of its functions, it will select a subset of statements from each module. The code in this reduced statement set will be a complete and fully functional program in its own right. It will not have the functionality of the entire program in that only a subset of the source code has been selected by the particular functionality. An example of this can be seen in Exhibits 9, 10, and 11. In Exhibit 9 we have a code segment of C code. To compute the FI of this segment we will measure the entire segment. In all, there are seven lines of code, five statements, and seven total operands. When this code is executed, however, there are two outcomes based on the predicate clause. In Exhibit 9 we will assume that the variable i is greater than j when the code is executed. If we measure just the code segment that executed, there are now five lines of code, four statements, and six total operands. In Exhibit 11 we will assume that the predicate is false. Now there are just two lines of code, two statements, and three total operands. The complexity of this code segment depends on the outcome of the evaluation of the predicate clause.

Exhibit 9: C Code Segment 1

 if (i > j)    {     i = i + j;     printf (i);    }   else    printf (i);

Exhibit 10: C Code Segment 2

 if (i > j)    {     i = i + j;     printf (i);    }

Exhibit 11: C Code Segment 3

 if (i > j)    printf (i);

In the dynamic program complexity view, the complexity of a program module will vary in accordance with the code that is selected for execution. Under this view, the complexity of the system will vary by function as well. The metric values for a single program module can no longer be characterized by a single vector of values. In this new view there will be a set of vectors of metrics values, one vector for each functionality. As each functionality is executed, a different subset of code will be excised by the functionality. Thus, every time a new functionality is executed, we will re-measure that code and extract a new set of raw metrics for that subset.

A significant problem arises in the computation of the orthogonal metric sets and, subsequently, FI, in that many different subsets of code will be selected by the varying functionalities. A solution to this problem is to use the original transformation matrix that was calculated for the domain metrics for the entire program. If this is the case, the raw metrics of each module will be bounded above by the raw metric values of the entire module. The FI of any of the subsets of the module code will also be bounded above by the FI of the entire module. Thus, the baseline established for the initial computation of FI will also serve as the baseline for the computation of the complexity of the functional code subsets. As was the case for the model of evolving software discussed in Chapter 8, the baseline established for fractional code measurement will consist of the set of means and standard deviations obtained for the original raw metrics and the transformation matrix. We standardize the raw metrics of the functional code subsets with these means and standard deviations of the full program measures.

Each distinct test scenario to a program will cause a subset of the program and each of its modules to execute. The source code subset of the original program represented by the code subset of a module m_i will have its own FI as the functionality changes. To differentiate between the system FI and the FI of each of the functional code subsets, this new FI of the i^th module executing the j^th function will be denoted as . This fractional FI value will represent the FI of just the code that was executed when the j^th functionality was exercised.

From a measurement perspective, it is not always easy to extract the code segments that have executed when a particular functionality is exercised. Necessarily, the software will have to be instrumented to permit this introspection. In some languages, such as C, there are available code coverage tools, such as ATAC, capable of isolating code sequences associated with particular tests. ^[7] When these code segments are so isolated, they can then be measured by existing metric analysis tools.

To demonstrate the concept of the fractional complexity of functionalities with the software test process, the XC system was monitored and measured by a modified version of the ATAC tool. Our modified ATAC tool was used to identify the specific C source statements that had executed for each test activity. The source code elements that were thus identified by the modified ATAC tool were measured by the UX metric tool.

From the standpoint of the highest-level granularity of functionality, the XC tool consists of six basic functions. These functionalities are summarized in Exhibit 12. At any given epoch of the execution of a functionality, it will be executing only one of the modules involved in expressing that functionality. As each functionality executed, it would exercise a subset of program modules. For the six basic functionalities of XC, the corresponding module subsets are depicted in Exhibit 13.

Exhibit 12: Functionalities of XC

Function	Functionality Description
f₁	Routing output
f₂	Cross-referencing reserved words
f₃	Processing nested include files
f₄	Generate listing only
f₅	Extended debugging facility
f₆	Cross-referencing user-defined identifiers

Exhibit 13: Functional Module Subsets of XC

F	M_c	M_i	M_p
f₁	{m₁}	{m₃–m₁₂, m₁₄, m₁₅}	{m₂, m₁₃}
f₂	{m₁}	{m₃–m₁₂, m₁₄, m₁₅}	{m₂, m₁₃}
f₃	{m₁}	{m₃–m₈, m₁₁, m₁₄, m₁₅}	{m₂}
f₄	{m₁}	{m₃–m₈, m₁₁, m₁₄, m₁₅}	{m₂}
f₅	{m₁}		{m₂–m₁₅}
f₆	{m₁}	{m₃–m₁₂, m₁₄, m₁₅}	{m₂, m₁₃}

A total of 12 test suites were designed to test this basic set of functionalities. These test functions are show in Exhibit 14. In this table are the descriptions of each of the tests, together with the functionalities exercised by that test in the column labeled T × F. The 12 test suites were designed by the test staff to ensure 100 percent block coverage in the code during the test. The net effect of executing all the program blocks is that all of the program functionality would also be exercised. Because the stopping criterion for the test staff was related to the block coverage, many exception conditions had to be exercised. In most cases (see Tests 7 through 11), the execution of the code that raised the exception conditions only accessed a limited number of program functionalities.

Exhibit 14: Test Suites for the XC Program Functionalities

Test No.	Function	T × F
1	Enable the debug option with the -d switch.	f₅, f₆
2	Flip the -d switch in the case statement in main to exercise the debug option of this program with the output being written to a file with the -o option. It also tests for cross referencing of reserved words -r and emits a printer initialization string with the -e option.	f₁, f₂, f₅, f₆
3	Overflow the hash table in function put_token that is limited to 749 identifiers.	f₆
4	Raise the use_err in main if the command line arguments are less than two.	f₅, f₆
5	Generate a listing only with the -1 option and to enable file inclusion with -i.	f₃, f₄, f₆
6	Raise the use_err in main for an illegal file name option.	f₁
7	Raise the use_err in main when the second argument is a '-' instead of a filename.	f₅
8	Raise the use_err in main when -o option is used without giving a legal filename.	f₁
9	Raise the use_err in main if the -o option is invoked followed by another switch with no file name in between.	f₁
10	Raise the use_err in main if a switch is used without preceding it with a '-'.	f₅
11	Raise the use_err in main if an illegal switch is used.	f₅
12	Raise the use_err in proc_file if an illegal or nonexistent filename is given as the second argument.	f₅, f₆

In the case of Test 1, all of the modules, except for module 2, are executed. The execution results of the Test 1 suite are shown in Exhibit 15. The average FI of only the modules that were involved in this execution was 100.7, slightly above the average system FI of 100. If we add the FI values of the modules involved, and then divide by the total number of modules in the system, we get an assessment of the exposure to the complexity of the system that was manifested by this test. If all of the modules are executed in their entirety, then there will be 100 percent exposure to the complexity of the system. In the case of Test 1, this computation yields a value of 97 percent. This appears at face value to be a realistic test.

Exhibit 15: Functional Complexity for Test 1

Module	ρ	Frequency	Profile	φ
1	118	1	0.00002	0.002
2	90	0	0.0	0.0
3	102	1	0.00002	0.002
4	104	1323	0.02923	3.039
5	89	25369	0.56055	49.888
6	112	13785	0.30459	34.114
7	119	1319	0.02914	3.467
8	101	958	0.02116	2.137
9	102	150	0.00331	0.337
10	92	150	0.00331	0.304
11	93	254	0.00561	0.521
12	90	808	0.01785	1.606
13	101	1	0.00002	0.002
14	93	20	0.00044	0.040
15	94	1118	0.02470	2.322
Total	1500	45257	1.0	97.787

As mentioned earlier, the failure of a software system is related not only to the complexity of a module, but also to the amount of time spent executing that module. To get a better assessment of the effectiveness of a software test, we need to incorporate this concept of time. For this purpose, we will examine the number of transitions within the system. At this point we will concern ourselves only with the number of times a specific module is called during a particular execution of the system. The frequency with which a module is called will be divided by the total number of calls in the entire system in order to determine the probability that at a given epoch of the process, execution will be taking place within that module. ^[8] This execution profile for Test 1 is shown in Exhibit 15. It represents the proportion of time spent in each of the 15 modules by executing Test 1. We can see, for example, that modules 5 and 6 will receive a disproportionate amount of time in this test, whereas modules 1, 2, 3, and 13 will receive little or no exposure. Similarly, the most complex module, module 7, also gets little exposure in this test, yet it is the one most likely to contain faults. The functional complexity of each of the modules under Test 1 can be found in the column labeled φ in Exhibit 15. It is derived by multiplying the FI, ρ_i, of each module by its probability of execution, or profile, under Test 1. Effectively, the functional complexity of Test 1 is the expected value of FI for the modules that are exercised by Test 1 under the Test 1 execution profile. The frequency data shown in Exhibit 15 was obtained from the C profiler and were extracted for each of the test executions.

By taking into consideration the execution frequency of each of these modules, a more refined assessment of the complexity exercised by Test 1 can be obtained. Now, instead of the initial average module complexity of , we take into account the proportion of time spent in each of the modules of varying complexity. This will yield an expected value for fractional complexity of φ⁽¹⁾ = 97.787. The functional complexity of the test is less than the average system FI. The view of the system as established by this test shows fewer potential faults than the actual number of faults in the whole system. However, even this estimate is still not realistic. It is based on the naive assumption that every time a module is called, the entire module (and thus all of its complexity) is executed. By measuring only the code of a module that executes during a given test, we can better assess the true potential for a test to expose the software to its potential faults. The measurement technique of fractional complexity was developed for this purpose.

11.6.3 Operational Complexity

It is now clear that each function has an associated execution profile. This execution profile, once again, can be expected to change across the set of program functionalities. This will result in a concomitant variation in the operational complexity of the function. As was the case in the definition of functional complexity, each functionality f_j. has an execution profile represented by the probabilities ρ^(j). The operational complexity for the execution of each function f_i is the expected value of the fractional complexity under an execution profile as follows:

This is the fractional complexity analog of the earlier concept of functional complexity. Operational complexity has the virtue that it reflects the concomitant variation in both the fractional complexity of the code selected by the execution of a function and the execution profile of that function.

11.6.4 Fractional and Operational Complexity of XC Tests

Each of the 12 tests of the system, as shown in Exhibit 14, was run again, this time under the control of the modified ATAC software system. This would permit the source code that was executed for each test to be extracted. This reduced set of code for each program module was then submitted to the metric analyzer and the nine metrics were computed for each of the twelve functional tests. These raw complexity measures were then standardized using the mean and standard deviation values obtained previously from the full code set. Thus, the metric data from each of the tests was standardized relative to the baseline or total code set. These standardized metric values were then transformed to domain metric values using the original baseline transformation data. Fractional complexities ξ_i were computed using the eigenvalues from the baseline data. Again, the fractional complexities are simply the FI of the reduced code sets from each test.

The fractional complexities ξ_i for each module on each test are shown in Exhibits 16 and 17. The data in these two tables represent the same set of modules. Exhibit 17 is abbreviated in that for these tests, modules 4 through 15 simply were not exercised. Each test consists of the sets M_i and M_p for each of the test functionalities. In the case of the data shown in these exhibits, the set M_c clearly consists of only one program module, in this case, the main program. A large number of table entries are zero. This is because the associated program modules did not execute on the particular test and, thus, none of the complexity of that module was exercised. In some cases, it can be seen that there is a high degree of variability in the observed complexity of a module on each execution. In other cases, the complexity of a module, when it is called, is relatively fixed, if not constant. It is the highly variable modules that will require the most attention during testing. The test activity with regard to module 7 is notable in this regard. This is the most complex module (ρ₇ = 119). Under Test 1 it is one of the least complex modules . Thus, Test 1 only executes a small fraction of the potential complexity of module 7. We can see that of all of the tests, only Test 3 begins to exercise the complexity of the code in module .

Exhibit 16: Fractional Complexities of Modules on Test Suites

	Test No.
	Static	1	2	3	4	5	6
Module	ρ	ξ	ξ	ξ	ξ	ξ	ξ
1	118	107	114	98	91	104	102
2	90	0	0	0	90	0	0
3	102	98	98	96	0	101	0
4	104	104	104	92	0	104	0
5	89	89	89	88	0	89	0
6	112	92	93	109	0	110	0
7	119	87	88	114	0	103	0
8	101	99	100	101	0	88	0
9	102	102	102	102	0	0	0
10	92	88	89	88	0	0	0
11	93	92	92	92	0	0	0
12	90	90	90	90	0	0	0
13	101	99	100	0	0	0	0
14	93	90	92	90	0	91	0
15	94	91	93	92	0	92	0

Exhibit 17: Fractional Complexities of Modules on Test Suites

	Test No.
	7	8	9	10	11	12
Module	ξ	ξ	ξ	ξ	ξ	ξ
1	92	98	99	93	97	98
2	90	90	90	90	90	0
3	0	0	0	0	0	90

Although Exhibit 17 incorporates one aspect of dynamic measurement, measuring only the code subsets associated with a given execution, it does not reflect the execution profile of the code. As an example, operational complexity for Test 1 will be computed using the fractional complexity data. The frequency data, and thus the profiles, are no different than in Exhibit 15. The only difference is that between the original FI, fractional complexity, and the resultant operational complexities. The operational complexity for Test 1, using fractional complexity, is provided in Exhibit 18.

Exhibit 18: Fractional and Operational Complexity for Test 1

Module	ξ	Frequency	Profile	ω
1	107	1	0.00002	0.002
2	0	0	0	0
3	98	1	0.00002	0.002
4	104	1323	0.02923	3.040
5	89	25369	0.56055	49.889
6	92	13785	0.30459	28.022
7	87	1319	0.02914	2.535
8	99	958	0.02116	2.095
9	102	150	0.00331	0.338
10	88	150	0.00331	0.291
11	92	254	0.00561	0.516
12	90	808	0.01785	1.607
13	99	1	0.00002	0.002
14	90	20	0.00044	0.040
15	91	1118
	94.857			90.626

To return to the previous case, under Test 1 the static FI of module 7 was ρ₇ = 119 .0 and its fractional complexity was . When we actually look within this module at the code that is actually expressed by the execution of Test 1, we find that the fraction of code executed by this test as measured by is really quite small. Further, the operational complexity is also small. Test 1 simply does not expose this module to the potential faults that it may contain. Further examination of Exhibit 18 shows exactly where Test 1 was focused. We can see that the operational complexity of modules 5 and 6 are quite a bit larger than other entries in this exhibit. More interesting is the fact that module 5 is the least complex module of the entire set. A considerable amount of the total test effort is focused on this module, which is least likely to contain faults. This observation is quite characteristic of normal test activity. The test process and test outcomes are not intuitive. A stress test of the software is one that will maximize the exposure to latent faults in the system. This stress can be measured directly with either functional or operational complexity. Altogether too often we have seen the stress testing of systems reflect the mental anguish of the tester as opposed to the exercise of the potentially fault-prone code.

Let us now examine other test possibilities. Our objective now will be to maximize our exposure to the faults in the system. A logical activity would be to identify a functionality that would generate a profile giving maximum exposure to those modules whose fractional complexity was large for that functionality. This is a research process. Our objective will be to understand the behavior of the system in this process — not to find faults. This is the principal metaphor shift in the measurement-based approach to testing.

Exhibit 19 summarizes four distinct measures of test activity as the XC system was driven through its set of functionalities. Each of these measures will provide a global assessment of just what the test did to exercise the code within each of the program modules. The second column in Exhibit 19, labeled , represents the average FI of just the set of program modules that were executed at least once. Only the first three tests provided reasonable exposure to the complete set of modules by this test criterion. At the other extreme is Test 6, with an average FI of 8. The exposure to the full system complexity of this test is clearly trivial.

Exhibit 19: Four Measures of Test Activity

	Static Measures		Dynamic Measures
Test No.			φ	ω
1	94	89	98	91
2	94	90	98	91
3	87	83	98	96
4	14	12	104	91
5	68	65	96	94
6	8	7	118	102
7	14	12	104	91
8	14	13	104	94
9	14	13	104	94
10	14	12	104	92
11	14	12	104	93
12	15	13	110	94

The third column in Exhibit 19, labeled , contains the average fractional complexity, again for only those modules that were executed at least once by each of the tests. According to this criterion, none of the tests were really good at exercising the modules to their full extent. The very best of the tests, Test 2, executed a code set whose average complexity was 90, a figure well below the average complexity (100) of all of the program modules.

The fourth and fifth columns in Exhibit 19 contain the dynamic measures of test outcomes. These two columns represent the expected values of FI and fractional complexity under each of the test profiles. The fourth column shows the values of functional complexity for each of the tests. According to this test criterion, Test 6 provided substantial exposure to fault-prone code. However, this fact is somewhat misleading in and of itself. Test 6 only executed program module 1, which had an FI of 118. The final column Exhibit 19 contains the operational complexity of each of the tests. If we exclude the anomalous result of Test 6, we can see that the best test of all of the test activity was on Test 3. This test provided a large exposure to the total system complexity while also yielding the largest operational complexity of all the full tests.

By computing fractional complexity, and then using it to calculate functional complexity, we get a more accurate assessment of Test 1. Where we first had an initial average module complexity of 94 under this test as was shown in Exhibit 18, then a functional complexity based on a module's full FI of 89, we now have a functional complexity of 98. So, by incorporating the dynamic aspects of the test, we obtain a quite different assessment of the average module complexity for Test 1. Furthermore, by dividing the total observed complexity for this execution (1019) by the total complexity for the full system (1500), fractional complexity estimates that 67.93 percent of the full complexity of the system was exercised by this test. Recall that using the FI for the full module provided an estimate of 97 percent. Both the assessment of the average module complexity and the percentage of the full complexity of the system are needed to assess the effectiveness of a test in terms of the potential fault exposure.

^[5]Munson, J.C. and Hall, G.A., Dynamic Program Complexity and Software Testing, Proceedings of the 1995 IEEE International Testing Conference, IEEE Computer Society Press, Los Alamitos, CA, 1995, pp. 730–773.

^[6]Horgan, J.R., London, S., and Lyu, M.R., Achieving Software Quality with Testing Coverage Measures, IEEE Computer, 27(9), 60–69, 1994.

^[7]Horgan, J.R., London, S., and Lyu, M.R., Achieving Software Quality with Testing Coverage Measures, IEEE Computer, 27(9), 60–69, 1994.

^[8]Munson, J.C., A Functional Approach to Software Reliability Modeling, in Quality of Numerical Software, Assessment and Enhancement, Boisvert, R.F., Ed., Chapman & Hall, London, 1997.