Up to this point in our discussion of testing, we have regarded a program module as atomic. That is, it was not decomposable. As our need for increased resolution in the test activity begins to be an important factor, we can actually look within a program module and study how the
One thing is quite clear, however. Whatever subset of code that does execute when a module is called is a complete and
For the purposes of exploring the notions of fractional complexity and its impact on the test process, a small
For this study, the UX metric tool was employed to measure each of the 15 modules of the XC program. A set of nine metrics was obtained from the UX metric measurement tool. These metrics are shown in Exhibit 7. This is a reduced set of metrics obtained from the measurement process. They
Exhibit 7: Metrics and Their Definitions
|
|
|
η 1 |
Halstead's count of the number of unique operators |
|
η 2 |
Halstead's count of the number of unique operands |
|
N 1 |
Halstead's count of the total number of operators |
|
N 2 |
Halstead's count of the total number of operands |
|
Stmt |
Count of total noncommented source statements |
|
LOC |
Count of the total number of noncommented lines of code |
|
Blks |
Count of the total number of program blocks |
|
Span |
Average span of a variable reference |
|
V ( g ) |
Nodes - Edges + 2 |
|
|
While this reduction in the number of metrics has simplified the problem somewhat, we will further simplify the metric values for each module by computing their individual FI values, ρ i . This static measure of FI for the 15 program modules, ρ , is shown in Exhibit 8.
Exhibit 8: FI Values for XC Modules
|
|
|
Module |
7 |
1 |
6 |
4 |
9 |
3 |
8 |
13 |
15 |
14 |
11 |
10 |
12 |
2 |
5 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
ρ |
119 |
118 |
112 |
104 |
102 |
102 |
101 |
101 |
94 |
93 |
93 |
92 |
90 |
90 |
89 |
|
|
In its capacity as a fault surrogate, the program modules shown in Exhibit 8 are ordered from the most complex to the least complex. We can see from this arrangement that modules 7, 1, and 6 have disproportionately high values of the FI. Previous validation studies support the conclusion that these are the modules that have the greatest potential for faults. [6] However, when we get to executing the actual modules, we will discover that not all of module 7, for example, ever executes when the module receives control. Some statements will execute and others will not. The design of module 6, on the other hand, may result in all of the code in this module being executed every time it is entered. Thus, there is a dynamic aspect to the complexity of the module. The dynamic complexity of a module, like module 7, may be much less than its static complexity.
It can be seen from Exhibit 8 that there is a great deal of variability in the complexity of these 15 modules. For example, module 7 has an FI of 119, two standard deviations above the mean. Conversely, module 5 has an FI of 89, one standard deviation below the mean. From these static measurements, and the high correlation of FI with software faults, it can be surmised that there are likely to be more faults in module 7 than in module 5. However, this fact does not
In the usual computation of static software complexity, the complexity of the entire module is computed for each program module. This static measure may be very misleading. As the code in each program module executes, only a small number of the possible paths through the module may execute, in which case the complexity of the
functionality
has been exaggerated by an amount equal to the complexity of the code that did not execute. Hence, it only seems reasonable in the computation of dynamic program complexity to assess the complexity of just that code that executed for each of the distinct functionalities. As a program executes each of the sets of its functions, it will select a subset of statements from each module. The code in this reduced statement set will be a complete and fully functional program in its own right. It will not have the functionality of the entire program in that only a subset of the source code has been selected by the particular functionality. An example of this can be seen in Exhibits 9, 10, and 11. In Exhibit 9 we have a code segment of C code. To compute the FI of this segment we will measure the entire segment. In all, there are seven lines of code, five statements, and seven total operands. When this code is executed, however, there are two
Exhibit 9: C Code Segment 1
|
|
if (i > j) { i = i + j; printf (i); } else printf (i);
|
|
Exhibit 10: C Code Segment 2
|
|
if (i > j) { i = i + j; printf (i); }
|
|
Exhibit 11: C Code Segment 3
|
|
if (i > j) printf (i);
|
|
In the dynamic program complexity view, the complexity of a program module will vary in accordance with the code that is selected for execution. Under this view, the complexity of the system will vary by function as well. The metric values for a single program module can no longer be characterized by a single vector of values. In this new view there will be a set of vectors of metrics values, one vector for each functionality. As each functionality is executed, a different subset of code will be excised by the functionality. Thus, every time a new functionality is executed, we will re-measure that code and extract a new set of raw metrics for that subset.
A significant problem arises in the computation of the orthogonal metric sets and, subsequently, FI, in that many different subsets of code will be selected by the varying functionalities. A solution to this problem is to use the original transformation matrix that was calculated for the domain metrics for the entire program. If this is the case, the raw metrics of each module will be bounded above by the raw metric values of the entire module. The FI of any of the subsets of the module code will also be bounded above by the FI of the entire module. Thus, the baseline established for the initial computation of FI will also serve as the baseline for the computation of the complexity of the functional code subsets. As was the case for the model of evolving software discussed in Chapter 8, the baseline established for fractional code measurement will consist of the set of means and standard deviations obtained for the original raw metrics and the transformation matrix. We standardize the raw metrics of the functional code
Each distinct test scenario to a program will cause a subset of the program and each of its modules to execute. The source code subset of the original program represented by the code subset of a module
m
i
will have its own FI as the functionality changes. To differentiate between the system FI and the FI of each of the functional code subsets, this new FI of the
i
th
module executing the
j
th
function will be denoted as
. This
fractional
FI value will represent the FI of
just the code that was executed
when the
j
th
functionality was exercised.
From a measurement perspective, it is not always easy to extract the code segments that have executed when a particular functionality is exercised. Necessarily, the software will have to be instrumented to permit this introspection. In some languages, such as C, there are available code coverage tools, such as ATAC, capable of isolating code sequences associated with particular tests. [7] When these code segments are so isolated, they can then be measured by existing metric analysis tools.
To
From the standpoint of the highest-level granularity of functionality, the XC tool consists of six basic functions. These functionalities are summarized in Exhibit 12. At any given
Exhibit 12: Functionalities of XC
|
|
|
Function |
Functionality Description |
|---|---|
|
f 1 |
Routing output |
|
f 2 |
Cross-referencing reserved words |
|
f 3 |
Processing nested include files |
|
f 4 |
Generate listing only |
|
f 5 |
Extended debugging facility |
|
f 6 |
Cross-referencing
|
|
|
Exhibit 13: Functional Module Subsets of XC
|
|
|
F |
M c |
M i |
M p |
|---|---|---|---|
|
f 1 |
{m 1 } |
{m 3 –m 12 , m 14 , m 15 } |
{m 2 , m 13 } |
|
f 2 |
{m 1 } |
{m 3 –m 12 , m 14 , m 15 } |
{m 2 , m 13 } |
|
f 3 |
{m 1 } |
{m 3 –m 8 , m 11 , m 14 , m 15 } |
{m 2 } |
|
f 4 |
{m 1 } |
{m 3 –m 8 , m 11 , m 14 , m 15 } |
{m 2 } |
|
f 5 |
{m 1 } |
{m 2 –m 15 } |
|
|
f 6 |
{m 1 } |
{m 3 –m 12 , m 14 , m 15 } |
{m 2 , m 13 } |
|
|
A total of 12 test suites were designed to test this basic set of functionalities. These test functions are show in Exhibit 14. In this table are the descriptions of each of the tests, together with the functionalities exercised by that test in the column labeled T × F. The 12 test suites were designed by the test staff to ensure 100 percent block coverage in the code during the test. The net effect of executing all the program blocks is that all of the program functionality would also be exercised. Because the stopping criterion for the test staff was related to the block coverage, many exception conditions had to be exercised. In most cases (see Tests 7 through 11), the execution of the code that raised the exception conditions only accessed a limited number of program functionalities.
Exhibit 14: Test Suites for the XC Program Functionalities
|
|
|
Test No. |
Function |
T × F |
|---|---|---|
|
1 |
Enable the debug option with the -d switch. |
f 5 , f 6 |
|
2 |
Flip the -d switch in the case statement in main to exercise the debug option of this program with the output being written to a file with the -o option. It also tests for cross referencing of reserved words -r and emits a printer initialization string with the -e option. |
f 1 , f 2 , f 5 , f 6 |
|
3 |
Overflow the hash table in function put_token that is limited to 749 identifiers. |
f 6 |
|
4 |
Raise the use_err in main if the command line arguments are less than two. |
f 5 , f 6 |
|
5 |
Generate a listing only with the -1 option and to enable file inclusion with -i. |
f 3 , f 4 , f 6 |
|
6 |
Raise the use_err in main for an illegal file
|
f 1 |
|
7 |
Raise the use_err in main when the second argument is a '-' instead of a filename. |
f 5 |
|
8 |
Raise the use_err in main when -o option is used without giving a legal filename. |
f 1 |
|
9 |
Raise the use_err in main if the -o option is invoked followed by another switch with no file name in between. |
f 1 |
|
10 |
Raise the use_err in main if a switch is used without
|
f 5 |
|
11 |
Raise the use_err in main if an illegal switch is used. |
f 5 |
|
12 |
Raise the use_err in proc_file if an illegal or nonexistent filename is given as the second argument. |
f 5 , f 6 |
|
|
In the case of Test 1, all of the modules, except for module 2, are executed. The execution results of the Test 1 suite are shown in Exhibit 15. The average FI of only the modules that were involved in this execution was 100.7, slightly above the average system FI of 100. If we add the FI values of the modules involved, and then divide by the total number of modules in the system, we get an assessment of the exposure to the complexity of the system that was manifested by this test. If all of the modules are executed in their entirety, then there will be 100 percent exposure to the complexity of the system. In the case of Test 1, this computation yields a value of 97 percent. This appears at face value to be a realistic test.
Exhibit 15: Functional Complexity for Test 1
|
|
|
Module |
ρ |
Frequency |
Profile |
φ |
|---|---|---|---|---|
|
1 |
118 |
1 |
0.00002 |
0.002 |
|
2 |
90 |
|
0.0 |
0.0 |
|
3 |
102 |
1 |
0.00002 |
0.002 |
|
4 |
104 |
1323 |
0.02923 |
3.039 |
|
5 |
89 |
25369 |
0.56055 |
49.888 |
|
6 |
112 |
13785 |
0.30459 |
34.114 |
|
7 |
119 |
1319 |
0.02914 |
3.467 |
|
8 |
101 |
958 |
0.02116 |
2.137 |
|
9 |
102 |
150 |
0.00331 |
0.337 |
|
10 |
92 |
150 |
0.00331 |
0.304 |
|
11 |
93 |
254 |
0.00561 |
0.521 |
|
12 |
90 |
808 |
0.01785 |
1.606 |
|
13 |
101 |
1 |
0.00002 |
0.002 |
|
14 |
93 |
20 |
0.00044 |
0.040 |
|
15 |
94 |
1118 |
0.02470 |
2.322 |
|
Total |
1500 |
45257 |
1.0 |
97.787 |
|
|
As mentioned earlier, the failure of a software system is related not only to the complexity of a module, but also to the amount of time spent executing that module. To get a better assessment of the effectiveness of a software test, we need to
By taking into consideration the execution frequency of each of these modules, a more refined assessment of the complexity exercised by Test 1 can be obtained. Now, instead of the initial average module complexity of
, we take into account the proportion of time spent in each of the modules of varying complexity. This will yield an expected value for fractional complexity of
φ
(1)
= 97.787. The functional complexity of the test is less than the average system FI. The view of the system as established by this test shows fewer potential faults than the actual number of faults in the whole system. However, even this estimate is still not realistic. It is based on the naive assumption that every time a module is called, the entire module (and thus all of its complexity) is executed. By measuring only the code of a module that executes during a given test, we can better assess the true potential for a test to expose the software to its potential faults. The measurement technique of fractional complexity was developed for this purpose.
It is now clear that each function has an associated execution profile. This execution profile, once again, can be expected to change across the set of program functionalities. This will result in a concomitant variation in the
operational complexity
of the function. As was the case in the definition of functional complexity, each functionality
f
j
. has an execution profile represented by the probabilities
ρ
(j)
. The operational complexity
for the execution of each function
f
i
is the expected value of the fractional complexity under an execution profile as
This is the fractional complexity analog of the earlier concept of functional complexity. Operational complexity has the virtue that it reflects the concomitant variation in both the fractional complexity of the code selected by the execution of a function and the execution profile of that function.
Each of the 12 tests of the system, as shown in Exhibit 14, was run again, this time under the control of the modified ATAC software system. This would permit the source code that was executed for each test to be extracted. This reduced set of code for each program module was then submitted to the metric analyzer and the nine metrics were computed for each of the twelve functional tests. These raw complexity measures were then standardized using the mean and standard deviation values obtained previously from the full code set. Thus, the metric data from each of the tests was standardized relative to the baseline or total code set. These standardized metric values were then transformed to domain metric values using the original baseline transformation data. Fractional complexities ξ i were computed using the eigenvalues from the baseline data. Again, the fractional complexities are simply the FI of the reduced code sets from each test.
The fractional complexities
ξ
i
for each module on each test are shown in Exhibits 16 and 17. The data in these two tables represent the same set of modules. Exhibit 17 is abbreviated in that for these tests, modules 4 through 15 simply were not exercised. Each test consists of the sets
M
i
and
M
p
for each of the test functionalities. In the case of the data shown in these exhibits, the set
M
c
clearly consists of only one program module, in this case, the main program. A large number of table entries are zero. This is because the associated program modules did not execute on the particular test and, thus, none of the complexity of that module was exercised. In some cases, it can be seen that there is a high degree of variability in the
. Thus, Test 1 only executes a small fraction of the potential complexity of module 7. We can see that of all of the tests, only Test 3 begins to exercise the complexity of the code in module
.
Exhibit 16: Fractional Complexities of Modules on Test Suites
|
|
|
Test No. |
|||||||
|---|---|---|---|---|---|---|---|
|
Static |
1 |
2 |
3 |
4 |
5 |
6 |
|
|
Module |
ρ |
ξ |
ξ |
ξ |
ξ |
ξ |
ξ |
|
1 |
118 |
107 |
114 |
98 |
91 |
104 |
102 |
|
2 |
90 |
|
|
|
90 |
|
|
|
3 |
102 |
98 |
98 |
96 |
|
101 |
|
|
4 |
104 |
104 |
104 |
92 |
|
104 |
|
|
5 |
89 |
89 |
89 |
88 |
|
89 |
|
|
6 |
112 |
92 |
93 |
109 |
|
110 |
|
|
7 |
119 |
87 |
88 |
114 |
|
103 |
|
|
8 |
101 |
99 |
100 |
101 |
|
88 |
|
|
9 |
102 |
102 |
102 |
102 |
|
|
|
|
10 |
92 |
88 |
89 |
88 |
|
|
|
|
11 |
93 |
92 |
92 |
92 |
|
|
|
|
12 |
90 |
90 |
90 |
90 |
|
|
|
|
13 |
101 |
99 |
100 |
|
|
|
|
|
14 |
93 |
90 |
92 |
90 |
|
91 |
|
|
15 |
94 |
91 |
93 |
92 |
|
92 |
|
|
|
Exhibit 17: Fractional Complexities of Modules on Test Suites
|
|
|
Test No. |
||||||
|---|---|---|---|---|---|---|
|
7 |
8 |
9 |
10 |
11 |
12 |
|
|
Module |
ξ |
ξ |
ξ |
ξ |
ξ |
ξ |
|
1 |
92 |
98 |
99 |
93 |
97 |
98 |
|
2 |
90 |
90 |
90 |
90 |
90 |
|
|
3 |
|
|
|
|
|
90 |
|
|
Although Exhibit 17 incorporates one aspect of dynamic measurement, measuring only the code subsets associated with a given execution, it does not reflect the execution profile of the code. As an example, operational complexity for Test 1 will be computed using the fractional complexity data. The frequency data, and thus the profiles, are no different than in Exhibit 15. The only difference is that between the original FI, fractional complexity, and the resultant operational complexities. The operational complexity for Test 1, using fractional complexity, is provided in Exhibit 18.
Exhibit 18: Fractional and Operational Complexity for Test 1
|
|
|
Module |
ξ |
Frequency |
Profile |
ω |
|---|---|---|---|---|
|
1 |
107 |
1 |
0.00002 |
0.002 |
|
2 |
|
|
|
|
|
3 |
98 |
1 |
0.00002 |
0.002 |
|
4 |
104 |
1323 |
0.02923 |
3.040 |
|
5 |
89 |
25369 |
0.56055 |
49.889 |
|
6 |
92 |
13785 |
0.30459 |
28.022 |
|
7 |
87 |
1319 |
0.02914 |
2.535 |
|
8 |
99 |
958 |
0.02116 |
2.095 |
|
9 |
102 |
150 |
0.00331 |
0.338 |
|
10 |
88 |
150 |
0.00331 |
0.291 |
|
11 |
92 |
254 |
0.00561 |
0.516 |
|
12 |
90 |
808 |
0.01785 |
1.607 |
|
13 |
99 |
1 |
0.00002 |
0.002 |
|
14 |
90 |
20 |
0.00044 |
0.040 |
|
15 |
91 |
1118 |
||
|
|
94.857 |
|
90.626 |
|
|
To return to the previous case, under Test 1 the static FI of module 7 was
ρ
7
= 119 .0 and its fractional complexity was
. When we actually look within this module at the code that is actually
is really quite small. Further, the operational complexity
is also small. Test 1 simply does not expose this module to the potential faults that it may contain. Further examination of Exhibit 18 shows exactly where Test 1 was focused. We can see that the operational complexity of modules 5 and 6 are quite a bit larger than other entries in this exhibit. More interesting is the fact that module 5 is the least complex module of the entire set. A considerable amount of the total test effort is focused on this module, which is least likely to contain faults. This observation is quite characteristic of normal test activity. The test process and test outcomes are not intuitive. A stress test of the software is one that will maximize the exposure to latent faults in the system. This stress can be measured directly with either functional or operational complexity. Altogether too often we have seen the stress testing of systems reflect the mental anguish of the tester as opposed to the exercise of the potentially fault-prone code.
Let us now examine other test possibilities. Our objective now will be to maximize our exposure to the faults in the system. A logical activity would be to identify a functionality that would generate a profile giving maximum exposure to those modules whose fractional complexity was large for that functionality. This is a research process. Our objective will be to understand the behavior of the system in this process — not to find faults. This is the principal metaphor shift in the measurement-based approach to testing.
Exhibit 19 summarizes four distinct measures of test activity as the
XC
system was driven through its set of functionalities. Each of these measures will provide a global assessment of just what the test did to exercise the code within each of the program modules. The second column in Exhibit 19, labeled
, represents the average FI of just the set of program modules that were executed at least once. Only the first three tests provided reasonable exposure to the complete set of modules by this test criterion. At the other extreme is Test 6, with an average FI of 8. The exposure to the full system complexity of this test is clearly trivial.
Exhibit 19: Four Measures of Test Activity
|
|
|
Static Measures |
Dynamic Measures |
|||
|---|---|---|---|---|
|
Test No. |
|
|
φ |
ω |
|
1 |
94 |
89 |
98 |
91 |
|
2 |
94 |
90 |
98 |
91 |
|
3 |
87 |
83 |
98 |
96 |
|
4 |
14 |
12 |
104 |
91 |
|
5 |
68 |
65 |
96 |
94 |
|
6 |
8 |
7 |
118 |
102 |
|
7 |
14 |
12 |
104 |
91 |
|
8 |
14 |
13 |
104 |
94 |
|
9 |
14 |
13 |
104 |
94 |
|
10 |
14 |
12 |
104 |
92 |
|
11 |
14 |
12 |
104 |
93 |
|
12 |
15 |
13 |
110 |
94 |
|
|
The third column in Exhibit 19, labeled
, contains the average fractional complexity, again for only those modules that were executed at least once by each of the tests. According to this criterion, none of the tests were really good at exercising the modules to their full extent. The very best of the tests, Test 2, executed a code set whose average complexity was 90, a figure well below the average complexity (100) of all of the program modules.
The fourth and fifth columns in Exhibit 19 contain the dynamic measures of test outcomes. These two
By computing fractional complexity, and then using it to calculate functional complexity, we get a more accurate assessment of Test 1. Where we first had an initial average module complexity of 94 under this test as was shown in Exhibit 18, then a functional complexity based on a module's full FI of 89, we now have a functional complexity of 98. So, by incorporating the dynamic aspects of the test, we obtain a quite different assessment of the average module complexity for Test 1. Furthermore, by dividing the total observed complexity for this execution (1019) by the total complexity for the full system (1500), fractional complexity estimates that 67.93 percent of the full complexity of the system was exercised by this test. Recall that using the FI for the full module provided an estimate of 97 percent. Both the assessment of the average module complexity and the percentage of the full complexity of the system are needed to assess the effectiveness of a test in terms of the potential fault exposure.
[5] Munson, J.C. and Hall, G.A., Dynamic Program Complexity and Software Testing, Proceedings of the 1995 IEEE International Testing Conference , IEEE Computer Society Press, Los Alamitos, CA, 1995, pp. 730–773.
[6] Horgan, J.R., London, S., and Lyu, M.R., Achieving Software Quality with Testing Coverage Measures, IEEE Computer , 27(9), 60–69, 1994.
[7] Horgan, J.R., London, S., and Lyu, M.R., Achieving Software Quality with Testing Coverage Measures, IEEE Computer , 27(9), 60–69, 1994.
[8] Munson, J.C., A Functional Approach to Software Reliability Modeling, in Quality of Numerical Software, Assessment and Enhancement , Boisvert, R.F., Ed., Chapman & Hall, London, 1997.