6.7 Domain Metrics

6.7 Domain Metrics

One of the main problems encountered in working with raw metric data is that they are data. There is very little information content in raw metric data. Take, for example, the 12 raw metrics obtained from one particular build of the PASS system. A sample of data from 20 program modules is shown in Exhibit 13. It would be very difficult to impossible to draw any useful conclusions about, for example, module 6 in relation to module 9, or any other module for that matter. If we were measuring a system of 10,000 modules on 30 metrics, we would have even more of a problem.

Exhibit 13: Raw Metric Values for 20 PASS Modules

start example

Module

η1

η2

N1

N2

Exec

LOC

Nodes

Edges

Paths

Cycles

Maxpath

Avepath

1

45

282

1335

763

339

841

114

131

14652

13

168

136

2

14

13

43

18

11

22

8

9

4

0

7

6

3

47

259

2542

1154

559

1027

103

129

5001

4

143

129

4

3

5

4

5

1

399

14

10

4

0

10

10

5

34

117

867

371

212

248

78

103

23892

3

87

69

6

35

157

1040

484

226

475

129

168

13512

4

112

97

7

12

28

47

34

13

114

14

10

4

0

10

10

8

45

331

3493

1760

733

1451

405

531

3129

14

441

429

9

42

221

1365

667

377

740

235

310

50004

11

134

118

10

26

62

274

109

46

69

24

31

46

0

22

17

11

11

22

26

22

9

283

14

10

4

0

10

10

12

38

154

836

427

203

286

117

156

50001

7

109

87

13

24

48

289

145

86

92

51

68

16472

5

64

47

14

37

82

321

177

70

197

34

35

40

1

32

25

15

23

69

361

167

67

81

25

31

377

3

43

32

16

23

56

212

111

64

81

26

33

637

2

45

34

17

20

29

85

47

20

24

16

19

13

1

20

14

18

35

191

1189

569

311

431

174

244

2231

6

150

137

19

33

144

774

362

194

597

122

158

210

1

40

27

20

25

51

205

101

60

79

31

39

444

1

36

30

x

16

50

300

154

70

138

34

44

7301

1

30

25

s

13

70

594

290

131

229

55

74

16668

3

47

43

end example

We would like to convert the data shown in Exhibit 13 to information. The first step in this process is to understand each module in the context of the larger system. The last two rows of this table contain the means and standard deviations, respectively, for each metric for the entire software system. We can now look at module 6 and see that there are more than the average number of paths. For example, module 6 has 23,892 paths whereas the average module in this system has 7301 paths.

We can increase our understanding of the data represented in Exhibit 13 by converting the raw metric values to z-scores. The corresponding z-scores for each of the program modules are shown in Exhibit 14. Now our resolution on the data is beginning to improve. Module 6 from this new perspective is rather different from the majority of other program modules. With the possible exception of path complexity, module 6 is at least one standard deviation greater than the mean of each of 11 metrics. Module 9 in another module that is strikingly greater than average on most attributes. Module 7, on the other hand, has lots of negative values. It is typically less than average on all program attributes.

Exhibit 14: z-Scores for Raw Metric Values

start example

Module

η1

η2

N1

N2

Exec

LOC

Nodes

Edges

Paths

Cycles

Maxpath

Avepath

1

2.21

3.31

1.74

2.10

2.06

3.06

1.46

1.17

0.44

3.94

2.94

2.61

2

-0.12

-0.53

-0.43

-0.47

-0.45

-0.51

-0.48

-0.47

-0.44

-0.40

-0.50

-0.45

3

2.36

2.98

3.78

3.45

3.74

3.87

1.26

1.14

-0.14

0.94

2.40

2.45

4

-0.94

-0.65

-0.50

-0.52

-0.53

1.14

-0.37

-0.46

-0.44

-0.40

-0.43

-0.36

5

1.38

0.95

0.96

0.75

1.09

0.48

0.80

0.79

1.00

0.61

1.21

1.02

6

1.46

1.52

1.25

1.14

1.20

1.47

1.73

1.67

0.37

0.94

1.74

1.68

7

-0.27

-0.32

-0.43

-0.42

-0.43

-0.11

-0.37

-0.46

-0.44

-0.40

-0.43

-0.36

8

2.21

4.01

5.38

5.54

5.07

5.72

6.79

6.55

-0.25

4.28

8.76

9.50

9

1.98

2.44

1.79

1.77

2.35

2.62

3.67

3.58

2.56

3.28

2.21

2.19

10

0.78

0.17

-0.04

-0.16

-0.18

-0.30

-0.19

-0.17

-0.44

-0.40

-0.18

-0.18

11

-0.34

-0.41

-0.46

-0.46

-0.46

0.63

-0.37

-0.46

-0.44

-0.40

-0.43

-0.36

12

1.68

1.48

0.90

0.94

1.02

0.64

1.51

1.51

2.56

1.94

1.68

1.46

13

0.63

-0.03

-0.02

-0.03

0.12

-0.20

0.30

0.32

0.55

1.27

0.72

0.51

14

1.61

0.45

0.04

0.08

0.00

0.26

-0.01

-0.12

-0.44

-0.06

0.04

-0.01

15

0.56

0.27

0.10

0.04

-0.02

-0.25

-0.17

-0.17

-0.42

0.61

0.27

0.17

16

0.56

0.08

-0.15

-0.15

-0.04

-0.25

-0.15

-0.15

-0.40

0.27

0.31

0.19

17

0.33

-0.31

-0.36

-0.37

-0.38

-0.50

-0.34

-0.33

-0.44

-0.06

-0.22

-0.26

18

1.46

2.01

1.50

1.43

1.85

1.28

2.56

2.69

-0.30

1.61

2.55

2.62

19

1.31

1.34

0.80

0.72

0.95

2.00

1.60

1.53

-0.43

-0.06

0.21

0.04

20

0.71

0.01

-0.16

-0.18

-0.07

-0.26

-0.06

-0.07

-0.41

-0.06

0.12

0.12

end example

We discovered with the PCA of the 12 metrics listed in Exhibits 13 and 14 that there are only two distinct sources of variation. We would like to transform the 12 raw metric values to their corresponding equivalents in the two new metric domains. Fortunately, the PCA technique produces a set of coefficients that will send the 12 metric z-scores shown in Exhibit 14 into the two new metric domains. The transformation matrix for 12 metrics on the PASS data is shown in Exhibit 15.

Exhibit 15: Transformation Matrix for z-Scores

start example

Metric

Size

Control

η1

0.14

-0.03

η2

0.22

-0.08

N1

0.26

-0.13

N2

0.26

-0.13

Exec

0.24

-0.10

LOC

0.20

-0.06

Nodes

-0.03

0.19

Edges

-0.04

0.20

Paths

-0.04

0.17

Cycles

-0.18

0.31

Maxpath

-0.10

0.26

Avepath

-0.11

0.27

end example

The z-scores for the PASS sample data are shown in Exhibit 14. This is a 20 × 12 matrix. When it is post-multiplied by the 12 × 2 matrix of coefficients shown in Exhibit 15, the result is a 20 × 2 matrix of factor scores, which we will call domain scores for each of the 20 program modules. This product matrix of domain scores is shown in Exhibit 16, along with the observation that each domain score has a mean of 0 and a standard deviation of 1, just the same as the raw z-scores. We have now reduced the data of Exhibit 13 to information. The program module that exhibits the largest size attribute is module 3. The most complex module from a control perspective is module 8.

Exhibit 16: Domain Scores for the PASS Data

start example

Metric

Size

Control

1

1.74

2.06

2

-0.36

-0.40

3

3.78

0.17

4

-0.24

-0.37

5

0.76

0.78

6

1.08

1.24

7

-0.25

-0.38

8

3.25

6.07

9

1.42

2.92

10

0.13

-0.34

11

-0.16

-0.40

12

0.53

1.91

13

-0.34

0.92

14

0.44

-0.23

15

-0.01

0.12

16

-0.12

0.13

17

-0.30

-0.17

18

1.10

2.00

19

1.39

0.03

20

-0.05

-0.01

end example

The size and control domain scores shown in Exhibit 16; both have a mean of 0 and a standard deviation of 1. We can therefore add them to create a new composite metric. This new metric sum is shown in the fourth column (Sum) of Exhibit 17. It is essentially a composite score of the program modules on size and control complexity. Exhibit 17 has also been sorted by this new sum. Now a new picture of the distribution of module complexity clearly emerges. Module 8 is, by far, the most complex module of the 20 sample modules. If we have been very careful in our selection of metrics to include only those that are distinctly related to software faults, then the new domain scores represented by Exhibit 17 are particularly relevant. A large domain score on the control domain, such as is the case with module 8, indicates a real proclivity on the part of those writing module 8 to include control faults in that module.

Exhibit 17: Sorted Domain Scores for the PASS Data

start example

Metric

Size

Control

Sum

8

3.25

6.07

9.31

9

1.42

2.92

4.34

3

3.78

0.17

3.95

1

1.74

2.06

3.80

18

1.10

2.00

3.10

12

0.53

1.91

2.44

6

1.08

1.24

2.32

5

0.76

0.78

1.55

19

1.39

0.03

1.42

13

-0.34

0.92

0.58

14

0.44

-0.23

0.21

15

-0.01

0.12

0.12

16

-0.12

0.13

0.01

20

-0.05

-0.01

-0.06

10

0.13

-0.34

-0.21

17

-0.30

-0.17

-0.47

11

-0.16

-0.40

-0.56

4

-0.24

-0.37

-0.61

7

-0.25

-0.38

—0.63

2

-0.36

-0.40

-0.75

end example

The right-most of Exhibit 17 is very revealing. Imagine that the system of program modules represented by this table constitutes the entire system. Further imagine that we are going to have to ship this system sometime in the very near future. We would like to invest our test and inspection time wisely so that we can maximize our exposure to latent faults in the system. We would be wise to invest our time in proportion to the likelihood of encountering faults in the code. The distribution of these faults in the code is not even. Control faults are more likely to be found in modules whose control domain scores are high. The right-most column of the table, (Sum) is our first cut at a fault surrogate. That is, it is a measure that varies in the same manner as software faults. There are other ways of creating surrogate fault measures, as we will now see.



Software Engineering Measurement
Software Engineering Measurement
ISBN: 0849315034
EAN: 2147483647
Year: 2003
Pages: 139

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net