4.5 Empirical Validity

4.5 Empirical Validity

The concept of the empirical validity of a metric relates to two distinct considerations, both based on the statistical notion of variability. First, a metric must identify a unique source of variation not already present in any other metrics that we might be using. Second, this new source of variation will serve to explain additional variation in one or more of our criterion measures. Third, the metric must vary as a function of concomitant variation in a criterion measure. Both of these considerations may be determined only through the experimentation process. We must design, develop, and conduct carefully controlled experiments for our test of empirical validity. This is a time- and resource-consuming process. We have often witnessed attempts to circumvent the application of scientific methodology in the validation of metrics. The most common means is to hire a metric expert or a panel of metric experts to identify the appropriate metrics to use in an organization. Much of the measurement process is counter-intuitive. Things just do not work the way we think they should. The use of statistics and sound scientific experimental methodology will provide the best opportunity to identify a viable working set of metrics for all of our measurement domains.

In short, a new metric will contribute to a significant increase in the proportion of variation that we observe in a criterion variable. If, for example, we are attempting to validate a new complexity metric for prediction of software faults, then the fault prediction model of complexity metrics on faults must be significantly better with addition of the new metric. A good example of the validation process can be found in a recent work by Anger et al. [2] or in our own work on the validation of the data structures metric. [3]

As we learn to apply this empirical validation process, we soon discover that only a limited subset of the existing metrics meets these new criteria. As we will see in Chapter 5, the Halstead software science metrics are a good example of this. [4] Consider the case of Halstead's program vocabulary, η = η1 + η2. From the standpoint of linear modeling, we simply cannot use the value of η because it is a linear compound of two other metrics, η1 and η2. We observe a similar circumstance with McCabe's cyclomatic complexity, which is the sum of nodes and edges in a control flowgraph representation of a program. The metric primitives of Nodes and Edges contain all of the information. It is clear that a metric that is a simple linear derivative of two other metrics cannot identify new sources of variation different from the metrics that comprise the new metric. In essence, if you know that a person has change amounting to 25 cents in one of his pants pockets and 50 cents in the other pants pocket, you will learn nothing new from my telling you that he has 75 cents in his pants pockets.

The world we live in is not a simple one. There is no one number that can characterize a human being. Human beings have many different attributes, such as height, weight, age, race, gender, hair color, eye color, and an astonishing array of genetic attributes. Each of these attributes can be measured individually. Samples from individuals measured for a single attribute follow some type of univariate probability distribution (probably not the normal distribution). Height, in and of itself, is not a good characterization of a person. To characterize a person more completely, we are going to have to have measures on a plethora of attribution dimensions, simultaneously. Realistically, we must learn to deal with the world as it really is. It is a multivariate world. A command of multivariate statistical analysis is vital to our understanding of this world. In all likelihood, our criterion measures are also multivariate in nature. We would like our software to be maintainable, reliable, secure, and demonstrate survivability, all at once. Our studies of empirical validity will most certainly not be a matter of simple correlation between a single metric and a single criterion measure. These empirical validation investigations will be complex multivariate experimental designs.

[2]Anger, F.D., Munson, J.C., and Rodriguez, R.V., Temporal Complexity and Software Faults, Proceedings of the IEEE International Symposium on Software Reliability Engineering 1994, IEEE Computer Society Press, Los Alamitos, CA, 1994.

[3]Munson, J.C. and Khoshgoftaar, T.M., The Measurement of Data Structure Complexity, Journal of Systems and Software, 20(3), 217–226, March 1993.

[4]Halstead, M.H., Elements of Software Science, Elsevier, New York, 1977.



Software Engineering Measurement
Software Engineering Measurement
ISBN: 0849315034
EAN: 2147483647
Year: 2003
Pages: 139

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net