Chapter 7: Modeling with Metrics | Software Engineering Measurement

7.1 Introduction

From the discussion in Chapter 5, it is clear that there are many different software attributes that we can measure. We can, for example, measure LOC, Exec, Nodes, etc. Knowledge of these specific attributes for a particular program module does us very little or no good. It is even more aggravated when we have these data on thousands of program modules. We easily come to the same conclusion as most people who have mindlessly measured code. So what? We have invested resources to acquire knowledge that we simply cannot put to good use.

For a moment, then, let us turn the question around. What is it that we really want to know about the programs that we are building? It would be relatively easy to construct such a list. We merely have to open just about any software engineering textbook and start writing. Following is a partial list of what we would like to know about our programs:

Maintainability
Reliability
Availability
Interoperability
Portability
Security

Unfortunately, it is very difficult to measure these things. In general, we will refer to these "ilities" as software quality attributes. The software attributes that lead to an understanding of software quality are the things that we most wish to measure. At the top of this list, perhaps, is the number of faults in each code module. If we were to know this, then we could set about to eliminate them one by one until we had perfect code. Software faults, however, are very elusive. They escape even the closest scrutiny. We can never know, with any degree of certainty, how many faults there are in a code module. We can only find them if we look very closely during software inspections or when the faults express themselves as failure events. This problem is exacerbated by the fact that the code base is probably changing over time. New faults are continually being added and some are being removed. In plain terms, we cannot measure the very thing that we wish to know. We can track with some precision how many faults we have removed from a system but we cannot know exactly how many we put there in the first place.

This is our dilemma. There are software attributes that we can measure; unfortunately, these attributes are not what we want to know. There are also software attributes that we cannot measure, but these are the very things that we must know. We can, however, learn from our past experiences in software development. We cannot know, for example, how many faults there are in our new DEF software system. We do have a pretty good idea how many faults there were in our legacy ABC system that is very similar to the DEF system that we are now developing. There is reason to believe that, all other software process issues being equal, the faults that were introduced by our software process in the development of the DEF system will be similar to those introduced into the ABC system during its development. If we have kept good records on the fault tracking system for the faults that were found during the development of the ABC system, we can develop a functional relationship between the number of faults found in each of the modules of the ABC system and certain measurable software attributes such as Exec and Nodes. This functional relationship is a model for the software faults of the ABC system.

If we assume that the new DEF system is similar to the ABC system in terms of software engineering methodology, programming language, and development team, then we can apply the model that we developed for the ABC system to the new DEF system. This will permit us to predict both the location and the quantity of faults in the new DEF system. In this case we can use our knowledge of things that we can measure but do not really want to know about the DEF system to predict the things that we cannot measure but really want to know.

Thus, there are really two different kinds of metrics. There are those that can be set by us to a particular value, or at least directly observed. For modeling purposes, we will call these independent variables. Then there are metrics whose values are directly affected by changes in the independent variables. These metrics we will call dependent variables or criterion variables. The "ilities" will always be our criterion measures.

In our investigations into modeling, we will first develop models that explore linear relationships between one or more independent models and a single criterion measure such as software faults. We will then study the particular case where the criterion measure has nominal values using a technique called discriminant analysis. Next we will investigate a family of models called canonical correlation, wherein we will have multiple, dependent variables. Finally, we will examine nonlinear relationships between our independent variables and a single dependent variable.

The predictive models that we develop in this chapter will allow us to map between the things that we can know and manipulate (i.e., the independent variables) and the things that we really want to know (i.e., the dependent variables). However, this measurement and modeling approach will work well if and only if we have good measurement data. If the data are weak, the predictive models that we develop will be equally weak. This is yet another example of the garbage-in and garbage-out problem.

One of the fundamental tenets of software measurement is that these measurements will disclose aspects of the quality of the system. In particular, we are interested in the use of metrics to determine the impact of change on the software quality of the system as the system changes over time. While we cannot know the numbers and locations of faults, we can build models based on observed relationships between faults and some other measurable software attributes. Software faults and other measures of software quality can be known only at the point the that software has finally been retired from service. Only then can it be said that all of the relevant faults have been isolated and removed from the software system. On the other hand, software complexity can be measured very early in the software life cycle. In some cases, these measures of software complexity can be extracted from design documents. Some of these measures are very good leading indicators of potential software faults.