2.2 Measurement

Despite the fact that the theoretical foundation for software measurement lies in measurement theory, both practitioners and researchers have largely ignored this fact. The result is that much published work in software measurement is theoretically flawed. Some of this material may even be quite misleading. Let us begin our inquiry into software measurement with a brief look at the notion of measurement.

Measurement is the process by which numbers or symbols are assigned to objects, either real or abstract, that we observe in our intellectual environment. These objects we refer to as entities. An example of a real object is a person (human being) or an automobile. An example of an abstract object is a sorting algorithm. Each instance of these objects has certain properties or attributes. The car in my driveway is an example of an instance of the entity automobile. Each instance of the entity automobile would have the attributes of manufacturer (Ford, Chevrolet, etc.), size (compact, mid-size, van, etc.), weight, horsepower, etc. Measurement is also a process for standardization whereby we can make statements of equivalence among objects of many different types.

The process of identifying the attributes of an abstract object, on the other hand, is not nearly so simple. This fact is regrettable, in that most of the objects we will have to deal with in software engineering are abstract objects (i.e., requirements specifications, design rules, computer programs, software processes). Most of us have little or no training in the determination of the properties (attributes) of abstract objects. Thus, it will be very difficult for us to measure these attributes.

Measurement, in the most general sense, is the mapping of numbers (symbols) to attributes of objects in accordance with some prescribed rule. The property of the entities that determines the mapping according to that rule is called a magnitude, the measurable attribute. The number assigned to a particular object by the mapping rule is called its measure, the amount or degree of its magnitude. The mapping rule will define both the magnitude and the measure. The mapping must preserve intuitive and empirical observations about the attributes and entities.

Not all of these mappings are without ambiguity. In earlier, more simple times, the attribute gender of a person would map into one of two categories (male or female), corresponding to the physical properties of the individual. With our new understanding of genetics, the world is not so simple. In addition to males and females, we also now recognize super-males with XYY chromosomes and super-females with XXX chromosomes. What is really interesting is that sometimes a complete understanding of these attributes is vital, such as in sporting events; while at other times, it is irrelevant, such as assigning genders to restroom facilities.

Very frequently among the practitioners of software measurement in software engineering, the following question is asked: "What software metrics should we be using?" Or, "What is the best metric to use?" Let us rephrase these questions in a slightly different context. Suppose we were to ask an automotive engineer, "What is the best measure for this car?" Or, "What are the metrics I should use for this car?" Such questions would seem foolish to the automotive engineer. The engineer would know that each automobile has multiple attributes, all of which are necessary to describe the automobile. Red might be our favorite color. We would probably not use this as a sole criterion in the purchase of an automobile. Ferraris come in red, as do Fiats. There are enormous differences between these two types of vehicles with respect to other attributes, such as cost and performance.

2.2.1 Primitive and Derived Measurements

There are basically two types of measurement that we might employ in the software engineering discipline: primitive measures and derived measures. A primitive measurement is one that presupposes no others. It is atomic. It measures a single identifiable attribute of an entity. A derived measurement is one that is obtained from primitive measures through numerical calculation, or possibly indirect observation.

In the field of software metrics, Maurice Halstead, a pioneer in the field of software measurement, identified four primitive measures of a program. He observed that program tokens (terminal symbols in the syntax of the programming language) could be grouped into one of two classes: operators or operands. From this observation he then identified four primitive measures of these operators and operands as follows:

η₁, the number of unique operators in the program
η₂, the number of unique operands in the program
N_l, the total operator count
N₂, the total operand count

From this set of metric primitives, he then went on to create a set of derived measures. A subset of these derived metrics is as follows:

η = η₁ + η₂, the vocabulary size of a program
N = N₁ + N₂, the program length
V = N log₂ η, the program volume
, the estimated program length

There are many more of these derived metrics. It would be a waste of good paper to list any more of them. It is not clear just precisely what program attributes they are actually measuring, nor did Halstead attempt to validate the constructs that these metrics purported to measure.

Yet another context for the concept of a derived measurement is the circumstance when we are unable to measure directly an attribute we wish to know the value of. A case in point is the determination of the weight of individual gas molecules. We cannot determine these directly, but we may do so by employing Avogadro's law: that equal volumes of all gases, at the same temperature and pressure, contain equal numbers of molecules. We can take a gas whose molecular weight is known, weigh a sample of this gas, and then weigh a sample of a gas of equal volume whose molecular weight is not known. We can then derive the weight of the unknown gas by multiplication.

The most important thing we must know about derived metrics is that we cannot add apples and oranges. A new metric simply cannot be derived from two or more simple metrics. We recognize this fact in several aspects of our daily life. We do not recognize this fact in our analysis of software engineering problems. Consider the case of the city of Gold Hill, Colorado. Outside the town there is a sign that reads:

Gold Hill
Established	1859
Elevation	8463
Population	118
Total	10440

This is really a funny sign. It is obvious to us that the Total is meaningless. Now consider the following example from function point analysis.

Unadjusted Function Point Count
Number of external input types	i
Number of external output types	o
Number of inquiries	e
Number of external files	p
Number of internal files	f
Total	4 × i + 5 × o + 7 × p + 10 × f

Apparently the folks in Gold Hill do not have a lock on humorous metrics.

2.2.2 Scales

Some measures are intrinsically qualitative in nature. They will cause the mapping of an entity into a set of mutually exclusive classes. Automobiles, for example, can be seen to have the manufacturer attribute. This attribute may have one of the distinct values of Ford, Toyota, Chrysler, etc. Other types of measures will result in the assignment of a number or quantity for that attribute. Some of these numerical assignments will simply allow us to order instances of each entity. Other numerical assignments will allow us to perform arithmetic on the measures.

A scale of measurement determines the actual assignment of the numbers and symbols by the rules. The term "scale" may be taken to be the measuring instrument or the standard of measurement. It determines what operations among the numbers (symbols) assigned in a measurement will yield results significant for the particular attribute being measured. There are essentially two classes of scales: scales of intensive (qualitative) and extensive (quantitative) measurement. Qualitative scales permit us to determine, at most, a degree of the relationship between two or more instances of an entity. Quantitative scales, on the other hand, permit us to answer questions of how much or how many.

2.2.2.1 Qualitative Measurement Scales.

The nominal scale is one in which numbers (symbols) are assigned only as labels or names. The mappings on this scale are into mutually exclusive qualitative categories. An instance of the entity automobile may be perceived to belong to exactly one of the classes of automobile manufacturer. An instance of a human being will have the sexual property of male or female. Eye color and political affiliation are other examples of measures defined on a nominal scale.

Measures defined on nominal scales do not permit the use of arithmetic operators or relational operators. In this new age, one could get into serious trouble by presuming that males were in some sense better than females. Furthermore, the sum of two males is not a female. Sometimes, numbers will be used to represent this assignment. For example, we might let a variable x₁ = 0 represent the observation that a particular instance of the entity person is a male. Similarly, we might also let x₂ = 1 represents the observation that a particular instance of the same entity is a female. It would not be politically correct (nor accurate) for us to present the fact that x₁ < x₂.

Sometimes, these nominal scales lurk in the guises of numbers. In particular is the case of Social Security numbers. This personal attribute is strictly nominal. That someone has a Social Security number less than mine carries no essential information. Similarly, the numerical difference between two such numbers has no meaning. This fact is lost on many computer analysts and programmers who employ numerical conversions on these nominal data.

The partially ordered scale orders some relatively homogeneous subset of objects. The movie rating scale (G, PG-13, PG, R, and X) is an example of such a scale. The assignment of movies to each of these groups largely depends on preference dictated by cultural imperatives. In the United States, it is thought that children should be shielded from aspects of human sexuality. The exposure of gratuitous violence to these same children is not thought to be important. Other more pacific cultures might well accept human sexuality as quite normal and regard gratuitous violence as something to shield their children from. These people would rate the very same movies that we have classified to be X-rated as suitable for teenagers (PG-13) as long as these movies were free from violence. Similarly, the assignment of a person on the political scale of liberalism is very much a matter of perspective. A conservative politician to me might well be your view of a liberal renegade. It is also most unfortunate that this type of scale dominates the current standard for employee evaluation, the assessment of the worth of computer software, etc.

The ordinal scale provides for the numerical ordering of a set of objects, although this may be a weak ordering with the same number being assigned to two or more elements of the set. We generally associate this scale with the concept of ranking. With this scale it is possible to take a set of programs and rank them according to some criterion, such as cost, performance, or reliability. The ordinal scale allows us to use the set of relational operators (<, ≤, =, ≠, ≥, >) on the rank attributes of objects assigned values on an ordinal scale. Thus, it is possible to say that program A performs better than program B. Further, measures defined on this scale exhibit transitivity. Thus, if A > B, and B > C, then A > C.

What is lacking from the ordinal scale is some sense of the distance between objects of differing ranks. That is, if programmer A is the fifth ranked programmer in a company, programmer B is the sixth ranked programmer, and programmer C is the seventh ranked programmer, the differences between A and B may be inconsequentially small whereas the differences between B and C may be very large. If we are able to make such assertions that the differences between the sets of programmers are meaningful, then the scale is not just ordinal but constitutes an ordered metric. Not only are the objects ordered, but also the intervals between them are at least partially ordered.

Computer programmers can be assigned into classes of novice, programmer, lead programmer, and team leader. The presumption here is that a programmer at a given level can consistently perform a given programming task better than a programmer of lower rank. If we can establish that lead programmers consistently excel over programmers, whereas programmers generally excel over novices, then we are well on the way to establishing an ordered metric.

2.2.2.2 Quantitative Measurement Scales.

The interval scale is a scale that provides equal intervals from some arbitrary numerical point on the scale. An example of such a scale is the Fahrenheit (or Centigrade) temperature scale. Neither of these scales has an absolute zero point. While there is a 0° point on the Fahrenheit scale, a negative temperature does not represent the loss of anything nor does a positive temperature represent a gain in anything.

Interval scales do make use of the arithmetic operators (+, -). If the high temperature today is 60° and yesterday's high was 50°, then I can assert with meaning that today it is 10° warmer than yesterday. As an aside, it is interesting to note that multiplicative relationships can be used on the sums (differences) of these same temperatures. Thus, the difference in temperature between 50° and 70° is twice as great as the difference between 50° and 60°.

The ratio scale, on the other hand, does have an absolute zero point. The multiplying and exponentiation operators (x, ÷, ↑) can be used with measures on this scale. It is also true that any of the relational operators and adding operators can also be used. A ratio scale, for example, can be used to define the amount of money in my bank account. In this case, there is real meaning to the statement that I have a -$50 in my account. I am overdrawn. I owe the bank $50. If I have a current balance of $500, then the bank owes me $500. Further, if someone else has $1000 in his or her account, then he or she has twice as much as I do. If another individual has $5000 in his account, he has an order of magnitude more money than I do.

It is interesting to note that as I drive to my home, a measure of distance may be obtained from those little mileage signposts on the road. All of these signposts are measured from the southwest point in my state. This measure of distance is defined on an interval scale. The only zero on this scale is a very long distance from my home. My velocity relative to my home will be defined on a ratio scale. As long as I am driving toward my house, this velocity will be positive. As soon as I begin to drive away from my house, this velocity will be negative. If I stop, it becomes zero.