Level of Measurement | Fundamentals of Measurement Theory

We have seen that from theory to empirical hypothesis and from theoretically defined concepts to operational definitions, the process is by no means direct. As the example illustrates, when we operationalize a definition and derive measurement indicators, we must consider the scale of measurement. For instance, to measure the quality of software inspection we may use a five-point scale to score the inspection effectiveness or we may use percentage to indicate the inspection coverage. For some cases, more than one measurement scale is applicable ; for others, the nature of the concept and the resultant operational definition can be measured only with a certain scale. In this section, we briefly discuss the four levels of measurement: nominal scale, ordinal scale, interval scale, and ratio scale.

Nominal Scale

The most simple operation in science and the lowest level of measurement is classification. In classifying we attempt to sort elements into categories with respect to a certain attribute. For example, if the attribute of interest is religion, we may classify the subjects of the study into Catholics, Protestants, Jews, Buddhists, and so on. If we classify software products by the development process models through which the products were developed, then we may have categories such as waterfall development process, spiral development process, iterative development process, object-oriented programming process, and others. In a nominal scale, the two key requirements for the categories are jointly exhaustive and mutually exclusive. Mutually exclusive means a subject can be classified into one and only one category. Jointly exhaustive means that all categories together should cover all possible categories of the attribute. If the attribute has more categories than we are interested in, an "other" category is needed to make the categories jointly exhaustive.

In a nominal scale, the names of the categories and their sequence bear no assumptions about relationships among categories. For instance, we place the waterfall development process in front of spiral development process, but we do not imply that one is "better than" or "greater than" the other. As long as the requirements of mutually exclusive and jointly exhaustive are met, we have the minimal conditions necessary for the application of statistical analysis. For example, we may want to compare the values of interested attributes such as defect rate, cycle time, and requirements defects across the different categories of software products.

Ordinal Scale

Ordinal scale refers to the measurement operations through which the subjects can be compared in order. For example, we may classify families according to socio-economic status: upper class, middle class, and lower class. We may classify software development projects according to the SEI maturity levels or according to a process rigor scale: totally adheres to process, somewhat adheres to process, does not adhere to process. Our earlier example of inspection effectiveness scoring is an ordinal scale.

The ordinal measurement scale is at a higher level than the nominal scale in the measurement hierarchy. Through it we are able not only to group subjects into categories, but also to order the categories. An ordinal scale is asymmetric in the sense that if A > B is true then B > A is false. It has the transitivity property in that if A > B and B > C , then A > C .

We must recognize that an ordinal scale offers no information about the magnitude of the differences between elements. For instance, for the process rigor scale we know only that "totally adheres to process" is better than "somewhat adheres to process" in terms of the quality outcome of the software product, and "somewhat adheres to process" is better than "does not adhere to process." However, we cannot say that the difference between the former pair of categories is the same as that between the latter pair. In customer satisfaction surveys of software products, the five-point Likert scale is often used with 1 = completely dissatisfied, 2 = somewhat dissatisfied, 3 = neutral, 4 = satisfied, and 5 = completely satisfied. We know only 5 > 4, 4 > 3, and 5 > 2, and so forth, but we cannot say how much greater 5 is than 4. Nor can we say that the difference between categories 5 and 4 is equal to that between categories 3 and 2. Indeed, to move customers from satisfied (4) to very satisfied (5) versus from dissatisfied (2) to neutral (3), may require very different actions and types of improvements.

Therefore, when we translate order relations into mathematical operations, we cannot use operations such as addition, subtraction, multiplication, and division. We can use "greater than" and "less than." However, in real-world application for some specific types of ordinal scales (such as the Likert five-point, seven-point, or ten-point scales), the assumption of equal distance is often made and operations such as averaging are applied to these scales . In such cases, we should be aware that the measurement assumption is deviated, and then use extreme caution when interpreting the results of data analysis.

Interval and Ratio Scales

An interval scale indicates the exact differences between measurement points. The mathematical operations of addition and subtraction can be applied to interval scale data. For instance, assuming products A, B, and C are developed in the same language, if the defect rate of software product A is 5 defects per KLOC and product B's rate is 3.5 defects per KLOC, then we can say product A's defect level is 1.5 defects per KLOC higher than product B's defect level. An interval scale of measurement requires a well-defined unit of measurement that can be agreed on as a common standard and that is repeatable. Given a unit of measurement, it is possible to say that the difference between two scores is 15 units or that one difference is the same as a second. Assuming product C's defect rate is 2 defects per KLOC, we can thus say the difference in defect rate between products A and B is the same as that between B and C.

When an absolute or nonarbitrary zero point can be located on an interval scale, it becomes a ratio scale. Ratio scale is the highest level of measurement and all mathematical operations can be applied to it, including division and multiplication. For example, we can say that product A's defect rate is twice as much as product C's because when the defect rate is zero, that means not a single defect exists in the product. Had the zero point been arbitrary, the statement would have been illegitimate. A good example of an interval scale with an arbitrary zero point is the traditional temperature measurement (Fahrenheit and centigrade scales). Thus we say that the difference between the average summer temperature (80 °F) and the average winter temperature (16 °F) is 64 °F, but we do not say that 80 °F is five times as hot as 16 °F. Fahrenheit and centigrade temperature scales are interval, not ratio, scales. For this reason, scientists developed the absolute temperature scale (a ratio scale) for use in scientific activities.

Except for a few notable examples, for all practical purposes almost all interval measurement scales are also ratio scales. When the size of the unit is established, it is usually possible to conceive of a zero unit.

For interval and ratio scales, the measurement can be expressed in both integer and noninteger data. Integer data are usually given in terms of frequency counts (e.g., the number of defects customers will encounter for a software product over a specified time length).

We should note that the measurement scales are hierarchical. Each higher-level scale possesses all properties of the lower ones. The higher the level of measurement, the more powerful analysis can be applied to the data. Therefore, in our operationalization process we should devise metrics that can take advantage of the highest level of measurement allowed by the nature of the concept and its definition. A higher-level measurement can always be reduced to a lower one, but not vice versa. For example, in our defect measurement we can always make various types of comparisons if the scale is in terms of actual defect rate. However, if the scale is in terms of excellent , good, average, worse than average, and poor, as compared to an industrial standard, then our ability to perform additional analysis of the data is limited.

What Is Software Quality?

Software Development Process Models

Fundamentals of Measurement Theory

Software Quality Metrics Overview

Applying the Seven Basic Quality Tools in Software Development

Defect Removal Effectiveness

The Rayleigh Model

Exponential Distribution and Reliability Growth Models

Quality Management Models

In-Process Metrics for Software Testing

Complexity Metrics and Models

Metrics and Lessons Learned for Object-Oriented Projects

Availability Metrics

Measuring and Analyzing Customer Satisfaction

Conducting In-Process Quality Assessments

Conducting Software Project Assessments

Dos and Donts of Software Process Improvement

Using Function Point Metrics to Measure Software Process Improvements

Concluding Remarks

A Project Assessment Questionnaire

A Project Assessment Questionnaire