Criteria for Causality | Fundamentals of Measurement Theory

The isolation of cause and effect in controlled experiments is relatively easy. For example, a headache medicine was administered to a sample of subjects who were having headaches. A placebo was administered to another group with headaches (who were statistically not different from the first group). If after a certain time of taking the headache medicine and the placebo, the headaches of the first group were reduced or disappeared, while headaches persisted among the second group, then the curing effect of the headache medicine is clear.

For analysis with observational data, the task is much more difficult. Researchers (e.g., Babbie, 1986) have identified three criteria:

The first requirement in a causal relationship between two variables is that the cause precede the effect in time or as shown clearly in logic.
The second requirement in a causal relationship is that the two variables be empirically correlated with one another.
The third requirement for a causal relationship is that the observed empirical correlation between two variables be not the result of a spurious relationship.

The first and second requirements simply state that in addition to empirical correlation, the relationship has to be examined in terms of sequence of occurrence or deductive logic. Correlation is a statistical tool and could be misused without the guidance of a logic system. For instance, it is possible to correlate the outcome of a Super Bowl (National Football League versus American Football League) to some interesting artifacts such as fashion (length of skirt, popular color , and so forth) and weather. However, logic tells us that coincidence or spurious association cannot substantiate causation.

The third requirement is a difficult one. There are several types of spurious relationships, as Figure 3.8 shows, and sometimes it may be a formidable task to show that the observed correlation is not due to a spurious relationship. For this reason, it is much more difficult to prove causality in observational data than in experimental data. Nonetheless, examining for spurious relationships is necessary for scientific reasoning; as a result, findings from the data will be of higher quality.

Figure 3.8. Spurious Relationships

graphics/03fig08.gif

In Figure 3.8, case A is the typical spurious relationship between X and Y in which X and Y have a common cause Z . Case B is a case of the intervening variable, in which the real cause of Y is an intervening variable Z instead of X . In the strict sense, X is not a direct cause of Y . However, since X causes Z and Z in turn causes Y , one could claim causality if the sequence is not too indirect. Case C is similar to case A. However, instead of X and Y having a common cause as in case B, both X and Y are indicators (operational definitions) of the same concept C . It is logical that there is a correlation between them, but causality should not be inferred.

An example of the spurious causal relationship due to two indicators measuring the same concept is Halstead's (1977) software science formula for program length:

graphics/03icon18.gif

where

N = estimated program length

n 1 = number of unique operators

n 2 = number of unique operands

Researchers have reported high correlations between actual program length (actual lines of code count) and the predicted length based on the formula, sometimes as high as 0.95 (Fitzsimmons and Love, 1978). However, as Card and Agresti (1987) show, both the formula and actual program length are functions of n 1 and n 2 , so correlation exists by definition. In other words, both the formula and the actual lines of code counts are operational measurements of the concept of program length. One has to conduct an actual n 1 and n 2 count for the formula to work. However, n 1 and n 2 counts are not available until the program is complete or almost complete. Therefore, the relationship is not a cause-and-effect relationship and the usefulness of the formula's predictability is limited.

What Is Software Quality?

Software Development Process Models

Fundamentals of Measurement Theory

Software Quality Metrics Overview

Applying the Seven Basic Quality Tools in Software Development

Defect Removal Effectiveness

The Rayleigh Model

Exponential Distribution and Reliability Growth Models

Quality Management Models

In-Process Metrics for Software Testing

Complexity Metrics and Models

Metrics and Lessons Learned for Object-Oriented Projects

Availability Metrics

Measuring and Analyzing Customer Satisfaction

Conducting In-Process Quality Assessments

Conducting Software Project Assessments

Dos and Donts of Software Process Improvement

Using Function Point Metrics to Measure Software Process Improvements

Concluding Remarks

A Project Assessment Questionnaire

A Project Assessment Questionnaire