7.3 Nonlinear Models

The nonlinear modeling process begins with a nonlinear hypothesis. There are essentially two types of nonlinear models that are used for software measurement data. First, are the polynomial models of the form:

(34)

These models can grow quite complex when we have multiple independent variables. A model can be linear in a subset of independent variables and nonlinear in the rest. An alternate nonlinear model can take the form:

(35)

where α is a parameter of the model.

Nonlinear models of the second type have little place or value in our work in software measurement. They are of no utility because we cannot formulate a reasonable prior hypothesis that looks like this: Faults = 0.04 * LOC^1.62. We commonly see these models employed by researchers who are ignorantly beating their data with a big statistical stick. They are trying to find a model to fit the data. Finding a model that fits the data and one that provides good future predictive capability are two very different things. The activity of fitting a model to the data is an unfortunate result of the common availability of very sophisticated and inexpensive statistical analysis tools. An excellent set of mechanic's tools will not make a novice a good mechanic. Arming a common foot soldier with a tank will not make him an effective warfighter; it will just make him dangerous. W. Edwards Deming is on record as having allowed that nobody should be allowed access to raw data without a substantial background in statistics. We firmly believe that the use of statistical packages should be restricted to those individuals with a fair level of training in statistical analysis.

Polynomial models do have some practical validity. We can easily imagine circumstances when they are very representative of a reality that we have personally experienced. Consider the relationship between the feeling of euphoria and beer consumption. Just about every college freshman has conducted an ad hoc experiment on this relationship. When said college freshman attends his first keg party, he notices that his feeling of euphoria rises with every glass of beer consumed, up to a point. Then the incremental increase euphoria begins to drop. At some point in the consumption process, the euphoria value will have peaked and will start decreasing. The final glass of beer consumed will make our freshman regret the day he was born. This phenomenon clearly warrants a nonlinear hypothesis. We can go straight there. For those of us who have had the experience, there will be no need to conduct this experiment twice, first with a linear model and then with the obvious nonlinear second-order polynomial model.

Sometimes, nonlinear relationships with independent variables are not initially apparent. When we plot the residuals against each of the independent variables, we do expect that these values will be randomly distributed about each of the independent variables. If we see evidence of second- or third-order effects in the residuals (see Exhibit 14), then we will carefully examine the data and the measurement processes to try and determine the source of nonlinearity. Again, if we are satisfied that we have truly observed a nonlinear relationship between the dependent variable and one or more of the independent variables, we must formulate a new hypothesis to reflect this nonlinearity, design a new experiment, collect new data, and fit our new nonlinear model to the new data.

Sometimes, the data we work with are of great magnitude. Tiny relative fluctuations in these data will dominate more subtle phenomena that we are also modeling. In this case, it might be genuinely useful to transform the raw data. A common transformation is the logarithmic transformation. This transformation is commonly used for measuring the relative strengths of earthquakes on the logarithmic Richter scale. If we wish to perform a suitable transformation on our data, then our experimental hypothesis must reflect this transformation. If, for example, we were to model the linear relationship of the frequency of canary chirps and local earthquakes measured on the Richter scale, then we are really modeling the exponential relationship between chirps and the absolute energy of earthquakes. Stated another way, the models that we build that employ data transformations must make sense in terms of the inverse mapping of the transformed data.