Section 4.7. Testing Non-Linear Relationships


4.7. Testing Non-Linear Relationships

Regression assumes that the relationships in the data are linear. This is usually a safe assumption, but sometimes you can get a more accurate model if you allow for non-linear relationships . We can use the solver to test the potential value of using non-linear terms in the model.

We start by inserting a row at the top of the worksheet, setting it up as shown in Figure 4-26.

Figure 4-26. Setup to test non-linear relationships


This is a classification problem . We are dividing the population of dogs into two groups. As long as we can set a good threshold, we do not care how well the model predicts the exact value.

This means the intercept is not adding any value. The model will do just as well without it because we are only interested the correlation between the score and the payout. If we substitute a zero for the intercept in cell I2, the performance of the model does not change.

The results in Figure 4-26 are exactly the same as in Figure 4-23. Only the threshold is different. In row one of columns F, G, and H we enter 1. We change the formula in F3 from =B3*F$1 to =(B3^F$1)*F$2), and fill this new formula across to column H and down to row 712.

At the top of the L column we add the formula =CORREL(I3:I451,E3:E451) to measure the correlation between the scores in column I and the payouts in column E. The value is 0.103. This means the scores are positively correlated with the payouts, but the correlation is not especially strong.

We want to see if changing the values in F1:H1 can increase the correlation. For this we use the Solver, which is on the Tools menu. If the Solver does not appear as one of the items on the Tools menu, it may be necessary to select Add-Ins and make sure the Solver Add-In is checked.

The Solver dialog is filled out as shown in Figure 4-27.

Figure 4-27. The Solver dialog


The target cell is L1. This is the cell with the correlation formula and is the value we want to improve. Equal to Max is selected because we want the highest value possible for correlation.

The By Changing Cells field is set to the range F1:H2. This means Solver is allowed to change the values in this range to get the maximum possible value in L1.

In Figure 4-28 the results are compared to the best results for the non-linear model.

Figure 4-28. Comparing results


In Item 1 the correlation is increased from 0.103 to 0.1976. This is a significant increase and could mean the model will now do a better job. In Item 2 the Profit for the model group has gone up from $22.20 to $31.80. That is great, but the test group in Item 3 tells a different story. In the test group the profit of $6.80 has turned into a loss of $3.40. This suggests that the additional flexibility of the non-linear terms has caused the model to over-train.

In this case the linear model is the right one to use.



Analyzing Business Data with Excel
Analyzing Business Data with Excel
ISBN: 0596100736
EAN: 2147483647
Year: 2006
Pages: 101
Authors: Gerald Knight

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net