Chapter 54: Forecasting in the Presence of Special Events | MicrosoftВ® Office ExcelВ® 2007: Data Analysis and Business Modeling (Bpg -- Other)

Overview

How can I determine whether specific factors influence customer traffic?
How can I evaluate forecast accuracy?
How can I check whether my forecast errors are random?

For a student project, we attempted to forecast the number of customers visiting the Eastland Plaza Branch of the Indiana University (IU) Credit Union each day. Interviews with the branch manager made it clear that the following factors affected the number of customers:

Month of the year
Day of the week
Whether the day was a faculty or staff payday
Whether the day before or the day after was a holiday

How can I determine whether specific factors influence customer traffic?
The data collected is contained in the Original worksheet in the file Creditunion.xlsx, shown in Figure 54-1 on the next page. If we try to run a regression on this data by using dummy variables (as described in Chapter 48, “Incorporating Qualitative Factors into Multiple Regression”), the dependent variable would be the number of customers arriving each day (the data in column E). We would need 19 independent variables:
- 11 to account for the month (12 months minus 1)
- 4 to account for the day of the week (5 business days minus 1)
- 2 to account for the types of paydays that occur each month
- 2 to account for whether a particular day follows or precedes a holiday
Figure 54-1: Data used to predict credit union customer traffic
Microsoft Office Excel 2007 allows only 15 independent variables, so it appears that we’re in trouble.
When a regression forecasting model requires more than 15 independent variables, we can use the Excel Solver feature to estimate the coefficients of the independent variables. We can also use Excel to compute the R-squared values between forecasts and actual customer traffic and the standard deviation for the forecast errors. To analyze this data, I created a forecasting equation by using a lookup table to “look up” the day of the week, the month, and other factors. Then I used Solver to choose the coefficients for each level of each factor that yields the minimum sum of squared errors. (Each day’s error equals actual customers minus forecasted customers.) Here are the particulars.
I began by creating indicator variables (in columns G through J) for whether the day is a staff payday (SP), faculty payday (FAC), before a holiday (BH), or after a holiday (AH). (See Figure 54-1.) For example, in cells G4, H4, and J4, I entered 1 to indicate that January 2 was a staff payday, faculty payday, and after a holiday. Cell I4 contains 0 to indicate that January 2 was not before a holiday.
Our forecast is defined by a constant (which helps to center the forecasts so that they will be more accurate), and effects for each day of the week, each month, a staff payday, a faculty payday, a day occurring before a holiday, and a day occurring after a holiday. I inserted trial values for all these parameters (the Solver changing cells) in the cell range O4:O26, shown in Figure 54-2. Solver will then choose values that make our model best fit the data. For each day, our forecast of customer count will be generated by the following equation:

Figure 54-2: Changing cells and customer forecasts
```
 Predicted customer count=Constant+(Month effect)+(Day of week effect)+(Staff payday effect, if any)+(Faculty payday effect, if any)+(Before holiday effect, if any)+(After holiday effect, if any)
```
Using this model, we compute a forecast for each day’s customer count by copying from K4 to K5:K257 the formula
```
 $O$26+VLOOKUP(B4,$N$14:$O$25,2)+VLOOKUP(D4,$N$4:$O$8,2) +G4*$O$9+H4*$O$10+I4*$O$11+J4*$O$12
```
Cell O26 picks up the constant term. VLOOKUP(B4,$N$14:$O$25,2) picks up the month coefficient for the current month, and VLOOKUP(D4,$N$4:$O$8,2) picks up the day of the week coefficient for the current week. G4*$O$9+H4*$O$10+I4*$O$11+ J4*$O$12 picks up the effects (if any) when the current day is SP, FAC, BH, or AH.
By copying from L4 to L5:L257 the formula (E4-K4)^2, I compute the squared error for each day. Then, in cell L2, I compute the sum of squared errors with the formula SUM(L4:L257).
In cell R4, I average the day of the week changing cells with the formula AVERAGE (O4:O8), and in cell R5, I average the month changing cells with the formula AVERAGE (O14:O25). Later, we’ll constrain the average month and day of the week effects to equal 0, which ensures that a month or day of the week with a positive effect has a higher than average customer count, and a month or day of the week with a negative effect has a lower than average customer count.
We can use the Solver settings shown in Figure 54-3 on the next page to choose our forecast parameters to minimize the sum of squared errors.

Figure 54-3: Solver Parameters dialog box for determining forecast parameters
Our Solver model changes the coefficients for the month, day of the week, BH, AH, SP, FAC, and the constant to minimize the sum of square errors. We also constrain the average day of the week and month effect to equal 0. Using the Solver, we obtain the results shown in Figure 54-2. For example, we find that Friday is the busiest day of the week and June is the busiest month. A staff payday raises our forecast (all else being equal-in the Latin, ceteris paribus) by 397 customers.
How can I evaluate forecast accuracy?
To evaluate the accuracy of the forecast, we compute the R-squared value between the forecasts and the actual customer count in cell J1. The formula we use is RSQ(E4:E257, K4:K257). This formula computes the percentage of the actual variation in customer count that is explained by our forecasting model. We find that our independent variables explain 77 percent of the daily variation in customer count.
We compute the error for each day in column M by copying from M4 to M5:M257 the formula E4–K4. A close approximation to the standard error of the forecast is given by the standard deviation of the errors. This value is computed in cell M1 by using the formula STDEV(M4:M257). Thus, approximately 68 percent of our forecasts should be accurate within 163 customers, 95 percent accurate within 326 customers, and so on.
Let’s try and spot any outliers. Recall that an observation is an outlier if the absolute value of our forecast error exceeds two times the standard error of the regression. Select the range M4:M257, and then click Conditional Formatting on the Home tab. Next, select New Rule and in the New Formatting Rule dialog box, choose Use A Formula To Determine Which Cells To Format. Fill in the rule description in the dialog box as shown in Figure 54-4. (For more information about conditional formatting, see Chapter 22, “Conditional Formatting.”)

Figure 54-4: Using conditional formatting to spot forecast outliers
After choosing a format with a red font, our conditional formatting settings will display in red any error that exceeds 2*(standard deviation of errors) in absolute error. Looking at the outliers, we find that we often underforecast the customer count for the first three days of the month. Also, during the second week in March (spring break), we overforecast, and the day before spring break, we greatly underforecast.
To remedy this problem, in the 1st Three Days worksheet, we added changing cells for each of the first three days of the month and for spring break and the day before spring break. We added trial values for these new effects in cells O26:O30. By copying from K4 to K5:K257 the formula
```
 $O$25+VLOOKUP(B4,$N$13:$O$24,2)+VLOOKUP(D4,$N$4:$O$8,2)+G4*$O$9+H4* $O$10+I4*$O$11+J4*$O$12+IF(C4=1,$O$26,IF(C4=2,$O$27,IF(C4=3,$O$28,0)))
```
we include the effects of the first three days of the month. (The term IF(C4=1,$O$26, IF(C4=2,$O$27,IF(C4=3,$O$28,0))) picks up the effect of the first three days of the month.) We manually entered the spring break coefficients in cells K54:K57. For example, in cell K52 we added +O29 to the formula, and in cells K53:K57, we added +O30.
After including our new changing cells in the Solver dialog box, we find the results shown in Figure 54-5 on the next page. Notice that the first three days of the month greatly increase customer count (probably because of government support and Social Security checks) and that spring break reduces customer count. Figure 54-5 also shows the improvement in our forecasting accuracy. We have improved our R squared value (RSQ)t o 87 percent and reduced our standard error to 122 customers.

Figure 54-5: Forecast parameters and forecasts including spring break and the first three days of the month
By looking at the forecast errors for the week 12/24 through 12/31 (see Figure 54-6), we see that we’ve greatly overforecasted the customer counts for the days in this week. We also underforecasted customer counts for the week before Christmas. Further examination of our forecast errors (often called residuals) also shows us the following:
- Thanksgiving is different than a normal holiday in that the credit union is far less busy than expected the day after Thanksgiving.
- The day before Good Friday is really busy because people leave town for Easter.
- Tax day (April 16) is also busier than expected.
- The week before Indiana University starts fall classes (last week in August) was not busy, probably because many staff and faculty take a “summer fling vacation” before the hectic onrush of the fall semester.
Figure 54-6: Errors for Christmas week
In the Final mode worksheet, I added changing cells to incorporate the effects of these factors. After adding the new parameters as changing cells, we ran Solver again. The results are shown in Figure 54-7. Our RSQ is up to 92 percent and our standard error is down to 98.61 customers! Note that the post-Christmas week reduced our daily customer count by 359; the day before Thanksgiving added 607 customers; the day after Thanksgiving reduced customer count by 161, and so on.

Figure 54-7: Final forecast parameters
Notice how we’ve improved our forecasting model by using outliers. If your outliers have something in common (like being the first three days of the month), include the common factor as an independent variable and your forecasting error will drop.
How can I check whether my forecast errors are random?
A good forecasting method should create forecast errors or residuals that are random. By random errors, I mean that our errors exhibit no discernible pattern. If forecast errors are random, the sign of your errors should change (from plus to minus or minus to plus) approximately half the time. Therefore, a commonly used test to evaluate the randomness of forecast errors is to look at the number of sign changes in the errors. If you have n observations, nonrandomness of the errors is indicated if you find either fewer than
or more than
changes in sign. In the Final Model worksheet, as shown in Figure 54-8, I determined the number of sign changes in our residuals by copying from cell P5 to P6:P257 the formula IF(M5*M4<0,1,0). A sign change in the residuals occurs if and only if the product of two consecutive residuals is negative. Therefore, our formula yields 1 whenever a change in the sign of the residuals occurs. There were 125 changes in sign. In cell P1, I computed
changes in sign as the cutoff for nonrandom residuals. Therefore we have random residuals.

Figure 54-8: Determing whether the residuals are random
A similar analysis was done to predict daily customer counts for dinner at a major restaurant chain. The special factors corresponded to holidays. We found Super Sunday (the day of the football Super Bowl) to be the least busy day and Valentine’s Day and Mother’s Day to be the busiest. Also, Saturday was the busiest day of the week for dinner and Friday was the busiest day of the week for lunch.