 < Day Day Up > 

The approach analyzed above implies that the best forecasting methods for each of the 300 items in each of the three forecasting groups have to be determined individually, as well as the individual weights for each item. A second experiment was to examine the effects of using the same forecasting methods for each of the 10 items. The method within each forecasting group that gave the best forecasting results for the largest number of items was chosen. This was the Holt's method for the exponential smoothing group and the AR(1) model for the BoxJenkins group. However, there was a tie between the linear and hyperbolic methods for the regression over time group, so both methods were included in the weightings.
In this case, the QP model gave weighted combinations of two or more forecasting methods for all but one of the 10 items. The weighting methods evaluated were the Fixed Weights, the Rolling Window Weights and the Simple Averages. The results of the evaluations are summarized, on the same criteria as before, in Tables 4a and 4b. To provide a direct comparison with the earlier results, the items are divided into the same samples A and B as before.
WEIGHTING METHOD  OP  BP^{[a]}  WP^{[a]}  MPI  WPP  SRMSE 

Simple Average  2  2  2  5.8%  26.9%  3755.9 
Fixed  3  5  0  7.0%  14.8%  3614.0 
Rolling Window  0  1  6  0.9%  23.9%  3818.0 
^{[a]}Best or worst of the three methods compared in the table. 
WEIGHTINGMETHOD  OP  BP ^{[a]}  WP^{[a]}  MPI  WPP  SRMSE 

Simple Average  2  1  1  11.1%  6.1%  315.4 
Fixed  1  0  1  15.9%  3.6%  338.2 
Rolling Window  1  1  0  30.4%  0.3%  319.2 
^{[a]}Best or worst of the three methods compared in the table. 
In comparing the three weighting methods, the results are very similar to those presented on the earlier analysis summarized in Tables 3a and 3b. The Fixed Weights is by far the best for the 8 items, but the worst for the 2 items. The Simple Average Weights method is the best for the 2item sample B but the second best for the 8item sample A. Interestingly, in comparing the results between Tables 3 and 4 for the different weighting methods and samples, there is little difference in the performance. Over all 10 items the sum of root mean squared errors is 3952 (3922) for Fixed Weights, 4138 (4058) for Rolling Window Weights and 4071 (4033) for Simple Average Weights; the numbers in brackets being the earlier values. There is virtually no difference for any method. The hybrid method of using simple averaging if the QP model only identified one weight at unity again gave a minor improvement. This result suggests that there is little to be gained by using different forecasting methods for the different items. There is then no need to find the best forecasting model in each forecasting group for all the 300 items that have to be stocked. Note, however, that the actual smoothing parameters in some of the models, the Holt's method, for example, will differ from item to item.
Assuming that the forecasts can be considered unbiased, an F test applied to a pairwise comparison of the error variances of the combined forecasts and the Holt's forecasts for each item would appear to be a formal test of the statistical significance of the results. The sum of squared errors is calculated over 12 monthly forecasts. The critical value at the 5% significance level for the F (12, 12) is 2.69. This value implies that the root mean squared error for an item would have to be reduced by 40% for it to be statistically significant. This is quite a large value and is due to the small sample size. Table 5 shows that two of the items do pass this test.
ITEM  Sum of Squared Errors (Holt's Method)  Sum of Squared Errors (Combined Forecasts)  F Statistic 

SAV740  57,348  40,591  1.41 
SAV091  4,438  4,387  1.01 
SAV763  19,207,862  16,330,067  1.18 
SAV012  1,842,651  1,800,108  1.02 
REM061  1,340,917  1,286,603  1.04 
SAV739  284,149  69,513  4.09 
CUA085  2,378,543  835,275  2.85 
SAV013  2,036,928  1,858,052  1.10 
CUA778  9,336,570  8,091,448  1.15 
REM037  2,152,837  2,144,509  1.00 
The Ftest considers each series in isolation. An important result is that the combination of forecasts reduces the sum of the squared errors (SSE) for each one of the 10 items. The way the experiments have been structured means that the choice of weights and forecasting method is done on the basis of fitting the data over the first 48 months. The alternatives are then evaluated by their performance on a further set of data beyond the fitting period, months 49 to 60. The same treatment is being applied to each of the 10 series. If the combination of weights had no consistent effect on performance, then one would expect the differences between the SSE for Holt's method and the SSE for the combination of forecasts to be randomly positive or negative across the 10 items. On a null hypothesis that the median of the differences between the two SSEs is zero, then one would expect there would be 5 positive (Holt's error bigger than combined) and 5 negative differences between the two SSEs. The sampling distribution of the number of positive (or negative) signs will be binomial with 0 = 0.5. It is a one tailed test of the alternative hypothesis that the median difference is greater than 0. The probability of having zero negative differences for the 10 items under the null hypothesis is 0.001, from any standard binomial probability tables. So looking at the results across all 10 items as a whole, rather than individually, there is a statistically significant reduction in the forecasting errors from using the combination of forecasts compared to using the Holt's method alone.
The results in all the tables indicate that the Simple Average Weights method gives a worse performance than the Fixed Weights method. However, using some common set of weights for all items would save the effort of having to apply the quadratic programming model to all 300 items. The final experiment is to check what loss or gain would result from this. The average weights over all 10 items for each of the four forecasting methods just analyzed were used. The actual values were 0.3938 for the AR(1) method, 0.2388 for the Holt's method, 0.1441 for the hyperbolic and 0.2233 for the linear regression overtime methods. The performance of this weighting method is given in Table 6. The results on both samples are almost midway between those for the Fixed Weights and those for the Simple Average Weights method in Tables 4a and 4b. The sum of the root mean squared errors over all 10 items is 4060, about 2.5% higher than for determining the individual Fixed Weights for each series separately for the same four forecasting methods. It is however about 7.6% better than using the Holt's method only for each item (see summary Table 7).
OP  MPI  WPP  SRMSE  

Sample A  3  5.8%  22.1%  3736.4 
Sample B  2  5.7%  4.4%  323.5 
N.B. BP and WP criteria not applicable here. 
FORECASTS  WEIGHTS  Sum of Root Mean Squared Errors  Percentage Saving on Holt's 

Holt's  4396  
Best over Past Year  4076  7.3%  
Four Common Methods  Fixed Common  4060  7.6% 
Four Common Methods  Individual Fixed  3952  10.1% 
Individual Rolling  4138  5.9%  
Hybrid  3929  10.6%  
Individual Three  Individual Fixed  3922  10.8% 
Best Methods  Individual Rolling  4058  7.7% 
Hybrid  3905  11.2% 
 < Day Day Up > 
