 < Day Day Up > 

The results of the experiments are summarized in two tables, Table 3a and Table 3b. The results are divided and discussed in this way because the QP model for the Fixed Weighting procedure for two of the items selected only one forecasting method for the combination, assigning a weight of 1 to one method and 0 to the other two. The results are based on the forecasts made for the test period, periods 49 to 60 inclusive, which are the last 12 months of the data. Interestingly, the single method selected was different for each of the two items. The set of 8 items is denoted as sample A of items and the results for this sample, for which a combined forecasting model was produced by the quadratic programming model for the Fixed Weights method, are given in Table 3a. The other 2 items are denoted as sample B, and their results are given in Table 3b.
WEIGHTING METHOD  OP  BP  WP  MPI  WPP  SRMSE 

Simple Average  1  3  2  5.5%  18.0%  3698.7 
Fixed  2  3  0  5.7%  13.8%  3570.1 
Rolling Window  0  0  2  0%  14.9%  3718.0 
Highest  4^{[a]}  2  4  24.3%  3725.0  
^{[a]}Highest Weights cannot outperform the best. This is the number of items for which the quadratic programming model's highest weight was in fact the best individual forecasting method. 
WEIGHTING METHOD  OP  BP  WP  MPI  WPP  SRMSE 

Simple Average  1  1  0  2.4%  2.0%  334.35 
Rolling Window  0  1  0  0%  0.3%  339.82 
Highest/Fixed ^{[a]}  0  0  2  3.1%  3.6%  351.32 
^{[a]}Note that the QP ends up with a solution of zero weights except for one at 1. So the Fixed Weights and Highest Weights methods give the same result. 
From Table 3a, it can be seen that the Fixed Weights (FW) method is the best of the four on all of the indicators. For 2 out of the 8 series, the method outperforms the actual best individual forecasting method. For the other 6 items, the weights produced are probably not stable but this cannot be remedied even by the Rolling Window Weights (RW) method. Updating the weights using a rolling window does not outperform the best in any one of these 6 cases. Method RW does worse than FW in 7 out of the 8 series. In the Highest Weight method, the highest weight was likely to be given to the best forecasting method over periods 37 to 48, the weight estimation period. However, it may not generalize well to the test period. The Highest Weight method only correctly chose the best forecasting method for the test period in 4 out of the 8 series. This indicates that the best forecasting method for an item may change over time. This may also indicate that none of the forecasting models alone are adequate to model the series over different time periods. The Simple Average Weights is the second best of the four methods. It has the second lowest sum of root mean squared errors. However, the difference between the best and the second best method is much greater than between the second best and the worst method.
Table 3b shows the results for the other 2 items where the Fixed Weights method does not suggest any combination of forecasts. When the forecasts were compared with the actuals over the periods 37 48 used for the weights estimation for the Fixed Weights method, it was found that the forecasts of the 3 time series models were either all underestimating or all overestimating the values at the same time. Furthermore, the 3 time series methods for each item have quite significantly different insample sum of squared errors. For example, the second best method usually had a sum of squared errors 25% more than the best in the weights estimation period. In all these circumstances, it is unlikely that any linear combination can really give an improvement over the best individual method. From Table 3b, the bad worst case performance by the Highest Weight method suggests that these demand series are not easy to be modelled by a single method. It should be remembered also that the forecasting method selected for each item was different and from a different forecasting group. The results also show that the forecasting method given the highest weight by the QP model, the best for earlier periods, was not the best forecasting model for the last 12 periods, 49 to 60. This would add further support that a single forecasting model for the demand time series is undesirable. In these circumstances, combination of forecasts over any moderate length of time may be promising since it mitigates the effects of one very bad forecast that might occur with the one chosen method. These two items suggest that the Simple Average and the Rolling Window Weights methods are useful and have similar sum of root mean squared errors. However, the Simple Average Weights method performs better on the OP and MPI tests, and it is preferred especially on the grounds of ease of calculation.
The choice of best method for this sample of 10 items must take account that it will be applied to the full 300 items stocked. There are contradictory results between sample A and sample B. The Simple Average Weights method gives the best results for the smaller sample B, but it does worse than Fixed Weights on sample A for the first 8 items. The Fixed Weights method does best on sample A items but worst on sample B. There appears to be no simple and obvious way of distinguishing between the two samples, except from the results of the QP model giving a weight of 1 to only one forecasting method. Combining all 10 items, the sum of the root mean squared errors are 3922 for Fixed Weights, 4033 for Simple Average Weights, 4058 for Rolling Window Weights and 4076 for Highest Weight methods. The Fixed Weights is the best method. The Simple Average Weights method has a root mean squared error value (RMSE) about 2.8% higher. The RMSE values for the Rolling Window Weights and Highest Weights methods are about 3.5% and 3.9%, respectively, higher than for the best method. The best that could have been done in hindsight was to use the actual individual best forecasting method for each item for the test period, which would have given a RMSE value of 3837. The results thus show that the Fixed Weights method of combining forecasts gives a result which is only about 2.2% worse than the best one of the three forecasting methods used in the combination. However, this is 3.8% better than using the best forecasting method over the previous 12 months. The particular forecasting method that most commonly appeared in the combinations was the Holt's model. This would have given a RMSE of 4396, about 12% above the weighted linear combination models. These results suggest that there is value in combining the forecasts but fail to suggest that there is anything to be gained by updating the weights every period.
 < Day Day Up > 
