| C++ Neural Networks and Fuzzy Logic |
by Valluru B. Rao
M&T Books, IDG Books Worldwide, Inc.
ISBN: 1558515526 Pub Date: 06/01/95
|Previous||Table of Contents||Next|
The input and output layers are fixed by the number of inputs and outputs we are using. In our case, the output is a single number, the expected change in the S&P 500 index 10 weeks from now. The input layer size will be dictated by the number of inputs we have after preprocessing. You will see more on this soon. The middle layers can be either 1 or 2. It is best to choose the smallest number of neurons possible for a given problem to allow for generalization. If there are too many neurons, you will tend to get memorization of patterns. We will use one hidden layer. The size of the first hidden layer is generally recommended as between one-half to three times the size of the input layer. If a second hidden layer is present, you may have between three and ten times the number of output neurons. The best way to determine optimum size is by trial and error.
NOTE: You should try to make sure that there are enough training examples for your trainable weights. In other words, your architecture may be dictated by the number of input training examples, or facts, you have. In an ideal world, you would want to have about 10 or more facts for each weight. For a 10-10-1 architecture, there are (10X10 + 10X1 = 110 weights), so you should aim for about 1100 facts. The smaller the ratio of facts to weights, the more likely you will be undertraining your network, which will lead to very poor generalization capability.
We now begin the preprocessing effort. As mentioned before, this will likely be where you, the neural network designer, will spend most of your time.
Lets look at the raw data for the problem we want to solve. There are a couple of ways we can start preprocessing the data to reduce the number of inputs and enhance the variability of the data:
We are left with the following indicators:
Raw data for the period from January 4, 1980 to October 28, 1983 is taken as the training period, for a total of 200 weeks of data. The following 50 weeks are kept on reserve for a test period to see if the predictions are valid outside of the training interval. The last date of this period is October 19, 1984. Lets look at the raw data now. (You get data on the disk available with this book that covers the period from January, 1980 to December, 1992.) In Figures 14.3 through 14.5, you will see a number of these indicators plotted over the training plus test intervals:
Figure 14.3 The S&P 500 Index for the period of interest.
Figure 14.4 Long-term and short-term interest rates.
Figure 14.5 Breadth indicators on the NYSE: Advancing/Declining issues and New Highs/New Lows.
A sample of a few lines looks like the following data in Table 14.1. Note that the order of parameters is the same as listed above.
For each of the five inputs, we want use a function to highlight rate of change types of features. We will use the following function (as originally proposed by Jurik) for this purpose.
ROC(n) = (input(t) - BA(t - n)) / (input(t)+ BA(t - n))
where: input(t) is the inputs current value and BA(t - n) is a five unit block average of adjacent values centered around the value n periods ago.
Now we need to decide how many of these features we need. Since we are making a prediction 10 weeks into the future, we will take data as far back as 10 weeks also. This will be ROC(10). We will also use one other rate of change, ROC(3). We have now added 5*2 = 10 inputs to our network, for a total of 15. All of the preprocessing can be done with a spreadsheet.
|Previous||Table of Contents||Next|
Copyright © IDG Books Worldwide, Inc.