[Cover] [Contents] [Index] |
Page 107
generally small random values. The gradient term in Equation (3.5) can be further expanded to:
(3.6) |
in terms of the chain rule of calculus, where f′(xj) is the first derivative of the sigmoid function, ai is the activity in neurone i, and δj is the error value occurring on neurone j. If neurone j is in the output layer, the values of δj in Equation (3.6) are simply defined as:
(3.7) |
where oj is the desired output for neurone j. Combining Equations (3.5) to (3.7), the weights wji for j=output layer can therefore be updated. For the other layers, the δj are computed by:
(3.8) |
where k denotes the neurone that received output from neurone j. In other words, if neurone j is in the lth layer, then k denotes all neurones in the (l+1)th layer. The combination of Equations (3.5), (3.6), and (3.8) determines the adjustment for each weight. This rule for the adjustment of the weights associated with network connections is also known as the generalised δ (delta) rule. Sometimes a momentum term is added to Equation (3.5) to give:
(3.9) |
where . The momentum term is applied to avoid oscillation problems during the search for the minimum value on the error surface, and can therefore speed up the convergence procedure.
The performance of a multilayer perceptron is controlled by several factors: the model-associated parameters, the network structure and the nature of the training samples. It is very difficult to choose an optimum combination of those factors to construct an ideal network for a given classification task.
In the back-propagation algorithm, one or more parameters need to be defined by the user. The choice of values for these parameters can have a
[Cover] [Contents] [Index] |