13. | Classification Methods for Remotely Sensed Data, Second Edition

[Cover] [Contents] [Index]

Page 109

of the search will be poor. Conversely, if η is too small, the network training phase can be very time-consuming. The selection of the learning rate η may be case-dependent. On the basis of a large number of trials, Kavzoglu (2001) recommends that the value 0.2 is suitable for the learning rate when no momentum term is used. Where a momentum term is used then the learning rate should be set to a value in the range 0.1–0.2, with the value of the momentum term taking a value between 0.5 and 0.6. It is possible to vary the parameters during the training process, as suggested by Pal and Mitra (1992).

A further consideration involves the selection of the initial values to be assigned to the weights associated with inter-neurone links. These initial values are generally assigned randomly, with values in a specified range, and the choice of these initial values can have a significant effect on classifier performance, as demonstrated by Ardö (1997), Blamire (1996) and Skidmore et al. (1997). Ardö (1997) found that classification accuracy varied from 59% to 70% using different initial weight sets selected from the range [−1, +1]. Kavzoglu (2001) notes that classification performance improves for small initial weight ranges, and recommends that the initial values of the weights should be in the range [−0.25, 0.25].

Training using the back-propagation algorithm is an iterative process. However, if the network is trained over too many iterations then it ‘learns’ the specific characteristics of the data, and fails to recognise some of the image data to be classified. If the network is insufficiently trained then the positions of the decision boundaries in feature space are not optimal. The practice of allowing the network to train until some predetermined level of error is reached is therefore unsatisfactory. A cross-validation approach can be used, in which the available training data set is subdivided into validation and training subsets. The network learns from the training subset, and is stopped at several points during learning. At each stopping point, the network is used to classify the samples contained in the validation subset. Training continues until the classification error of the validation subset begins to rise. This strategy is rather more involved than a simple error threshold stopping rule, but it has been found to be effective, provided it is realised that the error of the validation subset may fluctuate in the early period of training. The training process should not be terminated too quickly in response to a fall in the error of the validation subset. The effects of over-training are considered further in Section 3.1.4.

The coding scheme used for both the input and the output vectors requires some thought. For the input vector, each component can be normalised on to the interval [0, 1]. Normalisation is believed to reduce noise effects (Lee et al., 1990). Several coding schemes, such as the binary code or the ‘spread’ approach, can be applied to the network outputs. For example, in the case of four output classes, only two neurones are needed in the output layer if the binary coding method is applied as classes labelled

[Cover] [Contents] [Index]