298. | Classification Methods for Remotely Sensed Data, Second Edition

[Cover] [Contents] [Index]

Page 67

identified as part of the signal. Neilsen (1994) suggests that the use of frequency-domain filtering (Mather, 1999a) be used prior to the calculation of the noise covariance.

Other pertinent references to the derivation and use of MNFs are Green et al. (1988) and Lee et al. (1990), who discuss the mathematical details, while van der Meer and de Jong (2000) show how the results of spectral unmixing are enhanced by the orthogonalisation of end members. Neilsen et al. (1998) discuss the MAP transformation in the context of change detection. The use of the MNF and similar transforms is common in studies that utilise hyperspectral data, such as CASI (Jacobsen et al., 1998), AVIRIS (Harsanyi and Chang, 1994) and DAIS (de Jong and van der Werff, 1998; Kneubuehler et al., 1998). Mather (1999a) gives details of a similar approach to the orthogonalisation of a multispectral data set using image factor analysis.

2.2 Feature selection

The computational requirements of classification are generally positively correlated with the number of features used as input to the classification algorithm. For instance, in the case of n input features, the computational requirements of the maximum likelihood classifier (Section 2.3.4.3) are proportional to n². As n increases in magnitude, the computational cost will rise nonlinearly. The use of large artificial neural networks also results in increased training times.

Two approaches can be used to reduce the number of input features without sacrificing accuracy. One is to project the original feature space on to a subspace (i.e. a space of smaller dimensionality). This can be done using either an orthogonal transformation, as described in Section 2.1, or a self-organised feature map (SOM) neural network (Chapter 3).

The second method is to use separability measurements in the input feature space, and then select the subfeature dimension in which separability is a maximum. The aim is to reduce the feature space dimension without prejudicing classification accuracy. Two separability indices, the divergence index (Singh, 1984), and the B-distance (Haralick and Fu, 1983) are widely used. The divergence index D_ij between two classes i and j is derived from the likelihood ratio of any pair of classes. For multivariate Gaussian distributions, D_ij for classes i, j based on a sub-feature dimension m of the total p dimension is defined by:

(2.6)

where tr denotes the trace (sum of the diagonal elements) of the matrices,

[Cover] [Contents] [Index]