288. | Classification Methods for Remotely Sensed Data, Second Edition

[Cover] [Contents] [Index]

Page 58

levels per band. It is theoretically possible, therefore, for a pixel to occupy any of discrete locations in feature space. This number is greater than the number of pixels making up a standard ETM+ image , so much of the feature space will be empty.

Given a fixed training set size, an increase in the dimensionality of the feature space (e.g. by adding additional spectral bands) means that the number of parameters to be estimated in a statistical classifier also increases. The maximum likelihood decision rule (described below) uses the values of the mean vector and the variance-covariance matrix C. The former has k elements to be estimated, while the latter has k(k−1)/2+k elements, where k is the number of features, giving a total of k(k−1)/2 +2k elements. As the value of k increases, the number of parameters to be estimated increases disproportionately. For k=6, the number of parameters is 27. For k=12 this value rises to 90. The fact that efficient estimation of statistical parameters requires a representative sample of a sufficient size is well known; consequently, as the number of parameters rises then, for a fixed sample size, the efficiency of the estimation decreases, which implies that the confidence limits for each estimate become wider. The effectiveness of the classifier will therefore begin to decrease once a certain number of dimensions is reached. This is known as the Hughes phenomenon (Hughes, 1968). It follows that, if satisfactory results are to be obtained from the classification of remotely sensed data, the relationship between dimensionality and training sample size must be borne in mind. The topic of sampling adequacy is considered further in Section 2.6.

One way of mitigating the effects of high dimensionality is to determine a subset of the k-dimensional feature space that contains most of the pixels. This is the aim of orthogonal transforms. If the measurements made by two instruments are correlated then there is a degree of redundancy present in the data. When the correlation between two sets of measurements reaches ±1.0, then one set of measurements is completely redundant. The features used in remote sensing image classification are generally correlated, and a high proportion of the information content of the data can be represented in terms of m dimensions, where m<k. Landgrebe (1998) also claims that the data distribution in the reduced-space representation is more likely to be normally distributed than the original data. This observation is an important one if statistical classifiers are used. Cortijo and Pérez de la Blanca (1999) suggest that where the sample size is small it may be advisable to compute a common covariance matrix so that all training data are used, rather than compute separate covariance matrices for each class.

An alternative approach involves the use of orthogonal transforms, which provide a means of reducing feature space dimensionality. Nielsen (1994) provides a detailed survey of these methods. In essence, the aim of an orthogonal transform is to replace the feature set with a derived or synthetic feature set. The covariances of the synthetic features are zero,

[Cover] [Contents] [Index]