327.

[Cover] [Contents] [Index]

Page 93

derivation of a valid confusion matrix. There are k=7 classes, and class wheat occupies approximately 46% of the map area (Πwheat=0.46). The value of chi-square for the probability level (0.05/7)=0.00714 with one degree of freedom is 7.348. The minimum training data sample size is therefore 7.348(0.46)(1–0.46)/(0.05)2=730. Vieira (2000) notes that this value is quite close to the value of 30×3=90 samples per class (or 7× 90=630 total samples) suggested by Mather (1999a).

The size of the sample required for statistically valid measures of classification accuracy to be computed is not the only criterion that should be considered. Statistically based classifiers require that certain parameters be estimated accurately. In the case of the ML classifier, these parameters are the mean vector and the variance-covariance matrix for each class. The sample size is related to the number of features (dimensionality of feature space) and, as noted earlier, as the dimensionality of the data increases for a fixed sample size so the precision of the estimates of these parameters becomes lower, leading to loss of classifier efficiency. This is the Hughes phenomenon (Section 2.1). For low dimensional data, the suggestion that there should be a minimum size of 30×number of wavebands will give satisfactory results in most cases. However, as the dimensionality of the data increases (e.g. if hyperspectral data sets are used) then the required sample size will be unfeasibly large, and some method of dimensionality reduction such as an orthogonal transform (Section 2.1) or the use of feature selection methods (Section 2.2), will be required.

These remarks are directed towards the use of statistical classifiers, which (in the case of the ML classifier) operate by defining a model of the data distribution, such as the multivariate normal distribution, and then estimating the parameters of the model from the training data. El-Sheik and Wacker (1980), Hseih and Landgrebe (1998) and Raudys and Pikelis (1980) discuss these matters in depth.

Other classifiers, such as decision trees and artificial neural networks, are nonparametric and require that training data sets are large enough to represent the characteristics of each class. These methods are called nonparametric, because they do not involve the estimation of statistical parameters. Evans (1998) notes that ‘Decision tree classifiers are susceptible to large changes in accuracy when only small changes are made in the composition of the training samples’, indicating that both the size and the nature of the training samples assume considerable importance when such classifiers are used. There is some evidence that the artificial neural networks perform better than statistical classifiers even with small training data sets (Foody et al., 1995), while the selection of training data sets from interclass boundary areas is said by Foody (1999a) to improve classifier performance.

Sample size may be related to the scale of observation (which determines the objects in the landscape that are deemed ‘significant’) and it also has

[Cover] [Contents] [Index]


Classification Methods for Remotely Sensed Data
Classification Methods for Remotely Sensed Data, Second Edition
ISBN: 1420090720
EAN: 2147483647
Year: 2001
Pages: 354

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net