329.

[Cover] [Contents] [Index]

Page 95

cantly influenced by the presence of training sample data that are not representative.

The presence of outliers can be accommodated by the use of robust statistical estimators, which are not as severely affected by outliers as are conventional estimators. Mather (1999a) provides details of a method of weighting the observations used in the calculation of means and variance-covariance matrices. The weights are proportional to the Mahalanobis distance of the observation from its class mean. Since the position of the class mean vector in feature space will move as the weights change, the procedure iterates until the weight estimates converge. The use of the Mahalanobis distance implies that this weighting algorithm is appropriately applied to training data sets that are being used in statistical classifiers such as maximum likelihood. An alternative approach could use cross-validation procedures, in which the training data set is subdivided into a number of mutually exclusive groups, for example into ten groups, each containing 10% of the training data. The classifier is trained on 90% of the available training data (i.e. nine subgroups combined) and is applied to the remaining 10%. If the label given to an observation by the classifier does not correspond to the class allocated by the analyst, then that observation is eliminated. The procedure is repeated until all ten subsets have been labelled. An added sophistication is to use several classifiers and to eliminate only those training data observations that are mislabelled by all classifiers, or by a majority of the classifiers (Brodley and Friedl, 1996, 1999).

2.7 Estimation of classification accuracy

No classification is complete until its accuracy has been assessed. In this context, the term accuracy means the level of agreement between labels assigned by the classifier and class allocations based on ground data collected by the user, known as test data. As noted already, ground data do not necessarily represent reality, due to observation and recording errors, mislocation of test data sites, differences caused by changes in land cover between the time of observation and the date of imaging, etc. Where a separate set of test data is not available, accuracy can be assessed relative to the training data set, but the degree of accuracy will inevitably be overstated. The use of cross-validation methods is preferable in these circumstances (Section 2.4).

Appropriate measures of classification accuracy can provide us with a measure of classification performance. The methods considered in this section are based on analysis of the confusion matrix. Questions relating to the impact of sample size on accuracy assessment are considered in Section 2.6.2.

The most common tool used for the classification accuracy assessment is in terms of a confusion (or error) matrix. A confusion matrix is a square

[Cover] [Contents] [Index]


Classification Methods for Remotely Sensed Data
Classification Methods for Remotely Sensed Data, Second Edition
ISBN: 1420090720
EAN: 2147483647
Year: 2001
Pages: 354

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net