317. | Classification Methods for Remotely Sensed Data, Second Edition

[Cover] [Contents] [Index]

Page 84

Quinlan gives an example of a two-class problem, with fifteen cases each with four attributes. He shows that even this extremely simple case would generate 4×10⁶ trees. The tree is built without using backtracking, so decisions made at level n affect levels n+1 and subsequent levels.

Quinlan’s C4.5 procedure uses the concept of the information content of a message, which is expressed as (−log₂×probability) bits. Thus, eight equally probable messages each contain −log₂=3 bits of information. The probability of a set of training data for a particular class is the relative frequency of the observations in that class (e.g. if training data class i contains 8 pixels and the total number of training data pixels is 100, then the probability of class i is 0.08). The concept of information is used to evaluate possible tests for the subdivision of the tree, and that test that produces the greatest information gain is selected. However, this gain criterion favours tests with many possible outcomes, so the measure is normalised by the gain that would occur if the n pixels forming the training data were allocated to separate classes. C5.0, a development of C4.5, introduces the concept of boosting, which involves weighting the training class pixels. The weights given to incorrectly labelled (or unlabelled) pixels is increased relative to correctly labelled pixels, and the tree-generating process is repeated. This cycle is repeated a number of times. Friedl et al. (1999) suggest that no more than ten iterations are required, and their results show an improvement in accuracy of about 25% following boosting.

A decision tree developed in the manner described above will generally be complex, and some pruning will be required. Also, it may not be possible to separate some classes, which then occupy a common ‘leaf’. In such cases, the probability of each class can be output. Note that the decision boundaries are linear and are parallel to the axes of the feature space, so that this space is divided into a set of ‘hyper-rectangles’. Murphy et al. (1994) consider the case for oblique decision boundaries, while Breimen et al. (1984) use linear combinations of features (rather than a single feature) to conduct tests in their classification and regression tree (CART). Turney (1995) uses a genetic algorithm to search for the optimum split at each internal node.

Hansen et al. (1996) apply decision trees to land cover classification, and a number of other authors have followed their lead, including Hansen et al. (2000), Gahegan and West (1998), and Friedl et al. (1999). Safavian and Landgrebe (1991) present a survey of decision tree classifier methodology. A number of authors have conducted comparisons of decision trees and other classifiers, including Gahegan and West (1998), who find that the ML classifier performed less well than a decision tree in labelling the pixels of an image of a complex test area. The thematic map produced by ML shows severe fragmentation of land cover classes, possibly because the multivariate normal distribution provides an inadequate model. The use of a decision tree (C4.5) shows a significant improvement in performance.

[Cover] [Contents] [Index]