326.

[Cover] [Contents] [Index]

Page 92

In a random sampling scheme, the sampled pixels are randomly chosen from the image. The term ‘without replacement’ means that no pixel is selected twice. Although this approach is quite easy to apply, it might not be always suitable in practice because some small information classes may be omitted. Moreover, if the number of samples selected is not sufficiently large, some information classes may be under (or over) sampled. Hence, stratified unaligned sampling may be preferred. One may choose several unaligned subareas for each class from the image, and perform random sampling within each subarea. However, Congalton and Green (1999) note that the assumptions on which the computation of the kappa coefficient is based are satisfied only by random sampling. The kappa coefficient is discussed in Section 2.7.

A third sampling scheme, cluster sampling, can also be applied. In cluster sampling, groups of patch-like pixels, representing, for example, an agricultural field, are selected. This approach allows the collection of a large number of samples relatively quickly. However, the use of large cluster samples is not recommended, since pixels within a group are not mutually independent, which means that a sample of, for example, 40 contiguous pixels does not represent 40 independent samples, due to the autocorrelation effect that was noted above. Congalton (1988) suggested that no cluster should be larger than ten pixels.

2.6.2 Sample size, scale and spatial variability

Several authors (e.g. Hord and Brooner, 1976; van Genderen and Lock, 1977; Hay, 1979; Fitzpatrick-Lins, 1981; Rosenfield et al., 1982; Mather, 1999a) consider the question of sample size. Mather (1999a) suggests as a rule of thumb that the number of training data pixels per class should be at least thirty times the number of features. This rule is based on the notion that, in univariate studies, a sample size of thirty or more is considered ‘large’. This rule could be modified to state that the sample size per class should be at least thirty times the number of parameters to be estimated. More soundly based advice uses some theoretical models of the data distribution in order to predict the sample size required, in order to achieve a specified level of accuracy (Fitzpatrick-Lins, 1981; Congalton, 1991).

Congalton and Green (1999) present a method for estimating sample size that is based on the multinomial distribution. The sample size n is derived from the relationship , where bi is the required precision (expressed as a proportion, so that 0.05 is equivalent to 5% precision), B is the upper (α/k)×100th percentile of the chi-square distribution with one degree of freedom, k is the number of classes, Πi is the proportion of the area covered by class i, and α is the required confidence level. Vieira (2000) gives the following example: at a confidence level of 95% and a desired precision of 0.05, find the sample size necessary for the

[Cover] [Contents] [Index]


Classification Methods for Remotely Sensed Data
Classification Methods for Remotely Sensed Data, Second Edition
ISBN: 1420090720
EAN: 2147483647
Year: 2001
Pages: 354

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net