The iris data published by Fisher (1936) are widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on fifty iris specimens from each of three species, Iris setosa, I. versicolor, and I. virginica . The iris data are used in Example 25.1 through Example 25.3.
Example 25.4 and Example 25.5 use remote-sensing data on crops. In this data set, the observations are grouped into five crops: clover, corn, cotton, soybeans, and sugar beets. Four measures called X1 through X4 make up the descriptive variables .
In this example, several discriminant analyses are run with a single quantitative variable, petal width, so that density estimates and posterior probabilities can be plotted easily. The example produces Output 25.1.1 through Output 25.1.5. The GCHART procedure is used to display the sample distribution of petal width in the three species. Note the overlap between species I. versicolor and I. virginica that the bar chart shows. These statements produce Output 25.1.1:
proc format; value specname 1='Setosa ' 2='Versicolor' 3='Virginica '; run; data iris; title 'Discriminant Analysis of Fisher (1936) Iris Data'; input SepalLength SepalWidth PetalLength PetalWidth Species @@; format Species specname.; label SepalLength='Sepal Length in mm.' SepalWidth ='Sepal Width in mm.' PetalLength='Petal Length in mm.' PetalWidth ='Petal Width in mm.'; symbol = put(Species, specname10.); datalines; 50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3 63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2 59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2 65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3 68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3 77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3 49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2 64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3 55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1 49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1 67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1 77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2 50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1 61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1 61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1 51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1 51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1 46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1 50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3 57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1 71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3 49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1 49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1 66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1 44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2 47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2 74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1 56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3 49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1 56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2 51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3 54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3 61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3 68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1 45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1 55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1 51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2 63 33 60 25 3 53 37 15 02 1 ; pattern1 c=red /*v=l1 */; pattern2 c=yellow /*v=empty*/; pattern3 c=blue /*v=r1 */; axis1 label=(angle=90); axis2 value=(height=.6); legend1 frame label=none; proc gchart data=iris; vbar PetalWidth / subgroup=Species midpoints=0 to 25 raxis=axis1 maxis=axis2 legend=legend1 cframe=ligr; run;
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Observations 150 DF Total 149 Variables 1 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X-X)' COV (X-X) j (X)j (X) (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.9610 0.0390 9 Versicolor Virginica * 0.0000 0.0952 0.9048 57 Virginica Versicolor * 0.0000 0.9940 0.0060 78 Virginica Versicolor * 0.0000 0.8009 0.1991 91 Virginica Versicolor * 0.0000 0.9610 0.0390 148 Versicolor Virginica * 0.0000 0.3828 0.6172 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X-X)' COV (X-X) j (X)j (X) (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 4 46 50 0.00 8.00 92.00 100.00 Total 50 52 48 150 33.33 34.67 32.00 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0800 0.0400 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X-X)' COV (X-X) j j j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 26 18 27 71 36.62 25.35 38.03 100.00 Priors 0.33333 0.33333 0.33333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Observations 150 DF Total 149 Variables 1 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X-X)' COV (X-X) + ln COV j (X)j (X)j (X)j (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.8740 0.1260 9 Versicolor Virginica * 0.0000 0.0686 0.9314 42 Setosa Versicolor * 0.4923 0.5073 0.0004 57 Virginica Versicolor * 0.0000 0.9602 0.0398 78 Virginica Versicolor * 0.0000 0.6558 0.3442 91 Virginica Versicolor * 0.0000 0.8740 0.1260 148 Versicolor Virginica * 0.0000 0.2871 0.7129 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X-X)' COV (X-X) + ln COV j (X)j (X)j (X)j (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 49 1 0 50 98.00 2.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 4 46 50 0.00 8.00 92.00 100.00 Total 49 53 48 150 32.67 35.33 32.00 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0200 0.0400 0.0800 0.0467 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X-X)' COV (X-X) + ln COV j j j j j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 23 20 28 71 32.39 28.17 39.44 100.00 Priors 0.33333 0.33333 0.33333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Observations 150 DF Total 149 Variables 1 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (XY)' COV (XY) Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.8827 0.1173 9 Versicolor Virginica * 0.0000 0.0438 0.9562 57 Virginica Versicolor * 0.0000 0.9472 0.0528 78 Virginica Versicolor * 0.0000 0.8061 0.1939 91 Virginica Versicolor * 0.0000 0.8827 0.1173 148 Versicolor Virginica * 0.0000 0.2586 0.7414 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (XY)' COV (XY) Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp(.5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 4 46 50 0.00 8.00 92.00 100.00 Total 50 52 48 150 33.33 34.67 32.00 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0800 0.0400 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X-Y) COV (X-Y) Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 26 18 27 71 36.62 25.35 38.03 100.00 Priors 0.33333 0.33333 0.33333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Observations 150 DF Total 149 Variables 1 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (XY)' COV (XY) j Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.8805 0.1195 9 Versicolor Virginica * 0.0000 0.0466 0.9534 57 Virginica Versicolor * 0.0000 0.9394 0.0606 78 Virginica Versicolor * 0.0000 0.7193 0.2807 91 Virginica Versicolor * 0.0000 0.8805 0.1195 148 Versicolor Virginica * 0.0000 0.2275 0.7725 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X-Y)' COV (X-Y) j Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 4 46 50 0.00 8.00 92.00 100.00 Total 50 52 48 150 33.33 34.67 32.00 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0800 0.0400 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (XY)' COV (XY) j Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp(.5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 25 18 28 71 35.21 25.35 39.44 100.00 Priors 0.33333 0.33333 0.33333
In order to plot the density estimates and posterior probabilities, a data set called plotdata is created containing equally spaced values from -5 to 30, covering the range of petal width with a little to spare on each end. The plotdata data set is used with the TESTDATA= option in PROC DISCRIM.
data plotdata; do PetalWidth=-5 to 30 by .5; output; end; run;
The same plots are produced after each discriminant analysis, so a macro can be used to reduce the amount of typing required. The macro PLOT uses two data sets. The data set plotd , containing density estimates, is created by the TESTOUTD= option in PROC DISCRIM. The data set plotp , containing posterior probabilities, is created by the TESTOUT= option. For each data set, the macro PLOT removes uninteresting values (near zero) and does an overlay plot showing all three species on a single plot. The following statements create the macro PLOT
%macro plot; data plotd; set plotd; if setosa<.002 then setosa=.; if versicolor<.002 then versicolor=.; if virginica <.002 then virginica=.; label PetalWidth='Petal Width in mm.'; run; symbol1 i=join v=none c=red l=1 /*l=21*/; symbol2 i=join v=none c=yellow l=1 /*l= 1*/; symbol3 i=join v=none c=blue l=1 /*l= 2*/; legend1 label=none frame; axis1 label=(angle=90 'Density') order=(0 to .6 by .1); proc gplot data=plotd; plot setosa*PetalWidth versicolor*PetalWidth virginica*PetalWidth / overlay vaxis=axis1 legend=legend1 frame cframe=ligr; title3 'Plot of Estimated Densities'; run; data plotp; set plotp; if setosa<.01 then setosa=.; if versicolor<.01 then versicolor=.; if virginica<.01 then virginica=.; label PetalWidth='Petal Width in mm.'; run; axis1 label=(angle=90 'Posterior Probability') order=(0 to 1 by .2); proc gplot data=plotp; plot setosa*PetalWidth versicolor*PetalWidth virginica*PetalWidth / overlay vaxis=axis1 legend=legend1 frame cframe=ligr; title3 'Plot of Posterior Probabilities'; run; %mend;
The first analysis uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in the three classes. The NOCLASSIFY option suppresses the resubstitution classification results of the input data set observations. The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates. The following statements produce Output 25.1.2:
proc discrim data=iris method=normal pool=yes testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var PetalWidth; title2 'Using Normal Density Estimates with Equal Variance'; run; %plot
The next analysis uses normal-theory methods assuming unequal variances (POOL=NO) in the three classes. The following statements produce Output 25.1.3:
proc discrim data=iris method=normal pool=no testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var PetalWidth; title2 'Using Normal Density Estimates with Unequal Variance'; run; %plot
Two more analyses are run with nonparametric methods (METHOD=NPAR), specifically kernel density estimates with normal kernels (KERNEL=NORMAL). The first of these uses equal bandwidths (smoothing parameters) (POOL=YES) in each class. The use of equal bandwidths does not constrain the density estimates to be of equal variance. The value of the radius parameter that, assuming normality, minimizes an approximate mean integrated square error is 0 . 48 (see the Nonparametric Methods section on page 1158). Choosing r = 0 . 4 gives a more detailed look at the irregularities in the data. The following statements produce Output 25.1.4:
proc discrim data=iris method=npar kernel=normal r=.4 pool=yes testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var PetalWidth; title2 'Using Kernel Density Estimates with Equal Bandwidth'; run; %plot
Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 25.1.5:
proc discrim data=iris method=npar kernel=normal r=.4 pool=no testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var PetalWidth; title2 'Using Kernel Density Estimates with Unequal Bandwidth'; run; %plot
In this example, four more discriminant analyses of iris data are run with two quantitative variables: petal width and petal length. The example produces Output 25.2.1 through Output 25.2.5. A scatter plot shows the joint sample distribution. See Appendix B, Using the %PLOTIT Macro, for more information on the % PLOTIT macro.
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (XX)' COV (XX) j (X)j (X) (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.8453 0.1547 9 Versicolor Virginica * 0.0000 0.2130 0.7870 25 Virginica Versicolor * 0.0000 0.8322 0.1678 57 Virginica Versicolor * 0.0000 0.8057 0.1943 91 Virginica Versicolor * 0.0000 0.8903 0.1097 148 Versicolor Virginica * 0.0000 0.3118 0.6882 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (XX)' COV (XX) j (X)j (X) (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 4 46 50 0.00 8.00 92.00 100.00 Total 50 52 48 150 33.33 34.67 32.00 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0800 0.0400 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (XX)' COV (XX) j j j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 14507 16888 12858 44253 32.78 38.16 29.06 100.00 Priors 0.33333 0.33333 0.33333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross validation Results using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j (X)j (X)j (X)j (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.7288 0.2712 9 Versicolor Virginica * 0.0000 0.0903 0.9097 25 Virginica Versicolor * 0.0000 0.5196 0.4804 91 Virginica Versicolor * 0.0000 0.8335 0.1665 148 Versicolor Virginica * 0.0000 0.4675 0.5325 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross validation Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j (X)j (X)j (X)j (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 3 47 50 0.00 6.00 94.00 100.00 Total 50 51 49 150 33.33 34.00 32.67 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0600 0.0333 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j j j j j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 5461 5354 33438 44253 12.34 12.10 75.56 100.00 Priors 0.33333 0.33333 0.33333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross validation Results using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X Y)' COV (X Y) Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.7474 0.2526 9 Versicolor Virginica * 0.0000 0.0800 0.9200 25 Virginica Versicolor * 0.0000 0.5863 0.4137 91 Virginica Versicolor * 0.0000 0.8358 0.1642 148 Versicolor Virginica * 0.0000 0.4123 0.5877 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross validation Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X Y)' COV (X Y) Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 3 47 50 0.00 6.00 94.00 100.00 Total 50 51 49 150 33.33 34.00 32.67 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0600 0.0333 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X Y)' COV (X Y) Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 12631 9941 21681 44253 28.54 22.46 48.99 100.00 Priors 0.33333 0.33333 0.33333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross validation Results using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X Y)' COV (X Y) j Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.7826 0.2174 9 Versicolor Virginica * 0.0000 0.0506 0.9494 91 Virginica Versicolor * 0.0000 0.8802 0.1198 148 Versicolor Virginica * 0.0000 0.3726 0.6274 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross validation Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X Y)' COV (X Y) j Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 2 48 50 0.00 4.00 96.00 100.00 Total 50 50 50 150 33.33 33.33 33.33 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0400 0.0267 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Normal Kernel Density Squared Distance Function 2 1 D (X,Y) = (X Y)' COV (X Y) j Posterior Probability of Membership in Each Species 1 2 2 F(Xj) = n SUM exp( .5 D (X,Y) / R) j i ji Pr(jX) = PRIOR F(Xj) / SUM PRIOR F(Xk) j k k Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total 5447 5984 32822 44253 12.31 13.52 74.17 100.00 Priors 0.33333 0.33333 0.33333
Another data set is created for plotting, containing a grid of points suitable for contour plots. The large number of points in the grid makes the following analyses very time-consuming . If you attempt to duplicate these examples, begin with a small number of points in the grid.
data plotdata; do PetalLength= 2 to 72 by 0.25; h + 1; * Number of horizontal cells; do PetalWidth= 5 to 32 by 0.25; n + 1; * Total number of cells; output; end; end; * Make variables to contain H and V grid sizes; call symput('hnobs', compress(put(h , best12.))); call symput('vnobs', compress(put(n / h, best12.))); drop n h; run;
A macro CONTOUR is defined to make contour plots of density estimates and posterior probabilities. Classification results are also plotted on the same grid.
%macro contour; data contour(keep=PetalWidth PetalLength symbol density); set plotd(in=d) iris; if d then density = max(setosa,versicolor,virginica); run; title3 'Plot of Estimated Densities'; %plotit(data=contour, plotvars=PetalWidth PetalLength, labelvar=_blank_, symvar=symbol, typevar=symbol, symlen=4, exttypes=symbol contour, ls=100, paint=density white black, rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs, excolors=white, rgbround=-16 1 1 1, extend=close, options=noclip, types =Setosa Versicolor Virginica '', symtype=symbol symbol symbol contour, symsize=0.6 0.6 0.6 1, symfont=swiss swiss swiss solid) data posterior(keep=PetalWidth PetalLength symbol prob _into_); set plotp(in=d) iris; if d then prob = max(setosa,versicolor,virginica); run; title3 'Plot of Posterior Probabilities ' '(Black to White is Low to High Probability)'; %plotit(data=posterior, plotvars=PetalWidth PetalLength, labelvar=_blank_, symvar=symbol, typevar=symbol, symlen=4, exttypes=symbol contour, ls=100, paint=prob black white 0.3 0.999, rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs, excolors=white, rgbround=-16 1 1 1, extend=close, options=noclip, types =Setosa Versicolor Virginica '', symtype=symbol symbol symbol contour, symsize=0.6 0.6 0.6 1, symfont=swiss swiss swiss solid) title3 'Plot of Classification Results'; %plotit(data=posterior, plotvars=PetalWidth PetalLength, labelvar=_blank_, symvar=symbol, typevar=symbol, symlen=4, exttypes=symbol contour, ls=100, paint=_into_ CXCCCCCC CXDDDDDD white, rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs, excolors=white, extend=close, options=noclip, types =Setosa Versicolor Virginica '', symtype=symbol symbol symbol contour, symsize=0.6 0.6 0.6 1, symfont=swiss swiss swiss solid) %mend;
A normal-theory analysis (METHOD=NORMAL) assuming equal covariance matrices (POOL=YES) illustrates the linearity of the classification boundaries. These statements produce Output 25.2.2:
proc discrim data=iris method=normal pool=yes testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var Petal:; title2 'Using Normal Density Estimates with Equal Variance'; run; %contour
A normal-theory analysis assuming unequal covariance matrices (POOL=NO) illustrates quadratic classification boundaries. These statements produce Output 25.2.3:
proc discrim data=iris method=normal pool=no testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var Petal:; title2 'Using Normal Density Estimates with Unequal Variance'; run; %contour
A nonparametric analysis (METHOD=NPAR) follows , using normal kernels (KERNEL=NORMAL) and equal bandwidths (POOL=YES) in each class. The value of the radius parameter r that, assuming normality, minimizes an approximate mean integrated square error is 0 . 50 (see the Nonparametric Methods section on page 1158). These statements produce Output 25.2.4:
proc discrim data=iris method=npar kernel=normal r=.5 pool=yes testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var Petal:; title2 'Using Kernel Density Estimates with Equal Bandwidth'; run; %contour
Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 25.2.5:
proc discrim data=iris method=npar kernel=normal r=.5 pool=no testdata=plotdata testout=plotp testoutd=plotd short noclassify crosslisterr; class Species; var Petal:; title2 'Using Kernel Density Estimates with Unequal Bandwidth'; run; %contour
In this example, PROC DISCRIM uses normal-theory methods to classify the iris data used in Example 25.1. The POOL=TEST option tests the homogeneity of the within- group covariance matrices ( Output 25.3.3). Since the resulting test statistic is significant at the 0.10 level, the within-group covariance matrices are used to derive the quadratic discriminant criterion. The WCOV and PCOV options display the within-group covariance matrices and the pooled covariance matrix ( Output 25.3.2). The DISTANCE option displays squared distances between classes ( Output 25.3.4). The ANOVA and MANOVA options test the hypothesis that the class means are equal, using univariate statistics and multivariate statistics; all statistics are significantatthe 0.0001 level ( Output 25.3.5). The LISTERR option lists the misclassified observations under resubstitution ( Output 25.3.6). The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates ( Output 25.3.7). The resubstitution error count estimate, 0.02, is not larger than the cross validation error count estimate, 0.0267, as would be expected because the resubstitution estimate is optimistically biased . The OUTSTAT= option generates a TYPE=MIXED (because POOL=TEST) output data set containing various statistics such as means, covariances, and coefficients of the discriminant function ( Output 25.3.8).
The following statements produce Output 25.3.1 through Output 25.3.8:
proc discrim data=iris outstat=irisstat wcov pcov method=normal pool=test distance anova manova listerr crosslisterr; class Species; var SepalLength SepalWidth PetalLength PetalWidth; title2 'Using Quadratic Discriminant Function'; run; proc print data=irisstat; title2 'Output Discriminant Statistics'; run;
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Observations 150 DF Total 149 Variables 4 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Within-Class Covariance Matrices Species = Setosa, DF = 49 Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 12.42489796 9.92163265 1.63551020 1.03306122 SepalWidth Sepal Width in mm. 9.92163265 14.36897959 1.16979592 0.92979592 PetalLength Petal Length in mm. 1.63551020 1.16979592 3.01591837 0.60693878 PetalWidth Petal Width in mm. 1.03306122 0.92979592 0.60693878 1.11061224 ---------------------------------------------------------------------------------------------- Species = Versicolor, DF = 49 Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 26.64326531 8.51836735 18.28979592 5.57795918 SepalWidth Sepal Width in mm. 8.51836735 9.84693878 8.26530612 4.12040816 PetalLength Petal Length in mm. 18.28979592 8.26530612 22.08163265 7.31020408 PetalWidth Petal Width in mm. 5.57795918 4.12040816 7.31020408 3.91061224 ---------------------------------------------------------------------------------------------- Species = Virginica, DF = 49 Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 40.43428571 9.37632653 30.32897959 4.90938776 SepalWidth Sepal Width in mm. 9.37632653 10.40040816 7.13795918 4.76285714 PetalLength Petal Length in mm. 30.32897959 7.13795918 30.45877551 4.88244898 PetalWidth Petal Width in mm. 4.90938776 4.76285714 4.88244898 7.54326531 ----------------------------------------------------------------------------------------------
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Pooled Within-Class Covariance Matrix, DF = 147 Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 26.50081633 9.27210884 16.75142857 3.84013605 SepalWidth Sepal Width in mm. 9.27210884 11.53877551 5.52435374 3.27102041 PetalLength Petal Length in mm. 16.75142857 5.52435374 18.51877551 4.26653061 PetalWidth Petal Width in mm. 3.84013605 3.27102041 4.26653061 4.18816327 Within Covariance Matrix Information Natural Log of the Covariance Determinant of the Species Matrix Rank Covariance Matrix Setosa 4 5.35332 Versicolor 4 7.54636 Virginica 4 9.49362 Pooled 4 8.46214
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Test of Homogeneity of Within Covariance Matrices Notation: K = Number of Groups P = Number of Variables N = Total Number of Observations - Number of Groups N(i) = Number of Observations in the ith Group 1 __ N(i)/2 Within SS Matrix(i) V = ----------------------------------- N/2 Pooled SS Matrix _ _ 2 1 1 2P + 3P 1 RHO = 1.0 SUM ----- --- ------------- _ N(i) N _ 6(P+1)(K 1) DF = .5(K 1)P(P+1) _ _ PN/2 N V Under the null hypothesis: 2 RHO ln ------------------ __ PN(i)/2 _ N(i) _ is distributed approximately as Chi-Square(DF). Chi-Square DF Pr > ChiSq 140.943050 20 <.0001 Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function. Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Pairwise Squared Distances Between Groups 2 _ _ 1 _ _ D (ij) = (X X)' COV (X X) i j j i j Squared Distance to Species From Species Setosa Versicolor Virginica Setosa 0 103.19382 168.76759 Versicolor 323.06203 0 13.83875 Virginica 706.08494 17.86670 0 Pairwise Generalized Squared Distances Between Groups 2 _ _ 1 _ _ D (ij) = (X X)' COV (X X) + ln COV i j j i j j Generalized Squared Distance to Species From Species Setosa Versicolor Virginica Setosa 5.35332 110.74017 178.26121 Versicolor 328.41535 7.54636 23.33238 Virginica 711.43826 25.41306 9.49362
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Univariate Test Statistics F Statistics, Num DF=2, Den DF=147 Total Pooled Between Standard Standard Standard R-Square Variable Label Deviation Deviation Deviation R-Square / (1 RSq) F Value Pr > F SepalLength Sepal Length in mm. 8.2807 5.1479 7.9506 0.6187 1.6226 119.26 <.0001 SepalWidth Sepal Width in mm. 4.3587 3.3969 3.3682 0.4008 0.6688 49.16 <.0001 PetalLength Petal Length in mm. 17.6530 4.3033 20.9070 0.9414 16.0566 1180.16 <.0001 PetalWidth Petal Width in mm. 7.6224 2.0465 8.9673 0.9289 13.0613 960.01 <.0001 Average R-Square Unweighted 0.7224358 Weighted by Variance 0.8689444 Multivariate Statistics and F Approximations S=2 M=0.5 N=71 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.02343863 199.15 8 288 <.0001 Pillai's Trace 1.19189883 53.47 8 290 <.0001 Hotelling-Lawley Trace 32.47732024 582.20 8 203.4 <.0001 Roy's Greatest Root 32.19192920 1166.96 4 145 <.0001 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact.
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Resubstitution Results using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j j j j j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.6050 0.3950 9 Versicolor Virginica * 0.0000 0.3359 0.6641 12 Versicolor Virginica * 0.0000 0.1543 0.8457 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Resubstitution Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j j j j j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 48 2 50 0.00 96.00 4.00 100.00 Virginica 0 1 49 50 0.00 2.00 98.00 100.00 Total 50 49 51 150 33.33 32.67 34.00 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0200 0.0200 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross validation Results using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j (X)j (X)j (X)j (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.6632 0.3368 8 Versicolor Virginica * 0.0000 0.3134 0.6866 9 Versicolor Virginica * 0.0000 0.1616 0.8384 12 Versicolor Virginica * 0.0000 0.0713 0.9287 * Misclassified observation
Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross validation Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV j (X)j (X)j (X)j (X)j Posterior Probability of Membership in Each Species 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa 50 0 0 50 100.00 0.00 0.00 100.00 Versicolor 0 47 3 50 0.00 94.00 6.00 100.00 Virginica 0 1 49 50 0.00 2.00 98.00 100.00 Total 50 48 52 150 33.33 32.00 34.67 100.00 Priors 0.33333 0.33333 0.33333 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0600 0.0200 0.0267 Priors 0.3333 0.3333 0.3333
Discriminant Analysis of Fisher (1936) Iris Data Output Discriminant Statistics Sepal Sepal Petal Petal Obs Species _TYPE_ _NAME_ Length Width Length Width 1 . N 150.00 150.00 150.00 150.00 2 Setosa N 50.00 50.00 50.00 50.00 3 Versicolor N 50.00 50.00 50.00 50.00 4 Virginica N 50.00 50.00 50.00 50.00 5 . MEAN 58.43 30.57 37.58 11.99 6 Setosa MEAN 50.06 34.28 14.62 2.46 7 Versicolor MEAN 59.36 27.70 42.60 13.26 8 Virginica MEAN 65.88 29.74 55.52 20.26 9 Setosa PRIOR 0.33 0.33 0.33 0.33 10 Versicolor PRIOR 0.33 0.33 0.33 0.33 11 Virginica PRIOR 0.33 0.33 0.33 0.33 12 Setosa CSSCP SepalLength 608.82 486.16 80.14 50.62 13 Setosa CSSCP SepalWidth 486.16 704.08 57.32 45.56 14 Setosa CSSCP PetalLength 80.14 57.32 147.78 29.74 15 Setosa CSSCP PetalWidth 50.62 45.56 29.74 54.42 16 Versicolor CSSCP SepalLength 1305.52 417.40 896.20 273.32 17 Versicolor CSSCP SepalWidth 417.40 482.50 405.00 201.90 18 Versicolor CSSCP PetalLength 896.20 405.00 1082.00 358.20 19 Versicolor CSSCP PetalWidth 273.32 201.90 358.20 191.62 20 Virginica CSSCP SepalLength 1981.28 459.44 1486.12 240.56 21 Virginica CSSCP SepalWidth 459.44 509.62 349.76 233.38 22 Virginica CSSCP PetalLength 1486.12 349.76 1492.48 239.24 23 Virginica CSSCP PetalWidth 240.56 233.38 239.24 369.62 24 . PSSCP SepalLength 3895.62 1363.00 2462.46 564.50 25 . PSSCP SepalWidth 1363.00 1696.20 812.08 480.84 26 . PSSCP PetalLength 2462.46 812.08 2722.26 627.18 27 . PSSCP PetalWidth 564.50 480.84 627.18 615.66 28 . BSSCP SepalLength 6321.21 1995.27 16524.84 7127.93 29 . BSSCP SepalWidth 1995.27 1134.49 5723.96 2293.27 30 . BSSCP PetalLength 16524.84 5723.96 43710.28 18677.40 31 . BSSCP PetalWidth 7127.93 2293.27 18677.40 8041.33 32 . CSSCP SepalLength 10216.83 632.27 18987.30 7692.43 33 . CSSCP SepalWidth 632.27 2830.69 4911.88 1812.43 34 . CSSCP PetalLength 18987.30 4911.88 46432.54 19304.58 35 . CSSCP PetalWidth 7692.43 1812.43 19304.58 8656.99 36 . RSQUARED 0.62 0.40 0.94 0.93 37 Setosa COV SepalLength 12.42 9.92 1.64 1.03 38 Setosa COV SepalWidth 9.92 14.37 1.17 0.93 39 Setosa COV PetalLength 1.64 1.17 3.02 0.61 40 Setosa COV PetalWidth 1.03 0.93 0.61 1.11 41 Versicolor COV SepalLength 26.64 8.52 18.29 5.58 42 Versicolor COV SepalWidth 8.52 9.85 8.27 4.12 43 Versicolor COV PetalLength 18.29 8.27 22.08 7.31 44 Versicolor COV PetalWidth 5.58 4.12 7.31 3.91 45 Virginica COV SepalLength 40.43 9.38 30.33 4.91 46 Virginica COV SepalWidth 9.38 10.40 7.14 4.76 47 Virginica COV PetalLength 30.33 7.14 30.46 4.88 48 Virginica COV PetalWidth 4.91 4.76 4.88 7.54 49 . PCOV SepalLength 26.50 9.27 16.75 3.84 50 . PCOV SepalWidth 9.27 11.54 5.52 3.27 51 . PCOV PetalLength 16.75 5.52 18.52 4.27 52 . PCOV PetalWidth 3.84 3.27 4.27 4.19 53 . BCOV SepalLength 63.21 19.95 165.25 71.28 54 . BCOV SepalWidth 19.95 11.34 57.24 22.93 55 . BCOV PetalLength 165.25 57.24 437.10 186.77 56 . BCOV PetalWidth 71.28 22.93 186.77 80.41 57 . COV SepalLength 68.57 4.24 127.43 51.63 58 . COV SepalWidth 4.24 19.00 32.97 12.16 59 . COV PetalLength 127.43 32.97 311.63 129.56 60 . COV PetalWidth 51.63 12.16 129.56 58.10 61 Setosa STD 3.52 3.79 1.74 1.05 62 Versicolor STD 5.16 3.14 4.70 1.98 63 Virginica STD 6.36 3.22 5.52 2.75 64 . PSTD 5.15 3.40 4.30 2.05 65 . BSTD 7.95 3.37 20.91 8.97 66 . STD 8.28 4.36 17.65 7.62 67 Setosa CORR SepalLength 1.00 0.74 0.27 0.28 68 Setosa CORR SepalWidth 0.74 1.00 0.18 0.23 69 Setosa CORR PetalLength 0.27 0.18 1.00 0.33 70 Setosa CORR PetalWidth 0.28 0.23 0.33 1.00
Discriminant Analysis of Fisher (1936) Iris Data Output Discriminant Statistics Sepal Sepal Petal Petal Obs Species _TYPE_ _NAME_ Length Width Length Width 71 Versicolor CORR SepalLength 1.000 0.526 0.754 0.546 72 Versicolor CORR SepalWidth 0.526 1.000 0.561 0.664 73 Versicolor CORR PetalLength 0.754 0.561 1.000 0.787 74 Versicolor CORR PetalWidth 0.546 0.664 0.787 1.000 75 Virginica CORR SepalLength 1.000 0.457 0.864 0.281 76 Virginica CORR SepalWidth 0.457 1.000 0.401 0.538 77 Virginica CORR PetalLength 0.864 0.401 1.000 0.322 78 Virginica CORR PetalWidth 0.281 0.538 0.322 1.000 79 . PCORR SepalLength 1.000 0.530 0.756 0.365 80 . PCORR SepalWidth 0.530 1.000 0.378 0.471 81 . PCORR PetalLength 0.756 0.378 1.000 0.484 82 . PCORR PetalWidth 0.365 0.471 0.484 1.000 83 . BCORR SepalLength 1.000 0.745 0.994 1.000 84 . BCORR SepalWidth 0.745 1.000 0.813 0.759 85 . BCORR PetalLength 0.994 0.813 1.000 0.996 86 . BCORR PetalWidth 1.000 0.759 0.996 1.000 87 . CORR SepalLength 1.000 0.118 0.872 0.818 88 . CORR SepalWidth 0.118 1.000 0.428 0.366 89 . CORR PetalLength 0.872 0.428 1.000 0.963 90 . CORR PetalWidth 0.818 0.366 0.963 1.000 91 Setosa STDMEAN 1.011 0.850 1.301 1.251 92 Versicolor STDMEAN 0.112 0.659 0.284 0.166 93 Virginica STDMEAN 0.899 0.191 1.016 1.085 94 Setosa PSTDMEAN 1.627 1.091 5.335 4.658 95 Versicolor PSTDMEAN 0.180 0.846 1.167 0.619 96 Virginica PSTDMEAN 1.447 0.245 4.169 4.039 97 . LNDETERM 8.462 8.462 8.462 8.462 98 Setosa LNDETERM 5.353 5.353 5.353 5.353 99 Versicolor LNDETERM 7.546 7.546 7.546 7.546 100 Virginica LNDETERM 9.494 9.494 9.494 9.494 101 Setosa QUAD SepalLength 0.095 0.062 0.023 0.024 102 Setosa QUAD SepalWidth 0.062 0.078 0.006 0.011 103 Setosa QUAD PetalLength 0.023 0.006 0.194 0.090 104 Setosa QUAD PetalWidth 0.024 0.011 0.090 0.530 105 Setosa QUAD _LINEAR_ 4.455 0.762 3.356 3.126 106 Setosa QUAD _CONST_ 121.826 121.826 121.826 121.826 107 Versicolor QUAD SepalLength 0.048 0.018 0.043 0.032 108 Versicolor QUAD SepalWidth 0.018 0.099 0.011 0.097 109 Versicolor QUAD PetalLength 0.043 0.011 0.099 0.135 110 Versicolor QUAD PetalWidth 0.032 0.097 0.135 0.436 111 Versicolor QUAD _LINEAR_ 1.801 1.596 0.327 1.471 112 Versicolor QUAD _CONST_ 76.549 76.549 76.549 76.549 113 Virginica QUAD SepalLength 0.053 0.017 0.050 0.009 114 Virginica QUAD SepalWidth 0.017 0.079 0.006 0.042 115 Virginica QUAD PetalLength 0.050 0.006 0.067 0.014 116 Virginica QUAD PetalWidth 0.009 0.042 0.014 0.097 117 Virginica QUAD _LINEAR_ 0.737 1.325 0.623 0.966 118 Virginica QUAD _CONST_ 75.821 75.821 75.821 75.821
In this example, the remote-sensing data described at the beginning of the section are used. In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The LIST option lists the resubstitution classification results for each observation ( Output 25.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates ( Output 25.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification variable, xvalues , are obtained by rereading the x1 through x4 fields in the data lines as a single character variable. The following statements produce Output 25.4.1 through Output 25.4.3.
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Observations 36 DF Total 35 Variables 4 DF Within Classes 31 Classes 5 DF Between Classes 4 Class Level Information Variable Prior Crop Name Frequency Weight Proportion Probability Clover Clover 11 11.0000 0.305556 0.305556 Corn Corn 7 7.0000 0.194444 0.194444 Cotton Cotton 6 6.0000 0.166667 0.166667 Soybeans Soybeans 6 6.0000 0.166667 0.166667 Sugarbeets Sugarbeets 6 6.0000 0.166667 0.166667
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Pooled Covariance Matrix Information Natural Log of the Covariance Determinant of the Matrix Rank Covariance Matrix 4 21.30189
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Pairwise Generalized Squared Distances Between Groups 2 _ _ 1 _ _ D (ij) = (X X)' COV (X X) 2 ln PRIOR i j i j j Generalized Squared Distance to Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Clover 2.37125 7.52830 4.44969 6.16665 5.07262 Corn 6.62433 3.27522 5.46798 4.31383 6.47395 Cotton 3.23741 5.15968 3.58352 5.01819 4.87908 Soybeans 4.95438 4.00552 5.01819 3.58352 4.65998 Sugarbeets 3.86034 6.16564 4.87908 4.65998 3.58352
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Linear Discriminant Function _ 1 _ 1 _ Constant = .5 X' COV X + ln PRIOR Coefficient = COV X j j j Vector j Linear Discriminant Function for Crop Variable Clover Corn Cotton Soybeans Sugarbeets Constant 10.98457 7.72070 11.46537 7.28260 9.80179 x1 0.08907 0.04180 0.02462 0.0000369 0.04245 x2 0.17379 0.11970 0.17596 0.15896 0.20988 x3 0.11899 0.16511 0.15880 0.10622 0.06540 x4 0.15637 0.16768 0.18362 0.14133 0.16408
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Classification Results for Calibration Data: WORK.CROPS Resubstitution Results using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) 2 ln PRIOR j j j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Crop Classified xvalues From Crop into Crop Clover Corn Cotton Soybeans Sugarbeets 16 27 31 33 Corn Corn 0.0894 0.4054 0.1763 0.2392 0.0897 15 23 30 30 Corn Corn 0.0769 0.4558 0.1421 0.2530 0.0722 16 27 27 26 Corn Corn 0.0982 0.3422 0.1365 0.3073 0.1157 18 20 25 23 Corn Corn 0.1052 0.3634 0.1078 0.3281 0.0955 15 15 31 32 Corn Corn 0.0588 0.5754 0.1173 0.2087 0.0398 15 32 32 15 Corn Soybeans * 0.0972 0.3278 0.1318 0.3420 0.1011 12 15 16 73 Corn Corn 0.0454 0.5238 0.1849 0.1376 0.1083 20 23 23 25 Soybeans Soybeans 0.1330 0.2804 0.1176 0.3305 0.1385 24 24 25 32 Soybeans Soybeans 0.1768 0.2483 0.1586 0.2660 0.1502 21 25 23 24 Soybeans Soybeans 0.1481 0.2431 0.1200 0.3318 0.1570 27 45 24 12 Soybeans Sugarbeets * 0.2357 0.0547 0.1016 0.2721 0.3359 12 13 15 42 Soybeans Corn * 0.0549 0.4749 0.0920 0.2768 0.1013 22 32 31 43 Soybeans Cotton * 0.1474 0.2606 0.2624 0.1848 0.1448 31 32 33 34 Cotton Clover * 0.2815 0.1518 0.2377 0.1767 0.1523 29 24 26 28 Cotton Soybeans * 0.2521 0.1842 0.1529 0.2549 0.1559 34 32 28 45 Cotton Clover * 0.3125 0.1023 0.2404 0.1357 0.2091 26 25 23 24 Cotton Soybeans * 0.2121 0.1809 0.1245 0.3045 0.1780 53 48 75 26 Cotton Clover * 0.4837 0.0391 0.4384 0.0223 0.0166 34 35 25 78 Cotton Cotton 0.2256 0.0794 0.3810 0.0592 0.2548 22 23 25 42 Sugarbeets Corn * 0.1421 0.3066 0.1901 0.2231 0.1381 25 25 24 26 Sugarbeets Soybeans * 0.1969 0.2050 0.1354 0.2960 0.1667 34 25 16 52 Sugarbeets Sugarbeets 0.2928 0.0871 0.1665 0.1479 0.3056 54 23 21 54 Sugarbeets Clover * 0.6215 0.0194 0.1250 0.0496 0.1845 25 43 32 15 Sugarbeets Soybeans * 0.2258 0.1135 0.1646 0.2770 0.2191 26 54 2 54 Sugarbeets Sugarbeets 0.0850 0.0081 0.0521 0.0661 0.7887 12 45 32 54 Clover Cotton * 0.0693 0.2663 0.3394 0.1460 0.1789 24 58 25 34 Clover Sugarbeets * 0.1647 0.0376 0.1680 0.1452 0.4845 87 54 61 21 Clover Clover 0.9328 0.0003 0.0478 0.0025 0.0165 51 31 31 16 Clover Clover 0.6642 0.0205 0.0872 0.0959 0.1322 96 48 54 62 Clover Clover 0.9215 0.0002 0.0604 0.0007 0.0173 31 31 11 11 Clover Sugarbeets * 0.2525 0.0402 0.0473 0.3012 0.3588 56 13 13 71 Clover Clover 0.6132 0.0212 0.1226 0.0408 0.2023 32 13 27 32 Clover Clover 0.2669 0.2616 0.1512 0.2260 0.0943 36 26 54 32 Clover Cotton * 0.2650 0.2645 0.3495 0.0918 0.0292 53 08 06 54 Clover Clover 0.5914 0.0237 0.0676 0.0781 0.2392 32 32 62 16 Clover Cotton * 0.2163 0.3180 0.3327 0.1125 0.0206 * Misclassified observation
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Classification Summary for Calibration Data: WORK.CROPS Resubstitution Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) 2 ln PRIOR j j j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Total Clover 6 0 3 0 2 11 54.55 0.00 27.27 0.00 18.18 100.00 Corn 0 6 0 1 0 7 0.00 85.71 0.00 14.29 0.00 100.00 Cotton 3 0 1 2 0 6 50.00 0.00 16.67 33.33 0.00 100.00 Soybeans 0 1 1 3 1 6 0.00 16.67 16.67 50.00 16.67 100.00 Sugarbeets 1 1 0 2 2 6 16.67 16.67 0.00 33.33 33.33 100.00 Total 10 8 5 8 5 36 27.78 22.22 13.89 22.22 13.89 100.00 Priors 0.30556 0.19444 0.16667 0.16667 0.16667 Error Count Estimates for Crop Clover Corn Cotton Soybeans Sugarbeets Total Rate 0.4545 0.1429 0.8333 0.5000 0.6667 0.5000 Priors 0.3056 0.1944 0.1667 0.1667 0.1667
Discriminant Analysis of Remote Sensing Data on Five Crops Using Linear Discriminant Function The DISCRIM Procedure Classification Summary for Calibration Data: WORK.CROPS Cross validation Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) 2 ln PRIOR j (X)j (X) (X)j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Total Clover 4 3 1 0 3 11 36.36 27.27 9.09 0.00 27.27 100.00 Corn 0 4 1 2 0 7 0.00 57.14 14.29 28.57 0.00 100.00 Cotton 3 0 0 2 1 6 50.00 0.00 0.00 33.33 16.67 100.00 Soybeans 0 1 1 3 1 6 0.00 16.67 16.67 50.00 16.67 100.00 Sugarbeets 2 1 0 2 1 6 33.33 16.67 0.00 33.33 16.67 100.00 Total 9 9 3 9 6 36 25.00 25.00 8.33 25.00 16.67 100.00 Priors 0.30556 0.19444 0.16667 0.16667 0.16667 Error Count Estimates for Crop Clover Corn Cotton Soybeans Sugarbeets Total Rate 0.6364 0.4286 1.0000 0.5000 0.8333 0.6667 Priors 0.3056 0.1944 0.1667 0.1667 0.1667
data crops; title 'Discriminant Analysis of Remote Sensing Data on Five Crops'; input Crop $ 4-13 x1-x4 xvalues $ 14-24; datalines; Corn 16 27 31 33 Corn 15 23 30 30 Corn 16 27 27 26 Corn 18 20 25 23 Corn 15 15 31 32 Corn 15 32 32 15 Corn 12 15 16 73 Soybeans 20 23 23 25 Soybeans 24 24 25 32 Soybeans 21 25 23 24 Soybeans 27 45 24 12 Soybeans 12 13 15 42 Soybeans 22 32 31 43 Cotton 31 32 33 34 Cotton 29 24 26 28 Cotton 34 32 28 45 Cotton 26 25 23 24 Cotton 53 48 75 26 Cotton 34 35 25 78 Sugarbeets22 23 25 42 Sugarbeets25 25 24 26 Sugarbeets34 25 16 52 Sugarbeets54 23 21 54 Sugarbeets25 43 32 15 Sugarbeets26 54 2 54 Clover 12 45 32 54 Clover 24 58 25 34 Clover 87 54 61 21 Clover 51 31 31 16 Clover 96 48 54 62 Clover 31 31 11 11 Clover 56 13 13 71 Clover 32 13 27 32 Clover 36 26 54 32 Clover 53 08 06 54 Clover 32 32 62 16 ; proc discrim data=crops outstat=cropstat method=normal pool=yes list crossvalidate; class Crop; priors prop; id xvalues; var x1-x4; title2 'Using Linear Discriminant Function'; run;
Now use the calibration information stored in the Cropstat data set to classify a test data set. The TESTLIST option lists the classifi cation results for each observation in the test data set. The following statements produce Output 25.4.4 and Output 25.4.5:
data test; input Crop $ 110 x1x4 xvalues $ 1121; datalines; Corn 16 27 31 33 Soybeans 21 25 23 24 Cotton 29 24 26 28 Sugarbeets54 23 21 54 Clover 32 32 62 16 ; proc discrim data=cropstat testdata=test testout=tout testlist; class Crop; testid xvalues; var x1x4; title2 'Classification of Test Data'; run; proc print data=tout; title2 'Output Classification Results of Test Data'; run;
Discriminant Analysis of Remote Sensing Data on Five Crops Classification of Test Data The DISCRIM Procedure Classification Results for Test Data: WORK.TEST Classification Results using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) j j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Posterior Probability of Membership in Crop Classified xvalues From Crop into Crop Clover Corn Cotton Soybeans Sugarbeets 16 27 31 33 Corn Corn 0.0894 0.4054 0.1763 0.2392 0.0897 21 25 23 24 Soybeans Soybeans 0.1481 0.2431 0.1200 0.3318 0.1570 29 24 26 28 Cotton Soybeans * 0.2521 0.1842 0.1529 0.2549 0.1559 54 23 21 54 Sugarbeets Clover * 0.6215 0.0194 0.1250 0.0496 0.1845 32 32 62 16 Clover Cotton * 0.2163 0.3180 0.3327 0.1125 0.0206 * Misclassified observation
Discriminant Analysis of Remote Sensing Data on Five Crops Classification of Test Data The DISCRIM Procedure Classification Summary for Test Data: WORK.TEST Classification Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) j j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Total Clover 0 0 1 0 0 1 0.00 0.00 100.00 0.00 0.00 100.00 Corn 0 1 0 0 0 1 0.00 100.00 0.00 0.00 0.00 100.00 Cotton 0 0 0 1 0 1 0.00 0.00 0.00 100.00 0.00 100.00 Soybeans 0 0 0 1 0 1 0.00 0.00 0.00 100.00 0.00 100.00 Sugarbeets 1 0 0 0 0 1 100.00 0.00 0.00 0.00 0.00 100.00 Total 1 1 1 2 0 5 20.00 20.00 20.00 40.00 0.00 100.00 Priors 0.30556 0.19444 0.16667 0.16667 0.16667 Error Count Estimates for Crop Clover Corn Cotton Soybeans Sugarbeets Total Rate 1.0000 0.0000 1.0000 0.0000 1.0000 0.6389 Priors 0.3056 0.1944 0.1667 0.1667 0.1667
Discriminant Analysis of Remote Sensing Data on Five Crops Output Classification Results of Test Data Obs Crop x1 x2 x3 x4 xvalues Clover Corn Cotton Soybeans Sugarbeets _INTO_ 1 Corn 16 27 31 33 16 27 31 33 0.08935 0.40543 0.17632 0.23918 0.08972 Corn 2 Soybeans 21 25 23 24 21 25 23 24 0.14811 0.24308 0.11999 0.33184 0.15698 Soybeans 3 Cotton 29 24 26 28 29 24 26 28 0.25213 0.18420 0.15294 0.25486 0.15588 Soybeans 4 Sugarbeets 54 23 21 54 54 23 21 54 0.62150 0.01937 0.12498 0.04962 0.18452 Clover 5 Clover 32 32 62 16 32 32 62 16 0.21633 0.31799 0.33266 0.11246 0.02056 Cotton
In this example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data of Example 25.4. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output 25.5.1:
proc discrim data=crops method=normal pool=no crossvalidate; class Crop; priors prop; id xvalues; var x1-x4; title2 'Using Quadratic Discriminant Function'; run;
Discriminant Analysis of Remote Sensing Data on Five Crops Using Quadratic Discriminant Function The DISCRIM Procedure Observations 36 DF Total 35 Variables 4 DF Within Classes 31 Classes 5 DF Between Classes 4 Class Level Information Variable Prior Crop Name Frequency Weight Proportion Probability Clover Clover 11 11.0000 0.305556 0.305556 Corn Corn 7 7.0000 0.194444 0.194444 Cotton Cotton 6 6.0000 0.166667 0.166667 Soybeans Soybeans 6 6.0000 0.166667 0.166667 Sugarbeets Sugarbeets 6 6.0000 0.166667 0.166667
Discriminant Analysis of Remote Sensing Data on Five Crops Using Quadratic Discriminant Function The DISCRIM Procedure Within Covariance Matrix Information Natural Log of the Covariance Determinant of the Crop Matrix Rank Covariance Matrix Clover 4 23.64618 Corn 4 11.13472 Cotton 4 13.23569 Soybeans 4 12.45263 Sugarbeets 4 17.76293
Discriminant Analysis of Remote Sensing Data on Five Crops Using Quadratic Discriminant Function The DISCRIM Procedure Pairwise Generalized Squared Distances Between Groups 2 _ _ 1 _ _ D (ij) = (X X)' COV (X X) + ln COV 2 ln PRIOR i j j i j j j Generalized Squared Distance to Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Clover 26.01743 1320 104.18297 194.10546 31.40816 Corn 27.73809 14.40994 150.50763 38.36252 25.55421 Cotton 26.38544 588.86232 16.81921 52.03266 37.15560 Soybeans 27.07134 46.42131 41.01631 16.03615 23.15920 Sugarbeets 26.80188 332.11563 43.98280 107.95676 21.34645
Discriminant Analysis of Remote Sensing Data on Five Crops Using Quadratic Discriminant Function The DISCRIM Procedure Classification Summary for Calibration Data: WORK.CROPS Resubstitution Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV 2 ln PRIOR j j j j j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Total Clover 9 0 0 0 2 11 81.82 0.00 0.00 0.00 18.18 100.00 Corn 0 7 0 0 0 7 0.00 100.00 0.00 0.00 0.00 100.00 Cotton 0 0 6 0 0 6 0.00 0.00 100.00 0.00 0.00 100.00 Soybeans 0 0 0 6 0 6 0.00 0.00 0.00 100.00 0.00 100.00 Sugarbeets 0 0 1 1 4 6 0.00 0.00 16.67 16.67 66.67 100.00 Total 9 7 7 7 6 36 25.00 19.44 19.44 19.44 16.67 100.00 Priors 0.30556 0.19444 0.16667 0.16667 0.16667 Error Count Estimates for Crop Clover Corn Cotton Soybeans Sugarbeets Total Rate 0.1818 0.0000 0.0000 0.0000 0.3333 0.1111 Priors 0.3056 0.1944 0.1667 0.1667 0.1667
Discriminant Analysis of Remote Sensing Data on Five Crops Using Quadratic Discriminant Function The DISCRIM Procedure Classification Summary for Calibration Data: WORK.CROPS Cross validation Summary using Quadratic Discriminant Function Generalized Squared Distance Function 2 _ 1 _ D (X) = (X X)' COV (X X) + ln COV 2 ln PRIOR j (X)j (X)j (X)j (X)j j Posterior Probability of Membership in Each Crop 2 2 Pr(jX) = exp( .5 D (X)) / SUM exp( .5 D (X)) j k k Number of Observations and Percent Classified into Crop From Crop Clover Corn Cotton Soybeans Sugarbeets Total Clover 9 0 0 0 2 11 81.82 0.00 0.00 0.00 18.18 100.00 Corn 3 2 0 0 2 7 42.86 28.57 0.00 0.00 28.57 100.00 Cotton 3 0 2 0 1 6 50.00 0.00 33.33 0.00 16.67 100.00 Soybeans 3 0 0 2 1 6 50.00 0.00 0.00 33.33 16.67 100.00 Sugarbeets 3 0 1 1 1 6 50.00 0.00 16.67 16.67 16.67 100.00 Total 21 2 3 3 7 36 58.33 5.56 8.33 8.33 19.44 100.00 Priors 0.30556 0.19444 0.16667 0.16667 0.16667 Error Count Estimates for Crop Clover Corn Cotton Soybeans Sugarbeets Total Rate 0.1818 0.7143 0.6667 0.6667 0.8333 0.5556 Priors 0.3056 0.1944 0.1667 0.1667 0.1667