Examples


The iris data published by Fisher (1936) are widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on fifty iris specimens from each of three species, Iris setosa, I. versicolor, and I. virginica . The iris data are used in Example 25.1 through Example 25.3.

Example 25.4 and Example 25.5 use remote-sensing data on crops. In this data set, the observations are grouped into five crops: clover, corn, cotton, soybeans, and sugar beets. Four measures called X1 through X4 make up the descriptive variables .

Example 25.1. Univariate Density Estimates and Posterior Probabilities

In this example, several discriminant analyses are run with a single quantitative variable, petal width, so that density estimates and posterior probabilities can be plotted easily. The example produces Output 25.1.1 through Output 25.1.5. The GCHART procedure is used to display the sample distribution of petal width in the three species. Note the overlap between species I. versicolor and I. virginica that the bar chart shows. These statements produce Output 25.1.1:

  proc format;   value specname   1='Setosa    '   2='Versicolor'   3='Virginica ';   run;   data iris;   title 'Discriminant Analysis of Fisher (1936) Iris Data';   input SepalLength SepalWidth PetalLength PetalWidth   Species @@;   format Species specname.;   label SepalLength='Sepal Length in mm.'   SepalWidth ='Sepal Width in mm.'   PetalLength='Petal Length in mm.'   PetalWidth ='Petal Width in mm.';   symbol = put(Species, specname10.);   datalines;   50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3   63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2   59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2   65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3   68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3   77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3   49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2   64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3   55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1   49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1   67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1   77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2   50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1   61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1   61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1   51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1   51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1   46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1   50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3   57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1   71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3   49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1   49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1   66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1   44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2   47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2   74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1   56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3   49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1   56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2   51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3   54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3   61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3   68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1   45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1   55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1   51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2   63 33 60 25 3 53 37 15 02 1   ;   pattern1 c=red    /*v=l1   */;   pattern2 c=yellow /*v=empty*/;   pattern3 c=blue   /*v=r1   */;   axis1 label=(angle=90);   axis2 value=(height=.6);   legend1 frame label=none;   proc gchart data=iris;   vbar PetalWidth / subgroup=Species midpoints=0 to 25   raxis=axis1 maxis=axis2 legend=legend1 cframe=ligr;   run;  
Output 25.1.1: Sample Distribution of Petal Width in Three Species
start example
click to expand
end example
 
Output 25.1.2: Normal Density Estimates with Equal Variance
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          1          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross-validation Results using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X-X)' COV    (X-X)   j          (X)j      (X)     (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.9610        0.0390   9   Versicolor    Virginica  *        0.0000        0.0952        0.9048   57   Virginica     Versicolor *        0.0000        0.9940        0.0060   78   Virginica     Versicolor *        0.0000        0.8009        0.1991   91   Virginica     Versicolor *        0.0000        0.9610        0.0390   148   Versicolor    Virginica  *        0.0000        0.3828        0.6172   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross-validation Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X-X)' COV    (X-X)   j          (X)j      (X)     (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               4             46           50   0.00            8.00          92.00       100.00   Total                50              52             48          150   33.33           34.67          32.00       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0800      0.0400   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X-X)' COV  (X-X)   j          j            j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total                26              18             27           71   36.62           25.35          38.03       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
end example
 
Output 25.1.3: Normal Density Estimates with Unequal Variance
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          1          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross-validation Results using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X-X)' COV    (X-X) + ln COV   j          (X)j      (X)j    (X)j           (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.8740        0.1260   9   Versicolor    Virginica  *        0.0000        0.0686        0.9314   42   Setosa        Versicolor *        0.4923        0.5073        0.0004   57   Virginica     Versicolor *        0.0000        0.9602        0.0398   78   Virginica     Versicolor *        0.0000        0.6558        0.3442   91   Virginica     Versicolor *        0.0000        0.8740        0.1260   148   Versicolor    Virginica  *        0.0000        0.2871        0.7129   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross-validation Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X-X)' COV    (X-X) + ln COV   j          (X)j      (X)j    (X)j           (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               49               1              0           50   98.00            2.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               4             46           50   0.00            8.00          92.00       100.00   Total                49              53             48          150   32.67           35.33          32.00       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0200        0.0400       0.0800      0.0467   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X-X)' COV  (X-X) + ln COV   j          j      j     j           j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total                23              20             28           71   32.39           28.17          39.44       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
end example
 
Output 25.1.4: Kernel Density Estimates with Equal Bandwidth
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          1          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross-validation Results using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (XY)' COV  (XY)   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.8827        0.1173   9   Versicolor    Virginica  *        0.0000        0.0438        0.9562   57   Virginica     Versicolor *        0.0000        0.9472        0.0528   78   Virginica     Versicolor *        0.0000        0.8061        0.1939   91   Virginica     Versicolor *        0.0000        0.8827        0.1173   148   Versicolor    Virginica  *        0.0000        0.2586        0.7414   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross-validation Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (XY)' COV  (XY)   Posterior Probability of Membership in Each Species   1               2           2   F(Xj) = n   SUM exp(.5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               4             46           50   0.00            8.00          92.00       100.00   Total                50              52             48          150   33.33           34.67          32.00       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0800      0.0400   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X-Y) COV  (X-Y)   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total                26              18             27           71   36.62           25.35          38.03       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
end example
 
Output 25.1.5: Kernel Density Estimates with Unequal Bandwidth
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          1          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross-validation Results using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (XY)' COV  (XY)   j   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.8805        0.1195   9   Versicolor    Virginica  *        0.0000        0.0466        0.9534   57   Virginica     Versicolor *        0.0000        0.9394        0.0606   78   Virginica     Versicolor *        0.0000        0.7193        0.2807   91   Virginica     Versicolor *        0.0000        0.8805        0.1195   148   Versicolor    Virginica  *        0.0000        0.2275        0.7725   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross-validation Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X-Y)' COV  (X-Y)   j   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               4             46           50   0.00            8.00          92.00       100.00   Total                50              52             48          150   33.33           34.67          32.00       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0800      0.0400   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (XY)' COV  (XY)   j   Posterior Probability of Membership in Each Species   1               2           2   F(Xj) = n   SUM exp(.5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total                25              18             28           71   35.21           25.35          39.44       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
end example
 

In order to plot the density estimates and posterior probabilities, a data set called plotdata is created containing equally spaced values from -5 to 30, covering the range of petal width with a little to spare on each end. The plotdata data set is used with the TESTDATA= option in PROC DISCRIM.

  data plotdata;   do PetalWidth=-5 to 30 by .5;   output;   end;   run;  

The same plots are produced after each discriminant analysis, so a macro can be used to reduce the amount of typing required. The macro PLOT uses two data sets. The data set plotd , containing density estimates, is created by the TESTOUTD= option in PROC DISCRIM. The data set plotp , containing posterior probabilities, is created by the TESTOUT= option. For each data set, the macro PLOT removes uninteresting values (near zero) and does an overlay plot showing all three species on a single plot. The following statements create the macro PLOT

  %macro plot;   data plotd;   set plotd;   if setosa<.002 then setosa=.;   if versicolor<.002 then versicolor=.;   if virginica <.002 then virginica=.;   label PetalWidth='Petal Width in mm.';   run;   symbol1 i=join v=none c=red    l=1 /*l=21*/;   symbol2 i=join v=none c=yellow l=1 /*l= 1*/;   symbol3 i=join v=none c=blue   l=1 /*l= 2*/;   legend1 label=none frame;   axis1 label=(angle=90 'Density') order=(0 to .6 by .1);   proc gplot data=plotd;   plot setosa*PetalWidth   versicolor*PetalWidth   virginica*PetalWidth   / overlay vaxis=axis1 legend=legend1 frame   cframe=ligr;   title3 'Plot of Estimated Densities';   run;   data plotp;   set plotp;   if setosa<.01 then setosa=.;   if versicolor<.01 then versicolor=.;   if virginica<.01 then virginica=.;   label PetalWidth='Petal Width in mm.';   run;   axis1 label=(angle=90 'Posterior Probability')   order=(0 to 1 by .2);   proc gplot data=plotp;   plot setosa*PetalWidth   versicolor*PetalWidth   virginica*PetalWidth   / overlay vaxis=axis1 legend=legend1 frame   cframe=ligr;   title3 'Plot of Posterior Probabilities';   run;   %mend;  

The first analysis uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in the three classes. The NOCLASSIFY option suppresses the resubstitution classification results of the input data set observations. The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates. The following statements produce Output 25.1.2:

  proc discrim data=iris method=normal pool=yes   testdata=plotdata testout=plotp testoutd=plotd   short noclassify crosslisterr;   class Species;   var PetalWidth;   title2 'Using Normal Density Estimates with Equal Variance';   run;   %plot  

The next analysis uses normal-theory methods assuming unequal variances (POOL=NO) in the three classes. The following statements produce Output 25.1.3:

  proc discrim data=iris method=normal pool=no   testdata=plotdata testout=plotp testoutd=plotd   short noclassify crosslisterr;   class Species;   var PetalWidth;   title2 'Using Normal Density Estimates with Unequal Variance';   run;   %plot  

Two more analyses are run with nonparametric methods (METHOD=NPAR), specifically kernel density estimates with normal kernels (KERNEL=NORMAL). The first of these uses equal bandwidths (smoothing parameters) (POOL=YES) in each class. The use of equal bandwidths does not constrain the density estimates to be of equal variance. The value of the radius parameter that, assuming normality, minimizes an approximate mean integrated square error is 0 . 48 (see the Nonparametric Methods section on page 1158). Choosing r = 0 . 4 gives a more detailed look at the irregularities in the data. The following statements produce Output 25.1.4:

  proc discrim data=iris method=npar kernel=normal   r=.4 pool=yes   testdata=plotdata testout=plotp   testoutd=plotd   short noclassify crosslisterr;   class Species;   var PetalWidth;   title2 'Using Kernel Density Estimates with Equal   Bandwidth';   run;   %plot  

Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 25.1.5:

  proc discrim data=iris method=npar kernel=normal   r=.4 pool=no   testdata=plotdata testout=plotp   testoutd=plotd   short noclassify crosslisterr;   class Species;   var PetalWidth;   title2 'Using Kernel Density Estimates with Unequal   Bandwidth';   run;   %plot  

Example 25.2. Bivariate Density Estimates and Posterior Probabilities

In this example, four more discriminant analyses of iris data are run with two quantitative variables: petal width and petal length. The example produces Output 25.2.1 through Output 25.2.5. A scatter plot shows the joint sample distribution. See Appendix B, Using the %PLOTIT Macro, for more information on the % PLOTIT macro.

Output 25.2.2: Normal Density Estimates with Equal Variance
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          2          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross-validation Results using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (XX)' COV    (XX)   j          (X)j      (X)     (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.8453        0.1547   9   Versicolor    Virginica  *        0.0000        0.2130        0.7870   25   Virginica     Versicolor *        0.0000        0.8322        0.1678   57   Virginica     Versicolor *        0.0000        0.8057        0.1943   91   Virginica     Versicolor *        0.0000        0.8903        0.1097   148   Versicolor    Virginica  *        0.0000        0.3118        0.6882   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross-validation Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (XX)' COV    (XX)   j          (X)j      (X)     (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               4             46           50   0.00            8.00          92.00       100.00   Total                50              52             48          150   33.33           34.67          32.00       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0800      0.0400   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Equal Variance   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (XX)' COV (XX)   j          j            j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total             14507           16888          12858        44253   32.78           38.16          29.06       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
click to expand
end example
 
Output 25.2.3: Normal Density Estimates with Unequal Variance
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          2          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross   validation Results using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X   X)' COV    (X   X) + ln COV   j          (X)j      (X)j    (X)j           (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.7288        0.2712   9   Versicolor    Virginica  *        0.0000        0.0903        0.9097   25   Virginica     Versicolor *        0.0000        0.5196        0.4804   91   Virginica     Versicolor *        0.0000        0.8335        0.1665   148   Versicolor    Virginica  *        0.0000        0.4675        0.5325   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross   validation Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X   X)' COV    (X   X) + ln COV   j          (X)j      (X)j    (X)j           (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               3             47           50   0.00            6.00          94.00       100.00   Total                50              51             49          150   33.33           34.00          32.67       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0600      0.0333   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Normal Density Estimates with Unequal Variance   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X) + ln COV   j          j      j     j           j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total              5461            5354          33438        44253   12.34           12.10          75.56       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
click to expand
end example
 
Output 25.2.4: Kernel Density Estimates with Equal Bandwidth
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          2          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross   validation Results using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X   Y)' COV  (X   Y)   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.7474        0.2526   9   Versicolor    Virginica  *        0.0000        0.0800        0.9200   25   Virginica     Versicolor *        0.0000        0.5863        0.4137   91   Virginica     Versicolor *        0.0000        0.8358        0.1642   148   Versicolor    Virginica  *        0.0000        0.4123        0.5877   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross   validation Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X   Y)' COV  (X   Y)   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               3             47           50   0.00            6.00          94.00       100.00   Total                50              51             49          150   33.33           34.00          32.67       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0600      0.0333   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Equal Bandwidth   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X   Y)' COV  (X   Y)   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total             12631            9941          21681        44253   28.54           22.46          48.99       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
click to expand
end example
 
Output 25.2.5: Kernel Density Estimates with Unequal Bandwidth
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          2          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross   validation Results using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X   Y)' COV  (X   Y)   j   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Posterior Probability of Membership in Species   From          Classified   Obs   Species       into Species        Setosa    Versicolor     Virginica   5   Virginica     Versicolor *        0.0000        0.7826        0.2174   9   Versicolor    Virginica  *        0.0000        0.0506        0.9494   91   Virginica     Versicolor *        0.0000        0.8802        0.1198   148   Versicolor    Virginica  *        0.0000        0.3726        0.6274   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross   validation Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X   Y)' COV  (X   Y)   j   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               2             48           50   0.00            4.00          96.00       100.00   Total                50              50             50          150   33.33           33.33          33.33       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0400      0.0267   Priors            0.3333        0.3333       0.3333  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Kernel Density Estimates with Unequal Bandwidth   The DISCRIM Procedure   Classification Summary for Test Data: WORK.PLOTDATA   Classification Summary using Normal Kernel Density   Squared Distance Function   2   1   D (X,Y) = (X   Y)' COV  (X   Y)   j   Posterior Probability of Membership in Each Species     1               2           2   F(Xj) = n   SUM exp(   .5 D (X,Y) / R)   j   i                 ji   Pr(jX) = PRIOR  F(Xj) / SUM PRIOR  F(Xk)   j           k       k   Number of Observations and Percent Classified into Species   Setosa      Versicolor      Virginica        Total   Total              5447            5984          32822        44253   12.31           13.52          74.17       100.00   Priors          0.33333         0.33333        0.33333  
click to expand
click to expand
click to expand
end example
 

Another data set is created for plotting, containing a grid of points suitable for contour plots. The large number of points in the grid makes the following analyses very time-consuming . If you attempt to duplicate these examples, begin with a small number of points in the grid.

  data plotdata;   do PetalLength=   2 to 72 by 0.25;   h + 1;    * Number of horizontal cells;   do PetalWidth=   5 to 32 by 0.25;   n + 1; * Total number of cells;   output;   end;   end;   * Make variables to contain H and V grid sizes;   call symput('hnobs', compress(put(h    , best12.)));   call symput('vnobs', compress(put(n / h, best12.)));   drop n h;   run;  

A macro CONTOUR is defined to make contour plots of density estimates and posterior probabilities. Classification results are also plotted on the same grid.

  %macro contour;   data contour(keep=PetalWidth PetalLength symbol density);   set plotd(in=d) iris;   if d then density = max(setosa,versicolor,virginica);   run;   title3 'Plot of Estimated Densities';   %plotit(data=contour, plotvars=PetalWidth PetalLength,   labelvar=_blank_, symvar=symbol, typevar=symbol,   symlen=4, exttypes=symbol contour, ls=100,   paint=density white black, rgbtypes=contour,   hnobs=&hnobs, vnobs=&vnobs, excolors=white,   rgbround=-16 1 1 1,  extend=close, options=noclip,   types  =Setosa Versicolor Virginica  '',   symtype=symbol symbol    symbol      contour,   symsize=0.6    0.6       0.6         1,   symfont=swiss  swiss     swiss       solid)   data posterior(keep=PetalWidth PetalLength symbol   prob _into_);   set plotp(in=d) iris;   if d then prob = max(setosa,versicolor,virginica);   run;   title3 'Plot of Posterior Probabilities '   '(Black to White is Low to High Probability)';   %plotit(data=posterior, plotvars=PetalWidth PetalLength,   labelvar=_blank_, symvar=symbol, typevar=symbol,   symlen=4, exttypes=symbol contour, ls=100,   paint=prob black white 0.3 0.999, rgbtypes=contour,   hnobs=&hnobs, vnobs=&vnobs,  excolors=white,   rgbround=-16 1 1 1, extend=close, options=noclip,   types  =Setosa Versicolor Virginica  '',   symtype=symbol symbol     symbol     contour,   symsize=0.6    0.6        0.6        1,   symfont=swiss  swiss      swiss      solid)   title3 'Plot of Classification Results';   %plotit(data=posterior, plotvars=PetalWidth PetalLength,   labelvar=_blank_, symvar=symbol, typevar=symbol,   symlen=4, exttypes=symbol contour, ls=100,   paint=_into_ CXCCCCCC CXDDDDDD white,   rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs,   excolors=white,   extend=close, options=noclip,   types  =Setosa Versicolor Virginica  '',   symtype=symbol symbol     symbol     contour,   symsize=0.6    0.6        0.6        1,   symfont=swiss  swiss      swiss      solid)   %mend;  

A normal-theory analysis (METHOD=NORMAL) assuming equal covariance matrices (POOL=YES) illustrates the linearity of the classification boundaries. These statements produce Output 25.2.2:

  proc discrim data=iris method=normal pool=yes   testdata=plotdata testout=plotp testoutd=plotd   short noclassify crosslisterr;   class Species;   var Petal:;   title2 'Using Normal Density Estimates with Equal   Variance';   run;   %contour  

A normal-theory analysis assuming unequal covariance matrices (POOL=NO) illustrates quadratic classification boundaries. These statements produce Output 25.2.3:

  proc discrim data=iris method=normal pool=no   testdata=plotdata testout=plotp testoutd=plotd   short noclassify crosslisterr;   class Species;   var Petal:;   title2 'Using Normal Density Estimates with Unequal   Variance';   run;   %contour  

A nonparametric analysis (METHOD=NPAR) follows , using normal kernels (KERNEL=NORMAL) and equal bandwidths (POOL=YES) in each class. The value of the radius parameter r that, assuming normality, minimizes an approximate mean integrated square error is 0 . 50 (see the Nonparametric Methods section on page 1158). These statements produce Output 25.2.4:

  proc discrim data=iris method=npar kernel=normal   r=.5 pool=yes   testdata=plotdata testout=plotp   testoutd=plotd   short noclassify crosslisterr;   class Species;   var Petal:;   title2 'Using Kernel Density Estimates with Equal   Bandwidth';   run;   %contour  

Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 25.2.5:

  proc discrim data=iris method=npar kernel=normal   r=.5 pool=no   testdata=plotdata testout=plotp   testoutd=plotd   short noclassify crosslisterr;   class Species;   var Petal:;   title2 'Using Kernel Density Estimates with Unequal   Bandwidth';   run;   %contour  

Example 25.3. Normal-Theory Discriminant Analysis of Iris Data

In this example, PROC DISCRIM uses normal-theory methods to classify the iris data used in Example 25.1. The POOL=TEST option tests the homogeneity of the within- group covariance matrices ( Output 25.3.3). Since the resulting test statistic is significant at the 0.10 level, the within-group covariance matrices are used to derive the quadratic discriminant criterion. The WCOV and PCOV options display the within-group covariance matrices and the pooled covariance matrix ( Output 25.3.2). The DISTANCE option displays squared distances between classes ( Output 25.3.4). The ANOVA and MANOVA options test the hypothesis that the class means are equal, using univariate statistics and multivariate statistics; all statistics are significantatthe 0.0001 level ( Output 25.3.5). The LISTERR option lists the misclassified observations under resubstitution ( Output 25.3.6). The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates ( Output 25.3.7). The resubstitution error count estimate, 0.02, is not larger than the cross validation error count estimate, 0.0267, as would be expected because the resubstitution estimate is optimistically biased . The OUTSTAT= option generates a TYPE=MIXED (because POOL=TEST) output data set containing various statistics such as means, covariances, and coefficients of the discriminant function ( Output 25.3.8).

The following statements produce Output 25.3.1 through Output 25.3.8:

  proc discrim data=iris outstat=irisstat   wcov pcov method=normal pool=test   distance anova manova listerr crosslisterr;   class Species;   var SepalLength SepalWidth PetalLength PetalWidth;   title2 'Using Quadratic Discriminant Function';   run;   proc print data=irisstat;   title2 'Output Discriminant Statistics';   run;  
Output 25.3.1: Quadratic Discriminant Analysis of Iris Data
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Observations     150          DF Total               149   Variables          4          DF Within Classes      147   Classes            3          DF Between Classes       2   Class Level Information   Variable                                                    Prior   Species       Name          Frequency       Weight    Proportion    Probability   Setosa        Setosa               50      50.0000      0.333333       0.333333   Versicolor    Versicolor           50      50.0000      0.333333       0.333333   Virginica     Virginica            50      50.0000      0.333333       0.333333  
end example
 
Output 25.3.2: Covariance Matrices
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Within-Class Covariance Matrices   Species = Setosa,     DF = 49   Variable      Label                  SepalLength     SepalWidth    PetalLength     PetalWidth   SepalLength   Sepal Length in mm.    12.42489796     9.92163265     1.63551020     1.03306122   SepalWidth    Sepal Width in mm.      9.92163265    14.36897959     1.16979592     0.92979592   PetalLength   Petal Length in mm.     1.63551020     1.16979592     3.01591837     0.60693878   PetalWidth    Petal Width in mm.      1.03306122     0.92979592     0.60693878     1.11061224   ----------------------------------------------------------------------------------------------   Species = Versicolor,     DF = 49   Variable      Label                  SepalLength     SepalWidth    PetalLength     PetalWidth   SepalLength   Sepal Length in mm.    26.64326531     8.51836735    18.28979592     5.57795918   SepalWidth    Sepal Width in mm.      8.51836735     9.84693878     8.26530612     4.12040816   PetalLength   Petal Length in mm.    18.28979592     8.26530612    22.08163265     7.31020408   PetalWidth    Petal Width in mm.      5.57795918     4.12040816     7.31020408     3.91061224   ----------------------------------------------------------------------------------------------   Species = Virginica,     DF = 49   Variable      Label                  SepalLength     SepalWidth    PetalLength     PetalWidth   SepalLength   Sepal Length in mm.    40.43428571     9.37632653    30.32897959     4.90938776   SepalWidth    Sepal Width in mm.      9.37632653    10.40040816     7.13795918     4.76285714   PetalLength   Petal Length in mm.    30.32897959     7.13795918    30.45877551     4.88244898   PetalWidth    Petal Width in mm.      4.90938776     4.76285714     4.88244898     7.54326531   ----------------------------------------------------------------------------------------------  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Pooled Within-Class Covariance Matrix,     DF = 147   Variable      Label                  SepalLength     SepalWidth    PetalLength     PetalWidth   SepalLength   Sepal Length in mm.    26.50081633     9.27210884    16.75142857     3.84013605   SepalWidth    Sepal Width in mm.      9.27210884    11.53877551     5.52435374     3.27102041   PetalLength   Petal Length in mm.    16.75142857     5.52435374    18.51877551     4.26653061   PetalWidth    Petal Width in mm.      3.84013605     3.27102041     4.26653061     4.18816327   Within Covariance Matrix Information   Natural Log of the   Covariance    Determinant of the   Species       Matrix Rank     Covariance Matrix   Setosa                  4               5.35332   Versicolor              4               7.54636   Virginica               4               9.49362   Pooled                  4               8.46214  
end example
 
Output 25.3.3: Homogeneity Test
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Test of Homogeneity of Within Covariance Matrices   Notation: K    = Number of Groups   P    = Number of Variables   N    = Total Number of Observations - Number of Groups   N(i) = Number of Observations in the ith Group   1   __                       N(i)/2   Within SS Matrix(i)   V    = -----------------------------------   N/2   Pooled SS Matrix   _                  _     2   1        1     2P + 3P   1   RHO  = 1.0   SUM -----   ---   -------------   _     N(i)      N  _  6(P+1)(K   1)   DF   = .5(K   1)P(P+1)   _                  _   PN/2   N        V   Under the null hypothesis:   2 RHO ln  ------------------   __       PN(i)/2   _  N(i)          _   is distributed approximately as Chi-Square(DF).   Chi-Square        DF    Pr > ChiSq   140.943050        20        <.0001   Since the Chi-Square value is significant at the 0.1 level, the within   covariance matrices will be used in the discriminant function.   Reference: Morrison, D.F. (1976) Multivariate Statistical Methods   p252.  
end example
 
Output 25.3.4: Squared Distances
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Pairwise Squared Distances Between Groups   2         _   _   1  _   _   D (ij) = (X   X)' COV   (X   X)   i   j      j    i   j   Squared Distance to Species   From   Species           Setosa    Versicolor     Virginica   Setosa                 0     103.19382     168.76759   Versicolor     323.06203             0      13.83875   Virginica      706.08494      17.86670             0   Pairwise Generalized Squared Distances Between Groups   2         _   _   1   _   _   D (ij) = (X   X)' COV    (X   X) + ln COV   i   j      j     i   j           j   Generalized Squared Distance to Species   From   Species           Setosa    Versicolor     Virginica   Setosa           5.35332     110.74017     178.26121   Versicolor     328.41535       7.54636      23.33238   Virginica      711.43826      25.41306       9.49362  
end example
 
Output 25.3.5: Tests of Equal Class Means
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Univariate Test Statistics   F Statistics,    Num DF=2,   Den DF=147   Total    Pooled   Between   Standard  Standard  Standard           R-Square   Variable    Label               Deviation Deviation Deviation R-Square / (1   RSq) F Value Pr > F   SepalLength Sepal Length in mm.    8.2807    5.1479    7.9506   0.6187    1.6226  119.26 <.0001   SepalWidth  Sepal Width in mm.     4.3587    3.3969    3.3682   0.4008    0.6688   49.16 <.0001   PetalLength Petal Length in mm.   17.6530    4.3033   20.9070   0.9414   16.0566 1180.16 <.0001   PetalWidth  Petal Width in mm.     7.6224    2.0465    8.9673   0.9289   13.0613  960.01 <.0001   Average R-Square   Unweighted              0.7224358   Weighted by Variance    0.8689444   Multivariate Statistics and F Approximations   S=2    M=0.5    N=71   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.02343863     199.15         8       288    <.0001   Pillai's Trace              1.19189883      53.47         8       290    <.0001   Hotelling-Lawley Trace     32.47732024     582.20         8     203.4    <.0001   Roy's Greatest Root        32.19192920    1166.96         4       145    <.0001   NOTE: F Statistic for Roy's Greatest Root is an upper bound.   NOTE: F Statistic for Wilks' Lambda is exact.  
end example
 
Output 25.3.6: Misclassified Observations: Resubstitution
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Resubstitution Results using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X) + ln COV   j          j      j     j           j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Species   From          Classified   Obs    Species       into Species        Setosa    Versicolor     Virginica   5    Virginica     Versicolor *        0.0000        0.6050        0.3950   9    Versicolor    Virginica  *        0.0000        0.3359        0.6641   12    Versicolor    Virginica  *        0.0000        0.1543        0.8457   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Resubstitution Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X) + ln COV   j          j      j     j           j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              48              2           50   0.00           96.00           4.00       100.00   Virginica             0               1             49           50   0.00            2.00          98.00       100.00   Total                50              49             51          150   33.33           32.67          34.00       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0400       0.0200      0.0200   Priors            0.3333        0.3333       0.3333  
end example
 
Output 25.3.7: Misclassified Observations: Cross validation
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.IRIS   Cross   validation Results using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X   X)' COV    (X   X) + ln COV   j          (X)j      (X)j    (X)j           (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Species   From          Classified   Obs    Species       into Species        Setosa    Versicolor     Virginica   5    Virginica     Versicolor *        0.0000        0.6632        0.3368   8    Versicolor    Virginica  *        0.0000        0.3134        0.6866   9    Versicolor    Virginica  *        0.0000        0.1616        0.8384   12    Versicolor    Virginica  *        0.0000        0.0713        0.9287   * Misclassified observation  
  Discriminant Analysis of Fisher (1936) Iris Data   Using Quadratic Discriminant Function   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.IRIS   Cross   validation Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X   X)' COV    (X   X) + ln COV   j          (X)j      (X)j    (X)j           (X)j   Posterior Probability of Membership in Each Species   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Species   From   Species          Setosa      Versicolor      Virginica        Total   Setosa               50               0              0           50   100.00            0.00           0.00       100.00   Versicolor            0              47              3           50   0.00           94.00           6.00       100.00   Virginica             0               1             49           50   0.00            2.00          98.00       100.00   Total                50              48             52          150   33.33           32.00          34.67       100.00   Priors          0.33333         0.33333        0.33333   Error Count Estimates for Species   Setosa    Versicolor    Virginica       Total   Rate              0.0000        0.0600       0.0200      0.0267   Priors            0.3333        0.3333       0.3333  
end example
 
Output 25.3.8: Output Statistics from Iris Data
start example
  Discriminant Analysis of Fisher (1936) Iris Data   Output Discriminant Statistics   Sepal       Sepal        Petal       Petal   Obs     Species      _TYPE_        _NAME_          Length       Width       Length       Width   1             .    N                             150.00      150.00       150.00      150.00   2    Setosa        N                              50.00       50.00        50.00       50.00   3    Versicolor    N                              50.00       50.00        50.00       50.00   4    Virginica     N                              50.00       50.00        50.00       50.00   5             .    MEAN                           58.43       30.57        37.58       11.99   6    Setosa        MEAN                           50.06       34.28        14.62        2.46   7    Versicolor    MEAN                           59.36       27.70        42.60       13.26   8    Virginica     MEAN                           65.88       29.74        55.52       20.26   9    Setosa        PRIOR                           0.33        0.33         0.33        0.33   10    Versicolor    PRIOR                           0.33        0.33         0.33        0.33   11    Virginica     PRIOR                           0.33        0.33         0.33        0.33   12    Setosa        CSSCP       SepalLength       608.82      486.16        80.14       50.62   13    Setosa        CSSCP       SepalWidth        486.16      704.08        57.32       45.56   14    Setosa        CSSCP       PetalLength        80.14       57.32       147.78       29.74   15    Setosa        CSSCP       PetalWidth         50.62       45.56        29.74       54.42   16    Versicolor    CSSCP       SepalLength      1305.52      417.40       896.20      273.32   17    Versicolor    CSSCP       SepalWidth        417.40      482.50       405.00      201.90   18    Versicolor    CSSCP       PetalLength       896.20      405.00      1082.00      358.20   19    Versicolor    CSSCP       PetalWidth        273.32      201.90       358.20      191.62   20    Virginica     CSSCP       SepalLength      1981.28      459.44      1486.12      240.56   21    Virginica     CSSCP       SepalWidth        459.44      509.62       349.76      233.38   22    Virginica     CSSCP       PetalLength      1486.12      349.76      1492.48      239.24   23    Virginica     CSSCP       PetalWidth        240.56      233.38       239.24      369.62   24             .    PSSCP       SepalLength      3895.62     1363.00      2462.46      564.50   25             .    PSSCP       SepalWidth       1363.00     1696.20       812.08      480.84   26             .    PSSCP       PetalLength      2462.46      812.08      2722.26      627.18   27             .    PSSCP       PetalWidth        564.50      480.84       627.18      615.66   28             .    BSSCP       SepalLength      6321.21   1995.27     16524.84     7127.93   29             .    BSSCP       SepalWidth   1995.27     1134.49   5723.96   2293.27   30             .    BSSCP       PetalLength     16524.84   5723.96     43710.28    18677.40   31             .    BSSCP       PetalWidth       7127.93   2293.27     18677.40     8041.33   32             .    CSSCP       SepalLength     10216.83   632.27     18987.30     7692.43   33             .    CSSCP       SepalWidth   632.27     2830.69   4911.88   1812.43   34             .    CSSCP       PetalLength     18987.30   4911.88     46432.54    19304.58   35             .    CSSCP       PetalWidth       7692.43   1812.43     19304.58     8656.99   36             .    RSQUARED                        0.62        0.40         0.94        0.93   37    Setosa        COV         SepalLength        12.42        9.92         1.64        1.03   38    Setosa        COV         SepalWidth          9.92       14.37         1.17        0.93   39    Setosa        COV         PetalLength         1.64        1.17         3.02        0.61   40    Setosa        COV         PetalWidth          1.03        0.93         0.61        1.11   41    Versicolor    COV         SepalLength        26.64        8.52        18.29        5.58   42    Versicolor    COV         SepalWidth          8.52        9.85         8.27        4.12   43    Versicolor    COV         PetalLength        18.29        8.27        22.08        7.31   44    Versicolor    COV         PetalWidth          5.58        4.12         7.31        3.91   45    Virginica     COV         SepalLength        40.43        9.38        30.33        4.91   46    Virginica     COV         SepalWidth          9.38       10.40         7.14        4.76   47    Virginica     COV         PetalLength        30.33        7.14        30.46        4.88   48    Virginica     COV         PetalWidth          4.91        4.76         4.88        7.54   49             .    PCOV        SepalLength        26.50        9.27        16.75        3.84   50             .    PCOV        SepalWidth          9.27       11.54         5.52        3.27   51             .    PCOV        PetalLength        16.75        5.52        18.52        4.27   52             .    PCOV        PetalWidth          3.84        3.27         4.27        4.19   53             .    BCOV        SepalLength        63.21   19.95       165.25       71.28   54             .    BCOV        SepalWidth   19.95       11.34   57.24   22.93   55             .    BCOV        PetalLength       165.25   57.24       437.10      186.77   56             .    BCOV        PetalWidth         71.28   22.93       186.77       80.41   57             .    COV         SepalLength        68.57   4.24       127.43       51.63   58             .    COV         SepalWidth   4.24       19.00   32.97   12.16   59             .    COV         PetalLength       127.43   32.97       311.63      129.56   60             .    COV         PetalWidth         51.63   12.16       129.56       58.10   61    Setosa        STD                             3.52        3.79         1.74        1.05   62    Versicolor    STD                             5.16        3.14         4.70        1.98   63    Virginica     STD                             6.36        3.22         5.52        2.75   64             .    PSTD                            5.15        3.40         4.30        2.05   65             .    BSTD                            7.95        3.37        20.91        8.97   66             .    STD                             8.28        4.36        17.65        7.62   67    Setosa        CORR        SepalLength         1.00        0.74         0.27        0.28   68    Setosa        CORR        SepalWidth          0.74        1.00         0.18        0.23   69    Setosa        CORR        PetalLength         0.27        0.18         1.00        0.33   70    Setosa        CORR        PetalWidth          0.28        0.23         0.33        1.00  
  Discriminant Analysis of Fisher (1936) Iris Data   Output Discriminant Statistics   Sepal       Sepal       Petal      Petal   Obs     Species      _TYPE_      _NAME_           Length       Width      Length      Width   71    Versicolor    CORR        SepalLength       1.000       0.526       0.754      0.546   72    Versicolor    CORR        SepalWidth        0.526       1.000       0.561      0.664   73    Versicolor    CORR        PetalLength       0.754       0.561       1.000      0.787   74    Versicolor    CORR        PetalWidth        0.546       0.664       0.787      1.000   75    Virginica     CORR        SepalLength       1.000       0.457       0.864      0.281   76    Virginica     CORR        SepalWidth        0.457       1.000       0.401      0.538   77    Virginica     CORR        PetalLength       0.864       0.401       1.000      0.322   78    Virginica     CORR        PetalWidth        0.281       0.538       0.322      1.000   79             .    PCORR       SepalLength       1.000       0.530       0.756      0.365   80             .    PCORR       SepalWidth        0.530       1.000       0.378      0.471   81             .    PCORR       PetalLength       0.756       0.378       1.000      0.484   82             .    PCORR       PetalWidth        0.365       0.471       0.484      1.000   83             .    BCORR       SepalLength       1.000   0.745       0.994      1.000   84             .    BCORR       SepalWidth   0.745       1.000   0.813   0.759   85             .    BCORR       PetalLength       0.994   0.813       1.000      0.996   86             .    BCORR       PetalWidth        1.000   0.759       0.996      1.000   87             .    CORR        SepalLength       1.000   0.118       0.872      0.818   88             .    CORR        SepalWidth   0.118       1.000   0.428   0.366   89             .    CORR        PetalLength       0.872   0.428       1.000      0.963   90             .    CORR        PetalWidth        0.818   0.366       0.963      1.000   91    Setosa        STDMEAN   1.011       0.850   1.301   1.251   92    Versicolor    STDMEAN                       0.112   0.659       0.284      0.166   93    Virginica     STDMEAN                       0.899   0.191       1.016      1.085   94    Setosa        PSTDMEAN   1.627       1.091   5.335   4.658   95    Versicolor    PSTDMEAN                      0.180   0.846       1.167      0.619   96    Virginica     PSTDMEAN                      1.447   0.245       4.169      4.039   97             .    LNDETERM                      8.462       8.462       8.462      8.462   98    Setosa        LNDETERM                      5.353       5.353       5.353      5.353   99    Versicolor    LNDETERM                      7.546       7.546       7.546      7.546   100   Virginica     LNDETERM                      9.494       9.494       9.494      9.494   101   Setosa        QUAD        SepalLength   0.095       0.062       0.023      0.024   102   Setosa        QUAD        SepalWidth        0.062   0.078   0.006      0.011   103   Setosa        QUAD        PetalLength       0.023   0.006   0.194      0.090   104   Setosa        QUAD        PetalWidth        0.024       0.011       0.090   0.530   105   Setosa        QUAD        _LINEAR_          4.455   0.762       3.356   3.126   106   Setosa        QUAD        _CONST_   121.826   121.826   121.826   121.826   107   Versicolor    QUAD        SepalLength   0.048       0.018       0.043   0.032   108   Versicolor    QUAD        SepalWidth        0.018   0.099   0.011      0.097   109   Versicolor    QUAD        PetalLength       0.043   0.011   0.099      0.135   110   Versicolor    QUAD        PetalWidth   0.032       0.097       0.135   0.436   111   Versicolor    QUAD        _LINEAR_          1.801       1.596       0.327   1.471   112   Versicolor    QUAD        _CONST_   76.549   76.549   76.549   76.549   113   Virginica     QUAD        SepalLength   0.053       0.017       0.050   0.009   114   Virginica     QUAD        SepalWidth        0.017   0.079   0.006      0.042   115   Virginica     QUAD        PetalLength       0.050   0.006   0.067      0.014   116   Virginica     QUAD        PetalWidth   0.009       0.042       0.014   0.097   117   Virginica     QUAD        _LINEAR_          0.737       1.325       0.623      0.966   118   Virginica     QUAD        _CONST_   75.821   75.821   75.821   75.821  
end example
 

Example 25.4. Linear Discriminant Analysis of Remote-Sensing Data on Crops

In this example, the remote-sensing data described at the beginning of the section are used. In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The LIST option lists the resubstitution classification results for each observation ( Output 25.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates ( Output 25.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification variable, xvalues , are obtained by rereading the x1 through x4 fields in the data lines as a single character variable. The following statements produce Output 25.4.1 through Output 25.4.3.

Output 25.4.1: Linear Discriminant Function on Crop Data
start example
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Observations      36          DF Total                35   Variables          4          DF Within Classes       31   Classes            5          DF Between Classes       4   Class Level Information   Variable                                                    Prior   Crop          Name          Frequency       Weight    Proportion    Probability   Clover        Clover               11      11.0000      0.305556       0.305556   Corn          Corn                  7       7.0000      0.194444       0.194444   Cotton        Cotton                6       6.0000      0.166667       0.166667   Soybeans      Soybeans              6       6.0000      0.166667       0.166667   Sugarbeets    Sugarbeets            6       6.0000      0.166667       0.166667  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Pooled Covariance Matrix Information   Natural Log of the   Covariance    Determinant of the   Matrix Rank     Covariance Matrix   4              21.30189  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Pairwise Generalized Squared Distances Between Groups   2         _   _   1  _   _   D (ij) = (X   X)' COV   (X   X)   2 ln PRIOR   i   j           i   j              j   Generalized Squared Distance to Crop   From Crop         Clover          Corn        Cotton      Soybeans    Sugarbeets   Clover           2.37125       7.52830       4.44969       6.16665       5.07262   Corn             6.62433       3.27522       5.46798       4.31383       6.47395   Cotton           3.23741       5.15968       3.58352       5.01819       4.87908   Soybeans         4.95438       4.00552       5.01819       3.58352       4.65998   Sugarbeets       3.86034       6.16564       4.87908       4.65998       3.58352  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Linear Discriminant Function   _   1 _   1 _   Constant =   .5 X' COV   X  + ln PRIOR     Coefficient = COV   X   j        j           j    Vector               j   Linear Discriminant Function for Crop   Variable        Clover          Corn        Cotton      Soybeans    Sugarbeets   Constant   10.98457   7.72070   11.46537   7.28260   9.80179   x1             0.08907   0.04180       0.02462     0.0000369       0.04245   x2             0.17379       0.11970       0.17596       0.15896       0.20988   x3             0.11899       0.16511       0.15880       0.10622       0.06540   x4             0.15637       0.16768       0.18362       0.14133       0.16408  
end example
 
Output 25.4.2: Misclassified Observations: Resubstitution
start example
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Classification Results for Calibration Data: WORK.CROPS   Resubstitution Results using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X)   2 ln PRIOR   j          j            j              j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Crop   Classified   xvalues      From Crop   into Crop         Clover        Corn      Cotton    Soybeans  Sugarbeets   16 27 31 33  Corn        Corn              0.0894      0.4054      0.1763      0.2392      0.0897   15 23 30 30  Corn        Corn              0.0769      0.4558      0.1421      0.2530      0.0722   16 27 27 26  Corn        Corn              0.0982      0.3422      0.1365      0.3073      0.1157   18 20 25 23  Corn        Corn              0.1052      0.3634      0.1078      0.3281      0.0955   15 15 31 32  Corn        Corn              0.0588      0.5754      0.1173      0.2087      0.0398   15 32 32 15  Corn        Soybeans   *      0.0972      0.3278      0.1318      0.3420      0.1011   12 15 16 73  Corn        Corn              0.0454      0.5238      0.1849      0.1376      0.1083   20 23 23 25  Soybeans    Soybeans          0.1330      0.2804      0.1176      0.3305      0.1385   24 24 25 32  Soybeans    Soybeans          0.1768      0.2483      0.1586      0.2660      0.1502   21 25 23 24  Soybeans    Soybeans          0.1481      0.2431      0.1200      0.3318      0.1570   27 45 24 12  Soybeans    Sugarbeets *      0.2357      0.0547      0.1016      0.2721      0.3359   12 13 15 42  Soybeans    Corn       *      0.0549      0.4749      0.0920      0.2768      0.1013   22 32 31 43  Soybeans    Cotton     *      0.1474      0.2606      0.2624      0.1848      0.1448   31 32 33 34  Cotton      Clover     *      0.2815      0.1518      0.2377      0.1767      0.1523   29 24 26 28  Cotton      Soybeans   *      0.2521      0.1842      0.1529      0.2549      0.1559   34 32 28 45  Cotton      Clover     *      0.3125      0.1023      0.2404      0.1357      0.2091   26 25 23 24  Cotton      Soybeans   *      0.2121      0.1809      0.1245      0.3045      0.1780   53 48 75 26  Cotton      Clover     *      0.4837      0.0391      0.4384      0.0223      0.0166   34 35 25 78  Cotton      Cotton            0.2256      0.0794      0.3810      0.0592      0.2548   22 23 25 42  Sugarbeets  Corn       *      0.1421      0.3066      0.1901      0.2231      0.1381   25 25 24 26  Sugarbeets  Soybeans   *      0.1969      0.2050      0.1354      0.2960      0.1667   34 25 16 52  Sugarbeets  Sugarbeets        0.2928      0.0871      0.1665      0.1479      0.3056   54 23 21 54  Sugarbeets  Clover     *      0.6215      0.0194      0.1250      0.0496      0.1845   25 43 32 15  Sugarbeets  Soybeans   *      0.2258      0.1135      0.1646      0.2770      0.2191   26 54  2 54  Sugarbeets  Sugarbeets        0.0850      0.0081      0.0521      0.0661      0.7887   12 45 32 54  Clover      Cotton     *      0.0693      0.2663      0.3394      0.1460      0.1789   24 58 25 34  Clover      Sugarbeets *      0.1647      0.0376      0.1680      0.1452      0.4845   87 54 61 21  Clover      Clover            0.9328      0.0003      0.0478      0.0025      0.0165   51 31 31 16  Clover      Clover            0.6642      0.0205      0.0872      0.0959      0.1322   96 48 54 62  Clover      Clover            0.9215      0.0002      0.0604      0.0007      0.0173   31 31 11 11  Clover      Sugarbeets *      0.2525      0.0402      0.0473      0.3012      0.3588   56 13 13 71  Clover      Clover            0.6132      0.0212      0.1226      0.0408      0.2023   32 13 27 32  Clover      Clover            0.2669      0.2616      0.1512      0.2260      0.0943   36 26 54 32  Clover      Cotton     *      0.2650      0.2645      0.3495      0.0918      0.0292   53 08 06 54  Clover      Clover            0.5914      0.0237      0.0676      0.0781      0.2392   32 32 62 16  Clover      Cotton     *      0.2163      0.3180      0.3327      0.1125      0.0206   * Misclassified observation  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.CROPS   Resubstitution Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X)   2 ln PRIOR   j          j            j              j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Crop   From Crop        Clover         Corn       Cotton      Soybeans      Sugarbeets       Total   Clover                6            0            3             0               2          11   54.55         0.00        27.27          0.00           18.18      100.00   Corn                  0            6            0             1               0           7   0.00        85.71         0.00         14.29            0.00      100.00   Cotton                3            0            1             2               0           6   50.00         0.00        16.67         33.33            0.00      100.00   Soybeans              0            1            1             3               1           6   0.00        16.67        16.67         50.00           16.67      100.00   Sugarbeets            1            1            0             2               2           6   16.67        16.67         0.00         33.33           33.33      100.00   Total                10            8            5             8               5          36   27.78        22.22        13.89         22.22           13.89      100.00   Priors          0.30556      0.19444      0.16667       0.16667         0.16667   Error Count Estimates for Crop   Clover        Corn      Cotton    Soybeans    Sugarbeets      Total   Rate              0.4545      0.1429      0.8333      0.5000        0.6667     0.5000   Priors            0.3056      0.1944      0.1667      0.1667        0.1667  
end example
 
Output 25.4.3: Misclassified Observations: Cross Validation
start example
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Linear Discriminant Function   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.CROPS   Cross   validation Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X   X)' COV    (X   X)   2 ln PRIOR   j          (X)j      (X)     (X)j              j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Crop   From Crop        Clover         Corn       Cotton      Soybeans      Sugarbeets       Total   Clover                4            3            1             0               3          11   36.36        27.27         9.09          0.00           27.27      100.00   Corn                  0            4            1             2               0           7   0.00        57.14        14.29         28.57            0.00      100.00   Cotton                3            0            0             2               1           6   50.00         0.00         0.00         33.33           16.67      100.00   Soybeans              0            1            1             3               1           6   0.00        16.67        16.67         50.00           16.67      100.00   Sugarbeets            2            1            0             2               1           6   33.33        16.67         0.00         33.33           16.67      100.00   Total                 9            9            3             9               6          36   25.00        25.00         8.33         25.00           16.67      100.00   Priors          0.30556      0.19444      0.16667       0.16667         0.16667   Error Count Estimates for Crop   Clover        Corn      Cotton    Soybeans    Sugarbeets      Total   Rate              0.6364      0.4286      1.0000      0.5000        0.8333     0.6667   Priors            0.3056      0.1944      0.1667      0.1667        0.1667  
end example
 
  data crops;   title 'Discriminant Analysis of Remote Sensing Data   on Five Crops';   input Crop $ 4-13 x1-x4 xvalues $ 14-24;   datalines;   Corn      16 27 31 33   Corn      15 23 30 30   Corn      16 27 27 26   Corn      18 20 25 23   Corn      15 15 31 32   Corn      15 32 32 15   Corn      12 15 16 73   Soybeans  20 23 23 25   Soybeans  24 24 25 32   Soybeans  21 25 23 24   Soybeans  27 45 24 12   Soybeans  12 13 15 42   Soybeans  22 32 31 43   Cotton    31 32 33 34   Cotton    29 24 26 28   Cotton    34 32 28 45   Cotton    26 25 23 24   Cotton    53 48 75 26   Cotton    34 35 25 78   Sugarbeets22 23 25 42   Sugarbeets25 25 24 26   Sugarbeets34 25 16 52   Sugarbeets54 23 21 54   Sugarbeets25 43 32 15   Sugarbeets26 54  2 54   Clover    12 45 32 54   Clover    24 58 25 34   Clover    87 54 61 21   Clover    51 31 31 16   Clover    96 48 54 62   Clover    31 31 11 11   Clover    56 13 13 71   Clover    32 13 27 32   Clover    36 26 54 32   Clover    53 08 06 54   Clover    32 32 62 16   ;   proc discrim data=crops outstat=cropstat   method=normal pool=yes   list crossvalidate;   class Crop;   priors prop;   id xvalues;   var x1-x4;   title2 'Using Linear Discriminant Function';   run;  

Now use the calibration information stored in the Cropstat data set to classify a test data set. The TESTLIST option lists the classifi cation results for each observation in the test data set. The following statements produce Output 25.4.4 and Output 25.4.5:

  data test;   input Crop $ 110 x1x4 xvalues $ 1121;   datalines;   Corn      16 27 31 33   Soybeans  21 25 23 24   Cotton    29 24 26 28   Sugarbeets54 23 21 54   Clover    32 32 62 16   ;   proc discrim data=cropstat testdata=test testout=tout   testlist;   class Crop;   testid xvalues;   var x1x4;   title2 'Classification of Test Data';   run;   proc print data=tout;   title2 'Output Classification Results of Test Data';   run;  
Output 25.4.4: Classification of Test Data
start example
  Discriminant Analysis of Remote Sensing Data on Five Crops   Classification of Test Data   The DISCRIM Procedure   Classification Results for Test Data: WORK.TEST   Classification Results using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X)   j          j            j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Posterior Probability of Membership in Crop   Classified   xvalues      From Crop   into Crop         Clover        Corn      Cotton    Soybeans  Sugarbeets   16 27 31 33  Corn        Corn              0.0894      0.4054      0.1763      0.2392      0.0897   21 25 23 24  Soybeans    Soybeans          0.1481      0.2431      0.1200      0.3318      0.1570   29 24 26 28  Cotton      Soybeans   *      0.2521      0.1842      0.1529      0.2549      0.1559   54 23 21 54  Sugarbeets  Clover     *      0.6215      0.0194      0.1250      0.0496      0.1845   32 32 62 16  Clover      Cotton     *      0.2163      0.3180      0.3327      0.1125      0.0206   * Misclassified observation  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Classification of Test Data   The DISCRIM Procedure   Classification Summary for Test Data: WORK.TEST   Classification Summary using Linear Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X)   j          j            j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Crop   From Crop        Clover         Corn       Cotton      Soybeans      Sugarbeets       Total   Clover                0            0            1             0               0           1   0.00         0.00       100.00          0.00            0.00      100.00   Corn                  0            1            0             0               0           1   0.00       100.00         0.00          0.00            0.00      100.00   Cotton                0            0            0             1               0           1   0.00         0.00         0.00        100.00            0.00      100.00   Soybeans              0            0            0             1               0           1   0.00         0.00         0.00        100.00            0.00      100.00   Sugarbeets            1            0            0             0               0           1   100.00         0.00         0.00          0.00            0.00      100.00   Total                 1            1            1             2               0           5   20.00        20.00        20.00         40.00            0.00      100.00   Priors          0.30556      0.19444      0.16667       0.16667         0.16667   Error Count Estimates for Crop   Clover        Corn      Cotton    Soybeans    Sugarbeets      Total   Rate              1.0000      0.0000      1.0000      0.0000        1.0000     0.6389   Priors            0.3056      0.1944      0.1667      0.1667        0.1667  
end example
 
Output 25.4.5: Output Data Set of the Classification Results for Test Data
start example
  Discriminant Analysis of Remote Sensing Data on Five Crops   Output Classification Results of Test Data   Obs Crop       x1 x2 x3 x4   xvalues    Clover   Corn   Cotton Soybeans Sugarbeets _INTO_   1  Corn       16 27 31 33 16 27 31 33 0.08935 0.40543 0.17632  0.23918   0.08972  Corn   2  Soybeans   21 25 23 24 21 25 23 24 0.14811 0.24308 0.11999  0.33184   0.15698  Soybeans   3  Cotton     29 24 26 28 29 24 26 28 0.25213 0.18420 0.15294  0.25486   0.15588  Soybeans   4  Sugarbeets 54 23 21 54 54 23 21 54 0.62150 0.01937 0.12498  0.04962   0.18452  Clover   5  Clover     32 32 62 16 32 32 62 16 0.21633 0.31799 0.33266  0.11246   0.02056  Cotton  
end example
 

Example 25.5. Quadratic Discriminant Analysis of Remote-Sensing Data on Crops

In this example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data of Example 25.4. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output 25.5.1:

  proc discrim data=crops   method=normal pool=no   crossvalidate;   class Crop;   priors prop;   id xvalues;   var x1-x4;   title2 'Using Quadratic Discriminant Function';   run;  
Output 25.5.1: Quadratic Discriminant Function on Crop Data
start example
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Quadratic Discriminant Function   The DISCRIM Procedure   Observations      36          DF Total                35   Variables          4          DF Within Classes       31   Classes            5          DF Between Classes       4   Class Level Information   Variable                                                    Prior   Crop          Name          Frequency       Weight    Proportion    Probability   Clover        Clover               11      11.0000      0.305556       0.305556   Corn          Corn                  7       7.0000      0.194444       0.194444   Cotton        Cotton                6       6.0000      0.166667       0.166667   Soybeans      Soybeans              6       6.0000      0.166667       0.166667   Sugarbeets    Sugarbeets            6       6.0000      0.166667       0.166667  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Quadratic Discriminant Function   The DISCRIM Procedure   Within Covariance Matrix Information   Natural Log of the   Covariance    Determinant of the   Crop          Matrix Rank     Covariance Matrix   Clover                  4              23.64618   Corn                    4              11.13472   Cotton                  4              13.23569   Soybeans                4              12.45263   Sugarbeets              4              17.76293  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Quadratic Discriminant Function   The DISCRIM Procedure   Pairwise Generalized Squared Distances Between Groups   2         _   _   1  _   _   D (ij) = (X   X)' COV   (X   X) + ln COV   2 ln PRIOR   i   j      j    i   j           j              j   Generalized Squared Distance to Crop   From Crop         Clover          Corn        Cotton      Soybeans    Sugarbeets   Clover          26.01743          1320     104.18297     194.10546      31.40816   Corn            27.73809      14.40994     150.50763      38.36252      25.55421   Cotton          26.38544     588.86232      16.81921      52.03266      37.15560   Soybeans        27.07134      46.42131      41.01631      16.03615      23.15920   Sugarbeets      26.80188     332.11563      43.98280     107.95676      21.34645  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Quadratic Discriminant Function   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.CROPS   Resubstitution Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1   _   D (X) = (X   X)' COV  (X   X) + ln COV   2 ln PRIOR   j          j      j     j           j              j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Crop   From Crop        Clover         Corn       Cotton      Soybeans      Sugarbeets       Total   Clover                9            0            0             0               2          11   81.82         0.00         0.00          0.00           18.18      100.00   Corn                  0            7            0             0               0           7   0.00       100.00         0.00          0.00            0.00      100.00   Cotton                0            0            6             0               0           6   0.00         0.00       100.00          0.00            0.00      100.00   Soybeans              0            0            0             6               0           6   0.00         0.00         0.00        100.00            0.00      100.00   Sugarbeets            0            0            1             1               4           6   0.00         0.00        16.67         16.67           66.67      100.00   Total                 9            7            7             7               6          36   25.00        19.44        19.44         19.44           16.67      100.00   Priors          0.30556      0.19444      0.16667       0.16667         0.16667   Error Count Estimates for Crop   Clover        Corn      Cotton    Soybeans    Sugarbeets      Total   Rate              0.1818      0.0000      0.0000      0.0000        0.3333     0.1111   Priors            0.3056      0.1944      0.1667      0.1667        0.1667  
  Discriminant Analysis of Remote Sensing Data on Five Crops   Using Quadratic Discriminant Function   The DISCRIM Procedure   Classification Summary for Calibration Data: WORK.CROPS   Cross   validation Summary using Quadratic Discriminant Function   Generalized Squared Distance Function   2         _   1     _   D (X) = (X   X)' COV    (X   X) + ln COV   2 ln PRIOR   j          (X)j      (X)j    (X)j           (X)j              j   Posterior Probability of Membership in Each Crop   2                    2   Pr(jX) = exp(   .5 D (X)) / SUM exp(   .5 D (X))   j        k           k   Number of Observations and Percent Classified into Crop   From Crop        Clover         Corn       Cotton      Soybeans      Sugarbeets       Total   Clover                9            0            0             0               2          11   81.82         0.00         0.00          0.00           18.18      100.00   Corn                  3            2            0             0               2           7   42.86        28.57         0.00          0.00           28.57      100.00   Cotton                3            0            2             0               1           6   50.00         0.00        33.33          0.00           16.67      100.00   Soybeans              3            0            0             2               1           6   50.00         0.00         0.00         33.33           16.67      100.00   Sugarbeets            3            0            1             1               1           6   50.00         0.00        16.67         16.67           16.67      100.00   Total                21            2            3             3               7          36   58.33         5.56         8.33          8.33           19.44      100.00   Priors          0.30556      0.19444      0.16667       0.16667         0.16667   Error Count Estimates for Crop   Clover        Corn      Cotton    Soybeans    Sugarbeets      Total   Rate              0.1818      0.7143      0.6667      0.6667        0.8333     0.5556   Priors            0.3056      0.1944      0.1667      0.1667        0.1667  
end example
 



SAS.STAT 9.1 Users Guide (Vol. 2)
SAS/STAT 9.1 Users Guide Volume 2 only
ISBN: B003ZVJDOK
EAN: N/A
Year: 2004
Pages: 92

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net