Example


Example 67.1. Performing a Stepwise Discriminant Analysis

The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on fifty iris specimens from each of three species: Iris setosa, I. versicolor , and I. virginica .

  proc format;   value specname   1='Setosa    '   2='Versicolor'   3='Virginica ';   data iris;   title 'Fisher (1936) Iris Data';   input SepalLength SepalWidth PetalLength PetalWidth   Species @@;   format Species specname.;   label SepalLength='Sepal Length in mm.'   SepalWidth ='Sepal Width in mm.'   PetalLength='Petal Length in mm.'   PetalWidth ='Petal Width in mm.';   datalines;   50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3   63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2   59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2   65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3   68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3   77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3   49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2   64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3   55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1   49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1   67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1   77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2   50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1   61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1   61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1   51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1   51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1   46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1   50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3   57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1   71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3   49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1   49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1   66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1   44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2   47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2   74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1   56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3  
  49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1   56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2   51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3   54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3   61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3   68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1   45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1   55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1   51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2   63 33 60 25 3 53 37 15 02 1   ;  

A stepwise discriminant analysis is performed using stepwise selection.

In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 67.1.1 through Output 67.1.8:

  proc stepdisc data=iris bsscp tsscp;   class Species;   var SepalLength SepalWidth PetalLength PetalWidth;   run;  
Output 67.1.1: Iris Data: Summary Information
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   The Method for Selecting Variables is STEPWISE   Observations       150          Variable(s) in the Analysis        4   Class Levels         3          Variable(s) will be Included       0   Significance Level to Enter     0.15   Significance Level to Stay      0.15   Class Level Information   Variable   Species       Name          Frequency       Weight    Proportion   Setosa        Setosa               50      50.0000      0.333333   Versicolor    Versicolor           50      50.0000      0.333333   Virginica     Virginica            50      50.0000      0.333333  
end example
 
Output 67.1.2: Iris Data: Between-Class and Total-Sample SSCP Matrices
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Between-Class SSCP Matrix   Variable      Label                  SepalLength     SepalWidth    PetalLength    PetalWidth   SepalLength   Sepal Length in mm.     6321.21333   1995.26667    16524.84000    7127.93333   SepalWidth    Sepal Width in mm.   1995.26667     1134.49333   5723.96000   2293.26667   PetalLength   Petal Length in mm.    16524.84000   5723.96000    43710.28000   18677.40000   PetalWidth    Petal Width in mm.      7127.93333   2293.26667    18677.40000    8041.33333   Total-Sample SSCP Matrix   Variable      Label                  SepalLength     SepalWidth    PetalLength    PetalWidth   SepalLength   Sepal Length in mm.    10216.83333   632.26667    18987.30000    7692.43333   SepalWidth    Sepal Width in mm.   632.26667     2830.69333   4911.88000   1812.42667   PetalLength   Petal Length in mm.    18987.30000   4911.88000    46432.54000   19304.58000   PetalWidth    Petal Width in mm.      7692.43333   1812.42667    19304.58000    8656.99333  
end example
 
Output 67.1.3: Iris Data: Stepwise Selection Step 1
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Stepwise Selection: Step 1   Statistics for Entry, DF = 2, 147   Variable       Label                  R-Square    F Value    Pr > F    Tolerance   SepalLength    Sepal Length in mm.      0.6187     119.26    <.0001       1.0000   SepalWidth     Sepal Width in mm.       0.4008      49.16    <.0001       1.0000   PetalLength    Petal Length in mm.      0.9414    1180.16    <.0001       1.0000   PetalWidth     Petal Width in mm.       0.9289     960.01    <.0001       1.0000   Variable PetalLength will be entered.   Variable(s) that have been Entered   PetalLength   Multivariate Statistics   Statistic                                       Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda                                0.058628    1180.16         2       147    <.0001   Pillai's Trace                               0.941372    1180.16         2       147    <.0001   Average Squared Canonical Correlation        0.470686  
end example
 
Output 67.1.4: Iris Data: Stepwise Selection Step 2
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Stepwise Selection: Step 2   Statistics for Removal, DF = 2, 147   Variable       Label                  R-Square    F Value    Pr > F   PetalLength    Petal Length in mm.      0.9414    1180.16    <.0001   No variables can be removed.   Statistics for Entry, DF = 2, 146   Partial   Variable       Label                  R-Square    F Value    Pr > F    Tolerance   SepalLength    Sepal Length in mm.      0.3198      34.32    <.0001       0.2400   SepalWidth     Sepal Width in mm.       0.3709      43.04    <.0001       0.8164   PetalWidth     Petal Width in mm.       0.2533      24.77    <.0001       0.0729   Variable SepalWidth will be entered.   Variable(s) that have been Entered   SepalWidth PetalLength   Multivariate Statistics   Statistic                                       Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda                                0.036884     307.10         4       292    <.0001   Pillai's Trace                               1.119908      93.53         4       294    <.0001   Average Squared Canonical Correlation        0.559954  
end example
 
Output 67.1.5: Iris Data: Stepwise Selection Step 3
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Stepwise Selection: Step 3   Statistics for Removal, DF = 2, 146   Partial   Variable       Label                  R-Square    F Value    Pr > F   SepalWidth     Sepal Width in mm.       0.3709      43.04    <.0001   PetalLength    Petal Length in mm.      0.9384    1112.95    <.0001   No variables can be removed.   Statistics for Entry, DF = 2, 145   Partial   Variable       Label                  R-Square    F Value    Pr > F    Tolerance   SepalLength    Sepal Length in mm.      0.1447      12.27    <.0001       0.1323   PetalWidth     Petal Width in mm.       0.3229      34.57    <.0001       0.0662   Variable PetalWidth will be entered.   Variable(s) that have been Entered   SepalWidth PetalLength PetalWidth   Multivariate Statistics   Statistic                                       Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda                                0.024976     257.50         6       290    <.0001   Pillai's Trace                               1.189914      71.49         6       292    <.0001   Average Squared Canonical Correlation        0.594957  
end example
 
Output 67.1.6: Iris Data: Stepwise Selection Step 4
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Stepwise Selection: Step 4   Statistics for Removal, DF = 2, 145   Partial   Variable       Label                  R-Square    F Value    Pr > F   SepalWidth     Sepal Width in mm.       0.4295      54.58    <.0001   PetalLength    Petal Length in mm.      0.3482      38.72    <.0001   PetalWidth     Petal Width in mm.       0.3229      34.57    <.0001   No variables can be removed.   Statistics for Entry, DF = 2, 144   Partial   Variable       Label                  R-Square    F Value    Pr > F    Tolerance   SepalLength    Sepal Length in mm.      0.0615       4.72    0.0103       0.0320   Variable SepalLength will be entered.   All variables have been entered.   Multivariate Statistics   Statistic                                       Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda                                0.023439     199.15         8       288    <.0001   Pillai's Trace                               1.191899      53.47         8       290    <.0001   Average Squared Canonical Correlation        0.595949  
end example
 
Output 67.1.7: Iris Data: Stepwise Selection Step 5
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Stepwise Selection: Step 5   Statistics for Removal, DF = 2, 144   Partial   Variable       Label                  R-Square    F Value    Pr > F   SepalLength    Sepal Length in mm.      0.0615       4.72    0.0103   SepalWidth     Sepal Width in mm.       0.2335      21.94    <.0001   PetalLength    Petal Length in mm.      0.3308      35.59    <.0001   PetalWidth     Petal Width in mm.       0.2570      24.90    <.0001   No variables can be removed.   No further steps are possible.  
end example
 
Output 67.1.8: Iris Data: Stepwise Selection Summary
start example
  Fisher (1936) Iris Data   The STEPDISC Procedure   Stepwise Selection Summary   Average   Squared   Number                                              Partial                    Wilks'   Pr <   Canonical   Pr >   Step     In Entered     Removed     Label               R-Square F Value Pr > F     Lambda Lambda Correlation   ASCC   1      1 PetalLength             Petal Length in mm.   0.9414 1180.16 <.0001 0.05862828 <.0001  0.47068586 <.0001   2      2 SepalWidth              Sepal Width in mm.    0.3709   43.04 <.0001 0.03688411 <.0001  0.55995394 <.0001   3      3 PetalWidth              Petal Width in mm.    0.3229   34.57 <.0001 0.02497554 <.0001  0.59495691 <.0001   4      4 SepalLength             Sepal Length in mm.   0.0615    4.72 0.0103 0.02343863 <.0001  0.59594941 <.0001  
end example
 

In Step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. Variable PetalLength is selected because its F statistic, 1180.161, is the largest among all variables.

In Step 2, with variable PetalLength already in the model, PetalLength is tested for removal before selecting a new variable for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. Variable SepalWidth is selected because its F statistic, 43.035, is the largest among all variables not in the model and its associated tolerance, 0.8164, meets the criterion to enter. The process is repeated in Steps 3 and 4. Variable PetalWidth is entered in Step 3, and variable SepalLength is entered in Step 4.

Since no more variables can be added to or removed from the model, the procedure stops at Step 5 and displays a summary of the selection process.




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net