The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on fifty iris specimens from each of three species: Iris setosa, I. versicolor , and I. virginica .
proc format; value specname 1='Setosa ' 2='Versicolor' 3='Virginica '; data iris; title 'Fisher (1936) Iris Data'; input SepalLength SepalWidth PetalLength PetalWidth Species @@; format Species specname.; label SepalLength='Sepal Length in mm.' SepalWidth ='Sepal Width in mm.' PetalLength='Petal Length in mm.' PetalWidth ='Petal Width in mm.'; datalines; 50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3 63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2 59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2 65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3 68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3 77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3 49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2 64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3 55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1 49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1 67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1 77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2 50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1 61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1 61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1 51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1 51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1 46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1 50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3 57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1 71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3 49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1 49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1 66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1 44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2 47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2 74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1 56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1 56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2 51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3 54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3 61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3 68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1 45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1 55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1 51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2 63 33 60 25 3 53 37 15 02 1 ;
A stepwise discriminant analysis is performed using stepwise selection.
In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 67.1.1 through Output 67.1.8:
proc stepdisc data=iris bsscp tsscp; class Species; var SepalLength SepalWidth PetalLength PetalWidth; run;
Fisher (1936) Iris Data The STEPDISC Procedure The Method for Selecting Variables is STEPWISE Observations 150 Variable(s) in the Analysis 4 Class Levels 3 Variable(s) will be Included 0 Significance Level to Enter 0.15 Significance Level to Stay 0.15 Class Level Information Variable Species Name Frequency Weight Proportion Setosa Setosa 50 50.0000 0.333333 Versicolor Versicolor 50 50.0000 0.333333 Virginica Virginica 50 50.0000 0.333333
Fisher (1936) Iris Data The STEPDISC Procedure Between-Class SSCP Matrix Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 6321.21333 1995.26667 16524.84000 7127.93333 SepalWidth Sepal Width in mm. 1995.26667 1134.49333 5723.96000 2293.26667 PetalLength Petal Length in mm. 16524.84000 5723.96000 43710.28000 18677.40000 PetalWidth Petal Width in mm. 7127.93333 2293.26667 18677.40000 8041.33333 Total-Sample SSCP Matrix Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 10216.83333 632.26667 18987.30000 7692.43333 SepalWidth Sepal Width in mm. 632.26667 2830.69333 4911.88000 1812.42667 PetalLength Petal Length in mm. 18987.30000 4911.88000 46432.54000 19304.58000 PetalWidth Petal Width in mm. 7692.43333 1812.42667 19304.58000 8656.99333
Fisher (1936) Iris Data The STEPDISC Procedure Stepwise Selection: Step 1 Statistics for Entry, DF = 2, 147 Variable Label R-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.6187 119.26 <.0001 1.0000 SepalWidth Sepal Width in mm. 0.4008 49.16 <.0001 1.0000 PetalLength Petal Length in mm. 0.9414 1180.16 <.0001 1.0000 PetalWidth Petal Width in mm. 0.9289 960.01 <.0001 1.0000 Variable PetalLength will be entered. Variable(s) that have been Entered PetalLength Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.058628 1180.16 2 147 <.0001 Pillai's Trace 0.941372 1180.16 2 147 <.0001 Average Squared Canonical Correlation 0.470686
Fisher (1936) Iris Data The STEPDISC Procedure Stepwise Selection: Step 2 Statistics for Removal, DF = 2, 147 Variable Label R-Square F Value Pr > F PetalLength Petal Length in mm. 0.9414 1180.16 <.0001 No variables can be removed. Statistics for Entry, DF = 2, 146 Partial Variable Label R-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.3198 34.32 <.0001 0.2400 SepalWidth Sepal Width in mm. 0.3709 43.04 <.0001 0.8164 PetalWidth Petal Width in mm. 0.2533 24.77 <.0001 0.0729 Variable SepalWidth will be entered. Variable(s) that have been Entered SepalWidth PetalLength Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.036884 307.10 4 292 <.0001 Pillai's Trace 1.119908 93.53 4 294 <.0001 Average Squared Canonical Correlation 0.559954
Fisher (1936) Iris Data The STEPDISC Procedure Stepwise Selection: Step 3 Statistics for Removal, DF = 2, 146 Partial Variable Label R-Square F Value Pr > F SepalWidth Sepal Width in mm. 0.3709 43.04 <.0001 PetalLength Petal Length in mm. 0.9384 1112.95 <.0001 No variables can be removed. Statistics for Entry, DF = 2, 145 Partial Variable Label R-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.1447 12.27 <.0001 0.1323 PetalWidth Petal Width in mm. 0.3229 34.57 <.0001 0.0662 Variable PetalWidth will be entered. Variable(s) that have been Entered SepalWidth PetalLength PetalWidth Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.024976 257.50 6 290 <.0001 Pillai's Trace 1.189914 71.49 6 292 <.0001 Average Squared Canonical Correlation 0.594957
Fisher (1936) Iris Data The STEPDISC Procedure Stepwise Selection: Step 4 Statistics for Removal, DF = 2, 145 Partial Variable Label R-Square F Value Pr > F SepalWidth Sepal Width in mm. 0.4295 54.58 <.0001 PetalLength Petal Length in mm. 0.3482 38.72 <.0001 PetalWidth Petal Width in mm. 0.3229 34.57 <.0001 No variables can be removed. Statistics for Entry, DF = 2, 144 Partial Variable Label R-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.0615 4.72 0.0103 0.0320 Variable SepalLength will be entered. All variables have been entered. Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.023439 199.15 8 288 <.0001 Pillai's Trace 1.191899 53.47 8 290 <.0001 Average Squared Canonical Correlation 0.595949
Fisher (1936) Iris Data The STEPDISC Procedure Stepwise Selection: Step 5 Statistics for Removal, DF = 2, 144 Partial Variable Label R-Square F Value Pr > F SepalLength Sepal Length in mm. 0.0615 4.72 0.0103 SepalWidth Sepal Width in mm. 0.2335 21.94 <.0001 PetalLength Petal Length in mm. 0.3308 35.59 <.0001 PetalWidth Petal Width in mm. 0.2570 24.90 <.0001 No variables can be removed. No further steps are possible.
Fisher (1936) Iris Data The STEPDISC Procedure Stepwise Selection Summary Average Squared Number Partial Wilks' Pr < Canonical Pr > Step In Entered Removed Label R-Square F Value Pr > F Lambda Lambda Correlation ASCC 1 1 PetalLength Petal Length in mm. 0.9414 1180.16 <.0001 0.05862828 <.0001 0.47068586 <.0001 2 2 SepalWidth Sepal Width in mm. 0.3709 43.04 <.0001 0.03688411 <.0001 0.55995394 <.0001 3 3 PetalWidth Petal Width in mm. 0.3229 34.57 <.0001 0.02497554 <.0001 0.59495691 <.0001 4 4 SepalLength Sepal Length in mm. 0.0615 4.72 0.0103 0.02343863 <.0001 0.59594941 <.0001
In Step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. Variable PetalLength is selected because its F statistic, 1180.161, is the largest among all variables.
In Step 2, with variable PetalLength already in the model, PetalLength is tested for removal before selecting a new variable for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. Variable SepalWidth is selected because its F statistic, 43.035, is the largest among all variables not in the model and its associated tolerance, 0.8164, meets the criterion to enter. The process is repeated in Steps 3 and 4. Variable PetalWidth is entered in Step 3, and variable SepalLength is entered in Step 4.
Since no more variables can be added to or removed from the model, the procedure stops at Step 5 and displays a summary of the selection process.