Getting Started


The data in this example are measurements on 159 fish caught in Finland's lake Laengelmavesi; this data set is available from the Data Archive of the Journal of Statistics Education . For each of the seven species (bream, parkki, pike, perch, roach, smelt, and whitefish), the weight, length, height, and the width of each fish are tallied. Three different length measurements are recorded: from the nose of the fish to the beginning of its tail, from the nose to the notch of its tail, and from the nose to the end of its tail. The height and width are recorded as percentages of the third length variable. PROC STEPDISC will select a subset of the six quantitative variables that may be useful for differentiating between the fish species. This subset is used in conjunction with PROC CANDISC and PROC DISCRIM to develop discrimination models.

The following program creates the data set fish and uses PROC STEPDISC to select a subset of potential discriminator variables. By default, PROC STEPDISC uses stepwise selection on all numeric variables that are not listed in other statements, and the significance levels for a variable to enter the subset and to stay in the subset are set to 0.15.

  proc format;   value specfmt   1='Bream'   2='Roach'   3='Whitefish'   4='Parkki'   5='Perch'   6='Pike'   7='Smelt';   data fish (drop=HtPct WidthPct);   title 'Fish Measurement Data';   input Species Weight Length1 Length2 Length3 HtPct WidthPct @@;   Height=HtPct*Length3/100;   Width=WidthPct*Length3/100;   format Species specfmt.;   datalines;   1  242.0 23.2 25.4 30.0 38.4 13.4 1  290.0 24.0 26.3 31.2 40.0 13.8   1  340.0 23.9 26.5 31.1 39.8 15.1 1  363.0 26.3 29.0 33.5 38.0 13.3   1  430.0 26.5 29.0 34.0 36.6 15.1 1  450.0 26.8 29.7 34.7 39.2 14.2   1  500.0 26.8 29.7 34.5 41.1 15.3 1  390.0 27.6 30.0 35.0 36.2 13.4   1  450.0 27.6 30.0 35.1 39.9 13.8 1  500.0 28.5 30.7 36.2 39.3 13.7   1  475.0 28.4 31.0 36.2 39.4 14.1 1  500.0 28.7 31.0 36.2 39.7 13.3   1  500.0 29.1 31.5 36.4 37.8 12.0 1     .  29.5 32.0 37.3 37.3 13.6   1  600.0 29.4 32.0 37.2 40.2 13.9 1  600.0 29.4 32.0 37.2 41.5 15.0   1  700.0 30.4 33.0 38.3 38.8 13.8 1  700.0 30.4 33.0 38.5 38.8 13.5   1  610.0 30.9 33.5 38.6 40.5 13.3 1  650.0 31.0 33.5 38.7 37.4 14.8   1  575.0 31.3 34.0 39.5 38.3 14.1 1  685.0 31.4 34.0 39.2 40.8 13.7   1  620.0 31.5 34.5 39.7 39.1 13.3 1  680.0 31.8 35.0 40.6 38.1 15.1   1  700.0 31.9 35.0 40.5 40.1 13.8 1  725.0 31.8 35.0 40.9 40.0 14.8   1  720.0 32.0 35.0 40.6 40.3 15.0 1  714.0 32.7 36.0 41.5 39.8 14.1   1  850.0 32.8 36.0 41.6 40.6 14.9 1 1000.0 33.5 37.0 42.6 44.5 15.5   1  920.0 35.0 38.5 44.1 40.9 14.3 1  955.0 35.0 38.5 44.0 41.1 14.3   1  925.0 36.2 39.5 45.3 41.4 14.9 1  975.0 37.4 41.0 45.9 40.6 14.7   1  950.0 38.0 41.0 46.5 37.9 13.7   2   40.0 12.9 14.1 16.2 25.6 14.0 2   69.0 16.5 18.2 20.3 26.1 13.9   2   78.0 17.5 18.8 21.2 26.3 13.7 2   87.0 18.2 19.8 22.2 25.3 14.3   2  120.0 18.6 20.0 22.2 28.0 16.1 2    0.0 19.0 20.5 22.8 28.4 14.7   2  110.0 19.1 20.8 23.1 26.7 14.7 2  120.0 19.4 21.0 23.7 25.8 13.9   2  150.0 20.4 22.0 24.7 23.5 15.2 2  145.0 20.5 22.0 24.3 27.3 14.6   2  160.0 20.5 22.5 25.3 27.8 15.1 2  140.0 21.0 22.5 25.0 26.2 13.3   2  160.0 21.1 22.5 25.0 25.6 15.2 2  169.0 22.0 24.0 27.2 27.7 14.1   2  161.0 22.0 23.4 26.7 25.9 13.6 2  200.0 22.1 23.5 26.8 27.6 15.4   2  180.0 23.6 25.2 27.9 25.4 14.0 2  290.0 24.0 26.0 29.2 30.4 15.4   2  272.0 25.0 27.0 30.6 28.0 15.6 2  390.0 29.5 31.7 35.0 27.1 15.3   3  270.0 23.6 26.0 28.7 29.2 14.8 3  270.0 24.1 26.5 29.3 27.8 14.5   3  306.0 25.6 28.0 30.8 28.5 15.2 3  540.0 28.5 31.0 34.0 31.6 19.3   3  800.0 33.7 36.4 39.6 29.7 16.6 3 1000.0 37.3 40.0 43.5 28.4 15.0   4   55.0 13.5 14.7 16.5 41.5 14.1 4   60.0 14.3 15.5 17.4 37.8 13.3   4   90.0 16.3 17.7 19.8 37.4 13.5 4  120.0 17.5 19.0 21.3 39.4 13.7   4  150.0 18.4 20.0 22.4 39.7 14.7 4  140.0 19.0 20.7 23.2 36.8 14.2   4  170.0 19.0 20.7 23.2 40.5 14.7 4  145.0 19.8 21.5 24.1 40.4 13.1   4  200.0 21.2 23.0 25.8 40.1 14.2 4  273.0 23.0 25.0 28.0 39.6 14.8   4  300.0 24.0 26.0 29.0 39.2 14.6   5    5.9  7.5  8.4  8.8 24.0 16.0 5   32.0 12.5 13.7 14.7 24.0 13.6   5   40.0 13.8 15.0 16.0 23.9 15.2 5   51.5 15.0 16.2 17.2 26.7 15.3   5   70.0 15.7 17.4 18.5 24.8 15.9 5  100.0 16.2 18.0 19.2 27.2 17.3   5   78.0 16.8 18.7 19.4 26.8 16.1 5   80.0 17.2 19.0 20.2 27.9 15.1   5   85.0 17.8 19.6 20.8 24.7 14.6 5   85.0 18.2 20.0 21.0 24.2 13.2   5  110.0 19.0 21.0 22.5 25.3 15.8 5  115.0 19.0 21.0 22.5 26.3 14.7   5  125.0 19.0 21.0 22.5 25.3 16.3 5  130.0 19.3 21.3 22.8 28.0 15.5   5  120.0 20.0 22.0 23.5 26.0 14.5 5  120.0 20.0 22.0 23.5 24.0 15.0   5  130.0 20.0 22.0 23.5 26.0 15.0 5  135.0 20.0 22.0 23.5 25.0 15.0   5  110.0 20.0 22.0 23.5 23.5 17.0 5  130.0 20.5 22.5 24.0 24.4 15.1   5  150.0 20.5 22.5 24.0 28.3 15.1 5  145.0 20.7 22.7 24.2 24.6 15.0   5  150.0 21.0 23.0 24.5 21.3 14.8 5  170.0 21.5 23.5 25.0 25.1 14.9   5  225.0 22.0 24.0 25.5 28.6 14.6 5  145.0 22.0 24.0 25.5 25.0 15.0   5  188.0 22.6 24.6 26.2 25.7 15.9 5  180.0 23.0 25.0 26.5 24.3 13.9   5  197.0 23.5 25.6 27.0 24.3 15.7 5  218.0 25.0 26.5 28.0 25.6 14.8   5  300.0 25.2 27.3 28.7 29.0 17.9 5  260.0 25.4 27.5 28.9 24.8 15.0   5  265.0 25.4 27.5 28.9 24.4 15.0 5  250.0 25.4 27.5 28.9 25.2 15.8   5  250.0 25.9 28.0 29.4 26.6 14.3 5  300.0 26.9 28.7 30.1 25.2 15.4   5  320.0 27.8 30.0 31.6 24.1 15.1 5  514.0 30.5 32.8 34.0 29.5 17.7   5  556.0 32.0 34.5 36.5 28.1 17.5 5  840.0 32.5 35.0 37.3 30.8 20.9   5  685.0 34.0 36.5 39.0 27.9 17.6 5  700.0 34.0 36.0 38.3 27.7 17.6   5  700.0 34.5 37.0 39.4 27.5 15.9 5  690.0 34.6 37.0 39.3 26.9 16.2   5  900.0 36.5 39.0 41.4 26.9 18.1 5  650.0 36.5 39.0 41.4 26.9 14.5   5  820.0 36.6 39.0 41.3 30.1 17.8 5  850.0 36.9 40.0 42.3 28.2 16.8   5  900.0 37.0 40.0 42.5 27.6 17.0 5 1015.0 37.0 40.0 42.4 29.2 17.6   5  820.0 37.1 40.0 42.5 26.2 15.6 5 1100.0 39.0 42.0 44.6 28.7 15.4   5 1000.0 39.8 43.0 45.2 26.4 16.1 5 1100.0 40.1 43.0 45.5 27.5 16.3   5 1000.0 40.2 43.5 46.0 27.4 17.7 5 1000.0 41.1 44.0 46.6 26.8 16.3   6  200.0 30.0 32.3 34.8 16.0  9.7 6  300.0 31.7 34.0 37.8 15.1 11.0   6  300.0 32.7 35.0 38.8 15.3 11.3 6  300.0 34.8 37.3 39.8 15.8 10.1   6  430.0 35.5 38.0 40.5 18.0 11.3 6  345.0 36.0 38.5 41.0 15.6  9.7   6  456.0 40.0 42.5 45.5 16.0  9.5 6  510.0 40.0 42.5 45.5 15.0  9.8   6  540.0 40.1 43.0 45.8 17.0 11.2 6  500.0 42.0 45.0 48.0 14.5 10.2   6  567.0 43.2 46.0 48.7 16.0 10.0 6  770.0 44.8 48.0 51.2 15.0 10.5   6  950.0 48.3 51.7 55.1 16.2 11.2 6 1250.0 52.0 56.0 59.7 17.9 11.7   6 1600.0 56.0 60.0 64.0 15.0  9.6 6 1550.0 56.0 60.0 64.0 15.0  9.6   6 1650.0 59.0 63.4 68.0 15.9 11.0   7    6.7  9.3  9.8 10.8 16.1  9.7 7    7.5 10.0 10.5 11.6 17.0 10.0   7    7.0 10.1 10.6 11.6 14.9  9.9 7    9.7 10.4 11.0 12.0 18.3 11.5   7    9.8 10.7 11.2 12.4 16.8 10.3 7    8.7 10.8 11.3 12.6 15.7 10.2   7   10.0 11.3 11.8 13.1 16.9  9.8 7    9.9 11.3 11.8 13.1 16.9  8.9   7    9.8 11.4 12.0 13.2 16.7  8.7 7   12.2 11.5 12.2 13.4 15.6 10.4   7   13.4 11.7 12.4 13.5 18.0  9.4 7   12.2 12.1 13.0 13.8 16.5  9.1   7   19.7 13.2 14.3 15.2 18.9 13.6 7   19.9 13.8 15.0 16.2 18.1 11.6   ;   proc stepdisc data=fish;   class Species;   run;  

PROC STEPDISC begins by displaying summary information about the analysis; see Figure 67.1. This information includes the number of observations with nonmissing values, the number of classes in the classification variable (specified by the CLASS statement), the number of quantitative variables under consideration, the significance criteria for variables to enter and to stay in the model, and the method of variable selection being used. The frequency of each class is also displayed.

start figure
  Fish Measurement Data   The STEPDISC Procedure   The Method for Selecting Variables is STEPWISE   Observations       158          Variable(s) in the Analysis        6   Class Levels         7          Variable(s) will be Included       0   Significance Level to Enter     0.15   Significance Level to Stay      0.15   Class Level Information   Variable   Species      Name         Frequency       Weight    Proportion   Bream        Bream               34      34.0000      0.215190   Parkki       Parkki              11      11.0000      0.069620   Perch        Perch               56      56.0000      0.354430   Pike         Pike                17      17.0000      0.107595   Roach        Roach               20      20.0000      0.126582   Smelt        Smelt               14      14.0000      0.088608   Whitefish    Whitefish            6       6.0000      0.037975  
end figure

Figure 67.1: Summary Information

For each entry step, the statistics for entry are displayed for all variables not currently selected; see Figure 67.2. The variable selected to enter at this step (if any) is displayed, as well as all the variables currently selected. Next are multivariate statistics that take into account all previously selected variables and the newly entered variable.

start figure
  Fish Measurement Data   The STEPDISC Procedure   Stepwise Selection: Step 1   Statistics for Entry, DF = 6, 151   Variable    R-Square    F Value    Pr > F    Tolerance   Weight        0.3750      15.10    <.0001       1.0000   Length1       0.6017      38.02    <.0001       1.0000   Length2       0.6098      39.32    <.0001       1.0000   Length3       0.6280      42.49    <.0001       1.0000   Height        0.7553      77.69    <.0001       1.0000   Width         0.4806      23.29    <.0001       1.0000   Variable Height will be entered.   Variable(s) that have been Entered   Height   Multivariate Statistics   Statistic                                 Value F Value Num DF Den DF  Pr > F   Wilks' Lambda                          0.244670   77.69      6    151  <.0001   Pillai's Trace                         0.755330   77.69      6    151  <.0001   Average Squared Canonical              0.125888   Correlation  
end figure

Figure 67.2: Step 1: Variable HEIGHT Selected for Entry

For each removal step (Figure 67.3), the statistics for removal are displayed for all variables currently entered. The variable to be removed at this step (if any) is displayed. If no variable meets the criterion to be removed and the maximum number of steps as specified by the MAXSTEP= option has not been attained, then the procedure continues with another entry step.

start figure
  Fish Measurement Data   The STEPDISC Procedure   Stepwise Selection: Step 2   Statistics for Removal, DF = 6, 151   Variable    R-Square    F Value    Pr > F   Height        0.7553      77.69    <.0001   No variables can be removed.   Statistics for Entry, DF = 6, 150   Partial   Variable    R-Square    F Value    Pr > F    Tolerance   Weight        0.7388      70.71    <.0001       0.4690   Length1       0.9220     295.35    <.0001       0.6083   Length2       0.9229     299.31    <.0001       0.5892   Length3       0.9173     277.37    <.0001       0.5056   Width         0.8783     180.44    <.0001       0.3699   Variable Length2 will be entered.   Variable(s) that have been Entered   Length2 Height   Multivariate Statistics   Statistic                                 Value  F Value  Num DF  Den DF  Pr > F   Wilks' Lambda                          0.018861   157.04      12     300  <.0001   Pillai's Trace                         1.554349    87.78      12     302  <.0001   Average Squared Canonical              0.259058   Correlation  
end figure

Figure 67.3: Step 2: No Variable is Removed; Variable Length1 Added

The stepwise procedure terminates either when no variable can be removed and no variable can be entered or when the maximum number of steps as specified by the MAXSTEP= option has been attained. In this example at Step 7 no variables can be either removed or entered (Figure 67.4). Steps 3 through 6 are not displayed in this document.

start figure
  Fish Measurement Data   The STEPDISC Procedure   Stepwise Selection: Step 7   Statistics for Removal, DF = 6, 146   Partial   Variable    R-Square    F Value    Pr > F   Weight        0.4521      20.08    <.0001   Length1       0.2987      10.36    <.0001   Length2       0.5250      26.89    <.0001   Length3       0.7948      94.25    <.0001   Height        0.7257      64.37    <.0001   Width         0.5757      33.02    <.0001   No variables can be removed.   No further steps are possible.  
end figure

Figure 67.4: Step 7: No Variables Entered or Removed

PROC STEPDISC ends by displaying a summary of the steps.

start figure
  Fish Measurement Data   The STEPDISC Procedure   Stepwise Selection Summary   Average   Squared   Number                     Partial                       Wilks'    Pr <    Canonical    Pr >   Step      In  Entered  Removed  R-Square  F Value  Pr > F      Lambda  Lambda  Correlation    ASCC   1       1  Height              0.7553    77.69  <.0001  0.24466983  <.0001   0.12588836  <.0001   2       2  Length2             0.9229   299.31  <.0001  0.01886065  <.0001   0.25905822  <.0001   3       3  Length3             0.8826   186.77  <.0001  0.00221342  <.0001   0.38427100  <.0001   4       4  Width               0.5775    33.72  <.0001  0.00093510  <.0001   0.45200732  <.0001   5       5  Weight              0.4461    19.73  <.0001  0.00051794  <.0001   0.49488458  <.0001   6       6  Length1             0.2987    10.36  <.0001  0.00036325  <.0001   0.51744189  <.0001  
end figure

Figure 67.5: Step Summary

All the variables in the data set are found to have potential discriminatory power. These variables are used to develop discrimination models in both the CANDISC and DISCRIM procedure chapters.




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net