Getting Started


The data in this example are measurements on 159 fish caught in Finland's lake Laengelmavesi. The species, weight, three different length measurements, height, and width of each fish is tallied. The complete data set is displayed in Chapter 67, 'The STEPDISC Procedure.' The STEPDISC procedure identified all the variables as significant indicators of the differences among the seven fish species.

  proc format;   value specfmt   1='Bream'   2='Roach'   3='Whitefish'   4='Parkki'   5='Perch'   6='Pike'   7='Smelt';   data fish (drop=HtPct WidthPct);   title 'Fish Measurement Data';   input Species Weight Length1 Length2 Length3 HtPct   WidthPct @@;   Height=HtPct*Length3/100;   Width=WidthPct*Length3/100;   format Species specfmt.;   symbol = put(Species, specfmt2.);   datalines;   1  242.0 23.2 25.4 30.0 38.4 13.4   1  290.0 24.0 26.3 31.2 40.0 13.8   1  340.0 23.9 26.5 31.1 39.8 15.1   1  363.0 26.3 29.0 33.5 38.0 13.3    ...[155 more records]    ;  

The following program uses PROC CANDISC to find the three canonical variables that best separate the species of fish in the fish data and creates the output data set outcan . The NCAN= option is used to request that only the first three canonical variables are displayed. The %PLOTIT macro is invoked to create a plot of the first two canonical variables. See Appendix B, 'Using the %PLOTIT Macro,' for more information on the % PLOTIT macro.

  proc candisc data=fish ncan=3 out=outcan;   class Species;   var Weight Length1 Length2 Length3 Height Width;   run;   %plotit(data=outcan, plotvars=Can2 Can1,   labelvar=_blank_, symvar=symbol, typevar=symbol,   symsize=1, symlen=4, tsize=1.5, exttypes=symbol, ls=100,   plotopts=vaxis=-5 to 15 by 5, vtoh=, extend=close);  

PROC CANDISC begins by displaying summary information about the variables in the analysis. This information includes the number of observations, the number of quantitative variables in the analysis (specified with the VAR statement), and the number of classes in the classification variable (specified with the CLASS statement). The frequency of each class is also displayed.

start figure
  Fish Measurement Data   The CANDISC Procedure   Observations     158          DF Total               157   Variables          6          DF Within Classes      151   Classes            7          DF Between Classes       6   Class Level Information   Variable   Species      Name         Frequency       Weight    Proportion   Bream        Bream               34      34.0000      0.215190   Parkki       Parkki              11      11.0000      0.069620   Perch        Perch               56      56.0000      0.354430   Pike         Pike                17      17.0000      0.107595   Roach        Roach               20      20.0000      0.126582   Smelt        Smelt               14      14.0000      0.088608   Whitefish    Whitefish            6       6.0000      0.037975  
end figure

Figure 21.1: Summary Information

PROC CANDISC performs a multivariate one-way analysis of variance (one-way MANOVA) and provides four multivariate tests of the hypothesis that the class mean vectors are equal. These tests, shown in Figure 21.2, indicate that not all of the mean vectors are equal ( p < .0001).

start figure
  Fish Measurement Data   The CANDISC Procedure   Multivariate Statistics and F Approximations   S=6    M=-0.5    N=72   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.00036325      90.71        36    643.89    <.0001   Pillai's Trace              3.10465132      26.99        36       906    <.0001   Hotelling-Lawley Trace     52.05799676     209.24        36    413.64    <.0001   Roy's Greatest Root        39.13499776     984.90         6       151    <.0001   NOTE: F Statistic for Roy's Greatest Root is an upper bound.  
end figure

Figure 21.2: MANOVA and Multivariate Tests

The first canonical correlation is the greatest possible multiple correlation with the classes that can be achieved using a linear combination of the quantitative variables. The first canonical correlation, displayed in Figure 21.3, is 0.987463.

start figure
  Fish Measurement Data   The CANDISC Procedure   Adjusted    Approximate        Squared   Canonical      Canonical       Standard      Canonical   Correlation    Correlation          Error    Correlation   1       0.987463       0.986671       0.001989       0.975084   2       0.952349       0.950095       0.007425       0.906969   3       0.838637       0.832518       0.023678       0.703313   4       0.633094       0.623649       0.047821       0.400809   5       0.344157       0.334170       0.070356       0.118444   6       0.005701        .             0.079806       0.000033  
end figure

Figure 21.3: Canonical Correlations

A likelihood ratio test is displayed of the hypothesis that the current canonical correlation and all smaller ones are zero. The first line is equivalent to Wilks' Lambda multivariate test.

start figure
  Test of H0: The canonical correlations in the   current row and all that follow are zero   Likelihood    Approximate   Ratio        F Value    Num DF    Den DF    Pr > F   1    0.00036325          90.71        36    643.89    <.0001   2    0.01457896          46.46        25    547.58    <.0001   3    0.15671134          23.61        16    452.79    <.0001   4    0.52820347          12.09         9    362.78    <.0001   5    0.88152702           4.88         4       300    0.0008   6    0.99996749           0.00         1       151    0.9442  
end figure

Figure 21.4: Likelihood Ratio Test

The first canonical variable, Can1 , shows that the linear combination of the centered variables Can1 = ˆ’ 0.0006 — Weight ˆ’ 0.33 — Length1 ˆ’ 2.49 — Length2 + 2.60 — Length3 + 1.12 — Height ˆ’ 1.45 — Width separates the species most effectively (see Figure 21.5).

start figure
  Fish Measurement Data   The CANDISC Procedure   Raw Canonical Coefficients   Variable              Can1              Can2              Can3   Weight   0.000648508   0.005231659   0.005596192   Length1   0.329435762   0.626598051   2.934324102   Length2   2.486133674   0.690253987       4.045038893   Length3        2.595648437       1.803175454   1.139264914   Height         1.121983854   0.714749340       0.283202557   Width   1.446386704   0.907025481       0.741486686  
end figure

Figure 21.5: Raw Canonical Coefficients

PROC CANDISC computes the means of the canonical variables for each class. The first canonical variable is the linear combination of the variables Weight , Length1 , Length2 , Length3 , Height ,and Width that provides the greatest difference (in terms of a univariate F -test) between the class means. The second canonical variable provides the greatest difference between class means while being uncorrelated with the first canonical variable.

start figure
  Fish Measurement Data   The CANDISC Procedure   Class Means on Canonical Variables   Species                Can1              Can2              Can3   Bream           10.94142464        0.52078394        0.23496708   Parkki           2.58903743   2.54722416   0.49326158   Perch   4.47181389   1.70822715        1.29281314   Pike   4.89689441        8.22140791   0.16469132   Roach   0.35837149        0.08733611   1.10056438   Smelt   4.09136653   2.35805841   4.03836098   Whitefish   0.39541755   0.42071778        1.06459242  
end figure

Figure 21.6: Class Means for Canonical Variables

A plot of the first two canonical variables (Figure 21.7) shows that Can1 discriminates between three groups: 1) bream; 2) whitefish, roach, and parkki; and 3) smelt, pike, and perch. Can2 best discriminates between pike and the other species.

click to expand
Figure 21.7: Plot of First Two Canonical Variables



SAS.STAT 9.1 Users Guide (Vol. 1)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net