Examples | SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)

Example 32.1. Balanced Data from Randomized Complete Block with Means Comparisons and Contrasts

The following example, reported by Stenstrom (1940), analyzes an experiment to investigate how snapdragons grow in various soils. To eliminate the effect of local fertility variations, the experiment is run in blocks, with each soil type sampled in each block. Since these data are balanced, the Type I and Type III SS are the same and are equal to the traditional ANOVA SS.

First, the standard analysis is shown followed by an analysis that uses the SOLUTION option and includes MEANS and CONTRAST statements. The ORDER=DATA option in the second PROC GLM statement is used so that the ordering of coefficients in the CONTRAST statement can correspond to the ordering in the input data. The SOLUTION option requests a display of the parameter estimates, which are only produced by default if there are no CLASS variables . A MEANS statement is used to request a table of the means with two multiple comparison procedures requested . In experiments with focused treatment questions, CONTRAST statements are preferable to general means comparison methods . The following statements produce Output 32.1.1 through Output 32.1.6:

  title 'Balanced Data from Randomized Complete Block';   data plants;   input Type $ @;   do Block=1to3;   input StemLength @;   output;   end;   datalines;   Clarion  32.7 32.3 31.5   Clinton  32.1 29.7 29.1   Knox     35.7 35.9 33.1   O'Neill  36.0 34.2 31.2   Compost  31.8 28.0 29.2   Wabash   38.2 37.8 31.9   Webster  32.5 31.1 29.7   ;   proc glm;   class Block Type;   model StemLength = Block Type;   run;   proc glm order=data;   class Block Type;   model StemLength = Block Type / solution;   /*----------------------------------clrn-cltn-knox-onel-cpst-wbsh-wstr */   contrast 'Compost vs. others'  Type   1   1   1   1    6   1   1;   contrast 'River soils vs. non' Type   1   1   1   1    0    5   1,   Type   1    4   1   1    0    0   1;   contrast 'Glacial vs. drift'   Type   1    0    1    1    0    0   1;   contrast 'Clarion vs. Webster' Type   1    0    0    0    0    0    1;   contrast "Knox vs. O'Neill" Type       0    0    1   1    0    0    0;   run;   means Type / waller regwq;   run;

Output 32.1.1: Classes and Levels for Randomized Complete Blocks

  Balanced Data from Randomized Complete Block   The GLM Procedure   Class Level Information   Class         Levels    Values   Block              3    1 2 3   Type               7    Clarion Clinton Compost Knox ONeill Wabash Webster   Number of Observations Read          21   Number of Observations Used          21

Output 32.1.2: Analysis of Variance for Randomized Complete Blocks

  Balanced Data from Randomized Complete Block   The GLM Procedure   Dependent Variable: StemLength   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       8    142.1885714     17.7735714     10.80   0.0002   Error                      12     19.7428571      1.6452381   Corrected Total            20    161.9314286   R-Square     Coeff Var      Root MSE    StemLength Mean   0.878079      3.939745      1.282668           32.55714   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Block                       2     39.0371429     19.5185714     11.86   0.0014   Type                        6    103.1514286     17.1919048     10.45   0.0004   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Block                       2     39.0371429     19.5185714     11.86   0.0014   Type                        6    103.1514286     17.1919048     10.45   0.0004

Output 32.1.3: Standard Analysis Again

  Balanced Data from Randomized Complete Block   The GLM Procedure   Class Level Information   Class         Levels    Values   Block              3    1 2 3   Type               7    Clarion Clinton Compost Knox O'Neill Wabash Webster   Number of Observations Read          21   Number of Observations Used          21

Output 32.1.4: Contrasts and Solutions

  Balanced Data from Randomized Complete Block   The GLM Procedure   Dependent Variable: StemLength   Contrast                   DF    Contrast SS    Mean Square   F Value   Pr > F   Compost vs. others          1    29.24198413    29.24198413     17.77   0.0012   River soils vs. non         2    48.24694444    24.12347222     14.66   0.0006   Glacial vs. drift           1    22.14083333    22.14083333     13.46   0.0032   Clarion vs. Webster         1     1.70666667     1.70666667      1.04   0.3285   Knox vs. ONeill            1     1.81500000     1.81500000      1.10   0.3143   Standard   Parameter                 Estimate             Error    t Value    Pr > t   Intercept              29.35714286 B      0.83970354      34.96      <.0001   Block     1             3.32857143 B      0.68561507       4.85      0.0004   Block     2             1.90000000 B      0.68561507       2.77      0.0169   Block     3             0.00000000 B       .                .         .   Type      Clarion       1.06666667 B      1.04729432       1.02      0.3285   Type      Clinton   0.80000000 B      1.04729432   0.76      0.4597   Type      Knox          3.80000000 B      1.04729432       3.63      0.0035   Type      O'Neill       2.70000000 B      1.04729432       2.58      0.0242   Type      Compost   1.43333333 B      1.04729432   1.37      0.1962   Type      Wabash        4.86666667 B      1.04729432       4.65      0.0006   Type      Webster       0.00000000 B       .                .         .   NOTE: The X'X matrix has been found to be singular, and a generalized inverse   was used to solve the normal equations. Terms whose estimates are   followed by the letter 'B' are not uniquely estimable.

Output 32.1.5: Waller-Duncan tests

  Balanced Data from Randomized Complete Block   The GLM Procedure   Waller-Duncan K-ratio t Test for StemLength   NOTE: This test minimizes the Bayes risk under additive loss and certain other   assumptions.   Kratio                              100   Error Degrees of Freedom             12   Error Mean Square              1.645238   F Value                           10.45   Critical Value of t             2.12034   Minimum Significant Difference   2.2206   Means with the same letter are not significantly different.   Waller Grouping          Mean      N    Type   A        35.967      3    Wabash   A   A        34.900      3    Knox   A   B    A        33.800      3    O'Neill   B   B    C        32.167      3    Clarion   C   D    C        31.100      3    Webster   D    C   D    C        30.300      3    Clinton   D   D             29.667      3    Compost

Output 32.1.6: Ryan-Einot-Gabriel-Welsch Multiple Range Test

  Balanced Data from Randomized Complete Block   The GLM Procedure   Ryan-Einot-Gabriel-Welsch Multiple Range Test for StemLength   NOTE: This test controls the Type I experimentwise error rate.   Alpha                        0.05   Error Degrees of Freedom       12   Error Mean Square        1.645238   Number of Means         2         3         4         5         6         7   Critical Range  2.9876649 3.2838329 3.4396257 3.5402242 3.5402242 3.6653734   Means with the same letter are not significantly different.   REGWQ Grouping          Mean      N    Type   A             35.967      3    Wabash   A   B    A             34.900      3    Knox   B    A   B    A    C        33.800      3    ONeill   B         C   B    D    C        32.167      3    Clarion   D    C   D    C        31.100      3    Webster   D   D             30.300      3    Clinton   D   D             29.667      3    Compost

This analysis shows that the stem length is significantly different for the different soil types. In addition, there are significant differences in stem length between the three blocks in the experiment.

The GLM procedure is invoked again, this time with the ORDER=DATA option. This enables you to write accurate contrast statements more easily because you know the order SAS is using for the levels of the variable Type . The standard analysis is displayed again.

Output 32.1.4 shows the tests for contrasts that you specified as well as the estimated parameters. The contrast label, degrees of freedom, sum of squares, Mean Square, F Value, and Pr > F are shown for each contrast requested. In this example, the contrast results show that at the 5% significance level,

the stem length of plants grown in compost soil is significantly different from the average stem length of plants grown in other soils
the stem length of plants grown in river soils is significantly different from the average stem length of those grown in nonriver soils
the average stem length of plants grown in glacial soils (Clarion and Webster) is significantly different from the average stem length of those grown in drift soils (Knox and O Neill)
stem lengths for Clarion and Webster are not significantly different
stem lengths for Knox and O Neill are not significantly different

In addition to the estimates for the parameters of the model, the results of t tests about the parameters are also displayed. The ˜B following the parameter estimates indicates that the estimates are biased and do not represent a unique solution to the normal equations.

The final two pages of output ( Output 32.1.5 and Output 32.1.6) present results of the Waller-Duncan and REGWQ multiple comparison procedures. For each test, notes and information pertinent to the test are given on the output. The Type means are arranged from highest to lowest . Means with the same letter are not significantly different. For this example, while some pairs of means are significantly different, there are no clear equivalence classes among the different soils.

Example 32.2. Regression with Mileage Data

A car is tested for gas mileage at various speeds to determine at what speed the car achieves the greatest gas mileage. A quadratic model is fit to the experimental data. The following statements produce Output 32.2.1 through Output 32.2.5:

  title 'Gasoline Mileage Experiment';   data mileage;   input mph mpg @@;   datalines;   20 15.4   30 20.2   40 25.7   50 26.2  50 26.6  50 27.4   55   .   60 24.8   ;   proc glm;   model mpg=mph mph*mph / p clm;   output out=pp p=mpgpred r=resid;   axis1 minor=none major=(number=5);   axis2 minor=none major=(number=8);   symbol1 c=black i=none   v=plus;   symbol2 c=black i=spline v=none;   proc gplot data=pp;   plot mpg*mph=1 mpgpred*mph=2 / overlay haxis=axis1   vaxis=axis2;   run;

Output 32.2.1: Observations for Standard Regression Analysis

  Gasoline Mileage Experiment   The GLM Procedure   Number of Observations Read          8

Output 32.2.2: Standard Analysis of Variance for Regression

  Gasoline Mileage Experiment   The GLM Procedure   Dependent Variable: mpg   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       2    111.8086183     55.9043091     77.96   0.0006   Error                       4      2.8685246      0.7171311   Corrected Total             6    114.6771429   R-Square     Coeff Var      Root MSE      mpg Mean   0.974986      3.564553      0.846836      23.75714   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   mph                         1    85.64464286    85.64464286    119.43   0.0004   mph*mph                     1    26.16397541    26.16397541     36.48   0.0038   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   mph                         1    41.01171219    41.01171219     57.19   0.0016   mph*mph                     1    26.16397541    26.16397541     36.48   0.0038   Standard   Parameter         Estimate           Error    t Value    Pr > t   Intercept   5.985245902      3.18522249   1.88      0.1334   mph            1.305245902      0.17259876       7.56      0.0016   mph*mph   0.013098361      0.00216852   6.04      0.0038

Output 32.2.3: Results of Requesting the P and CLM Options

  1          15.40000000        14.88032787         0.51967213   2          20.20000000        21.38360656   1.18360656   3          25.70000000        25.26721311         0.43278689   4          26.20000000        26.53114754   0.33114754   5          26.60000000        26.53114754         0.06885246   6          27.40000000        26.53114754         0.86885246   7 *          .                26.18073770          .   8          24.80000000        25.17540984   0.37540984   95% Confidence Limits for   Observation            Mean Predicted Value   1          12.69701317     17.06364257   2          20.01727192     22.74994119   3          23.87460041     26.65982582   4          25.44573423     27.61656085   5          25.44573423     27.61656085   6          25.44573423     27.61656085   7 *        24.88679308     27.47468233   8          23.05954977     27.29126990   * Observation was not used in this analysis

Output 32.2.4: Additional Results of Requesting the P and CLM Options

  Gasoline Mileage Experiment   The GLM Procedure   Sum of Residuals                         0.00000000   Sum of Squared Residuals                 2.86852459   Sum of Squared Residuals - Error SS   0.00000000   PRESS Statistic                         23.18107335   First Order Autocorrelation   0.54376613   Durbin-Watson D                          2.94425592

Output 32.2.5: Plot of Mileage Data

The overall F statistic is significant. The tests of mph and mph * mph in the Type I sums of squares show that both the linear and quadratic terms in the regression model are significant. The model fits well, with an R ² of 0.97. The table of parameter estimates indicates that the estimated regression equation is

The P and CLM options in the MODEL statement produce the table shown in Output 32.2.3. For each observation, the observed , predicted, and residual values are shown. In addition, the 95% confidence limits for a mean predicted value are shown for each observation. Note that the observation with a missing value for mph is not used in the analysis, but predicted and confidence limit values are shown.

The final portion of output gives some additional information on the residuals. The Press statistic gives the sum of squares of predicted residual errors, as described in Chapter 2, Introduction to Regression Procedures. The First Order Autocorrelation and the Durbin-Watson D statistic, which measures first-order autocorrelation, are also given.

Output 32.2.5 shows the actual and predicted values for the data. The quadratic relationship between mpg and mph is evident.

Example 32.3. Unbalanced ANOVA for Two-Way Design with Interaction

This example uses data from Kutner (1974, p. 98) to illustrate a two-way analysis of variance. The original data source is Afifi and Azen (1972, p. 166). These statements produce Output 32.3.1 and Output 32.3.2.

  /*--------------------------------------------------------- */   /* Note: Kutner's 24 for drug 2, disease 1 changed to 34.  */   /*--------------------------------------------------------- */   title Unbalanced Two-Way Analysis of Variance;   data a;   input drug disease @;   do i=1 to 6;   input y @;   output;   end;   datalines;   1 1 42 44 36 13 19 22   1 2 33  . 26  . 33 21   1 3 31   3  . 25 25 24   2 1 28  . 23 34 42 13   2 2  . 34 33 31  . 36   2 3  3 26 28 32  4 16   3 1  .  . 1  29  . 19   3 2  . 11 9   7  1 -6   3 3 21  1 .   9  3  .   4 1 24  . 9  22   2 15   4 2 27 12 12   5 16 15   4 3 22  7 25  5 12  .   ;   proc glm;   class drug disease;   model y=drug disease drug*disease / ss1 ss2 ss3 ss4;   run;

Output 32.3.1: Classes and Levels for Unbalanced Two-Way Design

  Unbalanced Two-Way Analysis of Variance   The GLM Procedure   Class Level Information   Class         Levels    Values   drug               4    1 2 3 4   disease            3    1 2 3   Number of Observations Read          72   Number of Observations Used          58

Output 32.3.2: Analysis of Variance for Unbalanced Two-Way Design

  Unbalanced Two-Way Analysis of Variance   The GLM Procedure   Dependent Variable: y   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                      11    4259.338506     387.212591      3.51   0.0013   Error                      46    5080.816667     110.452536   Corrected Total            57    9340.155172   R-Square     Coeff Var      Root MSE        y Mean   0.456024      55.66750      10.50964      18.87931   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   drug                        3    3133.238506    1044.412835      9.46   <.0001   disease                     2     418.833741     209.416870      1.90   0.1617   drug*disease                6     707.266259     117.877710      1.07   0.3958   Source                     DF     Type II SS    Mean Square   F Value   Pr > F   drug                        3    3063.432863    1021.144288      9.25   <.0001   disease                     2     418.833741     209.416870      1.90   0.1617   drug*disease                6     707.266259     117.877710      1.07   0.3958   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   drug                        3    2997.471860     999.157287      9.05   <.0001   disease                     2     415.873046     207.936523      1.88   0.1637   drug*disease                6     707.266259     117.877710      1.07   0.3958   Source                     DF     Type IV SS    Mean Square   F Value   Pr > F   drug                        3    2997.471860     999.157287      9.05   <.0001   disease                     2     415.873046     207.936523      1.88   0.1637   drug*disease                6     707.266259     117.877710      1.07   0.3958

Note the differences between the four types of sums of squares. The Type I sum of squares for drug essentially tests for differences between the expected values of the arithmetic mean response for different drugs, unadjusted for the effect of disease. By contrast, the Type II sum of squares for drug measure the differences between arithmetic means for each drug after adjusting for disease . The Type III sum of squares measures the differences between predicted drug means over a balanced drug — disease population ”that is, between the LS-means for drug . Finally, the Type IV sum of squares is the same as the Type III sum of squares in this case, since there is data for every drug-by-disease combination.

No matter which sum of squares you prefer to use, this analysis shows a significant difference among the four drugs, while the disease effect and the drug-by-disease interaction are not significant. As the previous discussion indicates, Type III sums of squares correspond to differences between LS-means, so you can follow up the Type III tests with a multiple comparisons analysis of the drug LS-means. Since the GLM procedure is interactive, you can accomplish this by submitting the following statements after the previous ones that performed the ANOVA.

  lsmeans drug / pdiff=all adjust=tukey;   run;

Both the LS-means themselves and a matrix of adjusted p -values for pairwise differences between them are displayed; see Output 32.3.3.

Output 32.3.3: LS-Means for Unbalanced ANOVA

  Unbalanced Two-Way Analysis of Variance   The GLM Procedure   Least Squares Means   Adjustment for Multiple Comparisons: Tukey-Kramer   LSMEAN   drug        y LSMEAN      Number   1         25.9944444           1   2         26.5555556           2   3          9.7444444           3   4         13.5444444           4

  Unbalanced Two-Way Analysis of Variance   The GLM Procedure   Least Squares Means   Adjustment for Multiple Comparisons: Tukey-Kramer   Least Squares Means for effect drug   Pr > t for H0: LSMean(i)=LSMean(j)   Dependent Variable: y   i/j              1             2             3             4   1                      0.9989        0.0016        0.0107   2        0.9989                      0.0011        0.0071   3        0.0016        0.0011                      0.7870   4        0.0107        0.0071        0.7870

The multiple comparisons analysis shows that drugs 1 and 2 have very similar effects, and that drugs 3 and 4 are also insignificantly different from each other. Evidently, the main contribution to the significant drug effect is the difference between the 1/2 pair and the 3/4 pair.

Example 32.4. Analysis of Covariance

Analysis of covariance combines some of the features of both regression and analysis of variance. Typically, a continuous variable (the covariate) is introduced into the model of an analysis-of-variance experiment.

Data in the following example are selected from a larger experiment on the use of drugs in the treatment of leprosy (Snedecor and Cochran 1967, p. 422).

Variables in the study are

Drug	- two antibiotics (A and D) and a control (F)
PreTreatment	- a pre-treatment score of leprosy bacilli
PostTreatment	- a post-treatment score of leprosy bacilli

Ten patients are selected for each treatment ( Drug ), and six sites on each patient are measured for leprosy bacilli.

The covariate (a pretreatment score) is included in the model for increased precision in determining the effect of drug treatments on the posttreatment count of bacilli.

The following code creates the data set, performs a parallel-slopes analysis of covariance with PROC GLM, and computes Drug LS-means. These statements produce Output 32.4.1.

  data drugtest;   input Drug $ PreTreatment PostTreatment @@;   datalines;   A 11  6   A  8  0   A  5  2   A 14  8  A 19 11   A  6  4   A 10 13   A  6  1   A 11  8  A  3  0   D  6  0   D  6  2   D  7  3   D  8  1  D 18 18   D  8  4   D 19 14   D  8  9   D  5  1  D 15  9   F 16 13   F 13 10   F 11 18   F  9  5  F 21 23   F 16 12   F 12  5   F 12 16   F  7  1  F 12 20   ;   proc glm;   class Drug;   model PostTreatment = Drug PreTreatment / solution;   lsmeans Drug / stderr pdiff cov out=adjmeans;   run;   proc print data=adjmeans;   run;

Output 32.4.1: Overall Analysis of Variance

  The GLM Procedure   Class Level Information   Class         Levels    Values   Drug               3    A D F   Number of Observations Read          30   Number of Observations Used          30

  The GLM Procedure   Dependent Variable: PostTreatment   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       3     871.497403     290.499134     18.10   <.0001   Error                      26     417.202597      16.046254   Corrected Total            29    1288.700000   R-Square     Coeff Var      Root MSE    PostTreatment Mean   0.676261      50.70604      4.005778              7.900000

This model assumes that the slopes relating posttreatment scores to pretreatment scores are parallel for all drugs. You can check this assumption by including the class-by-covariate interaction, Drug * PreTreatment , in the model and examining the ANOVA test for the significance of this effect. This extra test is omitted in this example, but it is insignificant, justifying the equal-slopes assumption.

In Output 32.4.2, the Type I SS for Drug (293.6) gives the between-drug sums of squares that are obtained for the analysis-of-variance model PostTreatment = Drug . This measures the difference between arithmetic means of posttreatment scores for different drugs, disregarding the covariate. The Type III SS for Drug (68.5537) gives the Drug sum of squares adjusted for the covariate. This measures the differences between Drug LS-means, controlling for the covariate. The Type I test is highly significant ( p = 0 . 001), but the Type III test is not. This indicates that, while there is a statistically significant difference between the arithmetic drug means, this difference is reduced to below the level of background noise when you take the pretreatment scores into account. From the table of parameter estimates, you can derive the least-squares predictive formula model for estimating posttreatment score based on pretreatment score and drug.

Output 32.4.2: Tests and Parameter Estimates

  The GLM Procedure   Dependent Variable: PostTreatment   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Drug                        2    293.6000000    146.8000000      9.15   0.0010   PreTreatment                1    577.8974030    577.8974030     36.01   <.0001   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Drug                        2     68.5537106     34.2768553      2.14   0.1384   PreTreatment                1    577.8974030    577.8974030     36.01   <.0001   Standard   Parameter              Estimate             Error    t Value    Pr > t   Intercept   0.434671164 B      2.47135356   0.18      0.8617   Drug         A   3.446138280 B      1.88678065   1.83      0.0793   Drug         D   3.337166948 B      1.85386642   1.80      0.0835   Drug         F      0.000000000 B       .                .         .   PreTreatment        0.987183811        0.16449757       6.00      <.0001   NOTE: The X'X matrix has been found to be singular, and a generalized inverse   was used to solve the normal equations.  Terms whose estimates are   followed by the letter 'B' are not uniquely estimable.

Output 32.4.3 displays the LS-means, which are, in a sense, the means adjusted for the covariate. The STDERR option in the LSMEANS statement causes the standard error of the LS-means and the probability of getting a larger t value under the hypothesis H : LS-mean =0to be included in this table as well. Specifying the PDIFF option causes all probability values for the hypothesis H : LS-mean( i ) = LS-mean( j ) to be displayed, where the indexes i and j are numbered treatment levels.

Output 32.4.3: LS-means

  The GLM Procedure   Least Squares Means   Post   Treatment        Standard                  LSMEAN   Drug          LSMEAN           Error    Pr > t      Number   A          6.7149635       1.2884943      <.0001           1   D          6.8239348       1.2724690      <.0001           2   F         10.1611017       1.3159234      <.0001           3   Least Squares Means for effect Drug   Pr > t for H0: LSMean(i)=LSMean(j)   Dependent Variable: PostTreatment   i/j              1             2             3   1                      0.9521        0.0793   2        0.9521                      0.0835   3        0.0793        0.0835   NOTE: To ensure overall protection level, only probabilities associated with   pre-planned comparisons should be used.

The OUT= and COV options in the LSMEANS statement create a data set of the estimates, their standard errors, and the variances and covariances of the LS-means, which is displayed in Output 32.4.4

Output 32.4.4: LS-means Output Data Set

  Obs     _NAME_      Drug   LSMEAN   STDERR  NUMBER    COV1      COV2     COV3   1   PostTreatment   A     6.7150  1.28849     1     1.66022   0.02844   0.08403   2   PostTreatment   D     6.8239  1.27247     2     0.02844   1.61918   0.04299   3   PostTreatment   F    10.1611  1.31592     3   0.08403   0.04299   1.73165

The experimental graphics features of PROC GLM enable you to visualize the fitted analysis of covariance model.

  ods html;   ods graphics on;   proc glm;   class Drug;   model PostTreatment = Drug PreTreatment;   run;   ods graphics off;   ods html close;

When you specify the experimental ODS GRAPHICS statement and fit an analysis of covariance model, the GLM procedure output includes an analysis of covariance plot, as in Output 32.4.5. For general information about ODS graphics see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the GLM procedure, see the section ODS Graphics on page 1846.

Output 32.4.5: Analysis of Covariance Plot (Experimental)

The plot makes it clear that the control (drug F) has higher post-treatment scores across the range of pre-treatment scores, while the fitted models for the two antibiotics (drugs A and D) nearly coincide.

Example 32.5. Three-Way Analysis of Variance with Contrasts

This example uses data from Cochran and Cox (1957, p. 176) to illustrate the analysis of a three-way factorial design with replication, including the use of the CONTRAST statement with interactions, the OUTSTAT= data set, and the SLICE= option in the LSMEANS statement.

The object of the study is to determine the effects of electric current on denervated muscle. The variables are

Rep	the replicate number, 1 or 2
Time	the length of time the current is applied to the muscle, ranging from 1to4
Current	the level of electric current applied, ranging from 1 to 4
Number	the number of treatments per day, ranging from 1 to 3
MuscleWeight	the weight of the denervated muscle

The following code produces Output 32.5.1 through Output 32.5.4.

  data muscles;   do Rep=1 to 2;   do Time=1 to 4;   do Current=1 to 4;   do Number=1 to 3;   input MuscleWeight @@;   output;   end;   end;   end;   end;   datalines;   72 74 69 61 61 65 62 65 70 85 76 61   67 52 62 60 55 59 64 65 64 67 72 60   57 66 72 72 43 43 63 66 72 56 75 92   57 56 78 60 63 58 61 79 68 73 86 71   46 74 58 60 64 52 71 64 71 53 65 66   44 58 54 57 55 51 62 61 79 60 78 82   53 50 61 56 57 56 56 56 71 56 58 69   46 55 64 56 55 57 64 66 62 59 58 88   ;   proc glm outstat=summary;   class Rep Current Time Number;   model MuscleWeight = Rep CurrentTimeNumber;   contrast 'Time in Current 3'   Time 1 0 0   1 Current*Time 0 0 0 0 0 0 0 0 1 0 0   1,   Time 0 1 0   1 Current*Time 0 0 0 0 0 0 0 0 0 1 0   1,   Time 0 0 1   1 Current*Time 0 0 0 0 0 0 0 0 0 0 1   1;   contrast 'Current 1 versus 2' Current 1   1;   lsmeans Current*Time / slice=Current;   run;   proc print data=summary;   run;

Output 32.5.1: Overall Analysis

  The GLM Procedure   Class Level Information   Class         Levels    Values   Rep                2    1 2   Current            4    1 2 3 4   Time               4    1 2 3 4   Number             3    1 2 3   Number of Observations Read          96   Number of Observations Used          96   The GLM Procedure   Dependent Variable: MuscleWeight   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                      48    5782.916667     120.477431      1.77   0.0261   Error                      47    3199.489583      68.074246   Corrected Total            95    8982.406250   R-Square     Coeff Var      Root MSE    MuscleWeight Mean   0.643805      13.05105      8.250712             63.21875

Output 32.5.2: Individual Effects and Contrasts

  The GLM Procedure   Dependent Variable: MuscleWeight   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Rep                         1     605.010417     605.010417      8.89   0.0045   Current                     3    2145.447917     715.149306     10.51   <.0001   Time                        3     223.114583      74.371528      1.09   0.3616   Current*Time                9     298.677083      33.186343      0.49   0.8756   Number                      2     447.437500     223.718750      3.29   0.0461   Current*Number              6     644.395833     107.399306      1.58   0.1747   Time*Number                 6     367.979167      61.329861      0.90   0.5023   Current*Time*Number        18    1050.854167      58.380787      0.86   0.6276   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Rep                         1     605.010417     605.010417      8.89   0.0045   Current                     3    2145.447917     715.149306     10.51   <.0001   Time                        3     223.114583      74.371528      1.09   0.3616   Current*Time                9     298.677083      33.186343      0.49   0.8756   Number                      2     447.437500     223.718750      3.29   0.0461   Current*Number              6     644.395833     107.399306      1.58   0.1747   Time*Number                 6     367.979167      61.329861      0.90   0.5023   Current*Time*Number        18    1050.854167      58.380787      0.86   0.6276   Contrast                   DF    Contrast SS    Mean Square   F Value   Pr > F   Time in Current 3           3    34.83333333    11.61111111      0.17   0.9157   Current 1 versus 2          1    99.18750000    99.18750000      1.46   0.2334

Output 32.5.3: Simple Effects of Time

  The GLM Procedure   Least Squares Means   Current*Time Effect Sliced by Current for MuscleWeight   Sum of   Current        DF         Squares     Mean Square    F Value    Pr > F   1               3      271.458333       90.486111       1.33    0.2761   2               3      120.666667       40.222222       0.59    0.6241   3               3       34.833333       11.611111       0.17    0.9157   4               3       94.833333       31.611111       0.46    0.7085

Output 32.5.4: Contents of the OUTSTAT= Data Set

  Obs     _NAME_      _SOURCE_             _TYPE_    DF     SS       F       PROB   1  MuscleWeight   ERROR                ERROR     47  3199.49    .       .   2  MuscleWeight   Rep                  SS1        1   605.01   8.8875  0.00454   3  MuscleWeight   Current              SS1        3  2145.45  10.5054  0.00002   4  MuscleWeight   Time                 SS1        3   223.11   1.0925  0.36159   5  MuscleWeight   Current*Time         SS1        9   298.68   0.4875  0.87562   6  MuscleWeight   Number               SS1        2   447.44   3.2864  0.04614   7  MuscleWeight   Current*Number       SS1        6   644.40   1.5777  0.17468   8  MuscleWeight   Time*Number          SS1        6   367.98   0.9009  0.50231   9  MuscleWeight   Current*Time*Number  SS1       18  1050.85   0.8576  0.62757   10  MuscleWeight   Rep                  SS3        1   605.01   8.8875  0.00454   11  MuscleWeight   Current              SS3        3  2145.45  10.5054  0.00002   12  MuscleWeight   Time                 SS3        3   223.11   1.0925  0.36159   13  MuscleWeight   Current*Time         SS3        9   298.68   0.4875  0.87562   14  MuscleWeight   Number               SS3        2   447.44   3.2864  0.04614   15  MuscleWeight   Current*Number       SS3        6   644.40   1.5777  0.17468   16  MuscleWeight   Time*Number          SS3        6   367.98   0.9009  0.50231   17  MuscleWeight   Current*Time*Number  SS3       18  1050.85   0.8576  0.62757   18  MuscleWeight   Time in Current 3    CONTRAST   3    34.83   0.1706  0.91574   19  MuscleWeight   Current 1 versus 2   CONTRAST   1    99.19   1.4570  0.23344

The first CONTRAST statement examines the effects of Time within level 3 of Current . This is also called the simple effect of Time within Current * Time . Note that, since there are three degrees of freedom, it is necessary to specify three rows in the CONTRAST statement, separated by commas. Since the parameterization that PROC GLM uses is determined in part by the ordering of the variables in the CLASS statement, Current is specified before Time so that the Time parameters are nested within the Current * Time parameters; thus, the Current * Time contrast coefficients in each row are simply the Time coefficients of that row within the appropriate level of Current .

The second CONTRAST statement isolates a single degree of freedom effect corresponding to the difference between the first two levels of Current . You can use such a contrast in a large experiment where certain preplanned comparisons are important, but you want to take advantage of the additional error degrees of freedom available when all levels of the factors are considered .

The LSMEANS statement with the SLICE= option is an alternative way to test for the simple effect of Time within Current * Time . In addition to listing the LS-means for each current strength and length of time, it gives a table of F -tests for differences between the LS-means across Time within each Current level. In some cases, this can be a way to disentangle a complex interaction.

The output, shown in Output 32.5.2 and Output 32.5.3, indicates that the main effects for Rep , Current , and Number are significant (with p -values of 0.0045, <0.0001, and 0.0461, respectively), but Time is not significant, indicating that, in general, it doesn t matter how long the current is applied. None of the interaction terms are significant, nor are the contrasts significant. Notice that the row in the sliced ANOVA table corresponding to level 3 of current matches the Time in Current 3 contrast.

The SS, F statistics, and p -values can be stored in an OUTSTAT= data set, as shown in Output 32.5.4.

Example 32.6. Multivariate Analysis of Variance

The following example employs multivariate analysis of variance (MANOVA) to measure differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data are from Tubb et al. (1980), as reported in Hand et al. (1994).

For each of 26 samples of pottery, the percentages of oxides of five metals are measured. The following statements create the data set and invoke the GLM procedure to perform a one-way MANOVA. Additionally, it is of interest to know whether the pottery from one site in Wales (Llanederyn) differs from the samples from other sites; a CONTRAST statement is used to test this hypothesis.

  data pottery;   title1 "Romano-British Pottery";   input Site . Al Fe Mg Ca Na;   datalines;   Llanederyn   14.4 7.00 4.30 0.15 0.51   Llanederyn   13.8 7.08 3.43 0.12 0.17   Llanederyn   14.6 7.09 3.88 0.13 0.20   Llanederyn   11.5 6.37 5.64 0.16 0.14   Llanederyn   13.8 7.06 5.34 0.20 0.20   Llanederyn   10.9 6.26 3.47 0.17 0.22   Llanederyn   10.1 4.26 4.26 0.20 0.18   Llanederyn   11.6 5.78 5.91 0.18 0.16   Llanederyn   11.1 5.49 4.52 0.29 0.30   Llanederyn   13.4 6.92 7.23 0.28 0.20   Llanederyn   12.4 6.13 5.69 0.22 0.54   Llanederyn   13.1 6.64 5.51 0.31 0.24   Llanederyn   12.7 6.69 4.45 0.20 0.22   Llanederyn   12.5 6.44 3.94 0.22 0.23   Caldicot     11.8 5.44 3.94 0.30 0.04   Caldicot     11.6 5.39 3.77 0.29 0.06   IslandThorns 18.3 1.28 0.67 0.03 0.03   IslandThorns 15.8 2.39 0.63 0.01 0.04   IslandThorns 18.0 1.50 0.67 0.01 0.06   IslandThorns 18.0 1.88 0.68 0.01 0.04   IslandThorns 20.8 1.51 0.72 0.07 0.10   AshleyRails  17.7 1.12 0.56 0.06 0.06   AshleyRails  18.3 1.14 0.67 0.06 0.05   AshleyRails  16.7 0.92 0.53 0.01 0.05   AshleyRails  14.8 2.74 0.67 0.03 0.05   AshleyRails  19.1 1.64 0.60 0.10 0.03   ;   proc glm data=pottery;   class Site;   model Al Fe Mg Ca Na = Site;   contrast 'Llanederyn vs. the rest' Site 1 1 1   3;   manova h=_all_ / printe printh;   run;

After the summary information, displayed in Output 32.6.1, PROC GLM produces the univariate analyses for each of the dependent variables, as shown in Output 32.6.2 to Output 32.6.6. These analyses show that sites are significantly different for all oxides individually. You can suppress these univariate analyses by specifying the NOUNI option in the MODEL statement.

Output 32.6.1: Summary Information on Groups

  Romano-British Pottery   The GLM Procedure   Class Level Information   Class         Levels    Values   Site               4    AshleyRails Caldicot IslandThorns Llanederyn   Number of Observations Read          26   Number of Observations Used          26

Output 32.6.2: Univariate Analysis of Variance for Aluminum Oxide

  Romano-British Pottery   The GLM Procedure   Dependent Variable: Al   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       3    175.6103187     58.5367729     26.67   <.0001   Error                      22     48.2881429      2.1949156   Corrected Total            25    223.8984615   R-Square     Coeff Var      Root MSE       Al Mean   0.784330      10.22284      1.481525      14.49231   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Site                        3    175.6103187     58.5367729     26.67   <.0001   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Site                        3    175.6103187     58.5367729     26.67   <.0001   Contrast                     DF   Contrast SS   Mean Square F Value   Pr > F   Llanederyn vs. the rest       1   58.58336640   58.58336640    26.69  <.0001

Output 32.6.3: Univariate Analysis of Variance for Iron Oxide

  Romano-British Pottery   The GLM Procedure   Dependent Variable: Fe   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       3    134.2216158     44.7405386     89.88   <.0001   Error                      22     10.9508457      0.4977657   Corrected Total            25    145.1724615   R-Square     Coeff Var      Root MSE       Fe Mean   0.924567      15.79171      0.705525      4.467692   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Site                        3    134.2216158     44.7405386     89.88   <.0001   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Site                        3    134.2216158     44.7405386     89.88   <.0001   Contrast                     DF   Contrast SS   Mean Square F Value   Pr > F   Llanederyn vs. the rest       1   71.15144132   71.15144132   142.94  <.0001

Output 32.6.4: Univariate Analysis of Variance for Calcium Oxide

  Romano-British Pottery   The GLM Procedure   Dependent Variable: Mg   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       3    103.3505270     34.4501757     49.12   <.0001   Error                      22     15.4296114      0.7013460   Corrected Total            25    118.7801385   R-Square     Coeff Var      Root MSE       Mg Mean   0.870099      26.65777      0.837464      3.141538   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Site                        3    103.3505270     34.4501757     49.12   <.0001   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Site                        3    103.3505270     34.4501757     49.12   <.0001   Contrast                     DF   Contrast SS   Mean Square  F Value  Pr > F   Llanederyn vs. the rest       1   56.59349339   56.59349339    80.69  <.0001

Output 32.6.5: Univariate Analysis of Variance for Magnesium Oxide

  Romano-British Pottery   The GLM Procedure   Dependent Variable: Ca   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       3     0.20470275     0.06823425     29.16   <.0001   Error                      22     0.05148571     0.00234026   Corrected Total            25     0.25618846   R-Square     Coeff Var      Root MSE       Ca Mean   0.799032      33.01265      0.048376      0.146538   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Site                        3     0.20470275     0.06823425     29.16   <.0001   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Site                        3     0.20470275     0.06823425     29.16   <.0001   Contrast                     DF   Contrast SS   Mean Square  F Value  Pr > F   Llanederyn vs. the rest       1    0.03531688    0.03531688    15.09  0.0008

Output 32.6.6: Univariate Analysis of Variance for Sodium Oxide

  Romano-British Pottery   The GLM Procedure   Dependent Variable: Na   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       3     0.25824560     0.08608187      9.50   0.0003   Error                      22     0.19929286     0.00905877   Corrected Total            25     0.45753846   R-Square     Coeff Var      Root MSE       Na Mean   0.564424      60.06350      0.095178      0.158462   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   Site                        3     0.25824560     0.08608187      9.50   0.0003   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   Site                        3     0.25824560     0.08608187      9.50   0.0003   Contrast                     DF   Contrast SS   Mean Square  F Value  Pr > F   Llanederyn vs. the rest       1    0.23344446    0.23344446    25.77  <.0001

The PRINTE option in the MANOVA statement displays the elements of the error matrix, also called the Error Sums of Squares and Crossproducts matrix. See Output 32.6.7. The diagonal elements of this matrix are the error sums of squares from the corresponding univariate analyses.

Output 32.6.7: Error SSCP Matrix and Partial Correlations

  Romano-British Pottery   The GLM Procedure   Multivariate Analysis of Variance   E = Error SSCP Matrix   Al             Fe             Mg             Ca             Na   Al   48.288142857   7.0800714286   0.6080142857   0.1064714286   0.5889571429   Fe   7.0800714286   10.950845714   0.5270571429   0.155194286   0.0667585714   Mg   0.6080142857   0.5270571429   15.429611429   0.4353771429   0.0276157143   Ca   0.1064714286   0.155194286   0.4353771429   0.0514857143   0.0100785714   Na   0.5889571429   0.0667585714   0.0276157143   0.0100785714   0.1992928571   Partial Correlation Coefficients from the Error SSCP Matrix / Prob > r   DF = 22            Al            Fe            Mg            Ca            Na   Al           1.000000      0.307889      0.022275      0.067526      0.189853   0.1529        0.9196        0.7595        0.3856   Fe           0.307889      1.000000      0.040547   0.206685      0.045189   0.1529                      0.8543        0.3440        0.8378   Mg           0.022275      0.040547      1.000000      0.488478      0.015748   0.9196        0.8543                      0.0180        0.9431   Ca           0.067526   0.206685      0.488478      1.000000      0.099497   0.7595        0.3440        0.0180                      0.6515   Na           0.189853      0.045189      0.015748      0.099497      1.000000   0.3856        0.8378        0.9431        0.6515

The PRINTE option also displays the partial correlation matrix associated with the E matrix. In this example, none of the oxides are very strongly correlated; the strongest correlation ( r = 0 . 488) is between magnesium oxide and calcium oxide.

The PRINTH option produces the SSCP matrix for the hypotheses being tested ( Site and the contrast); see Output 32.6.8 and Output 32.6.9. Since the Type III SS are the highest level SS produced by PROC GLM by default, and since the HTYPE= option is not specified, the SSCP matrix for Site gives the Type III H matrix. The diagonal elements of this matrix are the model sums of squares from the corresponding univariate analyses.

Output 32.6.8: Hypothesis SSCP Matrix and Multivariate Tests for Overall Site Effect

  Romano-British Pottery   The GLM Procedure   Multivariate Analysis of Variance   H = Type III SSCP Matrix for Site   Al                Fe                Mg                Ca               Na   Al      175.61031868   149.295533   130.8097066   5.889163736   5.372264835   Fe   149.295533      134.22161582      117.74503516      4.8217865934     5.3259491209   Mg   130.8097066      117.74503516      103.35052703      4.2091613187     4.7105458242   Ca   5.889163736      4.8217865934      4.2091613187      0.2047027473      0.154782967   Na   5.372264835      5.3259491209      4.7105458242       0.154782967     0.2582456044   Characteristic Roots and Vectors of: E Inverse * H, where   H = Type III SSCP Matrix for Site   E = Error SSCP Matrix   Characteristic           Characteristic Vector VEV=1   Root  Percent            Al            Fe            Mg            Ca           Na   34.1611140    96.39    0.09562211   0.26330469   0.05305978   1.87982100   0.47071123   1.2500994     3.53    0.02651891   0.01239715    0.17564390   4.25929785   1.23727668   0.0275396     0.08    0.09082220    0.13159869    0.03508901   0.15701602   1.39364544   0.0000000     0.00    0.03673984   0.15129712    0.20455529    0.54624873   0.17402107   0.0000000     0.00    0.06862324    0.03056912   0.10662399    2.51151978   1.23668841   MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Site Effect   H = Type III SSCP Matrix for Site   E = Error SSCP Matrix   S=3    M=0.5    N=8   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.01230091      13.09        15    50.091    <.0001   Pillai's Trace              1.55393619       4.30        15        60    <.0001   Hotelling-Lawley Trace     35.43875302      40.59        15     29.13    <.0001   Roy's Greatest Root        34.16111399     136.64         5        20    <.0001   NOTE: F Statistic for Roy's Greatest Root is an upper bound.

Output 32.6.9: Hypothesis SSCP Matrix and Multivariate Tests for Differences Between Llanederyn and the Rest

  Romano-British Pottery   The GLM Procedure   Multivariate Analysis of Variance   H = Contrast SSCP Matrix for Llanederyn vs. the rest   Al                Fe                Mg                Ca               Na   Al      58.583366402   64.56230291   57.57983466   1.438395503   3.698102513   Fe   64.56230291      71.151441323      63.456352116      1.5851961376     4.0755256878   Mg   57.57983466      63.456352116      56.593493386      1.4137558201     3.6347541005   Ca   1.438395503      1.5851961376      1.4137558201      0.0353168783     0.0907993915   Na   3.698102513      4.0755256878      3.6347541005      0.0907993915     0.2334444577   Characteristic Roots and Vectors of: E Inverse * H, where   H = Contrast SSCP Matrix for Llanederyn vs. the rest   E = Error SSCP Matrix   Characteristic          Characteristic Vector VEV=1   Root Percent            Al            Fe            Mg            Ca           Na   16.1251646   100.00   0.08883488    0.25458141    0.08723574    0.98158668   0.71925759   0.0000000     0.00   0.00503538    0.03825743   0.17632854    5.16256699   0.01022754   0.0000000     0.00    0.00162771   0.08885364   0.01774069   0.83096817   2.17644566   0.0000000     0.00    0.04450136   0.15722494    0.22156791    0.00000000   0.00000000   0.0000000     0.00    0.11939206    0.10833549    0.00000000    0.00000000   0.00000000   MANOVA Test Criteria and Exact F Statistics for the Hypothesis   of No Overall Llanederyn vs. the rest Effect   H = Contrast SSCP Matrix for Llanederyn vs. the rest   E = Error SSCP Matrix   S=1    M=1.5    N=8   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.05839360      58.05         5        18    <.0001   Pillai's Trace              0.94160640      58.05         5        18    <.0001   Hotelling-Lawley Trace     16.12516462      58.05         5        18    <.0001   Roy's Greatest Root        16.12516462      58.05         5        18    <.0001

Four multivariate tests are computed, all based on the characteristic roots and vectors of E ^{ˆ’ 1} H . These roots and vectors are displayed along with the tests. All four tests can be transformed to variates that have F distributions under the null hypothesis. Note that the four tests all give the same results for the contrast, since it has only one degree of freedom. In this case, the multivariate analysis matches the univariate results: there is an overall difference between the chemical composition of samples from different sites, and the samples from Llanederyn are different from the average of the other sites.

Example 32.7. Repeated Measures Analysis of Variance

This example uses data from Cole and Grizzle (1966) to illustrate a commonly occurring repeated measures ANOVA design. Sixteen dogs are randomly assigned to four groups. (One animal is removed from the analysis due to a missing value for one dependent variable.) Dogs in each group receive either morphine or trimethaphan (variable Drug ) and have either depleted or intact histamine levels (variable Depleted ) before receiving the drugs. The dependent variable is the blood concentration of histamine at 0, 1, 3, and 5 minutes after injection of the drug. Logarithms are applied to these concentrations to minimize correlation between the mean and the variance of the data.

The following SAS statements perform both univariate and multivariate repeated measures analyses and produce Output 32.7.1 through Output 32.7.7:

  data dogs;   input Drug . Depleted $ Histamine0 Histamine1   Histamine3 Histamine5;   LogHistamine0=log(Histamine0);   LogHistamine1=log(Histamine1);   LogHistamine3=log(Histamine3);   LogHistamine5=log(Histamine5);   datalines;   Morphine      N  .04  .20  .10  .08   Morphine      N  .02  .06  .02  .02   Morphine      N  .07 1.40  .48  .24   Morphine      N  .17  .57  .35  .24   Morphine      Y  .10  .09  .13  .14   Morphine      Y  .12  .11  .10  .   Morphine      Y  .07  .07  .06  .07   Morphine      Y  .05  .07  .06  .07   Trimethaphan  N  .03  .62  .31  .22   Trimethaphan  N  .03 1.05  .73  .60   Trimethaphan  N  .07  .83 1.07  .80   Trimethaphan  N  .09 3.13 2.06 1.23   Trimethaphan  Y  .10  .09  .09  .08   Trimethaphan  Y  .08  .09  .09  .10   Trimethaphan  Y  .13  .10  .12  .12   Trimethaphan  Y  .06  .05  .05  .05   ;   proc glm;   class Drug Depleted;   model LogHistamine0--LogHistamine5 =   Drug Depleted Drug*Depleted / nouni;   repeated Time 4 (0 1 3 5) polynomial / summary printe;   run;

Output 32.7.1: Summary Information on Groups

  The GLM Procedure   Class Level Information   Class         Levels    Values   Drug               2    Morphine Trimethaphan   Depleted           2    N Y   Number of Observations Read          16   Number of Observations Used          15   The GLM Procedure   Repeated Measures Analysis of Variance   Analysis of Variance of Contrast Variables   Time_N represents the nth degree polynomial contrast for Time   Contrast Variable: Time_1   Contrast Variable: Time_2   Contrast Variable: Time_3

Output 32.7.2: Repeated Measures Levels

  The GLM Procedure   Repeated Measures Analysis of Variance   Repeated Measures Level Information   Log        Log        Log        Log   Dependent Variable   Histamine0 Histamine1 Histamine3 Histamine5   Level of Time            0          1          3          5

Output 32.7.3: Multivariate Tests of Within-Subject Effects

  The GLM Procedure   Repeated Measures Analysis of Variance   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time Effect   H = Type III SSCP Matrix for Time   E = Error SSCP Matrix   S=1    M=0.5    N=3.5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.11097706      24.03         3         9    0.0001   Pillai's Trace              0.88902294      24.03         3         9    0.0001   Hotelling-Lawley Trace      8.01087137      24.03         3         9    0.0001   Roy's Greatest Root         8.01087137      24.03         3         9    0.0001   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time*Drug Effect   H = Type III SSCP Matrix for Time*Drug   E = Error SSCP Matrix   S=1    M=0.5    N=3.5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.34155984       5.78         3         9    0.0175   Pillai's Trace              0.65844016       5.78         3         9    0.0175   Hotelling-Lawley Trace      1.92774470       5.78         3         9    0.0175   Roy's Greatest Root         1.92774470       5.78         3         9    0.0175   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time*Depleted Effect   H = Type III SSCP Matrix for Time*Depleted   E = Error SSCP Matrix   S=1    M=0.5    N=3.5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.12339988      21.31         3         9    0.0002   Pillai's Trace              0.87660012      21.31         3         9    0.0002   Hotelling-Lawley Trace      7.10373567      21.31         3         9    0.0002   Roy's Greatest Root         7.10373567      21.31         3         9    0.0002   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time*Drug*Depleted Effect   H = Type III SSCP Matrix for Time*Drug*Depleted   E = Error SSCP Matrix   S=1    M=0.5    N=3.5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.19383010      12.48         3         9    0.0015   Pillai's Trace              0.80616990      12.48         3         9    0.0015   Hotelling-Lawley Trace      4.15915732      12.48         3         9    0.0015   Roy's Greatest Root         4.15915732      12.48         3         9    0.0015

Output 32.7.4: Tests of Between-Subject Effects

  The GLM Procedure   Repeated Measures Analysis of Variance   Tests of Hypotheses for Between Subjects Effects   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Drug                         1      5.99336243      5.99336243       2.71    0.1281   Depleted                     1     15.44840703     15.44840703       6.98    0.0229   Drug*Depleted                1      4.69087508      4.69087508       2.12    0.1734   Error                       11     24.34683348      2.21334850

Output 32.7.5: Sphericity Test

  The GLM Procedure   Repeated Measures Analysis of Variance   Sphericity Tests   Mauchly's   Variables                    DF    Criterion    Chi-Square    Pr > ChiSq   Transformed Variates          5    0.1752641     16.930873        0.0046   Orthogonal Components         5    0.1752641     16.930873        0.0046

Output 32.7.6: Univariate Tests of Within-Subject Effects

  The GLM Procedure   Repeated Measures Analysis of Variance   Univariate Tests of Hypotheses for Within Subject Effects   Adj Pr > F   Source                     DF    Type III SS    Mean Square   F Value   Pr > F    G - G    H - F   Time                        3    12.05898677     4.01966226     53.44   <.0001   <.0001   <.0001   Time*Drug                   3     1.84429514     0.61476505      8.17   0.0003   0.0039   0.0008   Time*Depleted               3    12.08978557     4.02992852     53.57   <.0001   <.0001   <.0001   Time*Drug*Depleted          3     2.93077939     0.97692646     12.99   <.0001   0.0005   <.0001   Error(Time)                33     2.48238887     0.07522391   Greenhouse-Geisser Epsilon    0.5694   Huynh-Feldt Epsilon           0.8475

Output 32.7.7: Tests of Between-Subject Effects for Transformed Variables

  The GLM Procedure   Repeated Measures Analysis of Variance   Analysis of Variance of Contrast Variables   Time_N represents the nth degree polynomial contrast for Time   Contrast Variable: Time_1   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Mean                         1      2.00963483      2.00963483      34.99    0.0001   Drug                         1      1.18069076      1.18069076      20.56    0.0009   Depleted                     1      1.36172504      1.36172504      23.71    0.0005   Drug*Depleted                1      2.04346848      2.04346848      35.58    <.0001   Error                       11      0.63171161      0.05742833   Contrast Variable: Time_2   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Mean                         1      5.40988418      5.40988418      57.15    <.0001   Drug                         1      0.59173192      0.59173192       6.25    0.0295   Depleted                     1      5.94945506      5.94945506      62.86    <.0001   Drug*Depleted                1      0.67031587      0.67031587       7.08    0.0221   Error                       11      1.04118707      0.09465337   Contrast Variable: Time_3   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Mean                         1      4.63946776      4.63946776      63.04    <.0001   Drug                         1      0.07187246      0.07187246       0.98    0.3443   Depleted                     1      4.77860547      4.77860547      64.94    <.0001   Drug*Depleted                1      0.21699504      0.21699504       2.95    0.1139   Error                       11      0.80949018      0.07359002

The NOUNI option in the MODEL statement suppresses the individual ANOVA tables for the original dependent variables. These analyses are usually of no interest in a repeated measures analysis. The POLYNOMIAL option in the REPEATED statement indicates that the transformation used to implement the repeated measures analysis is an orthogonal polynomial transformation, and the SUMMARY option requests that the univariate analyses for the orthogonal polynomial contrast variables be displayed. Theparentheticalnumbers(0135)determinethespacingoftheorthogonal polynomials used in the analysis. The output is displayed in Output 32.7.1 through Output 32.7.7.

The Repeated Measures Level Information table gives information on the repeated measures effect; it is displayed in Output 32.7.2. In this example, the within-subject (within-dog) effect is Time , which has the levels 0, 1, 3, and 5.

The multivariate analyses for within-subject effects and related interactions are displayed in Output 32.7.3. For the example, the first table displayed shows that the TIME effect is significant. In addition, the Time * Drug * Depleted interaction is significant, as shown in the fourth table. This means that the effect of Time on the blood concentration of histamine is different for the four Drug * Depleted combinations studied.

Output 32.7.4 displays tests of hypotheses for between-subject (between-dog) effects. This section tests the hypotheses that the different Drug s, Depleted s, and their interactions have no effects on the dependent variables, while ignoring the within-dog effects. From this analysis, there is a significant between-dog effect for Depleted ( p -value=0.0229). The interaction and the main effect for Drug are not significant ( p -values=0.1734 and 0.1281, respectively).

Univariate analyses for within-subject (within-dog) effects and related interactions are displayed in Output 32.7.6. The results for this example are the same as for the multivariate analyses; this is not always the case. In addition, before the univariate analyses are used to make conclusions about the data, the result of the sphericity test (requested with the PRINTE option in the REPEATED statement and displayed in Output 32.7.5) should be examined. If the sphericity test is rejected, use the adjusted G-G or H-F probabilities. See the Repeated Measures Analysis of Variance section on page 1825 for more information.

Output 32.7.7 is produced by the SUMMARY option in the REPEATED statement. If the POLYNOMIAL option is not used, a similar table is displayed using the default CONTRAST transformation. The linear, quadratic, and cubic trends for Time , labeled as ˜Time_ 1 , ˜Time_ 2 , and ˜Time_ 3 , are displayed, and in each case, the Source labeled ˜Mean gives a test for the respective trend.

Example 32.8. Mixed Model Analysis of Variance Using the RANDOM Statement

Milliken and Johnson (1984) present an example of an unbalanced mixed model. Three machines, which are considered as a fixed effect, and six employees , which are considered a random effect, are studied. Each employee operates each machine for either one, two, or three different times. The dependent variable is an overall rating, which takes into account the number and quality of components produced.

The following statements form the data set and perform a mixed model analysis of variance by requesting the TEST option in the RANDOM statement. Note that the machine * person interaction is declared as a random effect; in general, when an interaction involves a random effect, it too should be declared as random. The results of the analysis are shown in Output 32.8.1 through Output 32.8.4.

  data machine;   input machine person rating @@;   datalines;   1 1 52.0  1 2 51.8  1 2 52.8  1 3 60.0  1 4 51.1  1 4 52.3   1 5 50.9  1 5 51.8  1 5 51.4  1 6 46.4  1 6 44.8  1 6 49.2   2 1 64.0  2 2 59.7  2 2 60.0  2 2 59.0  2 3 68.6  2 3 65.8   2 4 63.2  2 4 62.8  2 4 62.2  2 5 64.8  2 5 65.0  2 6 43.7   2 6 44.2  2 6 43.0  3 1 67.5  3 1 67.2  3 1 66.9  3 2 61.5   3 2 61.7  3 2 62.3  3 3 70.8  3 3 70.6  3 3 71.0  3 4 64.1   3 4 66.2  3 4 64.0  3 5 72.1  3 5 72.0  3 5 71.1  3 6 62.0   3 6 61.4  3 6 60.5   ;   proc glm data=machine;   class machine person;   model rating=machine person machine*person;   random person machine*person / test;   run;

Output 32.8.1: Summary Information on Groups

  The GLM Procedure   Class Level Information   Class         Levels    Values   machine            3    1 2 3   person             6    1 2 3 4 5 6   Number of Observations Read          44   Number of Observations Used          44

Output 32.8.2: Fixed-Effect Model Analysis of Variance

  The GLM Procedure   Dependent Variable: rating   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                      17    3061.743333     180.102549    206.41   <.0001   Error                      26      22.686667       0.872564   Corrected Total            43    3084.430000   R-Square     Coeff Var      Root MSE    rating Mean   0.992645      1.560754      0.934111       59.85000   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   machine                     2    1648.664722     824.332361    944.72   <.0001   person                      5    1008.763583     201.752717    231.22   <.0001   machine*person             10     404.315028      40.431503     46.34   <.0001   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   machine                     2    1238.197626     619.098813    709.52   <.0001   person                      5    1011.053834     202.210767    231.74   <.0001   machine*person             10     404.315028      40.431503     46.34   <.0001

Output 32.8.3: Expected Values of Type III Mean Squares

  The GLM Procedure   Source                Type III Expected Mean Square   machine               Var(Error) + 2.137 Var(machine*person) + Q(machine)   person                Var(Error) + 2.2408 Var(machine*person) + 6.7224   Var(person)   machine*person        Var(Error) + 2.3162 Var(machine*person)

Output 32.8.4: Mixed Model Analysis of Variance

  The GLM Procedure   Tests of Hypotheses for Mixed Model Analysis of Variance   Dependent Variable: rating   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   machine                     2    1238.197626     619.098813     16.57   0.0007   Error                  10.036     375.057436      37.370384   Error: 0.9226*MS(machine*person) + 0.0774*MS(Error)   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   person                      5    1011.053834     202.210767      5.17   0.0133   Error                  10.015     392.005726      39.143708   Error: 0.9674*MS(machine*person) + 0.0326*MS(Error)   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   machine*person             10     404.315028      40.431503     46.34   <.0001   Error: MS(Error)           26      22.686667       0.872564

The TEST option in the RANDOM statement requests that PROC GLM determine the appropriate F -tests based on person and machine * person being treated as random effects. As you can see in Output 32.8.4, this requires that a linear combination of mean squares be constructed to test both the machine and person hypotheses; thus, F -tests using Satterthwaite approximations are used.

Note that you can also use the MIXED procedure to analyze mixed models. The following statements use PROC MIXED to reproduce the mixed model analysis of variance; the relevant part of the PROC MIXED results is shown in Output 32.8.5

Output 32.8.5: PROC MIXED Mixed Model Analysis of Variance (Partial Output)

  The Mixed Procedure   Type 3 Analysis of Variance   Sum of   Source              DF         Squares     Mean Square   machine              2     1238.197626      619.098813   person               5     1011.053834      202.210767   machine*person      10      404.315028       40.431503   Residual            26       22.686667        0.872564   Type 3 Analysis of Variance   Source          Expected Mean Square   machine         Var(Residual) + 2.137 Var(machine*person) + Q(machine)   person          Var(Residual) + 2.2408 Var(machine*person) + 6.7224 Var(person)   machine*person  Var(Residual) + 2.3162 Var(machine*person)   Residual        Var(Residual)   Type 3 Analysis of Variance   Error   Source          Error Term                                   DF  F Value  Pr > F   machine         0.9226 MS(machine*person)                10.036    16.57  0.0007   + 0.0774 MS(Residual)   person          0.9674 MS(machine*person)                10.015     5.17  0.0133   + 0.0326 MS(Residual)   machine*person MS(Residual)                                  26    46.34  <.0001   Residual        .                                             .      .     .

  proc mixed data=machine method=type3;   class machine person;   model rating = machine;   random person machine*person;   run;

The advantage of PROC MIXED is that it offers more versatility for mixed models; the disadvantage is that it can be less computationally efficient for large data sets. See Chapter 46, The MIXED Procedure, for more details.

Example 32.9. Analyzing a Doubly-multivariate Repeated Measures Design

This example shows how to analyze a doubly-multivariate repeated measures design by using PROC GLM with an IDENTITY factor in the REPEATED statement. Note that this differs from previous releases of PROC GLM, in which you had to use a MANOVA statement to get a doubly repeated measures analysis.

Two responses, Y1 and Y2, are each measured three times for each subject (pretreatment, posttreatment, and in a later follow-up). Each subject receives one of three treatments; A, B, or the control. In PROC GLM, you use a REPEATED factor of type IDENTITY to identify the different responses and another repeated factor to identify the different measurement times. The repeated measures analysis includes multivariate tests for time and treatment main effects, as well as their interactions, across responses. The following statements produce Output 32.9.1 through Output 32.9.3.

  data Trial;   input Treatment $ Repetition PreY1 PostY1 FollowY1   PreY2 PostY2 FollowY2;   datalines;   A        1  3 13  9  0  0  9   A        2  0 14 10  6  6  3   A        3  4  6 17  8  2  6   A        4  7  7 13  7  6  4   A        5  3 12 11  6 12  6   A        6 10 14  8 13  3  8   B        1  9 11 17  8 11 27   B        2  4 16 13  9  3 26   B        3  8 10  9 12  0 18   B        4  5  9 13  3  0 14   B        5  0 15 11  3  0 25   B        6  4 11 14  4  2  9   Control  1 10 12 15  4  3  7   Control  2  2  8 12  8  7 20   Control  3  4  9 10  2  0 10   Control  4 10  8  8  5  8 14   Control  5 11 11 11  1  0 11   Control  6  1  5 15  8  9 10   ;   proc glm data=Trial;   class Treatment;   model PreY1 PostY1 FollowY1   PreY2 PostY2 FollowY2 = Treatment / nouni;   repeated Response 2 identity, Time 3;   run;

Output 32.9.1: A Doubly-multivariate Repeated Measures Design

  The GLM Procedure   Class Level Information   Class          Levels    Values   Treatment           3    A B Control   Number of Observations Read          18   Number of Observations Used          18

Output 32.9.2: Repeated Factor Levels

  The GLM Procedure   Repeated Measures Analysis of Variance   Repeated Measures Level Information   Dependent Variable      PreY1   PostY1 FollowY1    PreY2   PostY2 FollowY2   Level of Response           1        1        1        2        2        2   Level of Time           1        2        3        1        2        3

Output 32.9.3: Within-subject Tests

  The GLM Procedure   Repeated Measures Analysis of Variance   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Response Effect   H = Type III SSCP Matrix for Response   E = Error SSCP Matrix   S=1    M=0    N=6   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.02165587     316.24         2        14    <.0001   Pillai's Trace              0.97834413     316.24         2        14    <.0001   Hotelling-Lawley Trace     45.17686368     316.24         2        14    <.0001   Roy's Greatest Root        45.17686368     316.24         2        14    <.0001   MANOVA Test Criteria and F Approximations for the Hypothesis of no Response*Treatment Effect   H = Type III SSCP Matrix for Response*Treatment   E = Error SSCP Matrix   S=2    M=-0.5    N=6   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.72215797       1.24         4        28    0.3178   Pillai's Trace              0.27937444       1.22         4        30    0.3240   Hotelling-Lawley Trace      0.38261660       1.31         4    15.818    0.3074   Roy's Greatest Root         0.37698780       2.83         2        15    0.0908   NOTE: F Statistic for Roy's Greatest Root is an upper bound.   NOTE: F Statistic for Wilks' Lambda is exact.   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Response*Time Effect   H = Type III SSCP Matrix for Response*Time   E = Error SSCP Matrix   S=1    M=1    N=5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.14071380      18.32         4        12    <.0001   Pillai's Trace              0.85928620      18.32         4        12    <.0001   Hotelling-Lawley Trace      6.10662362      18.32         4        12    <.0001   Roy's Greatest Root         6.10662362      18.32         4        12    <.0001   MANOVA Test Criteria and F Approximations for the   Hypothesis of no Response*Time*Treatment Effect   H = Type III SSCP Matrix for Response*Time*Treatment   E = Error SSCP Matrix   S=2    M=0.5    N=5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.22861451       3.27         8        24    0.0115   Pillai's Trace              0.96538785       3.03         8        26    0.0151   Hotelling-Lawley Trace      2.52557514       3.64         8        15    0.0149   Roy's Greatest Root         2.12651905       6.91         4        13    0.0033   NOTE: F Statistic for Roy's Greatest Root is an upper bound.   NOTE: F Statistic for Wilks Lambda is exact.

The levels of the repeated factors are displayed in Output 32.9.2. Note that RESPONSE is 1 for all the Y1 measurements and 2 for all the Y2 measurements, while the three levels of Time identify the pretreatment, posttreatment, and follow-up measurements within each response. The multivariate tests for within-subject effects are displayed in Output 32.9.3.

The table for Response * Treatment tests for an overall treatment effect across the two responses; likewise, the tables for Response * Time and Response * Treatment * Time test for time and the treatment-by-time interaction, respectively. In this case, there is a strong main effect for time and possibly for the interaction, but not for treatment.

In previous releases (before the IDENTITY transformation was introduced), in order to perform a doubly repeated measures analysis, you had to use a MANOVA statement with a customized transformation matrix M. You might still want to use this approach to see details of the analysis, such as the univariate ANOVA for each transformed variate. The following statements demonstrate this approach by using the MANOVA statement to test for the overall main effect of time and specifying the SUMMARY option.

  proc glm data=Trial;   class Treatment;   model PreY1 PostY1 FollowY1   PreY2 PostY2 FollowY2 = Treatment / nouni;   manova  h=intercept  m=prey1 - posty1,   prey1 - followy1,   prey2 - posty2,   prey2 - followy2 / summary;   run;

The M matrix used to perform the test for time effects is displayed in Output 32.9.4, while the results of the multivariate test are given in Output 32.9.5. Note that the test results are the same as for the Response * Time effect in Output 32.9.3.

Output 32.9.4: M Matrix to Test for Time Effect (Repeated Measure)

  The GLM Procedure   Multivariate Analysis of Variance   M Matrix Describing Transformed Variables   PreY1         PostY1       FollowY1          PreY2         PostY2      FollowY2   MVAR1              1   1              0              0              0             0   MVAR2              1              0   1              0              0             0   MVAR3              0              0              0              1   1             0   MVAR4              0              0              0              1              0   1

Output 32.9.5: Tests for Time Effect (Repeated Measure)

  The GLM Procedure   Multivariate Analysis of Variance   Characteristic Roots and Vectors of: E Inverse * H, where   H = Type III SSCP Matrix for Intercept   E = Error SSCP Matrix   Variables have been transformed by the M Matrix   Characteristic               Characteristic Vector VEV=1   Root    Percent           MVAR1           MVAR2           MVAR3          MVAR4   6.10662362     100.00   0.00157729      0.04081620   0.04210209      0.03519437   0.00000000       0.00      0.00796367      0.00493217      0.05185236      0.00377940   0.00000000       0.00   0.03534089   0.01502146   0.00283074      0.04259372   0.00000000       0.00   0.05672137      0.04500208      0.00000000      0.00000000   MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Intercept Effect   on the Variables Defined by the M Matrix Transformation   H = Type III SSCP Matrix for Intercept   E = Error SSCP Matrix   S=1    M=1    N=5   Statistic                        Value    F Value    Num DF    Den DF    Pr > F   Wilks' Lambda               0.14071380      18.32         4        12    <.0001   Pillai's Trace              0.85928620      18.32         4        12    <.0001   Hotelling-Lawley Trace      6.10662362      18.32         4        12    <.0001   Roy's Greatest Root         6.10662362      18.32         4        12    <.0001

The SUMMARY option in the MANOVA statement creates an ANOVA table for each transformed variable as defined by the M matrix. MVAR1 and MVAR2 contrast the pretreatment measurement for Y1 with the posttreatment and follow-up measurements for Y1, respectively; MVAR3 and MVAR4 are the same contrasts for Y2. Output 32.9.6 displays these univariate ANOVA tables and shows that the contrasts are all strongly significant except for the pre-versus-post difference for Y2.

Output 32.9.6: Summary Output for the Test for Time Effect

  The GLM Procedure   Multivariate Analysis of Variance   Dependent Variable: MVAR1   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Intercept                    1     512.0000000     512.0000000      22.65    0.0003   Error                       15     339.0000000      22.6000000   The GLM Procedure   Multivariate Analysis of Variance   Dependent Variable: MVAR2   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Intercept                    1     813.3888889     813.3888889      32.87    <.0001   Error                       15     371.1666667      24.7444444   The GLM Procedure   Multivariate Analysis of Variance   Dependent Variable: MVAR3   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Intercept                    1      68.0555556      68.0555556       3.49    0.0814   Error                       15     292.5000000      19.5000000   The GLM Procedure   Multivariate Analysis of Variance   Dependent Variable: MVAR4   Source                      DF     Type III SS     Mean Square    F Value    Pr > F   Intercept                    1     800.0000000     800.0000000      26.43    0.0001   Error                       15     454.0000000      30.2666667

Example 32.10. Testing for Equal Group Variances

This example demonstrates how you can test for equal group variances in a one-way design. The data come from the University of Pennsylvania Smell Identification Test (UPSIT), reported in O Brien and Heft (1995). The study is undertaken to explore how age and gender are related to sense of smell. A total of 180 subjects 20 to 89 years old are exposed to 40 different odors: for each odor, subjects are asked to choose which of four words best describes the odor. The Freeman-Tukey modified arcsine transformation (Bishop et al. 1975) is applied to the proportion of correctly identified odors to arrive at an olfactory index. For the following analysis, subjects are divided into five age groups:

The following statements create a data set named upsit , containing the age group and olfactory index for each subject.

  data upsit;   input agegroup smell @@;   datalines;   1 1.381  1 1.322  1 1.162  1 1.275  1 1.381  1 1.275  1 1.322   1 1.492  1 1.322  1 1.381  1 1.162  1 1.013  1 1.322  1 1.322   1 1.275  1 1.492  1 1.322  1 1.322  1 1.492  1 1.322  1 1.381   1 1.234  1 1.162  1 1.381  1 1.381  1 1.381  1 1.322  1 1.381   1 1.322  1 1.381  1 1.275  1 1.492  1 1.275  1 1.322  1 1.275   1 1.381  1 1.234  1 1.105   2 1.234  2 1.234  2 1.381  2 1.322  2 1.492  2 1.234  2 1.381   2 1.381  2 1.492  2 1.492  2 1.275  2 1.492  2 1.381  2 1.492   2 1.322  2 1.275  2 1.275  2 1.275  2 1.322  2 1.492  2 1.381   2 1.322  2 1.492  2 1.196  2 1.322  2 1.275  2 1.234  2 1.322   2 1.098  2 1.322  2 1.381  2 1.275  2 1.492  2 1.492  2 1.381   2 1.196   3 1.381  3 1.381  3 1.492  3 1.492  3 1.492  3 1.098  3 1.492   3 1.381  3 1.234  3 1.234  3 1.129  3 1.069  3 1.234  3 1.322   3 1.275  3 1.230  3 1.234  3 1.234  3 1.322  3 1.322  3 1.381   4 1.322  4 1.381  4 1.381  4 1.322  4 1.234  4 1.234  4 1.234   4 1.381  4 1.322  4 1.275  4 1.275  4 1.492  4 1.234  4 1.098   4 1.322  4 1.129  4 0.687  4 1.322  4 1.322  4 1.234  4 1.129   4 1.492  4 0.810  4 1.234  4 1.381  4 1.040  4 1.381  4 1.381   4 1.129  4 1.492  4 1.129  4 1.098  4 1.275  4 1.322  4 1.234   4 1.196  4 1.234  4 0.585  4 0.785  4 1.275  4 1.322  4 0.712   4 0.810   5 1.322  5 1.234  5 1.381  5 1.275  5 1.275  5 1.322  5 1.162   5 0.909  5 0.502  5 1.234  5 1.322  5 1.196  5 0.859  5 1.196   5 1.381  5 1.322  5 1.234  5 1.275  5 1.162  5 1.162  5 0.585   5 1.013  5 0.960  5 0.662  5 1.129  5 0.531  5 1.162  5 0.737   5 1.098  5 1.162  5 1.040  5 0.558  5 0.960  5 1.098  5 0.884   5 1.162  5 1.098  5 0.859  5 1.275  5 1.162  5 0.785  5 0.859   ;

Older people are more at risk for problems with their sense of smell, and this should be reflected in significant differences in the mean of the olfactory index across the different age groups. However, many older people also have an excellent sense of smell, which implies that the older age groups should have greater variability. In order to test this hypothesis and to compute a one-way ANOVA for the olfactory index that is robust to the possibility of unequal group variances, you can use the HOVTEST and WELCH options in the MEANS statement for the GLM procedure, as shown in the following code.

  proc glm data=upsit;   class agegroup;   model smell = agegroup;   means agegroup / hovtest welch;   run;

Output 32.10.1, Output 32.10.2,and Output 32.10.3 display the usual ANOVA test for equal age group means, Levene s test for equal age group variances, and Welch s test for equal age group means, respectively. The hypotheses of age effects for mean and variance of the olfactory index are both confirmed.

Output 32.10.1: Usual ANOVA Test for Age Group Differences in Mean Olfactory Index

  The GLM Procedure   Dependent Variable: smell   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   agegroup                    4     2.13878141     0.53469535     16.65   <.0001

Output 32.10.2: Levene s Test for Age Group Differences in Olfactory Variability

  The GLM Procedure   Levene's Test for Homogeneity of smell Variance   ANOVA of Squared Deviations from Group Means   Sum of        Mean   Source          DF     Squares      Square    F Value    Pr > F   agegroup         4      0.0799      0.0200       6.35    <.0001   Error          175      0.5503      0.00314

Output 32.10.3: Welch s Test for Age Group Differences in Mean Olfactory Index

  The GLM Procedure   Welch's ANOVA for smell   Source            DF    F Value    Pr > F   agegroup      4.0000      13.72    <.0001   Error        78.7489

Example 32.11. Analysis of a Screening Design

Yin and Jillie (1987) describe an experiment on a nitride etch process for a single wafer plasma etcher. The experiment is run using four factors: cathode power ( power ), gas flow ( flow ), reactor chamber pressure ( pressure ), and electrode gap ( gap ). Of interest are the main effects and interaction effects of the factors on the nitride etch rate ( rate ). The following statements create a SAS data set named HalfFraction , containing the factor settings and the observed etch rate for each of eight experimental runs.

  data HalfFraction;   input power flow pressure gap rate;   datalines;   0.8   4.5 125 275     550   0.8   4.5 200 325     650   0.8 550.0 125 325     642   0.8 550.0 200 275     601   1.2   4.5 125 325     749   1.2   4.5 200 275    1052   1.2 550.0 125 275    1075   1.2 550.0 200 325     729   ;

Notice that each of the factors has just two values. This is a common experimental design when the intent is to screen from the many factors that might affect the response the few that actually do . Since there are 2 ⁴ = 16 different possible settings of four two-level factors, this design with only eight runs is called a half fraction. The eight runs are chosen specifically to provide unambiguous information on main effects at the cost of confounding interaction effects with each other.

One way to analyze this data is simply to use PROC GLM to compute an analysis of variance, including both main effects and interactions in the model. The following statements demonstrate this approach.

  proc glm data=HalfFraction;   class power flow pressure gap;   model rate=powerflowpressuregap@2;   run;

The ˜@2 notation on the model statement includes all main effects and two-factor interactions between the factors. The output is shown in Output 32.11.1.

Output 32.11.1: Analysis of Variance for Nitride Etch Process Half Fraction

  The GLM Procedure   Class Level Information   Class         Levels    Values   power              2    0.8 1.2   flow               2    4.5 550   pressure           2    125 200   gap                2    275 325   Number of Observations Read           8   Number of Observations Used           8   The GLM Procedure   Dependent Variable: rate   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       7    280848.0000     40121.1429       .      .   Error                       0         0.0000          .   Corrected Total             7    280848.0000   R-Square     Coeff Var      Root MSE     rate Mean   1.000000           .               .      756.0000   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   power                       1    168780.5000    168780.5000       .      .   flow                        1       264.5000       264.5000       .      .   power*flow                  1       200.0000       200.0000       .      .   pressure                    1        32.0000        32.0000       .      .   power*pressure              1      1300.5000      1300.5000       .      .   flow*pressure               1     78012.5000     78012.5000       .      .   gap                         1     32258.0000     32258.0000       .      .   power*gap                   0         0.0000          .           .      .   flow*gap                    0         0.0000          .           .      .   pressure*gap                0         0.0000          .           .      .   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   power                       1    168780.5000    168780.5000       .      .   flow                        1       264.5000       264.5000       .      .   power*flow                  0         0.0000          .           .      .   pressure                    1        32.0000        32.0000       .      .   power*pressure              0         0.0000          .           .      .   flow*pressure               0         0.0000          .           .      .   gap                         1     32258.0000     32258.0000       .      .   power*gap                   0         0.0000          .           .      .   flow*gap                    0         0.0000          .           .      .   pressure*gap                0         0.0000          .           .      .

Notice that there are no error degrees of freedom. This is because there are 10 effects in the model (4 main effects plus 6 interactions) but only 8 observations in the data set. This is another cost of using a fractional design: not only is it impossible to estimate all the main effects and interactions, but there is also no information left to estimate the underlying error rate in order to measure the significance of the effects that are estimable.

Another thing to notice in Output 32.11.1 is the difference between the Type I and Type III ANOVA tables. The rows corresponding to main effects in each are the same, but no Type III interaction tests are estimable, while some Type I interaction tests are estimable. This indicates that there is aliasing in the design: some interactions are completely confounded with each other.

In order to analyze this confounding, you should examine the aliasing structure of the design using the ALIASING option in the MODEL statement. Before doing so, however, it is advisable to code the design, replacing low and high levels of each factor with the values ˆ’ 1 and +1, respectively. This puts each factor on an equal footing in the model and makes the aliasing structure much more interpretable. The following statements code the data, creating a new data set named Coded .

  data Coded; set HalfFraction;   power    =   1*(power   =0.80) + 1*(power   =1.20);   flow     =   1*(flow    =4.50) + 1*(flow    =550);   pressure =   1*(pressure=125) + 1*(pressure=200);   gap      =   1*(gap     =275) + 1*(gap     =325);   run;

The following statements use the GLM procedure to reanalyze the coded design, displaying the parameter estimates as well as the functions of the parameters that they each estimate.

  proc glm data=Coded;   model rate=powerflowpressuregap@2 / solution aliasing;   run;

The parameter estimates table is shown in Output 32.11.2.

Output 32.11.2: Parameter Estimates and Aliases for Nitride Etch Process Half Fraction

  The GLM Procedure   Dependent Variable: rate   Standard   Parameter             Estimate            Error   t Value   Pr > t   Expected Value   Intercept          756.0000000                .       .        .       Intercept   power              145.2500000                .       .        .       power   flow                 5.7500000                .       .        .       flow   power*flow   5.0000000 B              .       .        .       power*flow + pressure*gap   pressure             2.0000000                .       .        .       pressure   power*pressure   12.7500000 B              .       .        .       power*pressure + flow*gap   flow*pressure   98.7500000 B              .       .        .       flow*pressure + power*gap   gap   63.5000000                .       .        .       gap   power*gap            0.0000000 B              .       .        .   flow*gap             0.0000000 B              .       .        .   pressure*gap         0.0000000 B              .       .        .   NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve   the normal equations. Terms whose estimates are followed by the letter 'B' are not   uniquely estimable.

Looking at the Expected Value column, notice that, while each of the main effects is unambiguously estimated by its associated term in the model, the expected values of the interaction estimates are more complicated. For example, the relatively large effect ( ˆ’ 98.75) corresponding to flow * pressure actually estimates the combined effect of flow*pressure and power * gap . Without further information, it is impossible to disentangle these aliased interactions; however, since the main effects of both power and gap are large and those for flow and pressure are small, it is reasonable to suspect that power * gap is the more active of the two interactions.

Fortunately, eight more runs are available for this experiment (the other half fraction.) The following statements create a data set containing these extra runs and add it to the previous eight, resulting in a full 2 ⁴ = 16 run replicate. Then PROC GLM displays the analysis of variance again.

  data OtherHalf;   input power flow pressure gap rate;   datalines;   0.8   4.5 125 325     669   0.8   4.5 200 275     604   0.8 550.0 125 275     633   0.8 550.0 200 325     635   1.2   4.5 125 275    1037   1.2   4.5 200 325     868   1.2 550.0 125 325     860   1.2 550.0 200 275    1063   ;   data FullRep;   set HalfFraction OtherHalf;   run;   proc glm data=FullRep;   class power flow pressure gap;   model rate=powerflowpressuregap@2;   run;

The results are displayed in Output 32.11.3.

Output 32.11.3: Analysis of Variance for Nitride Etch Process Full Replicate

  The GLM Procedure   Class Level Information   Class         Levels    Values   power              2    0.8 1.2   flow               2    4.5 550   pressure           2    125 200   gap                2    275 325   Number of Observations Read          16   Number of Observations Used          16   The GLM Procedure   Dependent Variable: rate   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                      10    521234.1250     52123.4125     25.58   0.0011   Error                       5     10186.8125      2037.3625   Corrected Total            15    531420.9375   R-Square     Coeff Var      Root MSE     rate Mean   0.980831      5.816175      45.13715      776.0625   Source                     DF      Type I SS    Mean Square   F Value   Pr > F   power                       1    374850.0625    374850.0625    183.99   <.0001   flow                        1       217.5625       217.5625      0.11   0.7571   power*flow                  1        18.0625        18.0625      0.01   0.9286   pressure                    1        10.5625        10.5625      0.01   0.9454   power*pressure              1         1.5625         1.5625      0.00   0.9790   flow*pressure               1      7700.0625      7700.0625      3.78   0.1095   gap                         1     41310.5625     41310.5625     20.28   0.0064   power*gap                   1     94402.5625     94402.5625     46.34   0.0010   flow*gap                    1      2475.0625      2475.0625      1.21   0.3206   pressure*gap                1       248.0625       248.0625      0.12   0.7414   Source                     DF    Type III SS    Mean Square   F Value   Pr > F   power                       1    374850.0625    374850.0625    183.99   <.0001   flow                        1       217.5625       217.5625      0.11   0.7571   power*flow                  1        18.0625        18.0625      0.01   0.9286   pressure                    1        10.5625        10.5625      0.01   0.9454   power*pressure              1         1.5625         1.5625      0.00   0.9790   flow*pressure               1      7700.0625      7700.0625      3.78   0.1095   gap                         1     41310.5625     41310.5625     20.28   0.0064   power*gap                   1     94402.5625     94402.5625     46.34   0.0010   flow*gap                    1      2475.0625      2475.0625      1.21   0.3206   pressure*gap                1       248.0625       248.0625      0.12   0.7414

With sixteen runs, the analysis of variance tells the whole story: all effects are estimable and there are five degrees of freedom left over to estimate the underlying error. The main effects of power and gap and their interaction are all significant, and no other effects are. Notice that the Type I and Type III ANOVA tables are the same; this is because the design is orthogonal and all effects are estimable.

This example illustrates the use of the GLM procedure for the model analysis of a screening experiment. Typically, there is much more involved in performing an experiment of this type, from selecting the design points to be studied to graphically assessing significant effects, optimizing the final model, and performing subsequent experimentation. Specialized tools for this are available in SAS/QC software, in particular the ADX Interface and the FACTEX and OPTEX procedures. Refer to SAS/QC User s Guide for more information.