Getting Started | SAS/STAT 9.1 Users Guide, Volumes 1-7

The following examples demonstrate how you can use the ANOVA procedure to perform analyses of variance for a one-way layout and a randomized complete block design.

One-Way Layout with Means Comparisons

A one-way analysis of variance considers one treatment factor with two or more treatment levels. The goal of the analysis is to test for differences among the means of the levels and to quantify these differences. If there are two treatment levels, this analysis is equivalent to a t test comparing two group means.

The assumptions of analysis of variance (Steel and Torrie 1980) are

treatment effects are additive
experimental errors
- are random
- are independently distributed
- follow a normal distribution
- have mean zero and constant variance

The following example studies the effect of bacteria on the nitrogen content of red clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti . Red clover plants are inoculated with the treatments , and nitrogen content is later measured in milligrams. The data are derived from an experiment by Erdman (1946) and are analyzed in Chapters 7 and 8 of Steel and Torrie (1980). The following DATA step creates the SAS data set Clover :

  title1 'Nitrogen Content of Red Clover Plants';   data Clover;   input Strain $ Nitrogen @@;   datalines;   3DOK1  19.4 3DOK1  32.6 3DOK1  27.0 3DOK1  32.1 3DOK1  33.0   3DOK5  17.7 3DOK5  24.8 3DOK5  27.9 3DOK5  25.2 3DOK5  24.3   3DOK4  17.0 3DOK4  19.4 3DOK4   9.1 3DOK4  11.9 3DOK4  15.8   3DOK7  20.7 3DOK7  21.0 3DOK7  20.5 3DOK7  18.8 3DOK7  18.6   3DOK13 14.3 3DOK13 14.4 3DOK13 11.8 3DOK13 11.6 3DOK13 14.2   COMPOS 17.3 COMPOS 19.4 COMPOS 19.1 COMPOS 16.9 COMPOS 20.8   ;

The variable Strain contains the treatment levels, and the variable Nitrogen contains the response. The following statements produce the analysis.

  proc anova data = Clover;   class strain;   model Nitrogen = Strain;   run;

The classification variable is specified in the CLASS statement. Note that, unlike the GLM procedure, PROC ANOVA does not allow continuous variables on the right-hand side of the model. Figure 17.1 and Figure 17.2 display the output produced by these statements.

  Nitrogen Content of Red Clover Plants   The ANOVA Procedure   Class Level Information   Class         Levels    Values   Strain             6    3DOK1 3DOK13 3DOK4 3DOK5 3DOK7 COMPOS   Number of Observations Read          30   Number of Observations Used          30

Figure 17.1: Class Level Information

  Dependent Variable: Nitrogen   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       5     847.046667     169.409333     14.37   <.0001   Error                      24     282.928000      11.788667   Corrected Total            29    1129.974667   R-Square     Coeff Var      Root MSE    Nitrogen Mean   0.749616      17.26515      3.433463         19.88667   Source                     DF       Anova SS    Mean Square   F Value   Pr > F   Strain                      5    847.0466667    169.4093333     14.37   <.0001

Figure 17.2: ANOVA Table

The 'Class Level Information' table shown in Figure 17.1 lists the variables that appear in the CLASS statement, their levels, and the number of observations in the data set.

Figure 17.2 displays the ANOVA table, followed by some simple statistics and tests of effects.

The degrees of freedom (DF) column should be used to check the analysis results. The model degrees of freedom for a one-way analysis of variance are the number of levels minus 1; in this case, 6 ˆ’ 1 = 5. The Corrected Total degrees of freedom are always the total number of observations minus one; in this case 30 ˆ’ 1 = 29. The sum of Model and Error degrees of freedom equal the Corrected Total.

The overall F test is significant ( F = 14.37 , p < 0.0001), indicating that the model as a whole accounts for a significant portion of the variability in the dependent variable. The F test for Strain is significant, indicating that some contrast between the means for the different strains is different from zero. Notice that the Model and Strain F tests are identical, since Strain is the only term in the model.

The F test for Strain ( F = 14.37, p < 0.0001) suggests that there are differences among the bacterial strains, but it does not reveal any information about the nature of the differences. Mean comparison methods can be used to gather further information. The interactivity of PROC ANOVA enables you to do this without re-running the entire analysis. After you specify a model with a MODEL statement and execute the ANOVA procedure with a RUN statement, you can execute a variety of statements (such as MEANS, MANOVA, TEST, and REPEATED) without PROC ANOVA recalculating the model sum of squares.

The following command requests means of the Strain levels with Tukey's studentized range procedure.

  means strain / tukey;

Results of Tukey's procedure are shown in Figure 17.3.

  The ANOVA Procedure   Tukey's Studentized Range (HSD) Test for Nitrogen   NOTE: This test controls the Type I experimentwise error rate, but it generally   has a higher Type II error rate than REGWQ.   Alpha                                   0.05   Error Degrees of Freedom                  24   Error Mean Square                   11.78867   Critical Value of Studentized Range  4.37265   Minimum Significant Difference        6.7142   Means with the same letter are not significantly different.   Tukey Grouping          Mean      N    Strain   A        28.820      5    3DOK1   A   B    A        23.980      5    3DOK5   B   B    C        19.920      5    3DOK7   B    C   B    C        18.700      5    COMPOS   C   C        14.640      5    3DOK4   C   C        13.260      5    3DOK13

Figure 17.3: Tukey's Multiple Comparisons Procedure

The multiple comparisons results indicate , for example, that

strain 3DOK1 fixes significantly more nitrogen than all but 3DOK5
even though 3DOK5 is not significantly different from 3DOK1, it is also not significantly better than all the rest

Although the experiment has succeeded in separating the best strains from the worst, clearly distinguishing the very best strain requires more experimentation.

The experimental graphics features of PROC ANOVA enable you to visualize the distribution of nitrogen content for each treatment.

  ods html;   ods graphics on;   proc anova data = Clover;   class strain;   model Nitrogen = Strain;   run;   ods graphics off;   ods html close;

When you specify the experimental ODS GRAPHICS statement and fit a one-way analysis of variance model, the ANOVA procedure output includes a box plot of the dependent variable values within each classification level of the independent variable. For general information about ODS graphics, see Chapter 15, 'Statistical Graphics Using ODS.' For specific information about the graphics available in the ANOVA procedure, see the section 'ODS Graphics' on page 460.

Figure 17.4: Box Plot of Nitrogen Content for each Treatment (Experimental)

Randomized Complete Block with One Factor

This example illustrates the use of PROC ANOVA in analyzing a randomized complete block design. Researchers are interested in whether three treatments have different effects on the yield and worth of a particular crop. They believe that the experimental units are not homogeneous. So, a blocking factor is introduced that allows the experimental units to be homogeneous within each block. The three treatments are then randomly assigned within each block.

The data from this study are input into the SAS data set RCB :

  title1 'Randomized Complete Block';   data RCB;   input Block Treatment $ Yield Worth @@;   datalines;   1 A 32.6 112  1 B 36.4  130 1 C 29.5 106   2 A 42.7 139  2 B 47.1  143 2 C 32.9 112   3 A 35.3 124  3 B 40.1  134 3 C 33.6 116   ;

The variables Yield and Worth are continuous response variables, and the variables Block and Treatment are the classification variables. Because the data for the analysis are balanced, you can use PROC ANOVA to run the analysis.

The statements for the analysis are

  proc anova data=RCB;   class Block Treatment;   model Yield Worth=Block Treatment;   run;

The Block and Treatment effects appear in the CLASS statement. The MODEL statement requests an analysis for each of the two dependent variables, Yield and Worth .

Figure 17.5 shows the 'Class Level Information' table.

  Randomized Complete Block   The ANOVA Procedure   Class Level Information   Class          Levels    Values   Block               3    1 2 3   Treatment           3    A B C   Number of Observations Read           9   Number of Observations Used           9

Figure 17.5: Class Level Information

The 'Class Level Information' table lists the number of levels and their values for all effects specified in the CLASS statement. The number of observations in the data set are also displayed. Use this information to make sure that the data have been read correctly.

The overall ANOVA table for Yield in Figure 17.6 appears first in the output because it is the first response variable listed on the left side in the MODEL statement.

  Dependent Variable: Yield   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       4    225.2777778     56.3194444      8.94   0.0283   Error                       4     25.1911111      6.2977778   Corrected Total             8    250.4688889   R-Square     Coeff Var      Root MSE    Yield Mean   0.899424      6.840047      2.509537      36.68889

Figure 17.6: Overall ANOVA Table for Yield

The overall F statistic is significant ( F = 8.94 , p = 0.02583), indicating that the model as a whole accounts for a significant portion of the variation in Yield and that you may proceed to tests of effects.

The degrees of freedom (DF) are used to ensure correctness of the data and model. The Corrected Total degrees of freedom are one less than the total number of observations in the data set; in this case, 9 ˆ’ 1 = 8. The Model degrees of freedom for a randomized complete block are ( b ˆ’ 1) + ( t ˆ’ 1), where b =number of block levels and t =number of treatment levels. In this case, (3 ˆ’ 1) + (3 ˆ’ 1) = 4.

Several simple statistics follow the ANOVA table. The R-Square indicates that the model accounts for nearly 90% of the variation in the variable Yield . The coefficient of variation (C.V.) is listed along with the Root MSE and the mean of the dependent variable. The Root MSE is an estimate of the standard deviation of the dependent variable. The C.V. is a unitless measure of variability.

The tests of the effects shown in Figure 17.7 are displayed after the simple statistics.

  Dependent Variable: Yield   Source                     DF       Anova SS    Mean Square   F Value   Pr > F   Block                       2     98.1755556     49.0877778      7.79   0.0417   Treatment                   2    127.1022222     63.5511111     10.09   0.0274

Figure 17.7: Tests of Effects for Yield

For Yield , both the Block and Treatment effects are significant ( F = 7.79, p = 0 . 0417 and F = 10.09, p = 0.0274, respectively) at the 95% level. From this you can conclude that blocking is useful for this variable and that some contrast between the treatment means is significantly different from zero.

Figure 17.8 shows the ANOVA table, simple statistics, and tests of effects for the variable Worth .

  Dependent Variable: Worth   Sum of   Source                     DF        Squares    Mean Square   F Value   Pr > F   Model                       4    1247.333333     311.833333      8.28   0.0323   Error                       4     150.666667      37.666667   Corrected Total             8    1398.000000   R-Square     Coeff Var      Root MSE    Worth Mean   0.892227      4.949450      6.137318      124.0000   Source                     DF       Anova SS    Mean Square   F Value   Pr > F   Block                       2    354.6666667    177.3333333      4.71   0.0889   Treatment                   2    892.6666667    446.3333333     11.85   0.0209

Figure 17.8: ANOVA Table for Worth

The overall F test is significant ( F = 8.28, p = 0.0323) at the 95% level for the variable Worth . The Block effect is not significant at the 0.05 level but is significant at the 0.10 confidence level ( F = 4.71, p = 0.0889). Generally, the usefulness of blocking should be determined before the analysis. However, since there are two dependent variables of interest, and Block is significant for one of them ( Yield ), blocking appears to be generally useful. For Worth , as with Yield , the effect of Treatment is significant ( F = 11.85, p = 0.0209).

Issuing the following command produces the Treatment means.

  means Treatment;   run;

Figure 17.9 displays the treatment means and their standard deviations for both dependent variables.

  The ANOVA Procedure   Level of          ------------Yield-----------    ------------Worth-----------   Treatment    N            Mean         Std Dev            Mean         Std Dev   A            3      36.8666667      5.22908532      125.000000      13.5277493   B            3      41.2000000      5.43415127      135.666667       6.6583281   C            3      32.0000000      2.19317122      111.333333       5.0332230

Figure 17.9: Means of Yield and Worth