The following examples demonstrate how you can use the ANOVA procedure to perform analyses of variance for a one-way layout and a randomized complete block design.
A one-way analysis of variance considers one treatment factor with two or more treatment levels. The goal of the analysis is to test for differences among the means of the levels and to quantify these differences. If there are two treatment levels, this analysis is equivalent to a t test comparing two group means.
The assumptions of analysis of variance (Steel and Torrie 1980) are
treatment effects are additive
experimental errors
are random
are independently distributed
follow a normal distribution
have mean zero and constant variance
The following example studies the effect of bacteria on the nitrogen content of red clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti . Red clover plants are inoculated with the treatments , and nitrogen content is later measured in milligrams. The data are derived from an experiment by Erdman (1946) and are analyzed in Chapters 7 and 8 of Steel and Torrie (1980). The following DATA step creates the SAS data set Clover :
title1 'Nitrogen Content of Red Clover Plants'; data Clover; input Strain $ Nitrogen @@; datalines; 3DOK1 19.4 3DOK1 32.6 3DOK1 27.0 3DOK1 32.1 3DOK1 33.0 3DOK5 17.7 3DOK5 24.8 3DOK5 27.9 3DOK5 25.2 3DOK5 24.3 3DOK4 17.0 3DOK4 19.4 3DOK4 9.1 3DOK4 11.9 3DOK4 15.8 3DOK7 20.7 3DOK7 21.0 3DOK7 20.5 3DOK7 18.8 3DOK7 18.6 3DOK13 14.3 3DOK13 14.4 3DOK13 11.8 3DOK13 11.6 3DOK13 14.2 COMPOS 17.3 COMPOS 19.4 COMPOS 19.1 COMPOS 16.9 COMPOS 20.8 ;
The variable Strain contains the treatment levels, and the variable Nitrogen contains the response. The following statements produce the analysis.
proc anova data = Clover; class strain; model Nitrogen = Strain; run;
The classification variable is specified in the CLASS statement. Note that, unlike the GLM procedure, PROC ANOVA does not allow continuous variables on the right-hand side of the model. Figure 17.1 and Figure 17.2 display the output produced by these statements.
Nitrogen Content of Red Clover Plants The ANOVA Procedure Class Level Information Class Levels Values Strain 6 3DOK1 3DOK13 3DOK4 3DOK5 3DOK7 COMPOS Number of Observations Read 30 Number of Observations Used 30
Dependent Variable: Nitrogen Sum of Source DF Squares Mean Square F Value Pr > F Model 5 847.046667 169.409333 14.37 <.0001 Error 24 282.928000 11.788667 Corrected Total 29 1129.974667 R-Square Coeff Var Root MSE Nitrogen Mean 0.749616 17.26515 3.433463 19.88667 Source DF Anova SS Mean Square F Value Pr > F Strain 5 847.0466667 169.4093333 14.37 <.0001
The 'Class Level Information' table shown in Figure 17.1 lists the variables that appear in the CLASS statement, their levels, and the number of observations in the data set.
Figure 17.2 displays the ANOVA table, followed by some simple statistics and tests of effects.
The degrees of freedom (DF) column should be used to check the analysis results. The model degrees of freedom for a one-way analysis of variance are the number of levels minus 1; in this case, 6 ˆ’ 1 = 5. The Corrected Total degrees of freedom are always the total number of observations minus one; in this case 30 ˆ’ 1 = 29. The sum of Model and Error degrees of freedom equal the Corrected Total.
The overall F test is significant ( F = 14.37 , p < 0.0001), indicating that the model as a whole accounts for a significant portion of the variability in the dependent variable. The F test for Strain is significant, indicating that some contrast between the means for the different strains is different from zero. Notice that the Model and Strain F tests are identical, since Strain is the only term in the model.
The F test for Strain ( F = 14.37, p < 0.0001) suggests that there are differences among the bacterial strains, but it does not reveal any information about the nature of the differences. Mean comparison methods can be used to gather further information. The interactivity of PROC ANOVA enables you to do this without re-running the entire analysis. After you specify a model with a MODEL statement and execute the ANOVA procedure with a RUN statement, you can execute a variety of statements (such as MEANS, MANOVA, TEST, and REPEATED) without PROC ANOVA recalculating the model sum of squares.
The following command requests means of the Strain levels with Tukey's studentized range procedure.
means strain / tukey;
Results of Tukey's procedure are shown in Figure 17.3.
The ANOVA Procedure Tukey's Studentized Range (HSD) Test for Nitrogen NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 24 Error Mean Square 11.78867 Critical Value of Studentized Range 4.37265 Minimum Significant Difference 6.7142 Means with the same letter are not significantly different. Tukey Grouping Mean N Strain A 28.820 5 3DOK1 A B A 23.980 5 3DOK5 B B C 19.920 5 3DOK7 B C B C 18.700 5 COMPOS C C 14.640 5 3DOK4 C C 13.260 5 3DOK13
The multiple comparisons results indicate , for example, that
strain 3DOK1 fixes significantly more nitrogen than all but 3DOK5
even though 3DOK5 is not significantly different from 3DOK1, it is also not significantly better than all the rest
Although the experiment has succeeded in separating the best strains from the worst, clearly distinguishing the very best strain requires more experimentation.
The experimental graphics features of PROC ANOVA enable you to visualize the distribution of nitrogen content for each treatment.
ods html; ods graphics on; proc anova data = Clover; class strain; model Nitrogen = Strain; run; ods graphics off; ods html close;
When you specify the experimental ODS GRAPHICS statement and fit a one-way analysis of variance model, the ANOVA procedure output includes a box plot of the dependent variable values within each classification level of the independent variable. For general information about ODS graphics, see Chapter 15, 'Statistical Graphics Using ODS.' For specific information about the graphics available in the ANOVA procedure, see the section 'ODS Graphics' on page 460.
This example illustrates the use of PROC ANOVA in analyzing a randomized complete block design. Researchers are interested in whether three treatments have different effects on the yield and worth of a particular crop. They believe that the experimental units are not homogeneous. So, a blocking factor is introduced that allows the experimental units to be homogeneous within each block. The three treatments are then randomly assigned within each block.
The data from this study are input into the SAS data set RCB :
title1 'Randomized Complete Block'; data RCB; input Block Treatment $ Yield Worth @@; datalines; 1 A 32.6 112 1 B 36.4 130 1 C 29.5 106 2 A 42.7 139 2 B 47.1 143 2 C 32.9 112 3 A 35.3 124 3 B 40.1 134 3 C 33.6 116 ;
The variables Yield and Worth are continuous response variables, and the variables Block and Treatment are the classification variables. Because the data for the analysis are balanced, you can use PROC ANOVA to run the analysis.
The statements for the analysis are
proc anova data=RCB; class Block Treatment; model Yield Worth=Block Treatment; run;
The Block and Treatment effects appear in the CLASS statement. The MODEL statement requests an analysis for each of the two dependent variables, Yield and Worth .
Figure 17.5 shows the 'Class Level Information' table.
Randomized Complete Block The ANOVA Procedure Class Level Information Class Levels Values Block 3 1 2 3 Treatment 3 A B C Number of Observations Read 9 Number of Observations Used 9
The 'Class Level Information' table lists the number of levels and their values for all effects specified in the CLASS statement. The number of observations in the data set are also displayed. Use this information to make sure that the data have been read correctly.
The overall ANOVA table for Yield in Figure 17.6 appears first in the output because it is the first response variable listed on the left side in the MODEL statement.
Dependent Variable: Yield Sum of Source DF Squares Mean Square F Value Pr > F Model 4 225.2777778 56.3194444 8.94 0.0283 Error 4 25.1911111 6.2977778 Corrected Total 8 250.4688889 R-Square Coeff Var Root MSE Yield Mean 0.899424 6.840047 2.509537 36.68889
The overall F statistic is significant ( F = 8.94 , p = 0.02583), indicating that the model as a whole accounts for a significant portion of the variation in Yield and that you may proceed to tests of effects.
The degrees of freedom (DF) are used to ensure correctness of the data and model. The Corrected Total degrees of freedom are one less than the total number of observations in the data set; in this case, 9 ˆ’ 1 = 8. The Model degrees of freedom for a randomized complete block are ( b ˆ’ 1) + ( t ˆ’ 1), where b =number of block levels and t =number of treatment levels. In this case, (3 ˆ’ 1) + (3 ˆ’ 1) = 4.
Several simple statistics follow the ANOVA table. The R-Square indicates that the model accounts for nearly 90% of the variation in the variable Yield . The coefficient of variation (C.V.) is listed along with the Root MSE and the mean of the dependent variable. The Root MSE is an estimate of the standard deviation of the dependent variable. The C.V. is a unitless measure of variability.
The tests of the effects shown in Figure 17.7 are displayed after the simple statistics.
Dependent Variable: Yield Source DF Anova SS Mean Square F Value Pr > F Block 2 98.1755556 49.0877778 7.79 0.0417 Treatment 2 127.1022222 63.5511111 10.09 0.0274
For Yield , both the Block and Treatment effects are significant ( F = 7.79, p = 0 . 0417 and F = 10.09, p = 0.0274, respectively) at the 95% level. From this you can conclude that blocking is useful for this variable and that some contrast between the treatment means is significantly different from zero.
Figure 17.8 shows the ANOVA table, simple statistics, and tests of effects for the variable Worth .
Dependent Variable: Worth Sum of Source DF Squares Mean Square F Value Pr > F Model 4 1247.333333 311.833333 8.28 0.0323 Error 4 150.666667 37.666667 Corrected Total 8 1398.000000 R-Square Coeff Var Root MSE Worth Mean 0.892227 4.949450 6.137318 124.0000 Source DF Anova SS Mean Square F Value Pr > F Block 2 354.6666667 177.3333333 4.71 0.0889 Treatment 2 892.6666667 446.3333333 11.85 0.0209
The overall F test is significant ( F = 8.28, p = 0.0323) at the 95% level for the variable Worth . The Block effect is not significant at the 0.05 level but is significant at the 0.10 confidence level ( F = 4.71, p = 0.0889). Generally, the usefulness of blocking should be determined before the analysis. However, since there are two dependent variables of interest, and Block is significant for one of them ( Yield ), blocking appears to be generally useful. For Worth , as with Yield , the effect of Treatment is significant ( F = 11.85, p = 0.0209).
Issuing the following command produces the Treatment means.
means Treatment; run;
Figure 17.9 displays the treatment means and their standard deviations for both dependent variables.
The ANOVA Procedure Level of ------------Yield----------- ------------Worth----------- Treatment N Mean Std Dev Mean Std Dev A 3 36.8666667 5.22908532 125.000000 13.5277493 B 3 41.2000000 5.43415127 135.666667 6.6583281 C 3 32.0000000 2.19317122 111.333333 5.0332230