Examples | SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)

Example 34.1. One-Way ANOVA

This example deals with the same situation as in Example 57.1 on page 3536 of Chapter 57, The POWER Procedure.

Hocking (1985, p. 109) describes a study of the effectiveness of electrolytes in reducing lactic acid buildup for long-distance runners. You are planning a similar study in which you will allocate five different fluids to runners on a 10-mile course and measure lactic acid buildup immediately after the race. The fluids consist of water and two commercial electrolyte drinks, EZDure and LactoZap, each prepared at two concentrations, low (EZD1 and LZ1) and high (EZD2 and LZ2).

You conjecture that the standard deviation of lactic acid measurements given any particular fluid is about 3.75, and that the expected lactic acid values will correspond roughly to Table 34.7. You are least familiar with the LZ1 drink and hence decide to consider a range of reasonable values for that mean.

Table 34.7: Mean Lactic Acid Buildup by Fluid
Water	EZD1	EZD2	LZ1	LZ2
35.6	33.7	30.2	29 or 28	25.9

You are interested in four different comparisons, shown in Table 34.8 with appropriate contrast coefficients.

Table 34.8: Planned Comparisons
Contrast	Coefficients
Comparison	Water	EZD1	EZD2	LZ1	LZ2
Water versus electrolytes	4	ˆ’ 1	ˆ’ 1	ˆ’ 1	ˆ’ 1
EZD versus LZ		1	1	ˆ’ 1	ˆ’ 1
EZD1 versus EZD2		1	ˆ’ 1
LZ1 versus LZ2				1	ˆ’ 1

For each of these contrasts you want to determine the sample size required to achieve a power of 0.9 for detecting an effect with magnitude in accord with Table 34.7. You are not yet attempting to choose a single sample size for the study, but rather checking the range of sample sizes needed for individual contrasts. You plan to test each contrast at ± = 0 . 025. In the interests of reducing costs, you will provide twice as many runners with water as with any of the electrolytes; that is, you will use a sample size weighting scheme of 2:1:1:1:1.

Before calling PROC GLMPOWER, you need to create the exemplary data set to specify means and weights for the design profiles:

  data Fluids;   input Fluid $ LacticAcid1 LacticAcid2 CellWgt;   datalines;   Water      35.6        35.6        2   EZD1       33.7        33.7        1   EZD2       30.2        30.2        1   LZ1        29          28          1   LZ2        25.9        25.9        1   ;   run;

The variable LacticAcid1 represents the cell means scenario with the larger LZ1 mean (29), and LacticAcid2 represents the scenario with the smaller LZ1 mean (28). The variable CellWgt contains the sample size allocation weights.

Use the DATA= option in the PROC GLMPOWER statement to specify Fluids as the exemplary data set. The following statements perform the sample size analysis:

  proc glmpower data=Fluids;   class Fluid;   model LacticAcid1 LacticAcid2 = Fluid;   weight CellWgt;   contrast "Water vs. others" Fluid   1   1   1   1 4;   contrast "EZD vs. LZ"       Fluid   1  1   1   1 0;   contrast "EZD1 vs. EZD2"    Fluid   1   1  0  0 0;   contrast "LZ1 vs. LZ2"      Fluid   0  0  1 -1 0;   power   stddev = 3.75   alpha  = 0.025   ntotal = .   power  = 0.9;   run;

The CLASS statement identifies Fluid as a classification variable. The MODEL statement specifies the model and the two cell means scenarios LacticAcid1 and LacticAcid2 . The WEIGHT statement identifies CellWgt as the weight variable. The CONTRAST statement specifies the contrasts. Since PROC GLMPOWER processes class levels in order of formatted values, the contrast coefficients correspond to the following order: EZD1, EZD2, LZ1, LZ2, Water. The POWER statement specifies total sample size as the result parameter and provides values for the other analysis parameters (error standard deviation, alpha, and power).

Output 34.1.1 displays the results.

Output 34.1.1: Sample Sizes for One-Way ANOVA Contrasts

  The GLMPOWER Procedure   Fixed Scenario Elements   Weight Variable                  CellWgt   Alpha                              0.025   Error Standard Deviation            3.75   Nominal Power                        0.9   Computed N Total   Test  Error  Actual      N   Index   Dependent     Type         Source         DF     DF   Power  Total   1 LacticAcid1 Effect      Fluid                4     25   0.958     30   2 LacticAcid1 Contrast    Water vs. others     1     25   0.947     30   3 LacticAcid1 Contrast    EZD vs. LZ           1     55   0.929     60   4 LacticAcid1 Contrast    EZD1 vs. EZD2        1    169   0.901    174   5 LacticAcid1 Contrast    LZ1 vs. LZ2          1    217   0.902    222   6 LacticAcid2 Effect      Fluid                4     25   0.972     30   7 LacticAcid2 Contrast    Water vs. others     1     19   0.901     24   8 LacticAcid2 Contrast    EZD vs. LZ           1     43   0.922     48   9 LacticAcid2 Contrast    EZD1 vs. EZD2        1    169   0.901    174   10 LacticAcid2 Contrast    LZ1 vs. LZ2          1    475   0.902    480

The sample sizes range from 24 for the comparison of water versus electrolytes to 480 for the comparison of LZ1 versus LZ2, both assuming the smaller LZ1 mean. The sample size for the latter comparison is relatively large because the small mean difference of 28 ˆ’ 25 . 9 = 2 . 1 is hard to detect. PROC GLMPOWER also includes the effect test for Fluid . Note that, in this case, it is equivalent to TEST=OVERALL_ F in the ONEWAYANOVA statement of PROC POWER, since there is only one effect in the model.

The Nominal Power of 0.9 in the Fixed Scenario Elements table in Output 34.1.1 represents the input target power, and the Actual Power column in the Computed N Total table is the power at the sample size (N Total) adjusted to achieve the specified sample weighting. Note that all of the sample sizes are rounded up to multiples of 6 to preserve integer group sizes (since the group weights add up to 6). You can use the NFRACTIONAL option in the POWER statement to compute raw fractional sample sizes.

Suppose you want to plot the required sample size for the range of power values from 0.5 to 0.95. First, define the analysis by specifying the same statements as before, but add the PLOTONLY option to the PROC GLMPOWER statement to disable the nongraphical results. Next, specify the PLOT statement with X=POWER to request a plot with power on the x-axis. (The result parameter, here sample size, is always plotted on the other axis.) Use the MIN= and MAX= options in the PLOT statement to specify the power range.

  proc glmpower data=Fluids plotonly;   class Fluid;   model LacticAcid1 LacticAcid2 = Fluid;   weight CellWgt;   contrast "Water vs. others" Fluid   1   1   1   1 4;   contrast "EZD vs. LZ"       Fluid   1  1   1   1 0;   contrast "EZD1 vs. EZD2"    Fluid   1   1  0  0 0;   contrast "LZ1 vs. LZ2"      Fluid   0  0  1   1 0;   power   stddev = 3.75   alpha  = 0.025   ntotal = .   power  = 0.9;   plot x=power min=.5 max=.95;   run;

See Output 34.1.2 for the resulting plot.

Output 34.1.2: Plot of Sample Size versus Power for One-Way ANOVA Contrasts

In Output 34.1.2, the line style identifies the test, and the plotting symbol identifies the cell means scenario. The plotting symbol locations identify actual computed powers; the curves are linear interpolations of these points. The plot shows that the required sample size is highest for the test of LZ1 versus LZ2 that was previously found to require the most resources, in either cell means scenario.

Note that some of the plotted points in Output 34.1.2 are unevenly spaced . This is because the plotted points are the rounded sample size results at their corresponding actual power levels. The range specified with the MIN= and MAX= values in the PLOT statement correspond to nominal power levels. In some cases, actual power is substantially higher than nominal power. To obtain plots with evenly spaced points (but with fractional sample sizes at the computed points), you can use the NFRACTIONAL option in the POWER statement preceding the PLOT statement.

Finally, suppose you want to plot the power for the range of sample sizes you will likely consider for the study (the range of 24 to 480 that achieves 0.9 power for different comparisons). In the POWER statement, identify power as the result (POWER=.), and specify NTOTAL=24. Specify the PLOT statement with X=N to request a plot with sample size on the x-axis.

  proc glmpower data=Fluids plotonly;   class Fluid;   model LacticAcid1 LacticAcid2 = Fluid;   weight CellWgt;   contrast "Water vs. others" Fluid  -1 -1 -1 -1 4;   contrast "EZD vs. LZ"       Fluid   1  1 -1 -1 0;   contrast "EZD1 vs. EZD2"    Fluid   1 -1  0  0 0;   contrast "LZ1 vs. LZ2"      Fluid   0  0  1 -1 0;   power   stddev = 3.75   alpha = 0.025   ntotal = 24   power = .;   plot x=n min=24 max=480;   run;

Note that the value specified with the NTOTAL=24 option is not used. It is overridden in the plot by the MIN= and MAX= options in the PLOT statement, and the PLOTONLY option in the PROC GLMPOWER statement disables nongraphical results. But the NTOTAL= option (along with a value) is still needed in the POWER statement as a placeholder, to identify the desired parameterization for sample size.

See Output 34.1.3 for the plot.

Output 34.1.3: Plot of Power versus Sample Size for One-Way ANOVA Contrasts

Although Output 34.1.2 and Output 34.1.3 surface essentially the same computations for practical power ranges, they each provide a different quick visual assessment. Output 34.1.2 reveals the range of required sample sizes for powers of interest, and Output 34.1.3 reveals the range of achieved powers for sample sizes of interest.

Example 34.2. Two-Way ANOVA with Covariate

Suppose you can enhance the planned study discussed in Example 34.1 on page 1951 in two ways:

Incorporate results from races at two different altitudes (high and low).
Measure the body mass index of each runner before the race.

This is equivalent to adding a second fixed effect and a continuous covariate to your model.

Since lactic acid buildup is more pronounced at higher altitudes, you will include altitude as a factor in the model along with fluid, extending the one-way ANOVA to a two-way ANOVA. In doing so, you expect to lower the residual standard deviation from about 3.75 to 3.5 (in addition to generalizing the study results). You assume there is negligible interaction between fluid and altitude and plan to use a main-effects -only model. You conjecture that the mean lactic acid buildup follows Table 34.9.

Table 34.9: Mean Lactic Acid Buildup by Fluid and Altitude
	Fluid
Altitude	Water	EZD1	EZD2	LZ1	LZ2
High	36.9	35.0	31.5	30	27.1
Low	34.3	32.4	28.9	27	24.7

By including a measurement of body mass index as a covariate in the study, you hope to further reduce the error variability. The extent of this reduction in variability is commonly expressed in two alternative ways: (1) the correlation between the covariates and the response or (2) the proportional reduction in total R ² incurred by the covariates. You prefer the former and guess that the correlation between body mass index and lactic acid buildup is between 0.2 and 0.3. You specify these estimates with the NCOVARIATES= and CORRXY= options in the POWER statement. The covariate is not included in the MODEL statement.

You are interested in the same four fluid comparisons as in Example 34.1,shownin Table 34.8 on page 1951, except this time you want to marginalize over the effect of altitude.

For each of these contrasts, you want to determine the sample size required to achieve a power of 0.9 to detect an effect with magnitude according to Table 34.9. You are not yet attempting to choose a single sample size for the study, but rather checking the range of sample sizes needed by individual contrasts. You plan to test each contrast at ± = 0 . 025. You will provide twice as many runners with water as with any of the electrolytes, and you predict that you can study approximately 2/3 as many runners at the high altitude than at the low altitude. The resulting planned sample size weighting scheme is shown in Table 34.10. Since the scheme is only approximate, you use the NFRACTIONAL option in the POWER statement to disable the rounding of sample sizes up to integers satisfying the weights exactly.

Table 34.10: Approximate Sample Size Allocation Weights
	Fluid
Altitude	Water	EZD1	EZD2	LZ1	LZ2
High	4	2	2	2	2
Low	6	3	3	3	3

First, you create the exemplary data set to specify means and weights for the design profiles:

  data Fluids2;   input Altitude $ Fluid $ LacticAcid CellWgt;   datalines;   High       Water      36.9       4   High       EZD1       35.0       2   High       EZD2       31.5       2   High       LZ1        30         2   High       LZ2        27.1       2   Low        Water      34.3       6   Low        EZD1       32.4       3   Low        EZD2       28.9       3   Low        LZ1        27         3   Low        LZ2        24.7       3   ;   run;

The variables Altitude , Fluid , and LacticAcid specify the factors and cell means in Table 34.9. The variable CellWgt contains the sample size allocation weights in Table 34.10.

Use the DATA= option in the PROC GLMPOWER statement to specify Fluids2 as the exemplary data set. The following statements perform the sample size analysis:

  proc glmpower data=Fluids2;   class Altitude Fluid;   model LacticAcid = Altitude Fluid;   weight CellWgt;   contrast "Water vs. others" Fluid   1   1   1   1 4;   contrast "EZD vs. LZ"       Fluid  1  1   1   1 0;   contrast "EZD1 vs. EZD2"    Fluid  1   1  0  0 0;   contrast "LZ1 vs. LZ2"      Fluid  0  0  1   1 0;   power   nfractional   stddev      = 3.5   ncovariates = 1   corrxy      = 0.2 0.3 0   alpha       = 0.025   ntotal      = .   power       = 0.9;   run;

The CLASS statement identifies Altitude and Fluid as classification variables. The MODEL statement specifies the model, and the WEIGHT statement identifies CellWgt as the weight variable. The CONTRAST statement specifies the contrasts in Table 34.8 on page 1951. As in Example 34.1, the order of the contrast coefficients corresponds to the formatted class levels (EZD1, EZD2, LZ1, LZ2, Water). The POWER statement specifies total sample size as the result parameter and provides values for the other analysis parameters. The NCOVARIATES= option specifies the single covariate (body mass index), and the CORRXY= option specifies the two scenarios for its correlation with lactic acid buildup (0.2 and 0.3). Output 34.2.1 displays the results.

Output 34.2.1: Sample Sizes for Two-Way ANOVA Contrasts

  The GLMPOWER Procedure   Fixed Scenario Elements   Dependent Variable                        LacticAcid   Weight Variable                              CellWgt   Alpha                                          0.025   Number of Covariates                               1   Std Dev Without Covariate Adjustment             3.5   Nominal Power                                    0.9   Computed Ceiling N Total   Adj   Corr    Std  Test  Error      Fractional   Index   Type          Source         XY    Dev    DF     DF       N Total   1 Effect    Altitude            0.2   3.43     1     84     90.418451   2 Effect    Altitude            0.3   3.34     1     79     85.862649   3 Effect    Altitude            0.0   3.50     1     88     94.063984   4 Effect    Fluid               0.2   3.43     4     16     22.446173   5 Effect    Fluid               0.3   3.34     4     15     21.687544   6 Effect    Fluid               0.0   3.50     4     17     23.055716   7 Contrast  Water vs. others    0.2   3.43     1     15     21.720195   8 Contrast  Water vs. others    0.3   3.34     1     14     20.848805   9 Contrast  Water vs. others    0.0   3.50     1     16     22.422381   10 Contrast  EZD vs. LZ          0.2   3.43     1     35     41.657424   11 Contrast  EZD vs. LZ          0.3   3.34     1     33     39.674037   12 Contrast  EZD vs. LZ          0.0   3.50     1     37     43.246415   13 Contrast  EZD1 vs. EZD2       0.2   3.43     1    139    145.613657   14 Contrast  EZD1 vs. EZD2       0.3   3.34     1    132    138.173983   15 Contrast  EZD1 vs. EZD2       0.0   3.50     1    145    151.565917   16 Contrast  LZ1 vs. LZ2         0.2   3.43     1    268    274.055008   17 Contrast  LZ1 vs. LZ2         0.3   3.34     1    253    259.919126   18 Contrast  LZ1 vs. LZ2         0.0   3.50     1    279    285.363976   Computed Ceiling N Total   Actual    Ceiling   Index   Power    N Total   1   0.902         91   2   0.901         86   3   0.903         95   4   0.912         23   5   0.908         22   6   0.919         24   7   0.905         22   8   0.903         21   9   0.910         23   10   0.903         42   11   0.903         40   12   0.906         44   13   0.901        146   14   0.902        139   15   0.901        152   16   0.901        275   17   0.900        260   18   0.901        286

The sample sizes in Output 34.2.1 range from 21 for the comparison of water versus electrolytes (assuming a correlation of 0.3 between body mass and lactic acid buildup) to 275 for the comparison of LZ1 versus LZ2 (assuming a correlation of 0.2). PROC GLMPOWER also includes the effect tests for Altitude and Fluid . Note that the required sample sizes for this study are lower than those for the study in Example 34.1.

Note that the error standard deviation has been reduced from 3.5 to 3.43 (when correlation is 0.2) or 3.34 (when correlation is 0.3) in the approximation of the effect of the body mass index covariate. The error degrees of freedom has also been automatically adjusted, lowered by 1 (the number of covariates).

Suppose you want to plot the required sample size for the range of power values from 0.5 to 0.95. First, define the analysis by specifying the same statements as before, but add the PLOTONLY option to the PROC GLMPOWER statement to disable the nongraphical results. Next, specify the PLOT statement with X=POWER to request a plot with power on the x-axis. Sample size is automatically placed on the y-axis. Use the MIN= and MAX= options in the PLOT statement to specify the power range.

  proc glmpower data=Fluids2 plotonly;   class Altitude Fluid;   model LacticAcid = Altitude Fluid;   weight CellWgt;   contrast "Water vs. others" Fluid   1   1   1   1 4;   contrast "EZD vs. LZ"       Fluid  1  1   1   1 0;   contrast "EZD1 vs. EZD2"    Fluid  1   1  0  0 0;   contrast "LZ1 vs. LZ2"      Fluid  0  0  1   1 0;   power   nfractional   stddev      = 3.5   ncovariates = 1   corrxy      = 0.2 0.3 0   alpha       = 0.025   ntotal      = .   power       = 0.9;   plot x=power min=.5 max=.95;   run;

See Output 34.2.2 for the plot.

Output 34.2.2: Plot of Sample Size versus Power for Two-Way ANOVA Contrasts

In Output 34.1.2, the line style identifies the test, and the plotting symbol identifies the scenario for the correlation between covariate and response. The plotting symbol locations identify actual computed powers; the curves are linear interpolations of these points. As in Example 34.1, the required sample size is highest for the test of LZ1 versus LZ2.

Finally, suppose you want to plot the power for the range of sample sizes you will likely consider for the study (the range of 21 to 275 that achieves 0.9 power for different comparisons). In the POWER statement, identify power as the result (POWER=.), and specify NTOTAL=21. Specify the PLOT statement with X=N to request a plot with sample size on the x-axis.

  proc glmpower data=Fluids2 plotonly;   class Altitude Fluid;   model LacticAcid = Altitude Fluid;   weight CellWgt;   contrast "Water vs. others" Fluid   1   1   1   1 4;   contrast "EZD vs. LZ"       Fluid   1  1   1   1 0;   contrast "EZD1 vs. EZD2"    Fluid   1   1  0  0 0;   contrast "LZ1 vs. LZ2"      Fluid   0  0  1   1 0;   power   nfractional   stddev      = 3.5   ncovariates = 1   corrxy      = 0.2 0.3 0   alpha       = 0.025   ntotal      = 21   power       = .;   plot x=n min=21 max=275;   run;

The MAX=275 option in the PLOT statement sets the maximum sample size value. The MIN= option automatically defaults to the value of 21 from the NTOTAL= option in the POWER statement.

See Output 34.2.3 for the plot.

Output 34.2.3: Plot of Power versus Sample Size for Two-Way ANOVA Contrasts

Although Output 34.2.2 and Output 34.2.3 surface essentially the same computations for practical power ranges, they each provide a different quick visual assessment. Output 34.2.2 reveals the range of required sample sizes for powers of interest, and Output 34.2.3 reveals the range of powers achieved for sample sizes of interest.