The following example, reported by Stenstrom (1940), analyzes an experiment to investigate how snapdragons grow in various soils. To eliminate the effect of local fertility variations, the experiment is run in blocks, with each soil type sampled in each block. Since these data are balanced, the Type I and Type III SS are the same and are equal to the traditional ANOVA SS.
First, the standard analysis is shown followed by an analysis that uses the SOLUTION option and includes MEANS and CONTRAST statements. The ORDER=DATA option in the second PROC GLM statement is used so that the ordering of coefficients in the CONTRAST statement can correspond to the ordering in the input data. The SOLUTION option requests a display of the parameter estimates, which are only produced by default if there are no CLASS variables . A MEANS statement is used to request a table of the means with two multiple comparison procedures requested . In experiments with focused treatment questions, CONTRAST statements are preferable to general means comparison methods . The following statements produce Output 32.1.1 through Output 32.1.6:
title 'Balanced Data from Randomized Complete Block'; data plants; input Type $ @; do Block=1to3; input StemLength @; output; end; datalines; Clarion 32.7 32.3 31.5 Clinton 32.1 29.7 29.1 Knox 35.7 35.9 33.1 O'Neill 36.0 34.2 31.2 Compost 31.8 28.0 29.2 Wabash 38.2 37.8 31.9 Webster 32.5 31.1 29.7 ; proc glm; class Block Type; model StemLength = Block Type; run; proc glm order=data; class Block Type; model StemLength = Block Type / solution; /*----------------------------------clrn-cltn-knox-onel-cpst-wbsh-wstr */ contrast 'Compost vs. others' Type 1 1 1 1 6 1 1; contrast 'River soils vs. non' Type 1 1 1 1 0 5 1, Type 1 4 1 1 0 0 1; contrast 'Glacial vs. drift' Type 1 0 1 1 0 0 1; contrast 'Clarion vs. Webster' Type 1 0 0 0 0 0 1; contrast "Knox vs. O'Neill" Type 0 0 1 1 0 0 0; run; means Type / waller regwq; run;
Balanced Data from Randomized Complete Block The GLM Procedure Class Level Information Class Levels Values Block 3 1 2 3 Type 7 Clarion Clinton Compost Knox ONeill Wabash Webster Number of Observations Read 21 Number of Observations Used 21
Balanced Data from Randomized Complete Block The GLM Procedure Dependent Variable: StemLength Sum of Source DF Squares Mean Square F Value Pr > F Model 8 142.1885714 17.7735714 10.80 0.0002 Error 12 19.7428571 1.6452381 Corrected Total 20 161.9314286 R-Square Coeff Var Root MSE StemLength Mean 0.878079 3.939745 1.282668 32.55714 Source DF Type I SS Mean Square F Value Pr > F Block 2 39.0371429 19.5185714 11.86 0.0014 Type 6 103.1514286 17.1919048 10.45 0.0004 Source DF Type III SS Mean Square F Value Pr > F Block 2 39.0371429 19.5185714 11.86 0.0014 Type 6 103.1514286 17.1919048 10.45 0.0004
Balanced Data from Randomized Complete Block The GLM Procedure Class Level Information Class Levels Values Block 3 1 2 3 Type 7 Clarion Clinton Compost Knox O'Neill Wabash Webster Number of Observations Read 21 Number of Observations Used 21
Balanced Data from Randomized Complete Block The GLM Procedure Dependent Variable: StemLength Contrast DF Contrast SS Mean Square F Value Pr > F Compost vs. others 1 29.24198413 29.24198413 17.77 0.0012 River soils vs. non 2 48.24694444 24.12347222 14.66 0.0006 Glacial vs. drift 1 22.14083333 22.14083333 13.46 0.0032 Clarion vs. Webster 1 1.70666667 1.70666667 1.04 0.3285 Knox vs. ONeill 1 1.81500000 1.81500000 1.10 0.3143 Standard Parameter Estimate Error t Value Pr > t Intercept 29.35714286 B 0.83970354 34.96 <.0001 Block 1 3.32857143 B 0.68561507 4.85 0.0004 Block 2 1.90000000 B 0.68561507 2.77 0.0169 Block 3 0.00000000 B . . . Type Clarion 1.06666667 B 1.04729432 1.02 0.3285 Type Clinton 0.80000000 B 1.04729432 0.76 0.4597 Type Knox 3.80000000 B 1.04729432 3.63 0.0035 Type O'Neill 2.70000000 B 1.04729432 2.58 0.0242 Type Compost 1.43333333 B 1.04729432 1.37 0.1962 Type Wabash 4.86666667 B 1.04729432 4.65 0.0006 Type Webster 0.00000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
Balanced Data from Randomized Complete Block The GLM Procedure Waller-Duncan K-ratio t Test for StemLength NOTE: This test minimizes the Bayes risk under additive loss and certain other assumptions. Kratio 100 Error Degrees of Freedom 12 Error Mean Square 1.645238 F Value 10.45 Critical Value of t 2.12034 Minimum Significant Difference 2.2206 Means with the same letter are not significantly different. Waller Grouping Mean N Type A 35.967 3 Wabash A A 34.900 3 Knox A B A 33.800 3 O'Neill B B C 32.167 3 Clarion C D C 31.100 3 Webster D C D C 30.300 3 Clinton D D 29.667 3 Compost
Balanced Data from Randomized Complete Block The GLM Procedure Ryan-Einot-Gabriel-Welsch Multiple Range Test for StemLength NOTE: This test controls the Type I experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 12 Error Mean Square 1.645238 Number of Means 2 3 4 5 6 7 Critical Range 2.9876649 3.2838329 3.4396257 3.5402242 3.5402242 3.6653734 Means with the same letter are not significantly different. REGWQ Grouping Mean N Type A 35.967 3 Wabash A B A 34.900 3 Knox B A B A C 33.800 3 ONeill B C B D C 32.167 3 Clarion D C D C 31.100 3 Webster D D 30.300 3 Clinton D D 29.667 3 Compost
This analysis shows that the stem length is significantly different for the different soil types. In addition, there are significant differences in stem length between the three blocks in the experiment.
The GLM procedure is invoked again, this time with the ORDER=DATA option. This enables you to write accurate contrast statements more easily because you know the order SAS is using for the levels of the variable Type . The standard analysis is displayed again.
the stem length of plants grown in compost soil is significantly different from the average stem length of plants grown in other soils
the stem length of plants grown in river soils is significantly different from the average stem length of those grown in nonriver soils
the average stem length of plants grown in glacial soils (Clarion and Webster) is significantly different from the average stem length of those grown in drift soils (Knox and O Neill)
stem lengths for Clarion and Webster are not significantly different
stem lengths for Knox and O Neill are not significantly different
In addition to the estimates for the parameters of the model, the results of t tests about the parameters are also displayed. The ˜B following the parameter estimates indicates that the estimates are biased and do not represent a unique solution to the normal equations.
The final two pages of output ( Output 32.1.5 and Output 32.1.6) present results of the Waller-Duncan and REGWQ multiple comparison procedures. For each test, notes and information pertinent to the test are given on the output. The Type means are arranged from highest to lowest . Means with the same letter are not significantly different. For this example, while some pairs of means are significantly different, there are no clear equivalence classes among the different soils.
A car is tested for gas mileage at various speeds to determine at what speed the car achieves the greatest gas mileage. A quadratic model is fit to the experimental data. The following statements produce Output 32.2.1 through Output 32.2.5:
title 'Gasoline Mileage Experiment'; data mileage; input mph mpg @@; datalines; 20 15.4 30 20.2 40 25.7 50 26.2 50 26.6 50 27.4 55 . 60 24.8 ; proc glm; model mpg=mph mph*mph / p clm; output out=pp p=mpgpred r=resid; axis1 minor=none major=(number=5); axis2 minor=none major=(number=8); symbol1 c=black i=none v=plus; symbol2 c=black i=spline v=none; proc gplot data=pp; plot mpg*mph=1 mpgpred*mph=2 / overlay haxis=axis1 vaxis=axis2; run;
Gasoline Mileage Experiment The GLM Procedure Number of Observations Read 8
Gasoline Mileage Experiment The GLM Procedure Dependent Variable: mpg Sum of Source DF Squares Mean Square F Value Pr > F Model 2 111.8086183 55.9043091 77.96 0.0006 Error 4 2.8685246 0.7171311 Corrected Total 6 114.6771429 R-Square Coeff Var Root MSE mpg Mean 0.974986 3.564553 0.846836 23.75714 Source DF Type I SS Mean Square F Value Pr > F mph 1 85.64464286 85.64464286 119.43 0.0004 mph*mph 1 26.16397541 26.16397541 36.48 0.0038 Source DF Type III SS Mean Square F Value Pr > F mph 1 41.01171219 41.01171219 57.19 0.0016 mph*mph 1 26.16397541 26.16397541 36.48 0.0038 Standard Parameter Estimate Error t Value Pr > t Intercept 5.985245902 3.18522249 1.88 0.1334 mph 1.305245902 0.17259876 7.56 0.0016 mph*mph 0.013098361 0.00216852 6.04 0.0038
1 15.40000000 14.88032787 0.51967213 2 20.20000000 21.38360656 1.18360656 3 25.70000000 25.26721311 0.43278689 4 26.20000000 26.53114754 0.33114754 5 26.60000000 26.53114754 0.06885246 6 27.40000000 26.53114754 0.86885246 7 * . 26.18073770 . 8 24.80000000 25.17540984 0.37540984 95% Confidence Limits for Observation Mean Predicted Value 1 12.69701317 17.06364257 2 20.01727192 22.74994119 3 23.87460041 26.65982582 4 25.44573423 27.61656085 5 25.44573423 27.61656085 6 25.44573423 27.61656085 7 * 24.88679308 27.47468233 8 23.05954977 27.29126990 * Observation was not used in this analysis
Gasoline Mileage Experiment The GLM Procedure Sum of Residuals 0.00000000 Sum of Squared Residuals 2.86852459 Sum of Squared Residuals - Error SS 0.00000000 PRESS Statistic 23.18107335 First Order Autocorrelation 0.54376613 Durbin-Watson D 2.94425592
The overall F statistic is significant. The tests of mph and mph * mph in the Type I sums of squares show that both the linear and quadratic terms in the regression model are significant. The model fits well, with an R 2 of 0.97. The table of parameter estimates indicates that the estimated regression equation is
The P and CLM options in the MODEL statement produce the table shown in Output 32.2.3. For each observation, the observed , predicted, and residual values are shown. In addition, the 95% confidence limits for a mean predicted value are shown for each observation. Note that the observation with a missing value for mph is not used in the analysis, but predicted and confidence limit values are shown.
The final portion of output gives some additional information on the residuals. The Press statistic gives the sum of squares of predicted residual errors, as described in Chapter 2, Introduction to Regression Procedures. The First Order Autocorrelation and the Durbin-Watson D statistic, which measures first-order autocorrelation, are also given.
This example uses data from Kutner (1974, p. 98) to illustrate a two-way analysis of variance. The original data source is Afifi and Azen (1972, p. 166). These statements produce Output 32.3.1 and Output 32.3.2.
/*--------------------------------------------------------- */ /* Note: Kutner's 24 for drug 2, disease 1 changed to 34. */ /*--------------------------------------------------------- */ title Unbalanced Two-Way Analysis of Variance; data a; input drug disease @; do i=1 to 6; input y @; output; end; datalines; 1 1 42 44 36 13 19 22 1 2 33 . 26 . 33 21 1 3 31 3 . 25 25 24 2 1 28 . 23 34 42 13 2 2 . 34 33 31 . 36 2 3 3 26 28 32 4 16 3 1 . . 1 29 . 19 3 2 . 11 9 7 1 -6 3 3 21 1 . 9 3 . 4 1 24 . 9 22 2 15 4 2 27 12 12 5 16 15 4 3 22 7 25 5 12 . ; proc glm; class drug disease; model y=drug disease drug*disease / ss1 ss2 ss3 ss4; run;
Unbalanced Two-Way Analysis of Variance The GLM Procedure Class Level Information Class Levels Values drug 4 1 2 3 4 disease 3 1 2 3 Number of Observations Read 72 Number of Observations Used 58
Unbalanced Two-Way Analysis of Variance The GLM Procedure Dependent Variable: y Sum of Source DF Squares Mean Square F Value Pr > F Model 11 4259.338506 387.212591 3.51 0.0013 Error 46 5080.816667 110.452536 Corrected Total 57 9340.155172 R-Square Coeff Var Root MSE y Mean 0.456024 55.66750 10.50964 18.87931 Source DF Type I SS Mean Square F Value Pr > F drug 3 3133.238506 1044.412835 9.46 <.0001 disease 2 418.833741 209.416870 1.90 0.1617 drug*disease 6 707.266259 117.877710 1.07 0.3958 Source DF Type II SS Mean Square F Value Pr > F drug 3 3063.432863 1021.144288 9.25 <.0001 disease 2 418.833741 209.416870 1.90 0.1617 drug*disease 6 707.266259 117.877710 1.07 0.3958 Source DF Type III SS Mean Square F Value Pr > F drug 3 2997.471860 999.157287 9.05 <.0001 disease 2 415.873046 207.936523 1.88 0.1637 drug*disease 6 707.266259 117.877710 1.07 0.3958 Source DF Type IV SS Mean Square F Value Pr > F drug 3 2997.471860 999.157287 9.05 <.0001 disease 2 415.873046 207.936523 1.88 0.1637 drug*disease 6 707.266259 117.877710 1.07 0.3958
Note the differences between the four types of sums of squares. The Type I sum of squares for drug essentially tests for differences between the expected values of the arithmetic mean response for different drugs, unadjusted for the effect of disease. By contrast, the Type II sum of squares for drug measure the differences between arithmetic means for each drug after adjusting for disease . The Type III sum of squares measures the differences between predicted drug means over a balanced drug — disease population ”that is, between the LS-means for drug . Finally, the Type IV sum of squares is the same as the Type III sum of squares in this case, since there is data for every drug-by-disease combination.
No matter which sum of squares you prefer to use, this analysis shows a significant difference among the four drugs, while the disease effect and the drug-by-disease interaction are not significant. As the previous discussion indicates, Type III sums of squares correspond to differences between LS-means, so you can follow up the Type III tests with a multiple comparisons analysis of the drug LS-means. Since the GLM procedure is interactive, you can accomplish this by submitting the following statements after the previous ones that performed the ANOVA.
lsmeans drug / pdiff=all adjust=tukey; run;
Both the LS-means themselves and a matrix of adjusted p -values for pairwise differences between them are displayed; see Output 32.3.3.
Unbalanced Two-Way Analysis of Variance The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer LSMEAN drug y LSMEAN Number 1 25.9944444 1 2 26.5555556 2 3 9.7444444 3 4 13.5444444 4
Unbalanced Two-Way Analysis of Variance The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer Least Squares Means for effect drug Pr > t for H0: LSMean(i)=LSMean(j) Dependent Variable: y i/j 1 2 3 4 1 0.9989 0.0016 0.0107 2 0.9989 0.0011 0.0071 3 0.0016 0.0011 0.7870 4 0.0107 0.0071 0.7870
The multiple comparisons analysis shows that drugs 1 and 2 have very similar effects, and that drugs 3 and 4 are also insignificantly different from each other. Evidently, the main contribution to the significant drug effect is the difference between the 1/2 pair and the 3/4 pair.
Analysis of covariance combines some of the features of both regression and analysis of variance. Typically, a continuous variable (the covariate) is introduced into the model of an analysis-of-variance experiment.
Data in the following example are selected from a larger experiment on the use of drugs in the treatment of leprosy (Snedecor and Cochran 1967, p. 422).
Variables in the study are
Drug | - two antibiotics (A and D) and a control (F) |
PreTreatment | - a pre-treatment score of leprosy bacilli |
PostTreatment | - a post-treatment score of leprosy bacilli |
Ten patients are selected for each treatment ( Drug ), and six sites on each patient are measured for leprosy bacilli.
The covariate (a pretreatment score) is included in the model for increased precision in determining the effect of drug treatments on the posttreatment count of bacilli.
The following code creates the data set, performs a parallel-slopes analysis of covariance with PROC GLM, and computes Drug LS-means. These statements produce Output 32.4.1.
data drugtest; input Drug $ PreTreatment PostTreatment @@; datalines; A 11 6 A 8 0 A 5 2 A 14 8 A 19 11 A 6 4 A 10 13 A 6 1 A 11 8 A 3 0 D 6 0 D 6 2 D 7 3 D 8 1 D 18 18 D 8 4 D 19 14 D 8 9 D 5 1 D 15 9 F 16 13 F 13 10 F 11 18 F 9 5 F 21 23 F 16 12 F 12 5 F 12 16 F 7 1 F 12 20 ; proc glm; class Drug; model PostTreatment = Drug PreTreatment / solution; lsmeans Drug / stderr pdiff cov out=adjmeans; run; proc print data=adjmeans; run;
The GLM Procedure Class Level Information Class Levels Values Drug 3 A D F Number of Observations Read 30 Number of Observations Used 30
The GLM Procedure Dependent Variable: PostTreatment Sum of Source DF Squares Mean Square F Value Pr > F Model 3 871.497403 290.499134 18.10 <.0001 Error 26 417.202597 16.046254 Corrected Total 29 1288.700000 R-Square Coeff Var Root MSE PostTreatment Mean 0.676261 50.70604 4.005778 7.900000
This model assumes that the slopes relating posttreatment scores to pretreatment scores are parallel for all drugs. You can check this assumption by including the class-by-covariate interaction, Drug * PreTreatment , in the model and examining the ANOVA test for the significance of this effect. This extra test is omitted in this example, but it is insignificant, justifying the equal-slopes assumption.
The GLM Procedure Dependent Variable: PostTreatment Source DF Type I SS Mean Square F Value Pr > F Drug 2 293.6000000 146.8000000 9.15 0.0010 PreTreatment 1 577.8974030 577.8974030 36.01 <.0001 Source DF Type III SS Mean Square F Value Pr > F Drug 2 68.5537106 34.2768553 2.14 0.1384 PreTreatment 1 577.8974030 577.8974030 36.01 <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept 0.434671164 B 2.47135356 0.18 0.8617 Drug A 3.446138280 B 1.88678065 1.83 0.0793 Drug D 3.337166948 B 1.85386642 1.80 0.0835 Drug F 0.000000000 B . . . PreTreatment 0.987183811 0.16449757 6.00 <.0001 NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
The GLM Procedure Least Squares Means Post Treatment Standard LSMEAN Drug LSMEAN Error Pr > t Number A 6.7149635 1.2884943 <.0001 1 D 6.8239348 1.2724690 <.0001 2 F 10.1611017 1.3159234 <.0001 3 Least Squares Means for effect Drug Pr > t for H0: LSMean(i)=LSMean(j) Dependent Variable: PostTreatment i/j 1 2 3 1 0.9521 0.0793 2 0.9521 0.0835 3 0.0793 0.0835 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.
The OUT= and COV options in the LSMEANS statement create a data set of the estimates, their standard errors, and the variances and covariances of the LS-means, which is displayed in Output 32.4.4
Obs _NAME_ Drug LSMEAN STDERR NUMBER COV1 COV2 COV3 1 PostTreatment A 6.7150 1.28849 1 1.66022 0.02844 0.08403 2 PostTreatment D 6.8239 1.27247 2 0.02844 1.61918 0.04299 3 PostTreatment F 10.1611 1.31592 3 0.08403 0.04299 1.73165
The experimental graphics features of PROC GLM enable you to visualize the fitted analysis of covariance model.
ods html; ods graphics on; proc glm; class Drug; model PostTreatment = Drug PreTreatment; run; ods graphics off; ods html close;
When you specify the experimental ODS GRAPHICS statement and fit an analysis of covariance model, the GLM procedure output includes an analysis of covariance plot, as in Output 32.4.5. For general information about ODS graphics see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the GLM procedure, see the section ODS Graphics on page 1846.
The plot makes it clear that the control (drug F) has higher post-treatment scores across the range of pre-treatment scores, while the fitted models for the two antibiotics (drugs A and D) nearly coincide.
This example uses data from Cochran and Cox (1957, p. 176) to illustrate the analysis of a three-way factorial design with replication, including the use of the CONTRAST statement with interactions, the OUTSTAT= data set, and the SLICE= option in the LSMEANS statement.
The object of the study is to determine the effects of electric current on denervated muscle. The variables are
Rep | the replicate number, 1 or 2 |
Time | the length of time the current is applied to the muscle, ranging from 1to4 |
Current | the level of electric current applied, ranging from 1 to 4 |
Number | the number of treatments per day, ranging from 1 to 3 |
MuscleWeight | the weight of the denervated muscle |
The following code produces Output 32.5.1 through Output 32.5.4.
data muscles; do Rep=1 to 2; do Time=1 to 4; do Current=1 to 4; do Number=1 to 3; input MuscleWeight @@; output; end; end; end; end; datalines; 72 74 69 61 61 65 62 65 70 85 76 61 67 52 62 60 55 59 64 65 64 67 72 60 57 66 72 72 43 43 63 66 72 56 75 92 57 56 78 60 63 58 61 79 68 73 86 71 46 74 58 60 64 52 71 64 71 53 65 66 44 58 54 57 55 51 62 61 79 60 78 82 53 50 61 56 57 56 56 56 71 56 58 69 46 55 64 56 55 57 64 66 62 59 58 88 ; proc glm outstat=summary; class Rep Current Time Number; model MuscleWeight = Rep CurrentTimeNumber; contrast 'Time in Current 3' Time 1 0 0 1 Current*Time 0 0 0 0 0 0 0 0 1 0 0 1, Time 0 1 0 1 Current*Time 0 0 0 0 0 0 0 0 0 1 0 1, Time 0 0 1 1 Current*Time 0 0 0 0 0 0 0 0 0 0 1 1; contrast 'Current 1 versus 2' Current 1 1; lsmeans Current*Time / slice=Current; run; proc print data=summary; run;
The GLM Procedure Class Level Information Class Levels Values Rep 2 1 2 Current 4 1 2 3 4 Time 4 1 2 3 4 Number 3 1 2 3 Number of Observations Read 96 Number of Observations Used 96 The GLM Procedure Dependent Variable: MuscleWeight Sum of Source DF Squares Mean Square F Value Pr > F Model 48 5782.916667 120.477431 1.77 0.0261 Error 47 3199.489583 68.074246 Corrected Total 95 8982.406250 R-Square Coeff Var Root MSE MuscleWeight Mean 0.643805 13.05105 8.250712 63.21875
The GLM Procedure Dependent Variable: MuscleWeight Source DF Type I SS Mean Square F Value Pr > F Rep 1 605.010417 605.010417 8.89 0.0045 Current 3 2145.447917 715.149306 10.51 <.0001 Time 3 223.114583 74.371528 1.09 0.3616 Current*Time 9 298.677083 33.186343 0.49 0.8756 Number 2 447.437500 223.718750 3.29 0.0461 Current*Number 6 644.395833 107.399306 1.58 0.1747 Time*Number 6 367.979167 61.329861 0.90 0.5023 Current*Time*Number 18 1050.854167 58.380787 0.86 0.6276 Source DF Type III SS Mean Square F Value Pr > F Rep 1 605.010417 605.010417 8.89 0.0045 Current 3 2145.447917 715.149306 10.51 <.0001 Time 3 223.114583 74.371528 1.09 0.3616 Current*Time 9 298.677083 33.186343 0.49 0.8756 Number 2 447.437500 223.718750 3.29 0.0461 Current*Number 6 644.395833 107.399306 1.58 0.1747 Time*Number 6 367.979167 61.329861 0.90 0.5023 Current*Time*Number 18 1050.854167 58.380787 0.86 0.6276 Contrast DF Contrast SS Mean Square F Value Pr > F Time in Current 3 3 34.83333333 11.61111111 0.17 0.9157 Current 1 versus 2 1 99.18750000 99.18750000 1.46 0.2334
The GLM Procedure Least Squares Means Current*Time Effect Sliced by Current for MuscleWeight Sum of Current DF Squares Mean Square F Value Pr > F 1 3 271.458333 90.486111 1.33 0.2761 2 3 120.666667 40.222222 0.59 0.6241 3 3 34.833333 11.611111 0.17 0.9157 4 3 94.833333 31.611111 0.46 0.7085
Obs _NAME_ _SOURCE_ _TYPE_ DF SS F PROB 1 MuscleWeight ERROR ERROR 47 3199.49 . . 2 MuscleWeight Rep SS1 1 605.01 8.8875 0.00454 3 MuscleWeight Current SS1 3 2145.45 10.5054 0.00002 4 MuscleWeight Time SS1 3 223.11 1.0925 0.36159 5 MuscleWeight Current*Time SS1 9 298.68 0.4875 0.87562 6 MuscleWeight Number SS1 2 447.44 3.2864 0.04614 7 MuscleWeight Current*Number SS1 6 644.40 1.5777 0.17468 8 MuscleWeight Time*Number SS1 6 367.98 0.9009 0.50231 9 MuscleWeight Current*Time*Number SS1 18 1050.85 0.8576 0.62757 10 MuscleWeight Rep SS3 1 605.01 8.8875 0.00454 11 MuscleWeight Current SS3 3 2145.45 10.5054 0.00002 12 MuscleWeight Time SS3 3 223.11 1.0925 0.36159 13 MuscleWeight Current*Time SS3 9 298.68 0.4875 0.87562 14 MuscleWeight Number SS3 2 447.44 3.2864 0.04614 15 MuscleWeight Current*Number SS3 6 644.40 1.5777 0.17468 16 MuscleWeight Time*Number SS3 6 367.98 0.9009 0.50231 17 MuscleWeight Current*Time*Number SS3 18 1050.85 0.8576 0.62757 18 MuscleWeight Time in Current 3 CONTRAST 3 34.83 0.1706 0.91574 19 MuscleWeight Current 1 versus 2 CONTRAST 1 99.19 1.4570 0.23344
The first CONTRAST statement examines the effects of Time within level 3 of Current . This is also called the simple effect of Time within Current * Time . Note that, since there are three degrees of freedom, it is necessary to specify three rows in the CONTRAST statement, separated by commas. Since the parameterization that PROC GLM uses is determined in part by the ordering of the variables in the CLASS statement, Current is specified before Time so that the Time parameters are nested within the Current * Time parameters; thus, the Current * Time contrast coefficients in each row are simply the Time coefficients of that row within the appropriate level of Current .
The second CONTRAST statement isolates a single degree of freedom effect corresponding to the difference between the first two levels of Current . You can use such a contrast in a large experiment where certain preplanned comparisons are important, but you want to take advantage of the additional error degrees of freedom available when all levels of the factors are considered .
The LSMEANS statement with the SLICE= option is an alternative way to test for the simple effect of Time within Current * Time . In addition to listing the LS-means for each current strength and length of time, it gives a table of F -tests for differences between the LS-means across Time within each Current level. In some cases, this can be a way to disentangle a complex interaction.
The output, shown in Output 32.5.2 and Output 32.5.3, indicates that the main effects for Rep , Current , and Number are significant (with p -values of 0.0045, <0.0001, and 0.0461, respectively), but Time is not significant, indicating that, in general, it doesn t matter how long the current is applied. None of the interaction terms are significant, nor are the contrasts significant. Notice that the row in the sliced ANOVA table corresponding to level 3 of current matches the Time in Current 3 contrast.
The SS, F statistics, and p -values can be stored in an OUTSTAT= data set, as shown in Output 32.5.4.
The following example employs multivariate analysis of variance (MANOVA) to measure differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data are from Tubb et al. (1980), as reported in Hand et al. (1994).
For each of 26 samples of pottery, the percentages of oxides of five metals are measured. The following statements create the data set and invoke the GLM procedure to perform a one-way MANOVA. Additionally, it is of interest to know whether the pottery from one site in Wales (Llanederyn) differs from the samples from other sites; a CONTRAST statement is used to test this hypothesis.
data pottery; title1 "Romano-British Pottery"; input Site . Al Fe Mg Ca Na; datalines; Llanederyn 14.4 7.00 4.30 0.15 0.51 Llanederyn 13.8 7.08 3.43 0.12 0.17 Llanederyn 14.6 7.09 3.88 0.13 0.20 Llanederyn 11.5 6.37 5.64 0.16 0.14 Llanederyn 13.8 7.06 5.34 0.20 0.20 Llanederyn 10.9 6.26 3.47 0.17 0.22 Llanederyn 10.1 4.26 4.26 0.20 0.18 Llanederyn 11.6 5.78 5.91 0.18 0.16 Llanederyn 11.1 5.49 4.52 0.29 0.30 Llanederyn 13.4 6.92 7.23 0.28 0.20 Llanederyn 12.4 6.13 5.69 0.22 0.54 Llanederyn 13.1 6.64 5.51 0.31 0.24 Llanederyn 12.7 6.69 4.45 0.20 0.22 Llanederyn 12.5 6.44 3.94 0.22 0.23 Caldicot 11.8 5.44 3.94 0.30 0.04 Caldicot 11.6 5.39 3.77 0.29 0.06 IslandThorns 18.3 1.28 0.67 0.03 0.03 IslandThorns 15.8 2.39 0.63 0.01 0.04 IslandThorns 18.0 1.50 0.67 0.01 0.06 IslandThorns 18.0 1.88 0.68 0.01 0.04 IslandThorns 20.8 1.51 0.72 0.07 0.10 AshleyRails 17.7 1.12 0.56 0.06 0.06 AshleyRails 18.3 1.14 0.67 0.06 0.05 AshleyRails 16.7 0.92 0.53 0.01 0.05 AshleyRails 14.8 2.74 0.67 0.03 0.05 AshleyRails 19.1 1.64 0.60 0.10 0.03 ; proc glm data=pottery; class Site; model Al Fe Mg Ca Na = Site; contrast 'Llanederyn vs. the rest' Site 1 1 1 3; manova h=_all_ / printe printh; run;
After the summary information, displayed in Output 32.6.1, PROC GLM produces the univariate analyses for each of the dependent variables, as shown in Output 32.6.2 to Output 32.6.6. These analyses show that sites are significantly different for all oxides individually. You can suppress these univariate analyses by specifying the NOUNI option in the MODEL statement.
Romano-British Pottery The GLM Procedure Class Level Information Class Levels Values Site 4 AshleyRails Caldicot IslandThorns Llanederyn Number of Observations Read 26 Number of Observations Used 26
Romano-British Pottery The GLM Procedure Dependent Variable: Al Sum of Source DF Squares Mean Square F Value Pr > F Model 3 175.6103187 58.5367729 26.67 <.0001 Error 22 48.2881429 2.1949156 Corrected Total 25 223.8984615 R-Square Coeff Var Root MSE Al Mean 0.784330 10.22284 1.481525 14.49231 Source DF Type I SS Mean Square F Value Pr > F Site 3 175.6103187 58.5367729 26.67 <.0001 Source DF Type III SS Mean Square F Value Pr > F Site 3 175.6103187 58.5367729 26.67 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F Llanederyn vs. the rest 1 58.58336640 58.58336640 26.69 <.0001
Romano-British Pottery The GLM Procedure Dependent Variable: Fe Sum of Source DF Squares Mean Square F Value Pr > F Model 3 134.2216158 44.7405386 89.88 <.0001 Error 22 10.9508457 0.4977657 Corrected Total 25 145.1724615 R-Square Coeff Var Root MSE Fe Mean 0.924567 15.79171 0.705525 4.467692 Source DF Type I SS Mean Square F Value Pr > F Site 3 134.2216158 44.7405386 89.88 <.0001 Source DF Type III SS Mean Square F Value Pr > F Site 3 134.2216158 44.7405386 89.88 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F Llanederyn vs. the rest 1 71.15144132 71.15144132 142.94 <.0001
Romano-British Pottery The GLM Procedure Dependent Variable: Mg Sum of Source DF Squares Mean Square F Value Pr > F Model 3 103.3505270 34.4501757 49.12 <.0001 Error 22 15.4296114 0.7013460 Corrected Total 25 118.7801385 R-Square Coeff Var Root MSE Mg Mean 0.870099 26.65777 0.837464 3.141538 Source DF Type I SS Mean Square F Value Pr > F Site 3 103.3505270 34.4501757 49.12 <.0001 Source DF Type III SS Mean Square F Value Pr > F Site 3 103.3505270 34.4501757 49.12 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F Llanederyn vs. the rest 1 56.59349339 56.59349339 80.69 <.0001
Romano-British Pottery The GLM Procedure Dependent Variable: Ca Sum of Source DF Squares Mean Square F Value Pr > F Model 3 0.20470275 0.06823425 29.16 <.0001 Error 22 0.05148571 0.00234026 Corrected Total 25 0.25618846 R-Square Coeff Var Root MSE Ca Mean 0.799032 33.01265 0.048376 0.146538 Source DF Type I SS Mean Square F Value Pr > F Site 3 0.20470275 0.06823425 29.16 <.0001 Source DF Type III SS Mean Square F Value Pr > F Site 3 0.20470275 0.06823425 29.16 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F Llanederyn vs. the rest 1 0.03531688 0.03531688 15.09 0.0008
Romano-British Pottery The GLM Procedure Dependent Variable: Na Sum of Source DF Squares Mean Square F Value Pr > F Model 3 0.25824560 0.08608187 9.50 0.0003 Error 22 0.19929286 0.00905877 Corrected Total 25 0.45753846 R-Square Coeff Var Root MSE Na Mean 0.564424 60.06350 0.095178 0.158462 Source DF Type I SS Mean Square F Value Pr > F Site 3 0.25824560 0.08608187 9.50 0.0003 Source DF Type III SS Mean Square F Value Pr > F Site 3 0.25824560 0.08608187 9.50 0.0003 Contrast DF Contrast SS Mean Square F Value Pr > F Llanederyn vs. the rest 1 0.23344446 0.23344446 25.77 <.0001
The PRINTE option in the MANOVA statement displays the elements of the error matrix, also called the Error Sums of Squares and Crossproducts matrix. See Output 32.6.7. The diagonal elements of this matrix are the error sums of squares from the corresponding univariate analyses.
Romano-British Pottery The GLM Procedure Multivariate Analysis of Variance E = Error SSCP Matrix Al Fe Mg Ca Na Al 48.288142857 7.0800714286 0.6080142857 0.1064714286 0.5889571429 Fe 7.0800714286 10.950845714 0.5270571429 0.155194286 0.0667585714 Mg 0.6080142857 0.5270571429 15.429611429 0.4353771429 0.0276157143 Ca 0.1064714286 0.155194286 0.4353771429 0.0514857143 0.0100785714 Na 0.5889571429 0.0667585714 0.0276157143 0.0100785714 0.1992928571 Partial Correlation Coefficients from the Error SSCP Matrix / Prob > r DF = 22 Al Fe Mg Ca Na Al 1.000000 0.307889 0.022275 0.067526 0.189853 0.1529 0.9196 0.7595 0.3856 Fe 0.307889 1.000000 0.040547 0.206685 0.045189 0.1529 0.8543 0.3440 0.8378 Mg 0.022275 0.040547 1.000000 0.488478 0.015748 0.9196 0.8543 0.0180 0.9431 Ca 0.067526 0.206685 0.488478 1.000000 0.099497 0.7595 0.3440 0.0180 0.6515 Na 0.189853 0.045189 0.015748 0.099497 1.000000 0.3856 0.8378 0.9431 0.6515
The PRINTE option also displays the partial correlation matrix associated with the E matrix. In this example, none of the oxides are very strongly correlated; the strongest correlation ( r = 0 . 488) is between magnesium oxide and calcium oxide.
The PRINTH option produces the SSCP matrix for the hypotheses being tested ( Site and the contrast); see Output 32.6.8 and Output 32.6.9. Since the Type III SS are the highest level SS produced by PROC GLM by default, and since the HTYPE= option is not specified, the SSCP matrix for Site gives the Type III H matrix. The diagonal elements of this matrix are the model sums of squares from the corresponding univariate analyses.
Romano-British Pottery The GLM Procedure Multivariate Analysis of Variance H = Type III SSCP Matrix for Site Al Fe Mg Ca Na Al 175.61031868 149.295533 130.8097066 5.889163736 5.372264835 Fe 149.295533 134.22161582 117.74503516 4.8217865934 5.3259491209 Mg 130.8097066 117.74503516 103.35052703 4.2091613187 4.7105458242 Ca 5.889163736 4.8217865934 4.2091613187 0.2047027473 0.154782967 Na 5.372264835 5.3259491209 4.7105458242 0.154782967 0.2582456044 Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for Site E = Error SSCP Matrix Characteristic Characteristic Vector VEV=1 Root Percent Al Fe Mg Ca Na 34.1611140 96.39 0.09562211 0.26330469 0.05305978 1.87982100 0.47071123 1.2500994 3.53 0.02651891 0.01239715 0.17564390 4.25929785 1.23727668 0.0275396 0.08 0.09082220 0.13159869 0.03508901 0.15701602 1.39364544 0.0000000 0.00 0.03673984 0.15129712 0.20455529 0.54624873 0.17402107 0.0000000 0.00 0.06862324 0.03056912 0.10662399 2.51151978 1.23668841 MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Site Effect H = Type III SSCP Matrix for Site E = Error SSCP Matrix S=3 M=0.5 N=8 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.01230091 13.09 15 50.091 <.0001 Pillai's Trace 1.55393619 4.30 15 60 <.0001 Hotelling-Lawley Trace 35.43875302 40.59 15 29.13 <.0001 Roy's Greatest Root 34.16111399 136.64 5 20 <.0001 NOTE: F Statistic for Roy's Greatest Root is an upper bound.
Romano-British Pottery The GLM Procedure Multivariate Analysis of Variance H = Contrast SSCP Matrix for Llanederyn vs. the rest Al Fe Mg Ca Na Al 58.583366402 64.56230291 57.57983466 1.438395503 3.698102513 Fe 64.56230291 71.151441323 63.456352116 1.5851961376 4.0755256878 Mg 57.57983466 63.456352116 56.593493386 1.4137558201 3.6347541005 Ca 1.438395503 1.5851961376 1.4137558201 0.0353168783 0.0907993915 Na 3.698102513 4.0755256878 3.6347541005 0.0907993915 0.2334444577 Characteristic Roots and Vectors of: E Inverse * H, where H = Contrast SSCP Matrix for Llanederyn vs. the rest E = Error SSCP Matrix Characteristic Characteristic Vector VEV=1 Root Percent Al Fe Mg Ca Na 16.1251646 100.00 0.08883488 0.25458141 0.08723574 0.98158668 0.71925759 0.0000000 0.00 0.00503538 0.03825743 0.17632854 5.16256699 0.01022754 0.0000000 0.00 0.00162771 0.08885364 0.01774069 0.83096817 2.17644566 0.0000000 0.00 0.04450136 0.15722494 0.22156791 0.00000000 0.00000000 0.0000000 0.00 0.11939206 0.10833549 0.00000000 0.00000000 0.00000000 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Llanederyn vs. the rest Effect H = Contrast SSCP Matrix for Llanederyn vs. the rest E = Error SSCP Matrix S=1 M=1.5 N=8 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.05839360 58.05 5 18 <.0001 Pillai's Trace 0.94160640 58.05 5 18 <.0001 Hotelling-Lawley Trace 16.12516462 58.05 5 18 <.0001 Roy's Greatest Root 16.12516462 58.05 5 18 <.0001
Four multivariate tests are computed, all based on the characteristic roots and vectors of E ˆ’ 1 H . These roots and vectors are displayed along with the tests. All four tests can be transformed to variates that have F distributions under the null hypothesis. Note that the four tests all give the same results for the contrast, since it has only one degree of freedom. In this case, the multivariate analysis matches the univariate results: there is an overall difference between the chemical composition of samples from different sites, and the samples from Llanederyn are different from the average of the other sites.
This example uses data from Cole and Grizzle (1966) to illustrate a commonly occurring repeated measures ANOVA design. Sixteen dogs are randomly assigned to four groups. (One animal is removed from the analysis due to a missing value for one dependent variable.) Dogs in each group receive either morphine or trimethaphan (variable Drug ) and have either depleted or intact histamine levels (variable Depleted ) before receiving the drugs. The dependent variable is the blood concentration of histamine at 0, 1, 3, and 5 minutes after injection of the drug. Logarithms are applied to these concentrations to minimize correlation between the mean and the variance of the data.
The following SAS statements perform both univariate and multivariate repeated measures analyses and produce Output 32.7.1 through Output 32.7.7:
data dogs; input Drug . Depleted $ Histamine0 Histamine1 Histamine3 Histamine5; LogHistamine0=log(Histamine0); LogHistamine1=log(Histamine1); LogHistamine3=log(Histamine3); LogHistamine5=log(Histamine5); datalines; Morphine N .04 .20 .10 .08 Morphine N .02 .06 .02 .02 Morphine N .07 1.40 .48 .24 Morphine N .17 .57 .35 .24 Morphine Y .10 .09 .13 .14 Morphine Y .12 .11 .10 . Morphine Y .07 .07 .06 .07 Morphine Y .05 .07 .06 .07 Trimethaphan N .03 .62 .31 .22 Trimethaphan N .03 1.05 .73 .60 Trimethaphan N .07 .83 1.07 .80 Trimethaphan N .09 3.13 2.06 1.23 Trimethaphan Y .10 .09 .09 .08 Trimethaphan Y .08 .09 .09 .10 Trimethaphan Y .13 .10 .12 .12 Trimethaphan Y .06 .05 .05 .05 ; proc glm; class Drug Depleted; model LogHistamine0--LogHistamine5 = Drug Depleted Drug*Depleted / nouni; repeated Time 4 (0 1 3 5) polynomial / summary printe; run;
The GLM Procedure Class Level Information Class Levels Values Drug 2 Morphine Trimethaphan Depleted 2 N Y Number of Observations Read 16 Number of Observations Used 15 The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables Time_N represents the nth degree polynomial contrast for Time Contrast Variable: Time_1 Contrast Variable: Time_2 Contrast Variable: Time_3
The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Log Log Log Log Dependent Variable Histamine0 Histamine1 Histamine3 Histamine5 Level of Time 0 1 3 5
The GLM Procedure Repeated Measures Analysis of Variance MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time Effect H = Type III SSCP Matrix for Time E = Error SSCP Matrix S=1 M=0.5 N=3.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.11097706 24.03 3 9 0.0001 Pillai's Trace 0.88902294 24.03 3 9 0.0001 Hotelling-Lawley Trace 8.01087137 24.03 3 9 0.0001 Roy's Greatest Root 8.01087137 24.03 3 9 0.0001 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time*Drug Effect H = Type III SSCP Matrix for Time*Drug E = Error SSCP Matrix S=1 M=0.5 N=3.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.34155984 5.78 3 9 0.0175 Pillai's Trace 0.65844016 5.78 3 9 0.0175 Hotelling-Lawley Trace 1.92774470 5.78 3 9 0.0175 Roy's Greatest Root 1.92774470 5.78 3 9 0.0175 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time*Depleted Effect H = Type III SSCP Matrix for Time*Depleted E = Error SSCP Matrix S=1 M=0.5 N=3.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.12339988 21.31 3 9 0.0002 Pillai's Trace 0.87660012 21.31 3 9 0.0002 Hotelling-Lawley Trace 7.10373567 21.31 3 9 0.0002 Roy's Greatest Root 7.10373567 21.31 3 9 0.0002 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Time*Drug*Depleted Effect H = Type III SSCP Matrix for Time*Drug*Depleted E = Error SSCP Matrix S=1 M=0.5 N=3.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.19383010 12.48 3 9 0.0015 Pillai's Trace 0.80616990 12.48 3 9 0.0015 Hotelling-Lawley Trace 4.15915732 12.48 3 9 0.0015 Roy's Greatest Root 4.15915732 12.48 3 9 0.0015
The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F Drug 1 5.99336243 5.99336243 2.71 0.1281 Depleted 1 15.44840703 15.44840703 6.98 0.0229 Drug*Depleted 1 4.69087508 4.69087508 2.12 0.1734 Error 11 24.34683348 2.21334850
The GLM Procedure Repeated Measures Analysis of Variance Sphericity Tests Mauchly's Variables DF Criterion Chi-Square Pr > ChiSq Transformed Variates 5 0.1752641 16.930873 0.0046 Orthogonal Components 5 0.1752641 16.930873 0.0046
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Adj Pr > F Source DF Type III SS Mean Square F Value Pr > F G - G H - F Time 3 12.05898677 4.01966226 53.44 <.0001 <.0001 <.0001 Time*Drug 3 1.84429514 0.61476505 8.17 0.0003 0.0039 0.0008 Time*Depleted 3 12.08978557 4.02992852 53.57 <.0001 <.0001 <.0001 Time*Drug*Depleted 3 2.93077939 0.97692646 12.99 <.0001 0.0005 <.0001 Error(Time) 33 2.48238887 0.07522391 Greenhouse-Geisser Epsilon 0.5694 Huynh-Feldt Epsilon 0.8475
The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables Time_N represents the nth degree polynomial contrast for Time Contrast Variable: Time_1 Source DF Type III SS Mean Square F Value Pr > F Mean 1 2.00963483 2.00963483 34.99 0.0001 Drug 1 1.18069076 1.18069076 20.56 0.0009 Depleted 1 1.36172504 1.36172504 23.71 0.0005 Drug*Depleted 1 2.04346848 2.04346848 35.58 <.0001 Error 11 0.63171161 0.05742833 Contrast Variable: Time_2 Source DF Type III SS Mean Square F Value Pr > F Mean 1 5.40988418 5.40988418 57.15 <.0001 Drug 1 0.59173192 0.59173192 6.25 0.0295 Depleted 1 5.94945506 5.94945506 62.86 <.0001 Drug*Depleted 1 0.67031587 0.67031587 7.08 0.0221 Error 11 1.04118707 0.09465337 Contrast Variable: Time_3 Source DF Type III SS Mean Square F Value Pr > F Mean 1 4.63946776 4.63946776 63.04 <.0001 Drug 1 0.07187246 0.07187246 0.98 0.3443 Depleted 1 4.77860547 4.77860547 64.94 <.0001 Drug*Depleted 1 0.21699504 0.21699504 2.95 0.1139 Error 11 0.80949018 0.07359002
The NOUNI option in the MODEL statement suppresses the individual ANOVA tables for the original dependent variables. These analyses are usually of no interest in a repeated measures analysis. The POLYNOMIAL option in the REPEATED statement indicates that the transformation used to implement the repeated measures analysis is an orthogonal polynomial transformation, and the SUMMARY option requests that the univariate analyses for the orthogonal polynomial contrast variables be displayed. Theparentheticalnumbers(0135)determinethespacingoftheorthogonal polynomials used in the analysis. The output is displayed in Output 32.7.1 through Output 32.7.7.
The Repeated Measures Level Information table gives information on the repeated measures effect; it is displayed in Output 32.7.2. In this example, the within-subject (within-dog) effect is Time , which has the levels 0, 1, 3, and 5.
The multivariate analyses for within-subject effects and related interactions are displayed in Output 32.7.3. For the example, the first table displayed shows that the TIME effect is significant. In addition, the Time * Drug * Depleted interaction is significant, as shown in the fourth table. This means that the effect of Time on the blood concentration of histamine is different for the four Drug * Depleted combinations studied.
Univariate analyses for within-subject (within-dog) effects and related interactions are displayed in Output 32.7.6. The results for this example are the same as for the multivariate analyses; this is not always the case. In addition, before the univariate analyses are used to make conclusions about the data, the result of the sphericity test (requested with the PRINTE option in the REPEATED statement and displayed in Output 32.7.5) should be examined. If the sphericity test is rejected, use the adjusted G-G or H-F probabilities. See the Repeated Measures Analysis of Variance section on page 1825 for more information.
Milliken and Johnson (1984) present an example of an unbalanced mixed model. Three machines, which are considered as a fixed effect, and six employees , which are considered a random effect, are studied. Each employee operates each machine for either one, two, or three different times. The dependent variable is an overall rating, which takes into account the number and quality of components produced.
The following statements form the data set and perform a mixed model analysis of variance by requesting the TEST option in the RANDOM statement. Note that the machine * person interaction is declared as a random effect; in general, when an interaction involves a random effect, it too should be declared as random. The results of the analysis are shown in Output 32.8.1 through Output 32.8.4.
data machine; input machine person rating @@; datalines; 1 1 52.0 1 2 51.8 1 2 52.8 1 3 60.0 1 4 51.1 1 4 52.3 1 5 50.9 1 5 51.8 1 5 51.4 1 6 46.4 1 6 44.8 1 6 49.2 2 1 64.0 2 2 59.7 2 2 60.0 2 2 59.0 2 3 68.6 2 3 65.8 2 4 63.2 2 4 62.8 2 4 62.2 2 5 64.8 2 5 65.0 2 6 43.7 2 6 44.2 2 6 43.0 3 1 67.5 3 1 67.2 3 1 66.9 3 2 61.5 3 2 61.7 3 2 62.3 3 3 70.8 3 3 70.6 3 3 71.0 3 4 64.1 3 4 66.2 3 4 64.0 3 5 72.1 3 5 72.0 3 5 71.1 3 6 62.0 3 6 61.4 3 6 60.5 ; proc glm data=machine; class machine person; model rating=machine person machine*person; random person machine*person / test; run;
The GLM Procedure Class Level Information Class Levels Values machine 3 1 2 3 person 6 1 2 3 4 5 6 Number of Observations Read 44 Number of Observations Used 44
The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 17 3061.743333 180.102549 206.41 <.0001 Error 26 22.686667 0.872564 Corrected Total 43 3084.430000 R-Square Coeff Var Root MSE rating Mean 0.992645 1.560754 0.934111 59.85000 Source DF Type I SS Mean Square F Value Pr > F machine 2 1648.664722 824.332361 944.72 <.0001 person 5 1008.763583 201.752717 231.22 <.0001 machine*person 10 404.315028 40.431503 46.34 <.0001 Source DF Type III SS Mean Square F Value Pr > F machine 2 1238.197626 619.098813 709.52 <.0001 person 5 1011.053834 202.210767 231.74 <.0001 machine*person 10 404.315028 40.431503 46.34 <.0001
The GLM Procedure Source Type III Expected Mean Square machine Var(Error) + 2.137 Var(machine*person) + Q(machine) person Var(Error) + 2.2408 Var(machine*person) + 6.7224 Var(person) machine*person Var(Error) + 2.3162 Var(machine*person)
The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: rating Source DF Type III SS Mean Square F Value Pr > F machine 2 1238.197626 619.098813 16.57 0.0007 Error 10.036 375.057436 37.370384 Error: 0.9226*MS(machine*person) + 0.0774*MS(Error) Source DF Type III SS Mean Square F Value Pr > F person 5 1011.053834 202.210767 5.17 0.0133 Error 10.015 392.005726 39.143708 Error: 0.9674*MS(machine*person) + 0.0326*MS(Error) Source DF Type III SS Mean Square F Value Pr > F machine*person 10 404.315028 40.431503 46.34 <.0001 Error: MS(Error) 26 22.686667 0.872564
The TEST option in the RANDOM statement requests that PROC GLM determine the appropriate F -tests based on person and machine * person being treated as random effects. As you can see in Output 32.8.4, this requires that a linear combination of mean squares be constructed to test both the machine and person hypotheses; thus, F -tests using Satterthwaite approximations are used.
Note that you can also use the MIXED procedure to analyze mixed models. The following statements use PROC MIXED to reproduce the mixed model analysis of variance; the relevant part of the PROC MIXED results is shown in Output 32.8.5
The Mixed Procedure Type 3 Analysis of Variance Sum of Source DF Squares Mean Square machine 2 1238.197626 619.098813 person 5 1011.053834 202.210767 machine*person 10 404.315028 40.431503 Residual 26 22.686667 0.872564 Type 3 Analysis of Variance Source Expected Mean Square machine Var(Residual) + 2.137 Var(machine*person) + Q(machine) person Var(Residual) + 2.2408 Var(machine*person) + 6.7224 Var(person) machine*person Var(Residual) + 2.3162 Var(machine*person) Residual Var(Residual) Type 3 Analysis of Variance Error Source Error Term DF F Value Pr > F machine 0.9226 MS(machine*person) 10.036 16.57 0.0007 + 0.0774 MS(Residual) person 0.9674 MS(machine*person) 10.015 5.17 0.0133 + 0.0326 MS(Residual) machine*person MS(Residual) 26 46.34 <.0001 Residual . . . .
proc mixed data=machine method=type3; class machine person; model rating = machine; random person machine*person; run;
The advantage of PROC MIXED is that it offers more versatility for mixed models; the disadvantage is that it can be less computationally efficient for large data sets. See Chapter 46, The MIXED Procedure, for more details.
This example shows how to analyze a doubly-multivariate repeated measures design by using PROC GLM with an IDENTITY factor in the REPEATED statement. Note that this differs from previous releases of PROC GLM, in which you had to use a MANOVA statement to get a doubly repeated measures analysis.
Two responses, Y1 and Y2, are each measured three times for each subject (pretreatment, posttreatment, and in a later follow-up). Each subject receives one of three treatments; A, B, or the control. In PROC GLM, you use a REPEATED factor of type IDENTITY to identify the different responses and another repeated factor to identify the different measurement times. The repeated measures analysis includes multivariate tests for time and treatment main effects, as well as their interactions, across responses. The following statements produce Output 32.9.1 through Output 32.9.3.
data Trial; input Treatment $ Repetition PreY1 PostY1 FollowY1 PreY2 PostY2 FollowY2; datalines; A 1 3 13 9 0 0 9 A 2 0 14 10 6 6 3 A 3 4 6 17 8 2 6 A 4 7 7 13 7 6 4 A 5 3 12 11 6 12 6 A 6 10 14 8 13 3 8 B 1 9 11 17 8 11 27 B 2 4 16 13 9 3 26 B 3 8 10 9 12 0 18 B 4 5 9 13 3 0 14 B 5 0 15 11 3 0 25 B 6 4 11 14 4 2 9 Control 1 10 12 15 4 3 7 Control 2 2 8 12 8 7 20 Control 3 4 9 10 2 0 10 Control 4 10 8 8 5 8 14 Control 5 11 11 11 1 0 11 Control 6 1 5 15 8 9 10 ; proc glm data=Trial; class Treatment; model PreY1 PostY1 FollowY1 PreY2 PostY2 FollowY2 = Treatment / nouni; repeated Response 2 identity, Time 3; run;
The GLM Procedure Class Level Information Class Levels Values Treatment 3 A B Control Number of Observations Read 18 Number of Observations Used 18
The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable PreY1 PostY1 FollowY1 PreY2 PostY2 FollowY2 Level of Response 1 1 1 2 2 2 Level of Time 1 2 3 1 2 3
The GLM Procedure Repeated Measures Analysis of Variance MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Response Effect H = Type III SSCP Matrix for Response E = Error SSCP Matrix S=1 M=0 N=6 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.02165587 316.24 2 14 <.0001 Pillai's Trace 0.97834413 316.24 2 14 <.0001 Hotelling-Lawley Trace 45.17686368 316.24 2 14 <.0001 Roy's Greatest Root 45.17686368 316.24 2 14 <.0001 MANOVA Test Criteria and F Approximations for the Hypothesis of no Response*Treatment Effect H = Type III SSCP Matrix for Response*Treatment E = Error SSCP Matrix S=2 M=-0.5 N=6 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.72215797 1.24 4 28 0.3178 Pillai's Trace 0.27937444 1.22 4 30 0.3240 Hotelling-Lawley Trace 0.38261660 1.31 4 15.818 0.3074 Roy's Greatest Root 0.37698780 2.83 2 15 0.0908 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact. MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no Response*Time Effect H = Type III SSCP Matrix for Response*Time E = Error SSCP Matrix S=1 M=1 N=5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.14071380 18.32 4 12 <.0001 Pillai's Trace 0.85928620 18.32 4 12 <.0001 Hotelling-Lawley Trace 6.10662362 18.32 4 12 <.0001 Roy's Greatest Root 6.10662362 18.32 4 12 <.0001 MANOVA Test Criteria and F Approximations for the Hypothesis of no Response*Time*Treatment Effect H = Type III SSCP Matrix for Response*Time*Treatment E = Error SSCP Matrix S=2 M=0.5 N=5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.22861451 3.27 8 24 0.0115 Pillai's Trace 0.96538785 3.03 8 26 0.0151 Hotelling-Lawley Trace 2.52557514 3.64 8 15 0.0149 Roy's Greatest Root 2.12651905 6.91 4 13 0.0033 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks Lambda is exact.
The levels of the repeated factors are displayed in Output 32.9.2. Note that RESPONSE is 1 for all the Y1 measurements and 2 for all the Y2 measurements, while the three levels of Time identify the pretreatment, posttreatment, and follow-up measurements within each response. The multivariate tests for within-subject effects are displayed in Output 32.9.3.
The table for Response * Treatment tests for an overall treatment effect across the two responses; likewise, the tables for Response * Time and Response * Treatment * Time test for time and the treatment-by-time interaction, respectively. In this case, there is a strong main effect for time and possibly for the interaction, but not for treatment.
In previous releases (before the IDENTITY transformation was introduced), in order to perform a doubly repeated measures analysis, you had to use a MANOVA statement with a customized transformation matrix M. You might still want to use this approach to see details of the analysis, such as the univariate ANOVA for each transformed variate. The following statements demonstrate this approach by using the MANOVA statement to test for the overall main effect of time and specifying the SUMMARY option.
proc glm data=Trial; class Treatment; model PreY1 PostY1 FollowY1 PreY2 PostY2 FollowY2 = Treatment / nouni; manova h=intercept m=prey1 - posty1, prey1 - followy1, prey2 - posty2, prey2 - followy2 / summary; run;
The M matrix used to perform the test for time effects is displayed in Output 32.9.4, while the results of the multivariate test are given in Output 32.9.5. Note that the test results are the same as for the Response * Time effect in Output 32.9.3.
The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables PreY1 PostY1 FollowY1 PreY2 PostY2 FollowY2 MVAR1 1 1 0 0 0 0 MVAR2 1 0 1 0 0 0 MVAR3 0 0 0 1 1 0 MVAR4 0 0 0 1 0 1
The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for Intercept E = Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic Characteristic Vector VEV=1 Root Percent MVAR1 MVAR2 MVAR3 MVAR4 6.10662362 100.00 0.00157729 0.04081620 0.04210209 0.03519437 0.00000000 0.00 0.00796367 0.00493217 0.05185236 0.00377940 0.00000000 0.00 0.03534089 0.01502146 0.00283074 0.04259372 0.00000000 0.00 0.05672137 0.04500208 0.00000000 0.00000000 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Intercept Effect on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for Intercept E = Error SSCP Matrix S=1 M=1 N=5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.14071380 18.32 4 12 <.0001 Pillai's Trace 0.85928620 18.32 4 12 <.0001 Hotelling-Lawley Trace 6.10662362 18.32 4 12 <.0001 Roy's Greatest Root 6.10662362 18.32 4 12 <.0001
The SUMMARY option in the MANOVA statement creates an ANOVA table for each transformed variable as defined by the M matrix. MVAR1 and MVAR2 contrast the pretreatment measurement for Y1 with the posttreatment and follow-up measurements for Y1, respectively; MVAR3 and MVAR4 are the same contrasts for Y2. Output 32.9.6 displays these univariate ANOVA tables and shows that the contrasts are all strongly significant except for the pre-versus-post difference for Y2.
The GLM Procedure Multivariate Analysis of Variance Dependent Variable: MVAR1 Source DF Type III SS Mean Square F Value Pr > F Intercept 1 512.0000000 512.0000000 22.65 0.0003 Error 15 339.0000000 22.6000000 The GLM Procedure Multivariate Analysis of Variance Dependent Variable: MVAR2 Source DF Type III SS Mean Square F Value Pr > F Intercept 1 813.3888889 813.3888889 32.87 <.0001 Error 15 371.1666667 24.7444444 The GLM Procedure Multivariate Analysis of Variance Dependent Variable: MVAR3 Source DF Type III SS Mean Square F Value Pr > F Intercept 1 68.0555556 68.0555556 3.49 0.0814 Error 15 292.5000000 19.5000000 The GLM Procedure Multivariate Analysis of Variance Dependent Variable: MVAR4 Source DF Type III SS Mean Square F Value Pr > F Intercept 1 800.0000000 800.0000000 26.43 0.0001 Error 15 454.0000000 30.2666667
This example demonstrates how you can test for equal group variances in a one-way design. The data come from the University of Pennsylvania Smell Identification Test (UPSIT), reported in O Brien and Heft (1995). The study is undertaken to explore how age and gender are related to sense of smell. A total of 180 subjects 20 to 89 years old are exposed to 40 different odors: for each odor, subjects are asked to choose which of four words best describes the odor. The Freeman-Tukey modified arcsine transformation (Bishop et al. 1975) is applied to the proportion of correctly identified odors to arrive at an olfactory index. For the following analysis, subjects are divided into five age groups:
The following statements create a data set named upsit , containing the age group and olfactory index for each subject.
data upsit; input agegroup smell @@; datalines; 1 1.381 1 1.322 1 1.162 1 1.275 1 1.381 1 1.275 1 1.322 1 1.492 1 1.322 1 1.381 1 1.162 1 1.013 1 1.322 1 1.322 1 1.275 1 1.492 1 1.322 1 1.322 1 1.492 1 1.322 1 1.381 1 1.234 1 1.162 1 1.381 1 1.381 1 1.381 1 1.322 1 1.381 1 1.322 1 1.381 1 1.275 1 1.492 1 1.275 1 1.322 1 1.275 1 1.381 1 1.234 1 1.105 2 1.234 2 1.234 2 1.381 2 1.322 2 1.492 2 1.234 2 1.381 2 1.381 2 1.492 2 1.492 2 1.275 2 1.492 2 1.381 2 1.492 2 1.322 2 1.275 2 1.275 2 1.275 2 1.322 2 1.492 2 1.381 2 1.322 2 1.492 2 1.196 2 1.322 2 1.275 2 1.234 2 1.322 2 1.098 2 1.322 2 1.381 2 1.275 2 1.492 2 1.492 2 1.381 2 1.196 3 1.381 3 1.381 3 1.492 3 1.492 3 1.492 3 1.098 3 1.492 3 1.381 3 1.234 3 1.234 3 1.129 3 1.069 3 1.234 3 1.322 3 1.275 3 1.230 3 1.234 3 1.234 3 1.322 3 1.322 3 1.381 4 1.322 4 1.381 4 1.381 4 1.322 4 1.234 4 1.234 4 1.234 4 1.381 4 1.322 4 1.275 4 1.275 4 1.492 4 1.234 4 1.098 4 1.322 4 1.129 4 0.687 4 1.322 4 1.322 4 1.234 4 1.129 4 1.492 4 0.810 4 1.234 4 1.381 4 1.040 4 1.381 4 1.381 4 1.129 4 1.492 4 1.129 4 1.098 4 1.275 4 1.322 4 1.234 4 1.196 4 1.234 4 0.585 4 0.785 4 1.275 4 1.322 4 0.712 4 0.810 5 1.322 5 1.234 5 1.381 5 1.275 5 1.275 5 1.322 5 1.162 5 0.909 5 0.502 5 1.234 5 1.322 5 1.196 5 0.859 5 1.196 5 1.381 5 1.322 5 1.234 5 1.275 5 1.162 5 1.162 5 0.585 5 1.013 5 0.960 5 0.662 5 1.129 5 0.531 5 1.162 5 0.737 5 1.098 5 1.162 5 1.040 5 0.558 5 0.960 5 1.098 5 0.884 5 1.162 5 1.098 5 0.859 5 1.275 5 1.162 5 0.785 5 0.859 ;
Older people are more at risk for problems with their sense of smell, and this should be reflected in significant differences in the mean of the olfactory index across the different age groups. However, many older people also have an excellent sense of smell, which implies that the older age groups should have greater variability. In order to test this hypothesis and to compute a one-way ANOVA for the olfactory index that is robust to the possibility of unequal group variances, you can use the HOVTEST and WELCH options in the MEANS statement for the GLM procedure, as shown in the following code.
proc glm data=upsit; class agegroup; model smell = agegroup; means agegroup / hovtest welch; run;
Output 32.10.1, Output 32.10.2,and Output 32.10.3 display the usual ANOVA test for equal age group means, Levene s test for equal age group variances, and Welch s test for equal age group means, respectively. The hypotheses of age effects for mean and variance of the olfactory index are both confirmed.
The GLM Procedure Dependent Variable: smell Source DF Type I SS Mean Square F Value Pr > F agegroup 4 2.13878141 0.53469535 16.65 <.0001
The GLM Procedure Levene's Test for Homogeneity of smell Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F agegroup 4 0.0799 0.0200 6.35 <.0001 Error 175 0.5503 0.00314
The GLM Procedure Welch's ANOVA for smell Source DF F Value Pr > F agegroup 4.0000 13.72 <.0001 Error 78.7489
Yin and Jillie (1987) describe an experiment on a nitride etch process for a single wafer plasma etcher. The experiment is run using four factors: cathode power ( power ), gas flow ( flow ), reactor chamber pressure ( pressure ), and electrode gap ( gap ). Of interest are the main effects and interaction effects of the factors on the nitride etch rate ( rate ). The following statements create a SAS data set named HalfFraction , containing the factor settings and the observed etch rate for each of eight experimental runs.
data HalfFraction; input power flow pressure gap rate; datalines; 0.8 4.5 125 275 550 0.8 4.5 200 325 650 0.8 550.0 125 325 642 0.8 550.0 200 275 601 1.2 4.5 125 325 749 1.2 4.5 200 275 1052 1.2 550.0 125 275 1075 1.2 550.0 200 325 729 ;
Notice that each of the factors has just two values. This is a common experimental design when the intent is to screen from the many factors that might affect the response the few that actually do . Since there are 2 4 = 16 different possible settings of four two-level factors, this design with only eight runs is called a half fraction. The eight runs are chosen specifically to provide unambiguous information on main effects at the cost of confounding interaction effects with each other.
One way to analyze this data is simply to use PROC GLM to compute an analysis of variance, including both main effects and interactions in the model. The following statements demonstrate this approach.
proc glm data=HalfFraction; class power flow pressure gap; model rate=powerflowpressuregap@2; run;
The ˜@2 notation on the model statement includes all main effects and two-factor interactions between the factors. The output is shown in Output 32.11.1.
The GLM Procedure Class Level Information Class Levels Values power 2 0.8 1.2 flow 2 4.5 550 pressure 2 125 200 gap 2 275 325 Number of Observations Read 8 Number of Observations Used 8 The GLM Procedure Dependent Variable: rate Sum of Source DF Squares Mean Square F Value Pr > F Model 7 280848.0000 40121.1429 . . Error 0 0.0000 . Corrected Total 7 280848.0000 R-Square Coeff Var Root MSE rate Mean 1.000000 . . 756.0000 Source DF Type I SS Mean Square F Value Pr > F power 1 168780.5000 168780.5000 . . flow 1 264.5000 264.5000 . . power*flow 1 200.0000 200.0000 . . pressure 1 32.0000 32.0000 . . power*pressure 1 1300.5000 1300.5000 . . flow*pressure 1 78012.5000 78012.5000 . . gap 1 32258.0000 32258.0000 . . power*gap 0 0.0000 . . . flow*gap 0 0.0000 . . . pressure*gap 0 0.0000 . . . Source DF Type III SS Mean Square F Value Pr > F power 1 168780.5000 168780.5000 . . flow 1 264.5000 264.5000 . . power*flow 0 0.0000 . . . pressure 1 32.0000 32.0000 . . power*pressure 0 0.0000 . . . flow*pressure 0 0.0000 . . . gap 1 32258.0000 32258.0000 . . power*gap 0 0.0000 . . . flow*gap 0 0.0000 . . . pressure*gap 0 0.0000 . . .
Notice that there are no error degrees of freedom. This is because there are 10 effects in the model (4 main effects plus 6 interactions) but only 8 observations in the data set. This is another cost of using a fractional design: not only is it impossible to estimate all the main effects and interactions, but there is also no information left to estimate the underlying error rate in order to measure the significance of the effects that are estimable.
In order to analyze this confounding, you should examine the aliasing structure of the design using the ALIASING option in the MODEL statement. Before doing so, however, it is advisable to code the design, replacing low and high levels of each factor with the values ˆ’ 1 and +1, respectively. This puts each factor on an equal footing in the model and makes the aliasing structure much more interpretable. The following statements code the data, creating a new data set named Coded .
data Coded; set HalfFraction; power = 1*(power =0.80) + 1*(power =1.20); flow = 1*(flow =4.50) + 1*(flow =550); pressure = 1*(pressure=125) + 1*(pressure=200); gap = 1*(gap =275) + 1*(gap =325); run;
The following statements use the GLM procedure to reanalyze the coded design, displaying the parameter estimates as well as the functions of the parameters that they each estimate.
proc glm data=Coded; model rate=powerflowpressuregap@2 / solution aliasing; run;
The parameter estimates table is shown in Output 32.11.2.
The GLM Procedure Dependent Variable: rate Standard Parameter Estimate Error t Value Pr > t Expected Value Intercept 756.0000000 . . . Intercept power 145.2500000 . . . power flow 5.7500000 . . . flow power*flow 5.0000000 B . . . power*flow + pressure*gap pressure 2.0000000 . . . pressure power*pressure 12.7500000 B . . . power*pressure + flow*gap flow*pressure 98.7500000 B . . . flow*pressure + power*gap gap 63.5000000 . . . gap power*gap 0.0000000 B . . . flow*gap 0.0000000 B . . . pressure*gap 0.0000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
Looking at the Expected Value column, notice that, while each of the main effects is unambiguously estimated by its associated term in the model, the expected values of the interaction estimates are more complicated. For example, the relatively large effect ( ˆ’ 98.75) corresponding to flow * pressure actually estimates the combined effect of flow*pressure and power * gap . Without further information, it is impossible to disentangle these aliased interactions; however, since the main effects of both power and gap are large and those for flow and pressure are small, it is reasonable to suspect that power * gap is the more active of the two interactions.
Fortunately, eight more runs are available for this experiment (the other half fraction.) The following statements create a data set containing these extra runs and add it to the previous eight, resulting in a full 2 4 = 16 run replicate. Then PROC GLM displays the analysis of variance again.
data OtherHalf; input power flow pressure gap rate; datalines; 0.8 4.5 125 325 669 0.8 4.5 200 275 604 0.8 550.0 125 275 633 0.8 550.0 200 325 635 1.2 4.5 125 275 1037 1.2 4.5 200 325 868 1.2 550.0 125 325 860 1.2 550.0 200 275 1063 ; data FullRep; set HalfFraction OtherHalf; run; proc glm data=FullRep; class power flow pressure gap; model rate=powerflowpressuregap@2; run;
The results are displayed in Output 32.11.3.
The GLM Procedure Class Level Information Class Levels Values power 2 0.8 1.2 flow 2 4.5 550 pressure 2 125 200 gap 2 275 325 Number of Observations Read 16 Number of Observations Used 16 The GLM Procedure Dependent Variable: rate Sum of Source DF Squares Mean Square F Value Pr > F Model 10 521234.1250 52123.4125 25.58 0.0011 Error 5 10186.8125 2037.3625 Corrected Total 15 531420.9375 R-Square Coeff Var Root MSE rate Mean 0.980831 5.816175 45.13715 776.0625 Source DF Type I SS Mean Square F Value Pr > F power 1 374850.0625 374850.0625 183.99 <.0001 flow 1 217.5625 217.5625 0.11 0.7571 power*flow 1 18.0625 18.0625 0.01 0.9286 pressure 1 10.5625 10.5625 0.01 0.9454 power*pressure 1 1.5625 1.5625 0.00 0.9790 flow*pressure 1 7700.0625 7700.0625 3.78 0.1095 gap 1 41310.5625 41310.5625 20.28 0.0064 power*gap 1 94402.5625 94402.5625 46.34 0.0010 flow*gap 1 2475.0625 2475.0625 1.21 0.3206 pressure*gap 1 248.0625 248.0625 0.12 0.7414 Source DF Type III SS Mean Square F Value Pr > F power 1 374850.0625 374850.0625 183.99 <.0001 flow 1 217.5625 217.5625 0.11 0.7571 power*flow 1 18.0625 18.0625 0.01 0.9286 pressure 1 10.5625 10.5625 0.01 0.9454 power*pressure 1 1.5625 1.5625 0.00 0.9790 flow*pressure 1 7700.0625 7700.0625 3.78 0.1095 gap 1 41310.5625 41310.5625 20.28 0.0064 power*gap 1 94402.5625 94402.5625 46.34 0.0010 flow*gap 1 2475.0625 2475.0625 1.21 0.3206 pressure*gap 1 248.0625 248.0625 0.12 0.7414
With sixteen runs, the analysis of variance tells the whole story: all effects are estimable and there are five degrees of freedom left over to estimate the underlying error. The main effects of power and gap and their interaction are all significant, and no other effects are. Notice that the Type I and Type III ANOVA tables are the same; this is because the design is orthogonal and all effects are estimable.
This example illustrates the use of the GLM procedure for the model analysis of a screening experiment. Typically, there is much more involved in performing an experiment of this type, from selecting the design points to be studied to graphically assessing significant effects, optimizing the final model, and performing subsequent experimentation. Specialized tools for this are available in SAS/QC software, in particular the ADX Interface and the FACTEX and OPTEX procedures. Refer to SAS/QC User s Guide for more information.