The following example, taken from Huntsberger and Billingsley (1989), compares two grazing methods using 32 steer. Half of the steer are allowed to graze continuously while the other half are subjected to controlled grazing time. The researchers want to know if these two grazing methods impact weight gain differently. The data are read by the following DATA step.
title 'Group Comparison Using Input Data Set of Summary Statistics'; data graze; length GrazeType $ 10; input GrazeType $ WtGain @@; datalines; controlled 45 controlled 62 controlled 96 controlled 128 controlled 120 controlled 99 controlled 28 controlled 50 controlled 109 controlled 115 controlled 39 controlled 96 controlled 87 controlled 100 controlled 76 controlled 80 continuous 94 continuous 12 continuous 26 continuous 89 continuous 88 continuous 96 continuous 85 continuous 130 continuous 75 continuous 54 continuous 112 continuous 69 continuous 104 continuous 95 continuous 53 continuous 21 ; run;
The variable GrazeType denotes the grazing method: ˜controlled' is controlled grazing and ˜continuous' is continuous grazing. The dollar sign ($) following GrazeType makes it a character variable, and the trailing at signs (@@) tell the procedure that there is more than one observation per line. The MEANS procedure is invoked to create a data set of summary statistics with the following statements:
proc sort; by GrazeType; proc means data=graze noprint; var WtGain; by GrazeType; output out=newgraze; run;
The NOPRINT option eliminates all output from the MEANS procedure. The VAR statement tells PROC MEANS to compute summary statistics for the WtGain variable, and the BY statement requests a separate set of summary statistics for each level of GrazeType . The OUTPUT OUT= statement tells PROC MEANS to put the summary statistics into a data set called newgraze so that it may be used in subsequent procedures. This new data set is displayed in Output 77.1.1 by using PROC PRINT as follows :
Group Comparison Using Input Data Set of Summary Statistics Obs GrazeType _TYPE_ _FREQ_ _STAT_ WtGain 1 continuous 0 16 N 16.000 2 continuous 0 16 MIN 12.000 3 continuous 0 16 MAX 130.000 4 continuous 0 16 MEAN 75.188 5 continuous 0 16 STD 33.812 6 controlled 0 16 N 16.000 7 controlled 0 16 MIN 28.000 8 controlled 0 16 MAX 128.000 9 controlled 0 16 MEAN 83.125 10 controlled 0 16 STD 30.535
proc print data=newgraze; run;
The _STAT_ variable contains the names of the statistics, and the GrazeType variable indicates which group the statistic is from.
The following code invokes PROC TTEST using the newgraze data set, as denoted by the DATA= option.
proc ttest data=newgraze; class GrazeType; var WtGain; run;
The CLASS statement contains the variable that distinguishes between the groups being compared, in this case GrazeType . The summary statistics and confidence intervals are displayed first, as shown in Output 77.1.2
The TTEST Procedure Statistics Lower CL Upper CL Lower CL Variable GrazeType N Mean Mean Mean Std Dev Std Dev WtGain continuous 16 57.171 75.188 93.204 . 33.812 WtGain controlled 16 66.854 83.125 99.396 . 30.535 WtGain Diff (1-2) 31.2 7.938 15.323 25.743 32.215 Statistics Upper CL Variable GrazeType Std Dev Std Err Minimum Maximum WtGain continuous . 8.4529 12 130 WtGain controlled . 7.6337 28 128 WtGain Diff (1-2) 43.061 11.39
In Output 77.1.2, the Variable column states the variable used in computations and the Class column specifies the group for which the statistics are computed. For each class, the sample size , mean, standard deviation and standard error, and maximum and minimum values are displayed. The confidence bounds for the mean are also displayed; however, since summary statistics are used as input, the confidence bounds for the standard deviation of the groups are not calculated.
Output 77.1.3 shows the results of tests for equal group means and equal variances. A group test statistic for the equality of means is reported for equal and unequal variances. Before deciding which test is appropriate, you should look at the test for equality of variances; this test does not indicate a significant difference in the two variances ( F ² =1 . 23 , p = 0 . 6981), so the pooled t statistic should be used. Based on the pooled statistic, the two grazing methods are not significantly different ( t = 0 . 70 , p = 0 . 4912). Note that this test assumes that the observations in both data sets are normally distributed; this assumption can be checked in PROC UNIVARIATE using the raw data.
T-Tests Variable Method Variances DF t Value Pr > t WtGain Pooled Equal 30 0.70 0.4912 WtGain Satterthwaite Unequal 29.7 0.70 0.4913 Equality of Variances Variable Method Num DF Den DF F Value Pr > F WtGain Folded F 15 15 1.23 0.6981
This example examines children's reading skills. The data consist of Degree of Reading Power (DRP) test scores from 44 third-grade children and are taken from Moore (1995, p. 337). Their scores are given in the following DATA step.
title 'One-Mean Comparison Using FREQ Statement'; data read; input score count @@; datalines; 40 2 47 2 52 2 26 1 19 2 25 2 35 4 39 1 26 1 48 1 14 2 22 1 42 1 34 2 33 2 18 1 15 1 29 1 41 2 44 1 51 1 43 1 27 2 46 2 28 1 49 1 31 1 28 1 54 1 45 1 ; run;
The following statements invoke the TTEST procedure to test if the mean test score is equal to 30. The count variable contains the frequency of occurrence of each test score; this is specified in the FREQ statement.
proc ttest data=read h0=30; var score; freq count; run;
The output, shown in Output 77.2.1, contains the results.
One-Mean Comparison Using FREQ Statement The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Minimum Maximum score 44 31.449 34.864 38.278 9.2788 11.23 14.229 1.693 14 54 T-Tests Variable DF t Value Pr > t score 43 2.87 0.0063
The SAS log states that 30 observations and two variables have been read. However, the sample size given in the TTEST output is N=44. This is due to specifying the count variable in the FREQ statement. The test is significant ( t =2 . 87, p =0 . 0063) at the 5% level, thus you can conclude that the mean test score is different from 30.
When it is not feasible to assume that two groups of data are independent, and a natural pairing of the data exists, it is advantageous to use an analysis that takes the correlation into account. Utilizing this correlation results in higher power to detect existing differences between the means. The differences between paired observations are assumed to be normally distributed. Some examples of this natural pairing are
pre- and post-test scores for a student receiving tutoring
fuel efficiency readings of two fuel types observed on the same automobile
sunburn scores for two sunblock lotions, one applied to the individual's right arm, one to the left arm
political attitude scores of husbands and wives
In this example, taken from SUGI Supplemental Library User's Guide, Version 5 Edition , a stimulus is being examined to determine its effect on systolic blood pressure. Twelve men participate in the study. Their systolic blood pressure is measured both before and after the stimulus is applied. The following statements input the data:
title 'Paired Comparison'; data pressure; input SBPbefore SBPafter @@; datalines; 120 128 124 131 130 131 118 127 140 132 128 125 140 141 135 137 126 118 130 132 126 129 127 135 ; run;
The variables SBPbefore and SBPafter denote the systolic blood pressure before and after the stimulus, respectively.
The statements to perform the test follow.
proc ttest; paired SBPbefore*SBPafter; run;
The PAIRED statement is used to test whether the mean change in systolic blood pressure is significantly different from zero. The output is displayed in Output 77.3.1.
Paired Comparison The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Minimum Maximum SBPbefore - SBPafter 12 5.536 1.833 1.8698 4.1288 5.8284 9.8958 1.6825 9 8 T-Tests Difference DF t Value Pr > t SBPbefore - SBPafter 11 1.09 0.2992
The variables SBPbefore and SBPafter are the paired variables with a sample size of 12. The summary statistics of the difference are displayed (mean, standard deviation, and standard error) along with their confidence limits. The minimum and maximum differences are also displayed. The t test is not significant ( t = ˆ’ 1 . 09, p =0 . 2992), indicating that the stimuli did not significantly affect systolic blood pressure.
Note that this test of hypothesis assumes that the differences are normally distributed. This assumption can be investigated using PROC UNIVARIATE with the NORMAL option. If the assumption is not satisfied, PROC NPAR1WAY should be used.