Examples | SAS/STAT 9.1 Users Guide, Volumes 1-7

Example 77.1. Comparing Group Means Using Input Data Set of Summary Statistics

The following example, taken from Huntsberger and Billingsley (1989), compares two grazing methods using 32 steer. Half of the steer are allowed to graze continuously while the other half are subjected to controlled grazing time. The researchers want to know if these two grazing methods impact weight gain differently. The data are read by the following DATA step.

  title 'Group Comparison Using Input Data Set of Summary Statistics';   data graze;   length GrazeType $ 10;   input GrazeType $ WtGain @@;   datalines;   controlled  45   controlled  62   controlled  96   controlled 128   controlled 120   controlled  99   controlled  28   controlled  50   controlled 109   controlled 115   controlled  39   controlled  96   controlled  87   controlled 100   controlled  76   controlled  80   continuous  94   continuous  12   continuous  26   continuous  89   continuous  88   continuous  96   continuous  85   continuous 130   continuous  75   continuous  54   continuous 112   continuous  69   continuous 104   continuous  95   continuous  53   continuous  21   ;   run;

The variable GrazeType denotes the grazing method: ˜controlled' is controlled grazing and ˜continuous' is continuous grazing. The dollar sign ($) following GrazeType makes it a character variable, and the trailing at signs (@@) tell the procedure that there is more than one observation per line. The MEANS procedure is invoked to create a data set of summary statistics with the following statements:

  proc sort;   by GrazeType;   proc means data=graze noprint;   var WtGain;   by GrazeType;   output out=newgraze;   run;

The NOPRINT option eliminates all output from the MEANS procedure. The VAR statement tells PROC MEANS to compute summary statistics for the WtGain variable, and the BY statement requests a separate set of summary statistics for each level of GrazeType . The OUTPUT OUT= statement tells PROC MEANS to put the summary statistics into a data set called newgraze so that it may be used in subsequent procedures. This new data set is displayed in Output 77.1.1 by using PROC PRINT as follows :

Output 77.1.1: Output Data Set of Summary Statistics

  Group Comparison Using Input Data Set of Summary Statistics   Obs    GrazeType     _TYPE_    _FREQ_    _STAT_     WtGain   1    continuous       0        16       N         16.000   2    continuous       0        16       MIN       12.000   3    continuous       0        16       MAX      130.000   4    continuous       0        16       MEAN      75.188   5    continuous       0        16       STD       33.812   6    controlled       0        16       N         16.000   7    controlled       0        16       MIN       28.000   8    controlled       0        16       MAX      128.000   9    controlled       0        16       MEAN      83.125   10    controlled       0        16       STD       30.535

  proc print data=newgraze;   run;

The _STAT_ variable contains the names of the statistics, and the GrazeType variable indicates which group the statistic is from.

The following code invokes PROC TTEST using the newgraze data set, as denoted by the DATA= option.

  proc ttest data=newgraze;   class GrazeType;   var WtGain;   run;

The CLASS statement contains the variable that distinguishes between the groups being compared, in this case GrazeType . The summary statistics and confidence intervals are displayed first, as shown in Output 77.1.2

Output 77.1.2: Summary Statistics

  The TTEST Procedure   Statistics   Lower CL          Upper CL Lower CL   Variable GrazeType        N      Mean    Mean      Mean   Std Dev Std Dev   WtGain    continuous     16    57.171 75.188    93.204          .   33.812   WtGain    controlled     16    66.854 83.125    99.396          .   30.535   WtGain    Diff (1-2)   31.2   7.938    15.323    25.743   32.215   Statistics   Upper CL   Variable  GrazeType    Std Dev    Std Err    Minimum    Maximum   WtGain    continuous         .     8.4529         12        130   WtGain    controlled         .     7.6337         28        128   WtGain    Diff (1-2)    43.061      11.39

In Output 77.1.2, the Variable column states the variable used in computations and the Class column specifies the group for which the statistics are computed. For each class, the sample size , mean, standard deviation and standard error, and maximum and minimum values are displayed. The confidence bounds for the mean are also displayed; however, since summary statistics are used as input, the confidence bounds for the standard deviation of the groups are not calculated.

Output 77.1.3 shows the results of tests for equal group means and equal variances. A group test statistic for the equality of means is reported for equal and unequal variances. Before deciding which test is appropriate, you should look at the test for equality of variances; this test does not indicate a significant difference in the two variances ( F ² =1 . 23 , p = 0 . 6981), so the pooled t statistic should be used. Based on the pooled statistic, the two grazing methods are not significantly different ( t = 0 . 70 , p = 0 . 4912). Note that this test assumes that the observations in both data sets are normally distributed; this assumption can be checked in PROC UNIVARIATE using the raw data.

Output 77.1.3: t Tests

  T-Tests   Variable    Method           Variances      DF    t Value    Pr > t   WtGain      Pooled           Equal          30   0.70      0.4912   WtGain      Satterthwaite    Unequal      29.7   0.70      0.4913   Equality of Variances   Variable    Method      Num DF    Den DF    F Value    Pr > F   WtGain      Folded F        15        15       1.23    0.6981

Example 77.2. One-Sample Comparison Using the FREQ Statement

This example examines children's reading skills. The data consist of Degree of Reading Power (DRP) test scores from 44 third-grade children and are taken from Moore (1995, p. 337). Their scores are given in the following DATA step.

  title 'One-Mean Comparison Using FREQ Statement';   data read;   input score count @@;   datalines;   40 2   47 2   52 2   26 1   19 2   25 2   35 4   39 1   26 1   48 1   14 2   22 1   42 1   34 2   33 2   18 1   15 1   29 1   41 2   44 1   51 1   43 1   27 2   46 2   28 1   49 1   31 1   28 1   54 1   45 1   ;   run;

The following statements invoke the TTEST procedure to test if the mean test score is equal to 30. The count variable contains the frequency of occurrence of each test score; this is specified in the FREQ statement.

  proc ttest data=read h0=30;   var score;   freq count;   run;

The output, shown in Output 77.2.1, contains the results.

Output 77.2.1: TTEST Results

  One-Mean Comparison Using FREQ Statement   The TTEST Procedure   Statistics   Lower CL          Upper CL Lower CL            Upper CL   Variable      N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev  Std Err Minimum   Maximum   score        44    31.449 34.864     38.278    9.2788    11.23    14.229    1.693      14        54   T-Tests   Variable      DF    t Value    Pr > t   score         43       2.87      0.0063

The SAS log states that 30 observations and two variables have been read. However, the sample size given in the TTEST output is N=44. This is due to specifying the count variable in the FREQ statement. The test is significant ( t =2 . 87, p =0 . 0063) at the 5% level, thus you can conclude that the mean test score is different from 30.

Example 77.3. Paired Comparisons

When it is not feasible to assume that two groups of data are independent, and a natural pairing of the data exists, it is advantageous to use an analysis that takes the correlation into account. Utilizing this correlation results in higher power to detect existing differences between the means. The differences between paired observations are assumed to be normally distributed. Some examples of this natural pairing are

pre- and post-test scores for a student receiving tutoring
fuel efficiency readings of two fuel types observed on the same automobile
sunburn scores for two sunblock lotions, one applied to the individual's right arm, one to the left arm
political attitude scores of husbands and wives

In this example, taken from SUGI Supplemental Library User's Guide, Version 5 Edition , a stimulus is being examined to determine its effect on systolic blood pressure. Twelve men participate in the study. Their systolic blood pressure is measured both before and after the stimulus is applied. The following statements input the data:

  title 'Paired Comparison';   data pressure;   input SBPbefore SBPafter @@;   datalines;   120 128   124 131   130 131   118 127   140 132   128 125   140 141   135 137   126 118   130 132   126 129   127 135   ;   run;

The variables SBPbefore and SBPafter denote the systolic blood pressure before and after the stimulus, respectively.

The statements to perform the test follow.

  proc ttest;   paired SBPbefore*SBPafter;   run;

The PAIRED statement is used to test whether the mean change in systolic blood pressure is significantly different from zero. The output is displayed in Output 77.3.1.

Output 77.3.1: TTEST Results

  Paired Comparison   The TTEST Procedure   Statistics   Lower CL          Upper CL  Lower CL           Upper CL   Difference                N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev Std Err Minimum   Maximum   SBPbefore - SBPafter     12   5.536   1.833    1.8698    4.1288   5.8284    9.8958   1.6825   9         8   T-Tests   Difference                 DF    t Value    Pr > t   SBPbefore - SBPafter       11   1.09      0.2992

The variables SBPbefore and SBPafter are the paired variables with a sample size of 12. The summary statistics of the difference are displayed (mean, standard deviation, and standard error) along with their confidence limits. The minimum and maximum differences are also displayed. The t test is not significant ( t = ˆ’ 1 . 09, p =0 . 2992), indicating that the stimuli did not significantly affect systolic blood pressure.

Note that this test of hypothesis assumes that the differences are normally distributed. This assumption can be investigated using PROC UNIVARIATE with the NORMAL option. If the assumption is not satisfied, PROC NPAR1WAY should be used.