Getting Started | SAS/STAT 9.1 Users Guide, Volumes 1-7

One-Sample t Test

A one-sample t test can be used to compare a sample mean to a given value. This example, taken from Huntsberger and Billingsley (1989, p. 290), tests whether the mean length of a certain type of court case is 80 days using 20 randomly chosen cases. The data are read by the following DATA step:

  title 'One-Sample t Test';   data time;   input time @@;   datalines;   43  90  84  87  116   95  86   99   93  92   121  71  66  98   79  102  60  112  105  98   ;   run;

The only variable in the data set, time , is assumed to be normally distributed. The trailing at signs (@@) indicate that there is more than one observation on a line. The following code invokes PROC TTEST for a one-sample t test:

  proc ttest h0=80 alpha=0.1;   var time;   run;

The VAR statement indicates that the time variable is being studied, while the H0= option specifies that the mean of the time variable should be compared to the value 80 rather than the default null hypothesis of 0. This ALPHA= option requests 10% confidence intervals rather than the default 5% confidence intervals. The output is displayed in Figure 77.1

  One-Sample t Test   The TTEST Procedure   Statistics   Lower CL          Upper CL   Lower CL          Upper CL   Variable      N      Mean    Mean      Mean    Std Dev  Std Dev  Std Dev  Std Err Minimum   Maximum   time         20    82.447   89.85    97.253      15.2   19.146    26.237   4.2811      43       121   T-Tests   Variable      DF    t Value    Pr > t   time          19       2.30      0.0329

Figure 77.1: One-Sample t Test Results

Summary statistics appear at the top of the output. The sample size (N), the mean and its confidence bounds (Lower CL Mean and Upper CL Mean), the standard deviation and its confidence bounds (Lower CL Std Dev and Upper CL Std Dev), and the standard error are displayed with the minimum and maximum values of the time variable. The test statistic, the degrees of freedom, and the p -value for the t test are displayed next ; at the 10% ± -level, this test indicates that the mean length of the court cases are significantly different from 80 days ( t =2 . 30 ,p =0 . 0329).

Comparing Group Means

If you want to compare values obtained from two different groups, and if the groups are independent of each other and the data are normally distributed in each group, then a group t test can be used. Examples of such group comparisons include

test scores for two third-grade classes, where one of the classes receives tutoring
fuel efficiency readings of two automobile nameplates, where each nameplate uses the same fuel
sunburn scores for two sunblock lotions, each applied to a different group of people
political attitude scores of males and females

In the following example, the golf scores for males and females in a physical education class are compared. The sample sizes from each population are equal, but this is not required for further analysis. The data are read by the following statements:

  title 'Comparing Group Means';   data scores;   input Gender $ Score @@;   datalines;   f 75  f 76  f 80  f 77  f 80  f 77   f 73   m 82  m 80  m 85  m 85  m 78  m 87   m 82   ;   run;

The dollar sign ($) following Gender in the INPUT statement indicates that Gender is a character variable. The trailing at signs (@@) enable the procedure to read more than one observation per line.

You can use a group t test to determine if the mean golf score for the men in the class differs significantly from the mean score for the women. If you also suspect that the distributions of the golf scores of males and females have unequal variances, then submitting the following statements invokes PROC TTEST with options to deal with the unequal variance case.

  proc ttest cochran ci=equal umpu;   class Gender;   var Score;   run;

The CLASS statement contains the variable that distinguishes the groups being compared, and the VAR statement specifies the response variable to be used in calculations. The COCHRAN option produces p -values for the unequal variance situation using the Cochran and Cox(1950) approximation . Equal tailed and uniformly most powerful unbiased (UMPU) confidence intervals for ƒ are requested by the CI= option. Output from these statements is displayed in Figure 77.2 through Figure 77.4.

  Comparing Group Means   The TTEST Procedure   Statistics   UMPU   Lower CL          Upper CL  Lower CL   Lower CL   Variable  Gender          N      Mean    Mean      Mean   Std Dev    Std Dev Std Dev   Score     f               7    74.504  76.857    79.211    1.6399    1.5634   2.5448   Score     m               7    79.804  82.714    85.625     2.028    1.9335   3.1472   Score     Diff (1-2)   9.19   5.857   2.524    2.0522    2.0019   2.8619   Statistics   UMPU   Upper CL    Upper CL   Variable Gender        Std Dev     Std Dev    Std Err    Minimum    Maximum   Score    f              5.2219      5.6039     0.9619         73         80   Score    m              6.4579      6.9303     1.1895         78         87   Score    Diff (1-2)     4.5727      4.7242     1.5298

Figure 77.2: Simple Statistics

  T-Tests   Variable    Method           Variances      DF    t Value    Pr > t   Score       Pooled           Equal          12   3.83      0.0024   Score       Satterthwaite    Unequal      11.5   3.83      0.0026   Score       Cochran          Unequal         6   3.83      0.0087

Figure 77.3: t Tests

  Equality of Variances   Variable    Method      Num DF    Den DF    F Value    Pr > F   Score       Folded F         6         6       1.53    0.6189

Figure 77.4: Tests of Equality of Variances

Simple statistics for the two populations being compared, as well as for the difference of the means between the populations, are displayed in Figure 77.2. The Variable column denotes the response variable, while the Class column indicates the population corresponding to the statistics in that row. The sample size (N) for each population, the sample means (Mean), and lower and upper confidence bounds for the means (Lower CL Mean and Upper CL Mean) are displayed next. The standard deviations (Std Dev) are displayed as well, with equal tailed confidence bounds in the Lower CL Std Dev and Upper CL Std Dev columns and UMPU confidence bounds in the UMPU Upper CL Std Dev and UMPU Lower CL Std Dev columns . In addition, standard error of the mean and the minimum and maximum data values are displayed.

The test statistics, associated degrees of freedom, and p -values are displayed in Figure 77.3. The Method column denotes which t test is being used for that row, and the Variances column indicates what assumption about variances is being made. The pooled test assumes that the two populations have equal variances and uses degrees of freedom n ₁ + n ₂ ˆ’ 2, where n ₁ and n ₂ are the sample sizes for the two populations. The remaining two tests do not assume that the populations have equal variances. The Satterthwaite test uses the Satterthwaite approximation for degrees of freedom, while the Cochran test uses the Cochran and Cox approximation for the p -value.

Examine the output in Figure 77.4 to determine which t test is appropriate. The 'Equality of Variances' test results show that the assumption of equal variances is reasonable for these data (the Folded F statistic F ² = 1 . 53, with p = 0 . 6189). If the assumption of normality is also reasonable, the appropriate test is the usual pooled t test, which shows that the average golf scores for men and women are significantly different ( t = ˆ’ 3 . 83, p = 0 . 0024). If the assumption of equality of variances is not reasonable, then either the Satterthwaite or the Cochran test should be used.

The assumption of normality can be checked using PROC UNIVARIATE; if the assumption of normality is not reasonable, you should analyze the data with the nonparametric Wilcoxon Rank Sum test using PROC NPAR1WAY.