Hack 17. Compare Two Groups

Which is better? Which has more? Do people really differ? Quantitative questions like these dominate the polite conversations of our times. If you want some real evidence for your beliefs about the best, most, and least, you can use a statistical tool called the "t test" to support your point.

My Uncle Frank is full of opinions. Green M&Ms taste better than blue. Women never get speeding tickets. The Brady Bunch kids could sing better than the Partridge Family. Plaid is back. He can argue all day spouting half-baked idea after half-baked idea. While I disagree with him on all four points (especially the position that plaid is backafter all, it never left!), I have only my opinions to fight with.

If only there were some scientific way to prove whether Uncle Frank is right or wrong! You no doubt recognize the rhetorical nature of my plea. After all, there are only about a gazillion statistical tools that exist to test hypotheses like these. One of the simplest tools is designed to test the simplest of claims. If the problem is deciding whether one group differs from another, the procedure known as an independent t test is the best solution.

Proving Uncle Frank Wrong (or Right)

To apply a t test to investigate one of Uncle Frank's theories, we have to compute a t value. Let's imagine that I decided to actually challenge Uncle Frank and collect some data to see whether he is right or wrong.

Uncle Frank believes that males get speeding tickets more frequently than females. To test this hypothesis, imagine that I select two groups of 15 drivers randomly [Hack #19] from his neighborhood. One group is female, and the other is male. I ask them some questions. Pretend that over the course of the last five years, the male group averaged 1.71 speeding tickets with a variance of .71. The female group averaged 1.35 speeding tickets with a variance of .25.

Variance is the total amount of variability in a given group of numbers. It is calculated by finding the distance of each score in the group from the mean score. Square those distances and average them to get the variance.

Here is the equation for producing a t value:

The larger the t value, the less likely that any differences found between your sample groups occurred by chance. Typically, t values larger than about 2 are big enough to reach the conclusion that the differences exist in the whole population, not just in your samples.

The t formula shown here works best when both groups have the same number of people in them. A similar formula that averages the variance information is used when there are unequal sample sizes.

Is there support for Uncle Frank's belief? To determine that, our calculations require the data in Table 2-9.

Table Data for speeding ticket t test
	Group 1 (males)	Group 2 (females)
Mean	1.71	1.35
Variance	.71	.25
Sample Size	15	15

If we place those key values into our t formula, it looks like this:

The calculations work out this way:

In this case, a mean difference of .36 produces a t value of 1.42.

Interpreting the t Value

Could our t value of 1.42 have occurred by chance? In other words, if the actual difference in the population is zero, could two samples drawn from that single population produce means that differ by that much?

Earlier, I mentioned that values of 2 or greater are typically required to reach this conclusion. Under this standard, we would conclude that there is no evidence that males really do get more tickets than females. They did in our sample, of course, but might not if we measured everybody (the whole population). There is no evidence that Uncle Frank is right. This is different in an important way from concluding that he is wrong, but it still means he should lose this particular argument.

Statistics is all about precision, though, so let's explore our 1.42 a little further. How big, exactly, would it need to be for us to conclude that Uncle Frank is actually right?

The answer, determined through custom, is that if the t is bigger than would occur by chance 5 percent of the time or less, then the t is big enough. Fortunately, the chances of finding ts of various sizes when drawn randomly from a population has been determined by hard-working mathematicians using assumptions of the Central Limit Theorem [Hack #2]. The exact t value required for statistical significance depends on the total sample size in both groups combined. Table 2-10 provides t values that you must meet or beat to declare statistical significance at the .05 level.

Table t values occurring by chance less than 5 percent of the time
Sample size in both groups combined	Critical t value
4	4.30
20	2.10
30	2.05
60	2.00
100	1.99
\x91 (infinity)	1.96

For sample sizes other than those shown in Table 2-10, you can figure out the rough t value you need to meet or beat by estimating the value between the values shown. Also, the chart assumes that you want to identify differences between groups in either direction. It assumes you want to know whether either group mean is larger than the other. This is what statisticians call a two-tailed test, and it is usually the comparison of interest.

Using Table 2-10, we see that a t value of 1.42 is less than the critical value for a total of 30 subjects. We need to see a t value greater than 2.05 to be confident that the sample differences we observed did not occur just by chance.

Why It Works

Social scientists use this comparison method all the time. Experimental and quasi-experimental designs often have two groups of people who are believed to be different in some way or another. You might be interested in the differences between Republicans and Democrats or girls and boys, or you might want to see if a group taking a new drug has fewer colds than a group not taking a drug at all.

Such designs produce two sets of scores, and those sets of values often differ, at least in the samples used. Researchers (and I, too, when it comes to proving Uncle Frank wrong) are more interested in whether there would be differences in the populations represented by the two samples.

The logic of inferential statistics is that a sample of scores represents a larger population of scores. If the samples differ on some variable, that difference might be reflected in the populations from which they were drawn. Or that difference might be due to errors resulting from the sampling.

A t test answers the question of whether any differences found between two samples are real (i.e., they probably exist in the populations from which the samples were drawn) or due to sampling error (i.e., they probably exist only in the samples). If the difference between the samples is too large to have occurred by chance, researchers conclude that there is a real difference between the populations.

The t test formula uses information about the shape of the sample distributions of scores. The needed information is the mean score on the research variable in each group, each group's variance, and the sample size of each group. The sample mean provides a good guess as to the population mean, the variances give an indication as to how much the sample mean might have varied from the population mean, and the sample size suggests the precision of the estimate. The difference between the two means is standardized and is expressed as a t value.

The way statisticians talk about real differences is "the two samples were likely drawn from different populations." The way you and I and researchers might talk about real differences is "Republicans and Democrats differ" or "the drug reduces the chance of getting a cold."

Where Else It Works

Numbers don't know where they come from. You can use t tests to look at differences in any two sets of numbers, whether those numbers describe people or things. In fact, the t test was first developed to determine the quality of an entire elevator full of grain used in beer production.

Instead of examining all the grain, a beer statistician (how's that for a dream job?) wanted a method that requires looking at a small sample only, randomly drawn from the larger population of grain. The rest is history, and so we can say today that much of the work done by statistical researchers is literally driven by beer.

Proving Uncle Frank Wrong (or Right)

Table Data for speeding ticket t test

Interpreting the t Value

Table t values occurring by chance less than 5 percent of the time

Why It Works

Where Else It Works