Hack 20. Sample with a Touch of Scotch


When statisticians choose samples of people from populations, they are really sampling from continuous distributions of variables. Sampling is sometimes easier to understand, though, by treating your variables as discrete objects, not continuous scores.

The most powerful statistical procedures use scores at the interval level of measurement or higher [Hack #7]. To sample scores from a population, social science researchers usually choose people, though, not scores. The people are then measured, which results in a sample of scores. So far, so good.

When discussing the sampling process, however, smart researchers sometimes sound not-so-smart when they refer to their sampling strategy. For example, if a researcher is interested in measuring the effects of some treatment on a continuous variable such as happiness, he might say (and think), "OK, first I need to get a sample full of happy and unhappy people." He, at least for the moment of the thought, is treating happiness as if it were a dichotomous variable.

Dichotomous is statistics jargon meaning "having only two values." For example, biological sex is a dichotomous variable.


He is referring to people as if they are either completely happy or entirely unhappy. In reality, of course, he thinks there is a large range of happiness scores that describe people, which is why he is using statistics that make the assumption of interval measurement.

He refers to his participants as either/or because doing so makes it easier for him to picture the representativeness of his sampling. It's a smart strategy, because by thinking of samples as representing big, discrete categories instead of more precise, continuous values, this sometimes makes questions about sampling easier to answer and justify.

A Sampling Problem

Here's a brainteaser that centers on a sampling question. A drunk, untenured statistician (I've met a few) is mixing drinks at a party. He is making a Scotch and soda for his department chair. The chair demands a drink with some exact proportion of Scotch to water (it doesn't matter what the specific request is; our hero never makes it that far).

The statistician starts with two glasses of the same size. One glass (the first glass) has two ounces of Scotch in it; the other (the second glass) has two ounces of water in it. He starts by pouring an ounce of water from the water glass into the Scotch. He apparently already screwed up, because he changes his mind and pours an ounce of the new mixture (three ounces of Scotch and water mixed up) back into the water glass. Both glasses now have two ounces of liquid in them, but the liquid in each glass is some mix of water and Scotch.

Nervously, the statistician attempts to start all over, but his department chair stops him. She says:

I have a proposition for you. We can't possibly know the exact proportion of Scotch and water in each glass right now, because we can't know how mixed up everything is. But if you can answer the following question correctly, I'll write a strong letter of support to your tenure committee. If not, well, I'm sure someone with your qualifications should have no trouble finding work in the hotel/motel or food service industry. Here's the question: right now, does the first glass have more water in it, or does the second glass have more Scotch in it?

Think of the question as a sampling issue. Does the first sample, the liquid in the first glass, have more water in it, or does the second sample, the liquid in the second glass, have more Scotch in it? Because both Scotch and water are made up of really small particles, it is difficult to picture how much of each liquid is represented in each sample. Even proportionately, we can't be sure how many water particles (or sampled scores that equal "water") are mixed into the sample of "Scotch" scores, because who knows how much water drifted down into the bottom of the first glass and would have remained there as the top part of the liquid near the surface was poured back into the second glass. An intuitive answer is called for. Unfortunately, it is wrong.

The intuitive answer typically generated by smart people is that the first glass, the Scotch glass, has more water in it than the water glass has Scotch in it. This makes sense because pure water was poured into the Scotch, while some mix of water and Scotch was poured back into the water glass. Amazingly, this clever thinking leads us astray. The correct answer is that the proportions are equal! There is the same amount of water in the Scotch glass as there is Scotch in the water glass.

Using Metaphor to Solve the Problem

The solution to the sampling problem is clearer if we imagine that our variables are not tiny particles, but instead are large categories, such as blue and white marbles. Instead of a glass of Scotch, imagine a glass of 100 blue marbles. Instead of a glass of water, imagine a glass of 100 white marbles.

The glasses are big, so the marbles can get mixed together well. Think large glass fishbowls. This is necessary to ensure that random selection is possible, as was likely with the mixed-up liquids. Keep your eye on the marbles through each step of the mixing.

Our hero takes 50 white marbles from the second glass and mixes them into the first glass. The distribution of the two variables is now:


Sample 1

100 blue marbles, 50 white marbles


Sample 2

50 white marbles

Now, he (randomly, remember, to simulate the mixed liquids) takes any 50 marbles from the first glass and mixes them back into the second glass. Let's imagine a variety of possibilities.

If by chance he selects all the white marbles, they go back into the second glass and the distribution is now:


Sample 1

100 blue marbles


Sample 2

100 white marbles

If by chance he selects no white marbles and puts 50 blue marbles into the second glass, the distribution is:


Sample 1

50 blue marbles, 50 white marbles


Sample 2

50 white marbles, 50 blue marbles

Now imagine a more likely scenario: some of the marbles he randomly draws are white and some are blue. For example, he could draw out 10 white marbles and 40 blue marbles and place them in the second glass. In that case, the new distribution is:


Sample 1

60 blue marbles, 40 white marbles


Sample 2

60 white marbles, 40 blue marbles

Try this with any mix of marbles you wish, but remember you have to draw out a total of 50 marbles (to duplicate the one ounce, or half, of the water originally mixed up).

Notice that any mixture you try results in 100 marbles in each glass at the end. Also, most importantly, notice that the ratio of blue to white marbles in the first glass at the end is always equal to the ratio of white to blue marbles in the second glass. Any blue marble that is not in the second glass must be in the first glass, and any white marble that is not in the first glass must be in the second glass.

The same is true for Scotch and water. The correct answer is that the proportions will be equal, no matter how they were originally mixed up.

Where Else It Works

Real-life polling companies, who make their living and stake their reputations on the accuracy of election predictions, are also primarily concerned with the proportion of samples who are in each of several crucial categories. If people have just voted and there are two candidates, anyone who did not vote for candidate A voted for candidate B. Their absence in one category guarantees their presence in the other. Reporting predictions as percentages creates the potential for greater accuracy. It also allows for greater error, as a voter predicted to be in category A who ends up in category B has therefore produced error in both categories.

When statistical social science researchers want to be convinced that their sample is representative of its population, their primary concern is always the proportions of characteristics in their sample, not the number of people with those characteristics. What matters most is that the proportions of each score for the key research variables are the same in both samples and their populations.




Statistics Hacks
Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds
ISBN: 0596101643
EAN: 2147483647
Year: 2004
Pages: 114
Authors: Bruce Frey

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net