Hack 30. Improve Your Test Score While Watching Paint Dry

If you don't like the score you just got on that important high-stakes test, maybe you should take the test again. Or should you?

We've already discussed how to measure anything precisely by applying concepts of reliability [Hack #6]. Reliability is the consistency with which a test assesses some outcome. In other words, a reliable test produces a stable score, and an unreliable test does not. Because tests that are less than perfectly reliable produce scores at least partly due to random chance, their scores can move around in ways that statisticians can predict. Because your test score when you retake a test will tend to move toward the average score on that test, this effect is called regression toward the mean.

When you take a high-stakes test such as the SAT, ACT, GRE, LSAT, or MCAT, you often have the option of retaking it to try to improve your score. Your decision on whether it is worth the time, hard work, and money to try to improve your test score should be made with an understanding of the test's reliability and how much change is possible simply through regression to the mean.

Regressing to the Mean

First, let's make regression to the mean occur, so you'll believe that scores can change in a predictable direction for no reason other than the characteristics of the normal curve [Hack #23]. Seeing is believing, and I hope to make this invisible magical phenomenon happen before your eyes.

Give the true/false quiz shown in Table 3-8 to 100 of your closest friends. Well, OK, maybe 10 people, counting you. 1,000 would be even better, but I just need enough to prove to you that this regression thing happens. As we proceed, keep in mind that if we had 100 or 1,000 takers of this very difficult (or very easy) test, the results would be even more convincing.

Oh, and for this test, you don't have to see the actual questions themselves. Scores will change on this test without any change in the construct that is being measured [Hack #32]. So, all you can do on this quiz is guess. Because they are true/false questions, you will have a 50 percent chance of getting any question correct, and the average performance for your group of 10 test takers (or 100 if you are really serious about this...can you do at least 30 maybe?...anyone?) should be a score of 5 out of 10.

Table Advanced Quantum Physics Quiz
Question	Circle Your Answer
1.	True or False
2.	True or False
3	True or False
4.	True or False
5.	True or False
6.	True or False
7.	True or False
8.	True or False
9.	True or False
10.	True or False

Administer the Advanced Quantum Physics Quiz to all the people you were able to get. And when you and the others take this quiz, don't cheat by looking at the answer key, even though it is only inches away from your eyes right now (in Table 3-9)!

Table Answer key for the Advanced Quantum Physics Quiz
1. True	2. True	3. False	4. False	5. True
6. False	7. False	8. True	9. True	10. False

Collect the completed tests (make sure they put their names on them) and score them up, using the answer key in Table 3-9.

Now, pick your highest scorer (this represents someone like you, perhaps, who scores higher than average on standardized tests such as the SAT) and the lowest scorer (this represents someone not like you, perhaps, who scores lower than average). Give these two people the quiz again (without them seeing the correct answers) and score them again.

Here's where regression to the mean kicks in. I am pretty surewithout knowing you or your friends or what their answers areof two things:

The person who scored lowest the first time will score higher than he did before.
The person who scored highest the first time will score lower than she did before.

If it worked, then aha! I told you so. If it didn't work, I told you I was only "pretty sure." With a larger sample, it is much more likely to work.

Why It Works

What we expect to happen with the two scores is that all the test scores that are below 5 (or whatever your test mean was) would move up toward the mean, and those scores above 5 would move down toward the mean. This may or may not have happened with your two scores, but it is the most probable outcome.

Remember this was a test in which knowledge had no effect on scores. Scores were due entirely to chance both times. This effect occurs with real tests, though, even when knowledge does influence your score. That's because no real test is perfectly reliable, and chance plays some role in performance on every test. This demonstration just exaggerated the effect by presenting a test in which chance accounts for 100 percent of the test taker's score.

So, why are scores likely to change and move closer to the mean on second occasions? In the long run, with 100 or 1,000 sets of test scores, we would expect the outcomes to be something like the normal distribution. Just like flipping a coin (which can come up heads or tails, with a 50 percent chance of either), probabilities are associated with particular outcomes on a true/false test (or any test, for that matter). Table 3-10 shows the possible scores and the likelihood of a test taker receiving them for the Advanced Quantum Physics Quiz.

Table Likely quiz score distribution
Score	Probability
0	0.001
1	0.010
2	0.044
3	0.117
4	0.205
5	0.246
6	0.205
7	0.117
8	0.044
9	0.010
10	0.001

Why would more extreme scores become less extreme with repeated testing? Look at the likelihood of getting two extreme scores (such as a score of 2 and then another score of 2) versus getting a score of 2 (probability = .044), and then a score of 4 (probability = .205). It's almost five times as likely that a person with a 2 the first time will score a 4 on a second administration. It is almost 95 percent certain that he will score higher than 2 (1 - .044 - .010 - .001 = .945).

The phrase "regression toward the mean" gets its name from the famous (and half cousin to Charles Darwin) Francis Galton, who studied the heights of parents and their children. He found that the average height of the children was closer to the mean height of all children than to the mean of the average height of the children's parents. While Galton called this observation "regression toward mediocrity" (Galton was not known to be a diplomat), we're a bit kinder. It has nothing to do with genetics and everything to do withyou guessed itstatistics.

With this test, in which scores were entirely due to chance, there is a 65.6 percent chance of scoring at or very near the mean (combining probabilities of scores 4, 5, and 6). With most tests, which have a greater number of items and produce normal distributions, you have a 68 percent chance of scoring at or near the mean [Hack #23].

Predicting the Likelihood of a Higher Score

This is all very interesting, but how will it help you decide whether it is worth it to take a test a second time? Back to our original dilemma. Taking these important tests (such as college admissions tests) a second time takes more money, time, stress, and, perhaps, preparation, so one needs to be strategic in deciding when to try again.

Of course, you can do better on a test by actually increasing your level of whatever knowledge the test is measuring. You are likely to score higher if you prepare for an exam through study, taking practice exams or preparation courses, and so on. If you score very low, though, you are likely to do better without having done anything between test administrations, just because of regression to the mean. You can watch paint dry between testing times and your score will still probably increase. Lucky dog!

The likelihood that you will do better on a test by just taking it a second time depends on two things: your score the first time and the reliability of the test.

Your score: Because scores are likely (by chance alone) to move toward the mean, the chance of you doing better given a second chance depends on whether your first score is below or above the mean. Think of the mean as that big sucking sound you hear, pulling all the scores along a distribution towards it. Scores below the mean are more likely to increase than are scores above the mean.
Test reliability: Measurement statisticians use a number for reliability, which represents the proportion of score variability that is not due to chance. The higher the reliability, then, the less of a role chance will play in determining your score. Reliable scores are stable scores, and the super-sucking powers of the mean are no match for a reliable score.

Statisticians have developed a formula that you can apply to give you a good idea of how much wiggle room you have around your score. If there is plenty of room to grow, you might consider a second shot at it. A useful tool to use here is the standard error of measurement. Here's the formula for the standard error of measurement [Hack #6]:

Most standardized tests publish their levels of reliability and the expected standard deviation for the many hundreds of thousands of scores produced by the test during each administration. By plugging values for these tests into the standard error of measurement equation, one can get a general sense of the variation of scores from test to retest that might be possible without any real change in the person being measured.

However, even the standard error is misleading for extreme scores. Very low scores and very high scores are likely to move a greater distance by chance alone than the standard error would suggest. The further you are from normal, the harder it is to resist the gravitational forces of normal. Extreme scores cannot resist that pull, unless they are perfectly reliable.

In sum, here's some sound advice on how to decide whether to retake a test:

If you scored very high, relatively speaking, but not as high as you would like, it is probably not worth the trouble to take the test a second time.
If you scored very low (far below average), it is almost certain that you will score higher the second time. Try again. You might study a little this time, too.

Neil Salkind

Regressing to the Mean

Table Advanced Quantum Physics Quiz

Table Answer key for the Advanced Quantum Physics Quiz

Why It Works

Table Likely quiz score distribution

Predicting the Likelihood of a Higher Score