Hack 26. Give Raw Scores a Makeover | Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds

A raw score on a test has little or no meaning. Change that pitiful raw score to a "z score," though, and you will scarcely believe how much information is crammed into that one little super number.

It is surprising how little information is conveyed by that single raw score plastered at the top of something like a high school test. Here's what I mean. If I come home from school and tell my mom that I got a 16 on the big exam in school today, she'll probably say a few things, including "Why are you still living at home at age 42?" and "That's nice, dear. Is 16 good?"

When you just tell someone a raw score, very little real information has been shared. You don't know if 16 is good. You don't know if 16 is relatively high or low. Did most people get a 16 or higher, or did most people get something less than 16? Even if we know the range of scores on that test and the points possible and so on, we still can't compare performance on that test to performance on the past test or the next test or a test on some other subject. Raw scores are virtually meaningless.

Don't fret! You can still understand your performance and the performances of others. You can still make selection decisions and compare performance across people and across tests. There is still hope!

Raw scores can be changed into a new number that does all the things that that 97-pound weakling, the raw score, could never do. Raw scores can be transformed into a super number: a z score. Unlike a raw score, a z tells you whether the performance is above or below average, and how far above or below average it is. A z also allows you to compare performance across tests and occasions, and even between people.

Calculating z Scores

A z score is a raw score that has been transformed in such a way that the new number indicates how far above or below the mean the raw score is.

Here's the equation:

To change a raw score into a z, subtract the mean from it and then divide by the standard deviation. The standard deviation of a distribution is the average distance of each score from the mean [Hack #2].

Understanding Performance

z scores typically take on a range of values between -3 and +3. Examine the top part of the z score equation and you might notice the following:

If the raw score is greater than the mean, the z will be positive.
If the raw score is below the mean, the z will be negative.
If the raw score is exactly the mean, the z will be 0.

z scores tend to range between -3 and +3 because the normal distribution of scores is typically just six standard deviations wide [Hack #23].

Smart measurement professionals use the z score trick when they report results. Instead of supplying raw scores, all you see are scores based on z scores, known generically as standardized scores [Hack #27]. These standardized scores have known stable characteristics. Therefore, if you know these scores' characteristics (their mean and standard deviation), you can turn them back into z scores and know how you did compared to other people.

To see how to use this formula to reveal hidden information about your performance, let's use the example of ACT tests. The American College Test is taken by juniors in many high schools across the U.S. and is required by many colleges for admission. It is a test of achievement and ability believed to predict performance in college.

Scores on any portion of the test range from 1 to 36. Though the actual test's descriptive statistics have drifted over the last few decades (as performance has improved), the official ACT mean is often reported as 18 with a standard deviation of 6. Imagine three students take the ACT and receive three different scores. We could use the mean and standard deviation from the ACT score distribution to transform them to z scores, as shown in Table 3-3.

Zack's z is negative, so we know he scored below average. He scored about two-thirds of a standard deviation below the mean. Taylor's z of 0.00 means he performed average compared to others who have taken the ACT over the years. Isaac did the best, scoring a full standard deviation above the mean.

The actual ACT mean and standard deviation changes every year the test is given. The real mean and standard deviation for the last few years has been around a mean of 21 and a standard deviation of about 4.5.

Identifying the Rarity of Your Performance

Though knowing how you scored in comparison with others who took the test is more useful than just knowing a raw score, the real interpretative power of z scores comes from its relationship to the normal curve. Figure 3-3 is a chart of the normal distribution, similar to the one shown in "See the Shape of Everything" [Hack #23].

Figure 3-3. z scores and the normal curve

The difference between the figure in "See the Shape of Everything" [Hack #23] and this one is that instead of showing the distance of each standard deviation from the mean, Figure 3-3 shows those values as z scores. By using knowledge of areas under the normal curve, you can learn even more from a z score. If the scores are normally distributed, there is a great deal you can say about the probability of scores in a certain range occurring.

The scores for the students shown in Table 3-3 can also be interpreted as the number of students they did better (or worse) than. Taylor's z of 0.00 means he did better than 50 percent of students. The kids' scores can also be expressed in a probabilistic sense. There was a 50 percent chance that Taylor would get a z of 0.00 or better. There is only a 16 percent chance of getting a z of 1.00 or better on any test, so Isaac did well compared to other students who took the test.

Why It Works

If converting raw scores to z scores so we can compare people to each other makes some sense to you, then you are not alone. For the last 100 years in the world of educational measurement, social scientists (and anyone who must evaluate human performance) have been attracted to the simplicity of norm-referenced interpretations. If we aren't sure what the score on a test really means, we can at least compare your score to how everyone else has done. We at least know whether you have more or less of whatever it is we just measured than other people have.

The alternative way to interpret educational and psychological scores is criterion-referenced. That approach requires knowing more about the trait or content that we have just measured and deciding beforehand how much is enough. Criterion-referenced measurement allows for everyone to get the same score as long as they meet the same criteria. The former approach has been and continues to be the most popular interpretative method, while the latter has just recently started to catch on.