Hack 27. Standardize Scores

Surprisingly, none of those well-known high-stakes tests, such as the SAT or ACT or intelligence tests, ever reports your raw score. Instead, test reports have transformed that useless number into a more meaningful score, one that can be used to understand your performance compared to everyone else who ever took the same test. Once you understand "standardized" scores, you can calculate them yourself and even invent your own.

"Give Raw Scores a Makeover" [Hack #26] discusses the superpowers of z scores. These standardized scores take meaningless raw scores and add all sorts of information to them. That's all well and good, and anyone using this book can interpret z scores and make decisions based on that information.

If you want to interpret many score reports, though (such as those SAT results you just got), you will not see a z score reported anywhere, but instead some weirdo customized standardized score, used only by that company, which is kind of like a z score but different enough to be meaningless for the uninitiated.

Never fear. Here are the tools you need to both interpret these strange standardized scores and, if you want, even create your own (for when you report scores to other people from your own weirdo test that is just about to sweep the nation and make you as rich as Mr. ACT or Ms. IQ or whoever makes money from our test-based society).

Problems with z Scores

There is a certain, shall I say, ugliness to z scores that prevents their widespread use when reporting performance to test takers or their parents or the colleges and employers who are considering them. Instead, most test companies use the z score as the first step in creating a more attractive standardized score, which is then reported.

A raw score is transformed into a z score using this formula:

As described in greater detail in "Give Raw Scores a Makeover" [Hack #26], this equation creates z scores that tend to range between -3.00 and +3.00, with 0.00 as the average and a standard deviation equal to one. Though very useful as a tool for interpreting test performance, people don't like these numbers when they see them because of a few problems:

It can be negative. In fact, half of all z scores will be negative. It is hard to convince people who take tests that a negative score can be anything but bad news.
A score of 0.00 is the average score! If we can't explain to people that a negative number isn't necessarily a bad thing, imagine trying to convince parents that we expect little Billy to get zero on the big test and we are pleased when he does.
The highest score you can expect is a 3.00, and only 1 out of a 100 test takers will ever get that. It seems like an awful lot of hard work in test preparation just to get a measly 3!

Measurement folks have searched for and found other standardized scales to report test performance that have more pleasing properties. The trick is to start with a z score, and then convert it onto some other scale with a mean and standard deviation that is friendlier.

Creating and Interpreting T Scores

One problem with z scores is that the mean is zero. Reporting zero as if it is an okay thing rubs some teachers, parents, and students the wrong way. We can solve that problem by moving down the alphabet form a z to a T.

T scores are a transformation of z scores into a new distribution that has a mean of 50 and a standard deviation of 10. The equation for a T score uses this backwards transformation approach. Here's the T score formula:

So, if little Billy's performance on a big test is average and he gets a z score of 0.00, instead of reporting that frightening score to his parents, we can transform it into a T:

and report that Billy scored a 50. Congratulations! To make the score meaningful, a good teacher or school counselor would explain that T scores range from about 20 to 80, and 50 is average.

T scores are used on some test reports as a better alternative to z scores. Scores cannot be negative, and the mean is a more substantial-seeming 50.

One popular test that reports scores using the T score distribution is the Minnesota Multiphase Personality Inventory-II, a psychological test that measures depression, schizophrenia, and so on. Mean scores on each MMPI-II subscale are 50, with a standard deviation of 10. By putting each subtest score on the same scale, you can compare across traits and develop a profile of scores to understand the test taker more completely.

Creating Customized Standardized Scores

Test developers have found other ways of reporting standard scores. Table 3-4 lists many of the best-known high-stakes tests that most people have taken or will take someday.

Table Common standardized score distributions
Test	Typical score range	Mean	Standard deviation
z scores	-3.00 to 3.00	0	1
T scores	20 to 80	50	10
American College Test (ACT)	1 to 36	18	6
SAT	200 to 800	500	100
Graduate Record Exam (GRE)	200 to 800	500	100
Graduate Management Admission Test (GMAT)	200 to 800	500	100
Law School Admission Test (LSAT)	120 to 180	150	10
Medical College Admission Test (MCAT)	1 to 15	8	2.5
Wechsler Intelligence Scales (IQ Test)	55 to 145	100	15
Stanford-Binet Intelligence Test (IQ Test)	52 to 148	100	16

Because test performance is normally distributed, you can interpret any of these scores by placing it against the normal curve and seeing whether your performance was average, unusually low, or unusually high [Hack #23].

Create Your Own Standardized Score

For fun, you can create your own standardized score distribution with any mean and standard deviation you wish. Don't like your SAT score of 350? Transform it into a score within a distribution of your choosing.

Imagine, for example, that you'd prefer a distribution with a mean of 752,365 and a standard deviation of 216,456 (and who wouldn't?). Let's call this distribution the Frey Score Distribution. Generalizing the T score formula, you could transform your SAT score of 350 into a Frey score. Remember, you have to start with the z score for an SAT score of 350:

and then transform it into a Frey score:

Now, doesn't a score of 427,681 sound better than a score of 350? Because you know the mean of the Frey distribution, the interpretation of both scores is the same; they are still below average, and they are still 1¹/₂ standard deviations below the mean. You haven't changed reality, just the numbers you use to describe it.

Why It Works

The distribution of z scores has a mean of 0 and a standard deviation of 1. This is because of the equation used. By dividing a group of values by its standard deviation, the standard deviation of the new distribution is 1. By subtracting the mean from each score in a distribution, the new values distribute themselves around a mean of 0.

If we want the scores we use to have a particular mean and standard deviation of our own choosing, we can take each z score and reverse engineer it, replacing the mean of 0 with anything we want and the standard deviation of 1 with anything we want.

Understanding Norm-Referenced Scoring

We have talked about the information inherent in norm-referenced scoring and its intuitive appeal from a statistical perspective, but it is not the only way to produce meaningful scores, and it's not always the best method.

As discussed in "Give Raw Scores a Makeover" [Hack #26], there are really two philosophies from which you can choose when designing scoring systems and building tests:

Norm-referenced scoring: Driven by the philosophy that to best understand performance on a task (such as acting in a movie or taking the ACT), the level of performance for one person should be compared to how other people performed
Criterion-referenced scoring: Evaluates performance based on a set of criteria, such as a base of knowledge, a set of skills, instructional objectives, and diagnostic characteristics

If the norm-referenced approach makes sense to you, then you will want to use the tools presented here to interpret your performance on these common standardized tests.