Because almost anything we measure in the natural world has a known distributional shape, the "normal curve," we can use the precise details of that distribution to predict the future and answer all sorts of probability questions.
A variety of hacks in this book capitalize on statisticians' close personal relationship with the normal curve. "See the Shape of Everything" [Hack #23] shows how to use the normal curve to predict test performance in a general way. We can do better than that, though.
So much is known about the exact shape of this mystical curve that we can make exact predictions about the probability that scores in a certain range will be obtained. There are many other types of questions that can be asked related to test performance, and statistics can help us to answer these sorts of questions before we ever take the test!
For these types of questions, a precise tool is needed. This hack provides that tool: a table of areas under the normal curve.
The Table of Areas Under the Normal Curve
The normal curve is defined by the mean and standard deviation of a distribution, and the shape of the curve is always the same, regardless of what we measure, as long as the scoring system allows scores to vary. The proportions of scores falling within various areas beneath the curve, such as the space between certain standard deviations and distances from the mean, have been specified.
This hack relies on a complicated-looking table, but it is so full of useful information that it will quickly become a primary tool in your hacker's toolbox. Without further ado, take a deep breath and look at Table 3-2.
Deciphering the Table
Before we use this nifty tool, we need to take a second deep breath and get the lay of the land. I have simplified the information on this table in a couple of ways. First, I have listed only a few of the values that could be computed. Indeed, many tables in statistical books have every value between a z of .00 and a z of 4.00, increasing at the rate of .01. That's a lot of information that could be presented, so I have chosen to show only a glimpse of the most commonly needed values, including the z scores necessary for 90 percent confidence (1.65) and 95 percent confidence intervals (1.96); see "Measure Precisely" [Hack #6] for more on confidence intervals.
I have also rounded the proportions to two decimal places. Finally, I used the symbol z in the table to indicate the distance from the mean in standard deviations. You can learn more about z scores in "Give Raw Scores a Makeover" [Hack #26].
After understanding the simplifications made to the table, the first step toward using it to make probability predictions about performance or answer statistical questions is to understand the four columns.
Estimating the Chance of Scoring Above or Below Any Score
If you need to know your chances of getting into your college of choice, identify the necessary score you need to beat, also known as the cut score, on that school's admissions tests. Once you know the score, find out the mean and standard deviation for the test. (All of this info is probably on the Web.) Convert your raw score to a z score [Hack #26], and then find that z score, or something close to it, in Table 3-2.
Determine whether the cut score is above the mean:
For the chances of scoring below a given score, the process is the opposite of the options just mentioned. The chance of getting below a specific cut score that is below the mean is shown in the "smaller area" column. The chance of scoring below a given cut score that is above the mean is shown in the "larger area" column.
Estimating the Chance of Scoring Between Any Two Scores
The chances of getting a score within any range of scoringscores can be determined by looking at the proportion of scores that will normally fall in that range.
If you want to know what proportion of scores falls between any two points under the curve, define those points by their z score and figure out the relevant proportion. Depending on whether both scores fall on the same side of the mean, one of two methods will give you the correct proportion between those points:
Producing Percentile Ranks
A third use of the table is to compute percentile ranks. You can read more about such norm-referenced scores in "Produce Percentiles" [Hack #24]. For scores above the mean, the percentile rank is "Proportion of scores between the mean and z" plus .50. For scores below the mean, the percentile rank is "Proportion of scores in the smaller area."
Determining Statistical Significance
Another use for these sorts of tables is to assign statistical significance [Hack #4] to differences in scores. By knowing the proportion of scores that will fall a certain distance from each other or further, you can assign a statistical probability to that outcome.
More usefully, other statistical values such as correlations and proportions can be converted to z scores, and this table can be used to compare those values to zero or to each other.
Why It Works
"See the Shape of Everything" [Hack #23] provides a good picture of the normal curve. However, just by looking at the way these values change in Table 3-2, you can get a good sense of the normal distribution's shape. Near the mean, where the rows have smaller z scores, a goodly proportion of scores will fall. As you move further and further away from the mean, it takes larger and larger areas of the curve to contain the same proportion of scores.
For example, it takes a jump from a z of 1.65 to 4 just to cover that last 5 percent of the distribution. Near the mean, though, it requires only a jump from z = .12 to z = .25 to cover 5 percent of scores. The table demonstrates how common it is to be common and how rare it is to be scarce.