Chapter 58: The Normal Random Variable | MicrosoftВ® Office ExcelВ® 2007: Data Analysis and Business Modeling (Bpg -- Other)

Overview

What are the properties of the normal random variable?
How do I use Excel to find probabilities for the normal random variable?
Can I use Excel to find percentiles for normal random variables?
Why is the normal random variable appropriate in many real-world situations?

What are the properties of the normal random variable?
In Chapter 55, “An Introduction to Random Variables,” you learned that continuous random variables can be used to model quantities such as the following:
- Price of Microsoft stock one year from now
- Market share for a new product
- Market size for a new product
- Cost of developing a new product
- Newborn baby’s weight
- Person’s IQ
Remember that if a discrete random variable (such as sales of blazers during 2006) can assume many possible values, we can approximate the value by using a continuous random variable as well. As I described in Chapter 55, any continuous random variable X has a probability density function (pdf). The pdf for a continuous random variable is a nonnegative function with the following properties (a and b are arbitrary numbers).
- The area under the pdf is 1.
- The probability that X<a equals the probability that X≤a. This probability is represented by the area under the pdf to the left of a.
- The probability that X>b equals the probability that X≥b. This probability is shown in the area under the pdf to the right of b.
- The probability that a<X<b equals the probability that a≤X≤b. This probability is the area under the pdf between a and b.
Thus, the area under a continuous random variable’s pdf represents probability. Also, the larger the value of the density function at X, the more likely the random variable will take on a value near X. For example, if the density function of a random variable at 20 is twice the density function of the random variable at 5, then the random variable is twice as likely to take on a value near 20 than a value near 5. For a continuous random variable, the probability that X equals a will always equal 0. For example, some people are from 5.99999 feet through 6.00001 feet tall, but no person can be exactly 6 feet tall. This explains why we can replace the less than sign (<) with the less than or equal to sign (≤) in our probability statements.
Figure 58-1 displays the pdf for X=IQ of a randomly chosen person. The area under this pdf is 1. If we want to find the probability that a person’s IQ is less than or equal to 90, we simply find the area to the left of 90. If we want to find the probability that a person’s IQ is from 95 through 120, we find the area under the pdf from 95 through 120. If we want to find the probability that a person’s IQ is more than 130, we find the area under the density function to the right of 130.

Figure 58-1: IQ pdf
Actually, the density sketched in Figure 58-1 is an example of the normal random variable. The normal random variable is specified by its mean and standard deviation. IQs follow a normal random variable with μ=100 and σ=15. This is the pdf displayed in Figure 58-1. The normal random variable has the following properties:
- The most likely value of a normal random variable is μ (as indicated by the pdf peaking at 100 in Figure 58-1).
- As the value x of the random variable moves away from μ, the probability that the random variable is near x sharply decreases.
- The normal random variable is symmetric about its mean. For example, IQs near 80 are as likely as IQs near 120.
- A normal random variable has 68 percent of its probability within σ of its mean, 95 percent within 2σ of its mean, and 99.7 percent within 3σ of its mean. These measures should remind you of the rule of thumb I described in Chapter 37, “Summarizing Data by Using Descriptive Statistics.” In fact, the rule of thumb is based on the assumption that data is “sampled” from a normal distribution, which explains why the rule of thumb does not work as well when the data fails to exhibit a symmetric histogram.
For a larger σ, a normal random variable is more spread out about its mean. This pattern is illustrated in Figures 58-2 and 58-3.

Figure 58-2: Normal random variable pdf with a mean equal to 60 and a standard deviation equal to 5

Figure 58-3: Normal random variable pdf with a mean equal to 60 and a standard deviation equal to 15
How do I use Excel to find probabilities for the normal random variable?
Consider a normal random variable X with a mean μ and standard deviation σ. Suppose for any number x, we want to find the probability that X≤x, which is called the normal cumulative function. To find the probability in Microsoft Office Excel 2007 that X≤x, simply enter the formula NORMDIST(x,μ,σ,1). Of course the fourth argument of 1 could be replaced by TRUE.
The argument 1 tells Excel to compute the normal cumulative. If the last argument of the function is 0, Excel returns the actual value of the normal random variable pdf.
We can use the NORMDIST function to answer many questions concerning normal probabilities. You can find examples in the file Normalexamples.xlsx, which is shown in Figure 58-4, and in the following three scenarios.

Figure 58-4: Calculating normal probability
What fraction of people have an IQ of less than 90? Let X equal the IQ of a randomly chosen person. Then we seek the probability that X<90, which is equal to the probability that X≤90. Therefore, we can enter into cell C3 of the Normal worksheet the formula NORMDIST(90,100,15,1), and Excel returns 0.252. Thus, 25.2 percent of all people have an IQ less than 90.
What fraction of all people have IQs from 95 through 120? When finding the probability that a≤X≤b, we use the form (area under the normal density function to the left of b)–(area under normal density function to the left of a). Thus, we can find the probability that a≤X≤b by entering the formula NORMDIST(b,μ,σ,1)–NORMDIST(a,μ,σ,1). This fact is illustrated in Figure 58-5, where clearly the shaded area is (area to left of b)–(area to left of a). You can answer the question about IQs from 95 through 120 by entering into cell C4 of the worksheet Normal the formula NORMDIST(120,100,15,1)–NORMDIST(95,100,15,1). Excel returns the probability 0.539. So, 53.9 percent of all people have an IQ from 95 through 120.

Figure 58-5: Calculating the probability that a random variable is between a and b
What fraction of all people have IQs of at least 130? To find the probability that X≥b, we note from Figure 58-6 that the probability that X≥b equals 1–probability X<b. We can compute the probability that X≥b by entering the formula 1–NORMDIST(b,μ,σ,1). We seek the probability that X≥130. This equals 1–probability X<130. We enter in cell C5 of the worksheet Normal the formula 1–NORMDIST(130,100,15,1). Excel returns 0.023, so we know that 2.3 percent of people have an IQ of at least 130.

Figure 58-6: Calculating that the probability random variable is greater than or equal to b
Can I use Excel to find percentiles for normal random variables?
Consider a given normal random variable X with mean and standard deviation. In many situations, we want to answer questions such as the following:
- A drug manufacturer believes that next year’s demand for its popular antidepressant will be normally distributed, with mean equal to 60 million days of therapy (DOT) and sigma equal to 5 million DOT. How many units of the drug should be produced this year if the company wants to have only a 1 percent chance of running out of the drug?
- Family income in Bloomington, Indiana, is normally distributed, with mean equal to $30,000 and sigma equal to $8,000. The poorest 10 percent of all families in Bloomington are eligible for federal aid. What should the aid cutoff be?
In our first example, we want to determine the 99th percentile of demand for the anti-depressant. That is, we seek the number x such that there is only a 1 percent chance that demand will exceed x and a 99 percent chance that demand will be less than x. In our second example, we want the 10th percentile of family income in Bloomington. That is, we seek the number x such that there is only a 10 percent chance that family income will be less than x and a 90 percent chance that family income will exceed x.
Suppose we want to find the pth percentile (expressed as a decimal) of a normal random variable X with a mean and a standard deviation. Simply enter the formula NORMINV(p,μ,σ) into Excel. This formula returns the number x with the property that the probability that X≤x equals the percentile, as we want. We now can solve our examples. You’ll find these exercises on the Normal worksheet in the file Normalexamples.xlsx.
For the drug manufacturing example, let X equal annual demand for the drug. We want a value x such that the probability that X≥x equals 0.01 or the probability that X<x equals 0.99. Again, we seek the 99th percentile of demand, which we find (in millions) by entering in cell C7 the formula NORMINV(0.99,60,5). Excel returns 71.63, so the company must produce 71,630,000 DOT. This assumes, of course, that the company begins the year with no supply of the drug on hand. If, for example, they had a beginning inventory of 10 million DOT, they would need to produce 61,630,000 DOT during the current year.
To determine the cutoff for federal aid, if X equals the income of a Bloomington family, we seek a value of x such that the probability that X≤x equals 0.10, or the 10th percentile of Bloomington family income. We find this value with the formula NORMINV(0.10, 30000,8000). Excel returns $19,747.59, so aid should be given to all families with incomes smaller than $19,749.59.
Why is the normal random variable appropriate in many real-world situations?
A well-known mathematical result called the Central Limit Theorem tells us that if we add together many (usually at least 30 is sufficient) independent random variables, their sum is normally distributed. This result holds true even if the individual random variables are not normally distributed. Many quantities (such as measurement errors) are created by adding together many independent random variables, which explains why the normal random variable occurs often in the real world. Here are some other situations in which we can use the Central Limit Theorem.
- The total demand for pizzas during a month at a supermarket is normally distributed, even if the daily demand for pizzas is not.
- The amount of money we win if we play craps 1000 times is normally distributed, even though the amount of money we win on each individual play is not.
Another important mathematical result tells us how to find the mean, variance, and standard deviations of sums of independent random variables. If we are adding together independent random variables X₁, X₂,…, X_n, where mean X_i=μ_i, and standard deviation X_i=σ_i, then the following are true:
1. Mean (X₁+X₂+…X_n)=μ₁+μ₂+…μ_n
2. Variance
3. Standard deviation
We note that 1 is true even when the random variables are not independent. By combining 1 through 3 with the Central Limit Theorem, we can solve many complex probability problems, such as the demand for pizza. Our pizza solution is in the Central Limit worksheet in the file Normalexamples.xlsx, which is shown in Figure 58-7.

Figure 58-7: Using the Central Limit Theorem
Even though the daily demand for frozen pizzas is not normally distributed, we know from the Central Limit Theorem that the 30-day demand for frozen pizzas is normally distributed. Given this, 1 through 3 above imply the following:
- From 1, the mean of a 30-day demand equals 30(45)=1350.
- From 2, the variance of a 30-day demand equals 30(12)²=4320.
- From 3, the standard deviation of a 30-day demand equals
Thus, the 30-day demand for pizzas can be modeled following a normal random variable with a mean of 1350 and a standard deviation of 65.73. In cell D11, I compute the probability that at least 1400 pizzas are sold as the probability that our normal approximation is at least 1399.5 (note that a demand of 1399.6, for example, would round up to 1400) with the formula 1–NORMDIST(1399.5,D7,D9,TRUE). We find the probability that demand in a 30-day period for at least 1400 pizzas is 22.6 percent.
The number of pizzas that we must stock to have only a 1 percent chance of running out of pizzas is just the 99th percentile of our demand distribution. We determine the 99th percentile of our demand distribution (1503) in cell D12 with the formula NORMINV (0.99,D7,D9). Therefore, at the beginning of a month, we should bring our stock of pizzas up to 1503 if we want to have only a 1 percent chance of running out of pizzas.