DISTRIBUTIONS THAT ARE NOT NORMAL
DISTRIBUTIONS THAT ARE NOT NORMAL
The normal distribution is often used as a reference for describing other distributions. A distribution is called skewed if it is not symmetric but instead has more cases (more of a "tail") toward one end of the distribution than the other. If the long tail is toward larger values, the distribution is called positively skewed, or skewed to the right. If the tail is toward smaller values, the distribution is negatively skewed, or skewed to the left. A variable such as income has a positively skewed distribution. That is because some incomes are very much above average and make a long tail to the right. Since incomes are rarely less than zero, the tail to the left is not so long.
If a larger proportion of cases
into the tails of a distribution than into those of a normal distribution, the distribution has positive kurtosis. If fewer cases fall into the
, the distribution has negative kurtosis. You can compute statistics that measure how much skewness and kurtosis a distribution has, in comparison to a normal distribution. These statistics are zero if the
distribution is exactly normal. Positive values for kurtosis indicate that the tails of a distribution are heavier than those of a normal distribution. Negative values
that a distribution has lighter tails than a normal distribution does. Of course, the measures of skewness and kurtosis for samples from a normal distribution will not be exactly zero. Because of variation from sample to sample, they will fluctuate around zero. To use the computer for the calculations, one only needs to identify the command with
, and the computer does the rest.
MORE ON THE DISTRIBUTION OF THE MEANS
It is understandable that certain
, such as height and weight, have distributions that are approximately normal. We know that most of the world is pretty close to average and that the farther we move from average, the fewer people we find. But why does the distribution of sample means look like a normal distribution?
fact is explained by the Central Limit Theorem. The Central Limit Theorem says that for samples of a sufficiently large size, the real distribution of means is almost always approximately normal. The original variable can have any kind of distribution. It does not have to be bell-shaped in the least. ("Real" distribution means the one you would get if you took an infinite number of random samples. The "real" distribution is a mathematical concept. You can get a pretty good idea of what the "real" distribution looks like by taking a lot of samples and examining plots of their values ” as we have been doing.) Sufficiently large
? What kind of language is that for a mathematical theorem? Actually, our
of the Central Limit Theorem has several vague
. You have to say what you are willing to consider "approximately normal" before you know what size sample is "sufficiently large." How large a sample you need depends on the way the variable is distributed. The important point is that the distribution of means gets closer and closer to normal as the sample size gets larger and larger ” regardless of what the distribution of the original variable looks like. Ultimately, the means will look like a normal distribution. That is why the normal distribution is so important in data analysis. Your variable does not have to be normally distributed. Means that you calculate from samples will be normally distributed, regardless. If the variable you are
actually does have a normal distribution, then the distribution of means will be normal for samples of any size. The further from normal the distribution of your variable is, the larger the samples have to be for the distribution of the means to be approximately normal. This is a fundamental assumption under which Statistical Process Control charting operates.