Statistical inference based on normal distribution.
Estimation techniques based on normal distribution.
Real data distribution may not be normal.
Work with mean of sample clusters, not individual values X i .
CLT uses normal distribution to infer population parameter: Mean ¼ and Variance ƒ 2
Mathematically the mean of means may be represented by
Whereas the variance of the means is represented as:
where n = number of individual samples in a subject or cluster. If there are clusters, the M = total number of clusters, nM = N = total number of individual samples.
For a cluster of n samples, we can use SND to determine:
The probabilities of the sample average, or,
The required number of samples, n, in a cluster such that is observed mean X m is within a specified range around the true population mean ¼ .
The cluster size n can be quite small, and the histogram of cluster mean values, X m , will rapidly converge to a normal distribution regardless of the underlying population.
The Central Limit Theorem applies to any population distribution, including the discrete and continuous distributions as well as bimodal distributions.
When discrete sampling is involved, the distribution of averages (i.e., the mean of clusters) must be used.
The variance of the means is a measure of the spread of clusters means about the true mean.
Variance gets smaller as n increases ; the smaller the number of samples in a cluster the larger the variance of the means.