The Central Limit Theorem and Law of Large Numbers
Two very important concepts for the project practitioner are the Law of Large Numbers and the Central Limit Theorem because they integrate much of what we have discussed and provide very useful and easily applied heuristics for managing the quantitative aspects of projects. Let's take them one at a time.
The Law of Large Numbers and Sample Average
The Law of Large Numbers deals with estimating expected value from a large number of observations of values of events from the same population. The Law of Large Numbers will be very valuable in the process of rolling wave planning, which we will address in the chapter on scheduling.
To proceed we need to define what is meant by the "sample average":
α(x) = (1/n) * (X1 + X2 + X3 + X4 + X5 + X6 + ...)
where α(x) is the "arithmetic average of a sample of observations of random variable X," using the lower case alpha from the Greek alphabet. α(x) is a random variable with a distribution of possible values; α(x) will probably be a different value for each set of Xis that are observed.
We call α(x) a "sample average" because we cannot be sure that the Xi that we observe is the complete population; some outcomes may not be in the sample. Perhaps there are other values that we do not have the opportunity to observe. Thus, the Xi in the α(x) is but a sample of the total population. α(x) is not the expected value of X since the probability weighting for each Xi is not in the calculation; that is, α(x) is an arithmetic average, and a random variable, whereas E(X) is a probability weighted average and a deterministic nonrandom variable.
Now here is where the Law of Large Numbers comes in. It can be proved, using a couple of functions (specifically Chebyshev's Inequality and Markov's Inequality ), that as the number of observations in the sample gets very large, then:
α(x) ≈ E(X) = μ
This means that the sample average is approximately equal to the expected value or mean of the distribution of the population of X. Since Xi are observations from the same distribution for the population X, E(Xi) = E(X). That is, all values of X share the same population parameters or expected value and standard deviation.
Maximum Likelihood and Unbiased Estimators
Hopefully, you can see that the Law of Large Numbers simplifies matters greatly when it comes to estimating an expected value or the mean of a distribution. Without knowledge of the distribution, or knowledge of the probabilities of the individual observations, we can nevertheless approximate the expected value and estimate the mean by calculating the average of a "large" sample from the population. In fact, we call the sample average the "maximum likelihood" estimator of the mean of the population. If it turns out that the expected value of the estimator is in fact equal to the parameter being estimated, then the estimator is said to be "unbiased." The sample average is an unbiased estimator of μ since the expected value of the sample average is also μ:
E[α(x)] = E(X) = μ
The practical effect of being unbiased is that as more and more observations are added to the sample, the expected value of the estimator becomes ever increasingly identical with the parameter being estimated. If there were a bias, the expected value might "wander off" with additional observations. 
Working the problem the other way, if the project manager knows expected value from a calculation using distributions and three-point estimates, then the project manager can deduce that a sample might contain the Xi. In fact, using Chebyshev's Inequality we find that the probability of an Xi straying very far from the mean, μ, goes down by the square of the distance from the mean. The probability that the absolute distance of sample value Xi from the population mean is greater than some distance, y, varies by 1/y2:
p(|Xi - μ | ≥ y) ≤ σ2/y2
Sample Variance and Root-Mean-Square Deviation
There is one other consequence of the Law of Large Numbers that is very important in both risk management and rolling wave planning: the variance of the sample average is 1/n smaller than the variance of the population variance:
σ2[α(x)] = (1/n) * σ2(X)
σ2[α(x)] = (1/n) * [X - α(x)]2
Notice that even though the expected value of the sample average is approximately the same as the expected value of the population, the variance of the sample average is improved by 1/n, and of course the standard deviation of the sample average is improved by √(1/n). The standard deviation of the sample variance is often called the root-mean-square (RMS) deviation because of the fact that the standard deviation is the square root of the mean of the "squared distance."
In effect, the sample average is less risky than the general population represented by the random variable X, and therefore α(x) is less risky than a single outcome, Xi, of the general population.
For all practical purposes, we have just made the case for diversification of risk: a portfolio is less risky than any single element, whether it is a financial stock portfolio, a project portfolio, or a work breakdown structure (WBS) of tasks.
Central Limit Theorem
Every bit as important as the Law of Large Numbers is to sampling or diversification, the Central Limit Theorem helps to simplify matters regarding probability distributions to the point of heuristics in many cases. Here is what it is all about. Regardless of the distribution of the random variables in a sum or sample — for instance, (X1 + X2 + X3 + X4 + X5 + X6 + ...) with BETA or Triangular distributions — as long as their distributions are all the same, the distribution of the sum will be Normal with a mean "n times" the mean of the unknown population distribution!
∑ (X1 + X2 + X3 + X4 + X5 + X6 + ...) = S
S will have a Normal distribution regardless of the distribution of X:
E(X1 + X2 + X3 + X4 + X5 + X6 + ...) = E(S) = n * E(Xi) = n * μ
For n = ∑ i
"Distribution of the sum will be Normal" means that the distribution of the sample average, as an example, is Normal with mean = μ, regardless of the distribution of the Xi. We do not have to have any knowledge whatsoever about the population distribution to say that a "large" sample average of the population is Normal. What luck! Now we can add up costs or schedule durations, or a host of other things in the project, and have full confidence that their sum. or their average is Normal regardless of how the cost packages or schedule tasks are distributed.
As a practical matter, even if a few of the distributions in a sum are not all the same, as they might not be in the sum of cost packages in a WBS, the distribution of the sum is so close to Normal that it really does not matter too much that it is not exactly Normal.
Once we are working with the Normal distribution, then all the other rules of thumb and tables of numbers associated with the Normal distribution come into play. We can estimate standard deviation from the observed or simulated pessimistic and optimistic values without calculating sample variance, we can work with confidence limits and intervals conveniently, and we can relate matters to others who have a working knowledge of the "bell curve."
Chebyshev's Inequality: probability that the absolute distance of sample value Xi from the population mean is greater than some distance, y, varies by 1/y2: p(Xi - μ | ≥ y) ≤ σ2/y2. Markov's Inequality applies to positive values of y, so the absolute distance is not a factor. It says that the probability of an observation being greater than y, regardless of the distance to the mean, is proportional to 1/y: p(X ≥ y) ≤ E(X) * (1/y).
Unlike the sample average, the sample variance, (1/n) ∑ [Xi - α(x)]2, is not an unbiased estimator of the population variance because it can be shown that its expected value is not the variance of the population. To unbias the sample variance, it must be multiplied by the factor [n/(n-1)]. As "n" gets large, you can see that this factor approaches 1. Thus, for large "n", the bias in the sample variance vanishes and it becomes a practical estimator of the population variance.