The Arithmetic of Operations on Statistics and Random Variables


The Arithmetic of Operations on Statistics and Random Variables

When it comes to arithmetic, random variables are not much different than deterministic variables. We can add, subtract, multiply, and divide random variables. For instance, we can define a random variable Z = X + Y, or W = X2. We can transform a random variable into a deterministic variable by calculating its expected value. However, many functional and logical operations on random variables depend on whether or not the variables are mutually exclusive or independent. As examples, the functional operation of expected value does not depend on independence, but the functional operation of variance does.

Similarly, there are operations on statistics that both inherit their properties from deterministic variables and acquire certain properties from the nature of randomness. For instance, the variance of a sum is the sum of variances if the random variables are independent, but the standard deviation of the sum is not the sum of the standard deviations.

Table 2-4 provides a summary of the most important operations for project managers.

Table 2-4: Operations on Random Variables and Statistics

Item

All Arithmetic Operations

All Functional Operations with Random Variables as Arguments

Limiting Conditions

Random variables

Yes

Yes

Probability density functions

Yes

Yes

Cumulative probability density functions

Yes

Yes

If a random variable is dependent upon another, the functional expression is usually affected.

Expected value, or mean, or sample average, or arithmetic average

Yes

Yes

Variance

Yes

Yes

If the random variables are not independent, then a covariance must be computed.

Standard deviation

Cannot add or subtract

Yes

To add or subtract standard deviations, first compute the sum of the variances and then take the square root.

Median

No

No

Median is calculated on the population or sample population of the combined random variables.

Mode or most likely

No

No

Most likely is taken from the population statistics of the combined random variables.

Optimistic and pessimistic random variables

Yes

Yes

None



Probability Distribution Statistics

Most often we do not know every value and its probability. Thus we cannot apply the equations we have discussed to calculate statistics directly. However, if we know the probability distribution of values, or can estimate what the probability function might be, then we can apply the statistics that have been derived for those distributions. And, appropriately so for project management, we can do quite nicely using arithmetic approximations for the statistics rather than constantly referring to a table of values. Of course, electronic spreadsheets have much better approximations, if not exact values, so spreadsheets are a useful and quick tool for statistical analysis.

Three-Point Estimate Approximations

Quite useful results for project statistics are obtainable by developing three-point estimates that can be used in equations to calculate expected value, variance, and standard deviation. The three points commonly used are:

  • Most pessimistic value that yet has some small probability of happening.

  • Most optimistic value that also has some small probability of happening.

  • Most likely value for any single instance of the project. The most likely value is the mode of the distribution.

It is not uncommon that the optimistic and most likely values are much closer to each other than is the pessimistic value. Many things can go wrong that are drivers on the pessimistic estimate; usually, there are fewer things that could go right. Table 2-5 provides the equations for the calculation of approximate values of statistics for the most common distributions.

Table 2-5: Statistics for Common Distributions

Statistic

Normal [*]

BETA[**]

Triangular

Uniform[***]

Expected value or mean

O + [(P - O)/2]

(P + 4 * ML + O)/6

(P + ML + O)/3

O + [(P - O)/2]

Variance, ϭ2

(P - O)2/36

(P - O)2/36

[(O - P)2 + (ML - O) * (ML - P)]/18

(P3 - O3)/ [3 * (P - O)] - (P - O)2/4

Standard deviation, ϭ

(P - O)/6

(O - P)/6

Sqrt(VAR)

Sqrt(VAR)

Mode or most likely

O + [(P - 0)/2]

By observation or estimation, the peak of the curve

By observation or estimation, the peak of the curve

Not applicable

Note: O optimistic value, P = pessimistic value, ML = most likely value.

[*]Formulas are approximations only to more complex functions.

[**]BETA formulas apply to the curve used in PERT calculations. PERT is discussed in Chapter 7. In general, a BETA distribution has four parameters, two of which are fixed to ensure the area under the curve integrates to 1, and two, α and β, determine the shape of the curve. Normally, fixing or estimating α and β then provides the means to calculate mean and variance. However, for the BETA used in PERT, the mean and variance formulas have been worked out such that α and β become the calculated parameters.

Since in most project situations the exact shape of the BETA curve does not need to be known, the calculation for α and β is not usually performed. If α and β are equal, then the BETA curve is symmetrical.

If the range of values of the BETA distributed random variable is normalized to a range of 0 to 1, then for means less than 0.5 the BETA curve will be skewed to the right; the curve will be symmetrical for mean = 0.5 and skewed left if the mean is greater than 0.5.

[***]In general, variance is calculated as Var(X) = E(X2) - [E(X)]2. This formula is used to derive the variance of the Triangular and Uniform distributions.

The variance for the Uniform reduces to (P - O)2/12 if the optimistic value is 0; similarly, the standard deviation reduces to (P - O)/3.45.

It is useful to compare the more common distributions under the conditions of identical estimates. Figure 2-6 provides the illustration. Rules of thumb can be inferred from this illustration:

  • As between the Normal, BETA, and Triangular distributions for the same estimates of optimism and pessimism (and the same mode for the BETA and Triangular), the expected value becomes more pessimistic moving from BETA to Triangular to Normal distribution.

  • The variance and standard deviation of the Normal and BETA distributions are about the same when the pessimistic and optimistic values are taken at the 3σ point. However, since the BETA distribution is not symmetrical, the significance of the standard deviation as a measure of spread around the mean is not as great as in the case of the symmetrical Normal distribution.

click to expand
Figure 2-6: Statistical Comparison of Distributions.

In addition to the estimates given above, there are a couple of exact statistics about the Normal distribution that are handy to keep in mind:

  • 68.3% of the values of a Normal distribution fall within 1σ of the mean value.

  • 95.4% of the values of a Normal distribution fall within 2σ of the mean value, and this figure goes up to 99.7% for 3σ of the mean value.

  • A process quality interpretation of 99.7% is that there are three errors per thousand events. If software coding were the object of the error measurement, then "three errors per thousand lines of code" probably would not be acceptable. At 6σ, the error rate is so small, 99.9998%, it is more easily spoken of in terms of "two errors per million events," about 1,000 times better than "3σ". [20]

[20]The Six Sigma literature commonly speaks of 3.4 errors per million events, not 2.0 errors per million. The difference arises from the fact that in the original program developed at Motorola, the mean of the distribution was allowed to "wander" 1.5σ from the expected mean of the distribution. This "wandering" increases the error rate from 2.0 to 3.4 errors per million events. An older shorthand way of referring to this error rate is "five nines and an eight" or perhaps "about six nines."