Key Statistics Used in Projects


Key Statistics Used in Projects

Strictly speaking, statistics are data. The data need not be a result of analysis. Statistics are any collection of data. We often hear, "What are the statistics on that event?" In other words, what are the numbers that are meaningful for understanding? Perhaps the most useful definition of statistics is that statistics is the "methods and techniques whereby collections of data are analyzed to obtain understanding and knowledge." [13] Statistical methods are by and large methods of approximation and estimation. As such, statistical methods fit very well with project management since the methods of project management are often approximate and based only on estimates.

Informational data, of course, are quite useful to project managers and to members of the project management team for estimating and forecasting, measuring progress, assessing value earned, quantifying risk, and calculating other numerical phenomena of importance to the project. Statistical methods provide some of the tools for reducing such data to meaningful information which the team uses to make decisions.

Expected Value and Average

The best-known statistic familiar to everyone is "average" (more properly, arithmetic average), which is arithmetically equal to a specific case of expected value. Expected value, E, is the most important statistic for project managers. The idea of expected value is as follows: In the face of uncertainty about a random variable that has values over a range, "expected value" [14] is the "best" single number to represent a range of estimated value of that random variable. "Best" means expected value is an unbiased maximum likelihood estimator for the population mean. [15] We will discuss unbiased maximum likelihood estimators more in subsequent paragraphs. Reducing a range of value to a single number comes into play constantly. When presenting a budget estimate to project sponsors, more often than not, only one number will be acceptable, not a range. The same is true for schedule, resource hours, and a host of other estimated variables.

Mathematically, to obtain expected value we add up all the values that a random variable can take on, weighting or multiplying each by the probability that that value will occur. Sound familiar? Except for "weighting each value by the probability," the process is identical to calculating the arithmetic average. But wait: in the case of the arithmetic average, there actually are weights on each value, but every weight is 1/n. Calculating expected value, E:

  • E(X) = [(p1 * X1) + (p2 * X2) + (p3 * X3) + (p4 * X4) + ...]

  • E(X) = (pi * Xi) for all values of "i"

where pi is the probability of specific value Xi occurring. If pi = 1/n, where "n" is the number of values in the summation, then E(X) is mathematically equal to the calculation of arithmetic average.

Consider this example: a work package manager estimates a task might take 2 weeks to complete with a 0.5 probability, optimistically 1.5 weeks with a 0.3 probability, but pessimistically it could take 5 weeks with a 0.2 probability. What is the expected value of the task duration?

  • E(task duration D) = 0.3 * 1.5w + 0.5 * 2 + 0.2 * 5w

  • E(task duration D) = 2.45w

Check yourself on the "p"s:

p1 + p2 + p3 = 0.3 + 0.5 + 0.2 = 1

There are a couple of key things of which to take note. E(X) is not a random variable. As such, E(X) will have a deterministic value. E(X) does not have a distribution. E(X) can be manipulated mathematically like any other deterministic variable; E(X) can be added, subtracted, multiplied, and divided.[16]

Transforming a space of random variables into a deterministic number is the source of power and influence of the expected value. This concept is very important to the project manager. Very often, the expected value is the number that the project manager carries to the project balance sheet as the estimate for a particular item. The range of values of the distributions that go into the expected value calculation constitutes both an opportunity (optimistic end of the range) and a threat (pessimistic end of the range) to the success of the project. The project manager carries the risk element to the risk portion of the project balance sheet.

If X is a continuous random variable, then the sum of all values of X morphs into integration as we saw before. We know that pi is the shorthand for the probability function f(X | Xi), so the expected value equation morphs to:

E(X) = X * f(X | Xi) dX, integrated over all values of "X"

Fortunately, as a practical matter, project managers do not really need to deal with integrals and integral calculus. The equations are shown for their contribution to background. Most of the integrals have approximate formulas amenable to solution with arithmetic or tables of values that can be used instead.

Mean or "μ"

Expected value is also called the "mean" of the distribution. [17] A common notation for mean is the Greek letter "μ,". Strictly speaking, "μ" is the notation for the population mean when all values in the population range are known. If all the values in a population are not known, or cannot be measured, then the expected value of those values that are known becomes an estimate of the true population mean, μ. As such, the expected value calculated from only a sample of the population may be somewhat in error of μ. We will discuss more about this problem later.

Variance and Standard Deviation

Variance and standard deviation are measures of the spread of values around the expected value. As a practical matter for project practitioners, the larger the spread, the less meaningful is the expected value per se.

Variance and standard deviation are related functionally:

SD = sqrt(VAR) = VAR

where VAR (variance) is always a positive number and so, therefore, is SD (standard deviation). Commonly used notation is σ = SD, σ2 = VAR.

Variance is a measure of distance or spread of a probable outcome from the expected value of the outcome. Whether the distance is "negative" or "positive" in some arbitrary coordinate system is not important for judging the significance of the distance. Thus we first calculate distance from the expected value as follows:

Distance2 = [Xi - E(X)]2

The meaning of the distance equation is as follows: the displacement or distance of a specific value of X, say for example a value of "Xi", from the expected value is calculated as the square of the displacement of Xi from E(X). Figure 2-5 illustrates the idea. Now we must also account for the probability of X taking on the value of "Xi".

click to expand
Figure 2-5: Variance and Standard Deviation.

Probabilistic distance = p(Xi) * [Xi - E(X)]2

Now, to obtain variance, we simply add up all the probabilistic distances:

VAR(X) = σ2(X) = p(X) * [Xi - E(X)]2 for all "i"

which simplifies to:

VAR(X) = σ2(X) = E(X2) - [E(X)]2

To find the standard deviation, σ, we take the square root of the variance.

Let's return to the example of task duration used in the expected value discussion to see about variance. The durations and the probability of each duration are specified. Plugging those figures into the variance equation:

σ2(task duration D) = 0.3 * (1.5 - 2.45)2 + 0.5 * (2 - 2.45)2

+ 0.2 * (5 - 2.45)2

where 2.45 weeks is the expected value of the task duration from prior calculation, σ2(task duration D) = 1.67 weeks-squared, and σ(task duration D) = 1.29 weeks. [18]

It is obvious that variance may not have physical meaning, whereas standard deviation usually does have some physical meaning. [19]

Mode

The mode of a random outcome is the most probable or most likely outcome of any single occurrence of an event. If you look at the distribution of outcome values versus their probabilities, the mode is the value at the peak of the distribution curve. Outcomes tend to cluster around the mode. Many confuse the concept of mode, the most likely outcome of any single event, with expected value, which is the best estimator of outcome considering all possible values and their probabilities. Of course, if the distribution of values is symmetrical about the mode, then the expected value and the mode will be identical.

Median

The median is the value that is half the distance between the absolute value of the most pessimistic value and the most optimistic value.

Median = 1/2 * | (optimistic value - pessimistic value)

[13]Balsley, Howard, Introduction to Statistical Method, Littlefield, Adams & Co., Totowa, NJ, 1964, pp. 3–4.

[14]Schyuler, John R., Decision Analysis in Projects, Project Management Institute, Newtown Square, PA, 1996, chap. 1, p. 11.

[15]"Best" may not be sufficiently conservative for some organizations, depending on risk attitude. Many project managers forecast with a more pessimistic estimate than the expected value.

[16]

Caution 

Strictly speaking, arithmetic operations on the expected value depend on whether or not only linear equations of probability were involved, like summations of cost or schedule durations. For example, nonlinear effects arise in schedules due to parallel and merging paths. In such cases, arithmetic operations are only approximate, and statistical simulations are best.

[17]You may also hear the term "moment" or "method of moments." Expected value is a "moment of X"; in fact, it is the "first moment of X." E(Xn) is called the "nth moment of X." An in-depth treatment of moments requires more calculus than is within the scope of this book.

[18]"σ" is the lower case "s" in the Greek alphabet. It is pronounced "sigma."

[19]An exception to the idea that variance has no physical meaning comes from engineering. The variance of voltage is measured in power: VAR(voltage) = watts.