Probability Distributions for Project Managers


Probability Distributions for Project Managers

If we plot the probability (density) function (PDF) on a graph with vertical axis as probability and horizontal axis as value of X, then that plot is called a "distribution." The PDF is aptly named because the PDF shows the distribution of value according to the probability that that value will occur, as illustrated in Figure 2-3. [9] Although the exact numerical values may change from one plot to the next, the general patterns of various plots are recognizable and have well-defined attributes. To these patterns we give names: Normal distribution, BETA distribution, Triangular distribution, Uniform distribution, and many others. The attributes also have names, such as mean, variance, standard distribution, etc. These attributes are also known as statistics.

click to expand
Figure 2-3: Probability Distribution.

Uniform Distribution

The discrete Uniform distribution is illustrated in Figure 2-4. The toss of the coin and the roll of the single die are discrete Uniform distributions. The principal attribute is that each value of the random variable has the same probability. In engineering, it is often useful to have a random number generator to simulate seemingly totally random events, each event being assigned a unique number. It is very much desired that the random numbers generated come from a discrete Uniform distribution so that no number, thus no event, is more likely than another.

click to expand
Figure 2-4: Common Distributions in Projects.

If the random variable is continuous, or the values of the discrete random variable are so close together so as to be approximately continuous, then, like all continuous distributions, the vertical axis is scaled such that the "area under the curve" equals 1. Why so? This is just a graphical way of saying that if all probabilities for all values are integrated, then the total will come to 1.

Recall: a-b all f(X) dX = 1

where dX represents an increment on the horizontal axis and f(X) represents a value on the vertical axis. Vertical * horizontal = area. Thus, mathematical integration is an area calculation.

Triangular Distribution

The Triangular distribution is applied to continuous random variables. The Triangular distribution is usually shown with a skew to one side or the other. The Triangular distribution portrays the situation that not all outcomes are equally likely as was the case in the Uniform distribution. The Triangular distribution has finite tails that meet the horizontal value axis at some specific value.

There is little in nature that has a Triangular distribution. However, it is a good graphical and mathematical approximation to many events that occur in projects. Project management, like engineering, relies heavily on approximation for day-to-day practice. For instance, the approximate behavior of both cost work packages and schedule task durations can be modeled quite well with the Triangular distribution. The skew shows the imbalance in pessimism or optimism in the event.

The BETA Distribution

The BETA distribution is a distribution with two parameters, typically denoted "a" and "b" in its PDF, that influence its shape quite dramatically. Depending on the values of "a" and "b", the BETA distribution can be all the way from approximately Uniform to approximately Normal. [10] However, forms of the BETA distribution that are most useful to projects are asymmetrical curves that look something like rounded-off triangles. Indeed, it is not incorrect to think that the Triangular distribution approximates the BETA distribution. But for the rounded-off appearance of the BETA distribution, it appears in many respects the same as the Triangular distribution, each having a skew to one side or the other and each having finite tails that come down to specific values on the horizontal value axis. Events that happen in nature rarely, if ever, have distinct endpoints. Mother Nature tends to smooth things out. Nevertheless, the BETA distribution approximates many natural events quite well.

The Normal Distribution

The Normal distribution is a well-known shape, sometimes referred to as the "bell curve" for its obvious similarity to a bell. In some texts, it will be referred to as the Gaussian distribution after the 19th century mathematician Carl Friedrich Gauss. [11] The Normal distribution is very important generally in the study of probability and statistics and useful to the project manager for its rather accurate portrayal of many natural events and for its relationship to something called the "Central Limit Theorem," which we will address shortly.

Let's return to the coin toss experiment. The values of H and T are uniformly distributed: H or T can each be either value 1 or value 0 with equal probability = 0.5. But consider this: the count of the number of times T comes up heads in 100 tosses is itself a random variable. Let CT stand for this random variable. CT has a distribution, as do all random variables. CT's distribution is Normal, with the value of 50 counts of T at the center. At the tails of the Normal distribution are the counts of T that are not likely to occur if the coin is fair.

Theoretically, the Normal distribution's tails come asymptotically close to the horizontal axis but never touch it. Thus the integration of the PDF must extend to "infinite" values along the horizontal axis in order to fully define the area under the curve that equals 1. As a practical matter, project managers and engineers get along with a good deal less than infinity along the horizontal axis. For most applications, the horizontal axis that defines about 99% of the area does very nicely. In the "Six Sigma" method, as we will discuss, a good deal more of the horizontal axis is used, but still not infinity.

Other Distributions

There are many other distributions that are useful in operations, sales, engineering, etc. They are amply described in the literature, [12] and a brief listing is given in Table 2-3.

Table 2-3: Other Distributions

Distribution

General Application

Poisson

  • The Poisson distribution is used for counting the random arrival or occurrence of an event in a given time, area, distance, etc. For example, the random clicks of a Geiger counter or the random arrival of customers to a store or website is generally Poisson distributed.

  • The Poisson distribution has a parameter, λ for arrival rate. As λ becomes large, the Poisson distribution is approximately Normal with μ = λ.

Binomial

  • The Binomial distribution applies to events that have two outcomes, like the coin toss, where the outcomes are generally referred to as success or failure, true or false. If X is the number of successes, n, in a series of repeated trials, then X will have a Binomial distribution.

  • As n becomes large, the Binomial distribution is approximately Normal.

  • The number of heads in a coin toss is Binomial for small n, becoming all but Normal for large n.

Rayleigh

  • The Rayleigh distribution is an asymmetrical distribution of all positive outcomes. It approximates outcomes that tend to cluster around a most likely value, but nonetheless have a finite probability of great pessimism.

  • The Rayleigh has a single parameter, "b", that is the most likely value of the outcome.

Student's t

  • The Student's t, or sometimes just t-distribution, is used in estimating confidence intervals when the variance of the population is unknown and the sample size is small.

  • The Student's t is closely related to the Normal distribution, being derived from it.

  • The Student's t has a parameter v, for "degrees of freedom." For large values of v, the Student's t is all but Normal.

Chi-square

  • The distribution of random variables of the form χ2 is often the Chi-square, named after the Greek letter chi χ.

  • The Chi-square distribution is always positive and highly asymmetrical, appearing like a decaying exponential when a parameter, n, for degrees of freedom, is small.

  • The Chi-square finds application in hypothesis testing and in determining the distribution of the sample variance.

[9]The probability function is also known as the "distribution function" or "probability distribution function."

[10]If the sum of "a" and "b" is a large number, then the BETA will be more narrow and peaked than a Normal; the ratio of a/b controls the asymmetry of the BETA.

[11]Carl Friedrich Gauss (1777-1855) in 1801 published his major mathematical work, Disquisitiones Arithmeticae. Gauss was a theorist, an observer, astronomer, mathematician, and physicist.

[12]Downing, Douglas and Clark, Jeffery, Statistics the Easy Way, Barrons Educational Series, Hauppauge, NY, 1997, pp. 90–155.