Random Variables and Their Functions in Projects


Random Variables and Their Functions in Projects

Random Variables

So far, we have discussed random events (tails coming up on a coin toss) and probability spaces (a coin toss can only be heads or tails because there is nothing else in the probability space), but we have not formally talked about the numerical value of an event. If the number of heads or tails is counted, then the count per se is a random variable, CH for heads and CT for tails. [6] "Random variables" is the term we apply to numerical outcomes of events when the value cannot be known in advance of the event and where the value is a number within a range of numbers. When a random event is completed, like tossing a coin 100 times, then the random variables will have specific values obtained by counting, measuring, or observing. In the project world, task durations denoted D with values in hours, days, or weeks and cost figures denoted C with values in dollars, pesos, or euros are two among many random variables project managers will encounter.

If a variable is not a random variable, then it is a deterministic or single-point variable. A deterministic variable has no distribution of values but rather has one and only one value, and there is no uncertainty (risk) as to its measure.

As examples of random and deterministic variables, let us say that we measure cost with variable C and that C is a random variable with observed values of $8, $9, $10, $12, and $15. The range of values is therefore from $8 to $15. We do not know C with certainty, but we have an expectation that any value of C would be between $8 and $15. If C were deterministic and equal to $13.50, then $13.50 is the only value C can have. Thus, $13.50 is a risk-free measure of C.

If C were a random variable with values from $8 to $15, we would also be interested in the probability of C taking on a value of $8 or $12 or any of the other values. Thus we associate with C not only the range of values but also the probability that any particular value will occur.

Probability Functions

Random variables do not have deterministic values. In advance of a random outcome, like the uncertain duration or cost of a work package, the project team can only estimate the probable values, but the team will not know for sure what value is taken until the outcome occurs. Of course, we do not know for sure that the event will happen at all. The event itself can only be predicted probabilistically.

In the coin toss, the probability of any specific value of H or T happens to be the same: H = 1 or 0 on any specific toss, 1 if heads, else 0, and similarly for T.

p(H = T = 50 in toss of 100) = 0.5

But equal values may not be the case for all random variables in all situations.

p(D = 7 in one roll of two die) = 1/6 = 0.167

p(D = 5 in one roll of two die) = 1/9 = 0.111

where D = value of the sum of the two faces of the die on a single roll.

The probability function [7] is the mathematical relationship between a random variable's value and the probability of obtaining that value. In effect, the probability function creates a functional relationship between a probability and an event:

f(X | value) = p(X = some condition or value)

f(X | a) = p(X = a), where the "|" is the symbol used to mean "evaluated at" or "given a value of the number "a". Example: f(H | true) = 0.5 from the coin toss.

Discrete Random Variables

So far, our examples of random variables have been discrete random variables. H or T could only take on discrete values on any specific toss: 1 or 0. On any given toss, we have no way of knowing what value H or T will take, but we can estimate or calculate what the probable outcomes are, and we can say for certain, because the random variables are discrete, that they will not take on in-between values. For sure, H cannot take on a value of 0.75 on any specific toss; only values of 1 (true) or 0 (false) are allowed. Sometimes knowing what values cannot happen is as important as knowing what values will happen.

Random variables are quite useful in projects when counting things that have an atomic size. People, for instance, are discrete. There is no such thing as one-half a person. Many physical and tangible objects in projects fit this description. Sometimes actions by others are discrete random variables in projects. We may not know at the outset of a project if a regulation will be passed or a contract option exercised, but we can calculate the probability that an action will occur, yes or no.

Many times there is no limit to the number of values that random variables can take on in the allowed range. There is no limit to how close together one value can be to the next; values can be as arbitrarily close together as required. The only requirement is that for any and all values of the discrete random variable, the sum of all their probabilities of occurrences equals 1:

all fi(X | ai) = 1, for i = 1 to "n"

where fi(X) is one of "n" probability functional values for the random variable, there being one functional value for each of the "n" values that X can take on in the probability space, and "ai" is the ith probable value of X.

In the coin toss experiment, "n" = 2 and "a" could have one of two values: 1 or 0. In the dice roll, "n" = 36; the values are shown in Table 2-1.

Continuous Random Variables

As the number of values of X increases in a given range of values, the spacing between them becomes smaller, so small in the limit that one cannot distinguish between one unique value and another. So also do the value's individual probabilities become arbitrarily small in order not to violate the rule about all probabilities adding up to 1. Such a random variable is called a continuous random variable because there is literally no space between one value and another; one value flows continuously to the next. Curiously, the probability of a specific value is arbitrarily near but not equal to 0. However, over a small range, say from X1 to X1 + dX, the probability of X being in this range is not necessarily small. [8]

As the number of elements in the probability function becomes arbitrarily large, the morphs smoothly to the integral : a-b all f(X) dX means integrate over all continuous values of X from values of alower to bupper

a-b all f(X) dX = 1

There are any number of continuous random variables in projects, or random variables that are so nearly continuous as to be reasonably thought of as continuous. The actual cost range of a work breakdown structure work package, discrete perhaps to the penny but for most practical applications continuous, is one example. Schedule duration range is another if measured to arbitrarily small units of time. Lifetime ranges of tools, facilities, and components are generally thought of as continuous.

Cumulative Probability Functions

It is useful in many project situations to think of the accumulating probability of an event happening. For instance, it might be useful to convey to the project sponsor that "...there is a 0.6 probability that the schedule will be 10 weeks or shorter." Since the maximum cumulative probability is 1, at some point the project manager can declare "...there is certainty, with probability 1, that the schedule will be shorter than x weeks."

We already have the function that will give us this information; we need only apply it. If we sum up the probability functions of X over a continuous range of values, ai, then we have what we want: all fi(X | ai) = 1, for i = "m" to "n" accumulates the probabilities of values between the limits of "m" and "n".

Table 2-2 provides an example of how a cumulative probability function works for a discrete random variable.

Table 2-2: Cumulative Discrete Probability Function

A

B

C

Outcome of Random Variable Di for an Activity Duration

Probability Density of Outcome Di

Cumulative Probability of Outcome Di

3 days

0.1

0.1

5 days

0.3

0.4

7 days

0.4

0.8

10 days

0.15

0.95

20 days

0.05

1.0

click to expand

Di is an outcome of an event described by the random variable D for task duration.

The probability of a single-valued outcome is given in column B; the accumulating probability that the duration will be equal to or less than the outcome in column A is given in column C.

Of course, for a continuous random variable, it is pretty much the same. We integrate from one limit of value to another to find the probability of the value of X hitting in the range between the limits of integration. For our purposes, integration is nothing more than summation with arbitrarily small separation between values.

[6]Italicized bold capital letters will be used for random variables.

[7]The probability function is often called the "probability density function." This name helps distinguish it from the cumulative probability function and also fits with the idea that the probability function really is a density, giving probability per value.

[8]"dX" is a notation used to mean a small, but not zero, value. Readers familiar with introductory integral calculus will recognize this convention.