Covariance and Correlation in Projects


Covariance and Correlation in Projects

It often arises in the course of executing projects that one or more random variables, or events, appear to bear on the same project problem. For instance, fixed costs that accumulate period by period and the overall project schedule duration are two random variables with obvious dependencies. Two statistical terms come into play when two or more variables are in the same project space: covariance and correlation.

Coveriance

Covariance is a measure of how much one random variable depends on another. Typically, we think in terms of "if X gets larger, does Y also get larger or does Y get smaller?" The covariance will be negative for the latter and positive for the former. The value of the covariance is not particularly meaningful since it will be large or small depending on whether X and Y are large or small. Covariance is defined simply as:

Cov(X,Y) = E(X * Y) - E(X) * E(Y)

If X and Y are independent, then E(X * Y) = E(X) * E(Y), and COV(X,Y) = 0.

Table 2-7 provides a project situation of covariance involving the interaction of cost and schedule duration on a WBS work package. The example requires an estimate of cost given various schedule possibilities. Once these estimates are made, then an analysis can be done of the expected value and variance of each random variable, the cost variable, and schedule duration variable. These calculations provide all that is needed to calculate the covariance.

Table 2-7: Covariance and Correlation Example

Table 2-7-A Cost * Duration Calculations

Work Package Duration, D Value

Work Package Cost, $C

p(D * C) of a Joint Outcome

Joint Outcome, D * C

E(D * C)

2 months

$10

0.1

20

2

 

$20

0.15

40

6

 

$60

0.05

120

6

3 months

$10

0.2

30

6

 

$20

0.3

60

18

 

$60

0.08

180

14.4

4 months

$10

0.02

40

0.8

 

$20

0.05

80

4

 

$60

0.05

240

12

Totals:

 

1

 

69.2

Table 2-7-B Cost Calculations

Work Package Cost, $C

p(C) of a Cost Outcome, Given All Schedule Possibilities

E(C), $

σ2 Variance

$10

0.32

$3.2

62.7 = 0.32(10 - 24)2

$20

0.5

$10

8 = 0.5(20 - 24)2

$60

0.18

$10.8

233.2 = 0.18(60 - 24)2

 

1

$24.00

σC2 = 304

σC2 = $17.44

Table 2-7-C Duration Calculations

Work Package Duration, D Value

p(D) of a Schedule Outcome, Given All Cost Possibilities

E(D), months

σ2 Variance and Standard Deviation

2 months

0.3

0.6

0.2 = 0.3(2 - 2.82)2

3 months

0.58

1.74

0.018 = 0.58(3 - 2.82)2

4 months

0.12

0.48

0.17 = 0.12(4 - 2.82)2

 

1

2.82

σD2 = 0.39

σD = 0.62 month

COV(D,C) = E(DC) - E(D) * E(C)

COV(D,C) = 69.2 - 2.82 * 24 = 1.52

Meaning: Because of the positive covariance, cost and schedule move in the same way; if one goes up, so does the other.

r(DC) = COV(D,C)/(σD * σC) = 1.52/(0.62 * $17.44) = 0.14

Meaning: Judging by a scale of -1 to +1, the "sensitivity" of cost to schedule is weak.

If the covariance of two random variables is not 0, then the variance of the sum of X and Y becomes:

VAR(X + Y) = VAR(X) + VAR(Y) + 2 * COV(X,Y)

The covariance of a sum becomes a governing equation for the project management problem of shared resources, particularly people. If the random variable X describes the availability need for a resource and Y for another resource, then the total variance of the availability need of the combined resources is given by the equation above. If resources are not substitutes for one another, then the covariance will be positive in many cases, thereby broadening the availability need (that is, increasing the variance) and lengthening the schedule accordingly. This broadening phenomenon is the underlying principle behind the lengthening of schedules when they are "resource leveled." [24]

Correlation

Covariance does not directly measure the strength of the "sensitivity" of X on Y; judging the strength is the job of correlation. Sensitivity will tell us how much the cost changes if the schedule is extended a month or compressed a month. In other words, sensitivity is always a ratio, also called a density, as in this example: $cost change/month change. But if cost and time are random variables, what does the ratio of any single outcome among all the possible outcomes forecast for the future? Correlation is a statistical estimate of the effects of sensitivity, measured on a scale of -1 to +1.

The Greek letter rho, ρ, used on populations of data, and "r", used with samples of data, stand for the correlation between two random variables: r(X,Y). The usual way of referring to "r" or "ρ" is as the "correlation coefficient." As such, their values can range from -1 to +1. "0" value means no correlation, whereas -1 means highly correlated but moving in opposite directions, and +1 means highly correlated moving in the same direction.

The correlation function is defined as the covariance normalized by the product of the standard deviations:

r(X,Y) = COV(X,Y)/(σX * σY)

We can now rewrite the variance equation:

VAR(X + Y) = VAR(X) + VAR(Y) + 2 * ρ(σX + σY)

Table 2-7 provides a project example of correlation.

[24]A Guide to the Project Management Body of Knowledge (PMBOK® Guide) — 2000 Edition, Project Management Institute, Newtown Square, PA, chap. 6, p. 76.