Covariance and Correlation in Projects
It often arises in the course of executing projects that one or more random variables, or events, appear to bear on the same project problem. For instance, fixed costs that accumulate period by period and the overall project schedule duration are two random variables with obvious dependencies. Two statistical terms come into play when two or more variables are in the same project space: covariance and correlation.
Coveriance
Covariance is a measure of how much one random variable depends on another. Typically, we think in terms of "if X gets larger, does Y also get larger or does Y get smaller?" The covariance will be negative for the latter and positive for the former. The value of the covariance is not particularly meaningful since it will be large or small depending on whether X and Y are large or small. Covariance is defined simply as:
Cov(X,Y) = E(X * Y)  E(X) * E(Y)
If X and Y are independent, then E(X * Y) = E(X) * E(Y), and COV(X,Y) = 0.
Table 27 provides a project situation of covariance involving the interaction of cost and schedule duration on a WBS work package. The example requires an estimate of cost given various schedule possibilities. Once these estimates are made, then an analysis can be done of the expected value and variance of each random variable, the cost variable, and schedule duration variable. These calculations provide all that is needed to calculate the covariance.
Table 27: Covariance and Correlation Example
Table 27A Cost * Duration Calculations 


Work Package Duration, D Value 
Work Package Cost, $C 
p(D * C) of a Joint Outcome 
Joint Outcome, D * C 
E(D * C) 
2 months 
$10 
0.1 
20 
2 
$20 
0.15 
40 
6 

$60 
0.05 
120 
6 

3 months 
$10 
0.2 
30 
6 
$20 
0.3 
60 
18 

$60 
0.08 
180 
14.4 

4 months 
$10 
0.02 
40 
0.8 
$20 
0.05 
80 
4 

$60 
0.05 
240 
12 

Totals: 
1 
69.2 
Table 27B Cost Calculations 


Work Package Cost, $C 
p(C) of a Cost Outcome, Given All Schedule Possibilities 
E(C), $ 
σ^{2} Variance 
$10 
0.32 
$3.2 
62.7 = 0.32(10  24)^{2} 
$20 
0.5 
$10 
8 = 0.5(20  24)^{2} 
$60 
0.18 
$10.8 
233.2 = 0.18(60  24)^{2} 
1 
$24.00 
σ_{C}^{2} = 304 σ_{C}^{2} = $17.44 
Table 27C Duration Calculations 


Work Package Duration, D Value 
p(D) of a Schedule Outcome, Given All Cost Possibilities 
E(D), months 
σ^{2} Variance and Standard Deviation 
2 months 
0.3 
0.6 
0.2 = 0.3(2  2.82)^{2} 
3 months 
0.58 
1.74 
0.018 = 0.58(3  2.82)^{2} 
4 months 
0.12 
0.48 
0.17 = 0.12(4  2.82)^{2} 
1 
2.82 
σ_{D}^{2} = 0.39 σ_{D} = 0.62 month 

COV(D,C) = E(DC)  E(D) * E(C) 

COV(D,C) = 69.2  2.82 * 24 = 1.52 

Meaning: Because of the positive covariance, cost and schedule move in the same way; if one goes up, so does the other. 

r(DC) = COV(D,C)/(σ_{D} * σ_{C}) = 1.52/(0.62 * $17.44) = 0.14 

Meaning: Judging by a scale of 1 to +1, the "sensitivity" of cost to schedule is weak. 
If the covariance of two random variables is not 0, then the variance of the sum of X and Y becomes:
VAR(X + Y) = VAR(X) + VAR(Y) + 2 * COV(X,Y)
The covariance of a sum becomes a governing equation for the project management problem of shared resources, particularly people. If the random variable X describes the availability need for a resource and Y for another resource, then the total variance of the availability need of the combined resources is given by the equation above. If resources are not substitutes for one another, then the covariance will be positive in many cases, thereby broadening the availability need (that is, increasing the variance) and lengthening the schedule accordingly. This broadening phenomenon is the underlying principle behind the lengthening of schedules when they are "resource leveled." ^{[24]}
Correlation
Covariance does not directly measure the strength of the "sensitivity" of X on Y; judging the strength is the job of correlation. Sensitivity will tell us how much the cost changes if the schedule is extended a month or compressed a month. In other words, sensitivity is always a ratio, also called a density, as in this example: $cost change/month change. But if cost and time are random variables, what does the ratio of any single outcome among all the possible outcomes forecast for the future? Correlation is a statistical estimate of the effects of sensitivity, measured on a scale of 1 to +1.
The Greek letter rho, ρ, used on populations of data, and "r", used with samples of data, stand for the correlation between two random variables: r(X,Y). The usual way of referring to "r" or "ρ" is as the "correlation coefficient." As such, their values can range from 1 to +1. "0" value means no correlation, whereas 1 means highly correlated but moving in opposite directions, and +1 means highly correlated moving in the same direction.
The correlation function is defined as the covariance normalized by the product of the standard deviations:
r(X,Y) = COV(X,Y)/(σ_{X} * σ_{Y})
We can now rewrite the variance equation:
VAR(X + Y) = VAR(X) + VAR(Y) + 2 * ρ(σX + σY)
Table 27 provides a project example of correlation.
^{[24]}A Guide to the Project Management Body of Knowledge (PMBOK® Guide) — 2000 Edition, Project Management Institute, Newtown Square, PA, chap. 6, p. 76.