Monte Carlo Simulation of the Network Performance


Monte Carlo Simulation of the Network Performance

The arithmetic of finding expected value, standard deviation, and variance, at least to approximate values suitable and appropriate to project management, is not hard to do when working with the most common distributions we have described so far in this book. Anyone with reasonable proficiency in arithmetic can do it, and with a calculator or spreadsheet the math is really trivial. However, the manual methodology applied to a network of many tasks, or hundreds of tasks, or thousands, or even tens of thousands of tasks, is so tedious that the number of hand calculations is overwhelming and beyond practicality. Moreover, the usual approach when applying manual methods is to work only with the expected value of the distribution. The expected value is the best single number in the face of uncertainty, to be sure, but if the probability distribution has been estimated, then the distribution is a much more rich representation of the probable task performance than just the one statistic of the distribution called the expected value. Sensibly, whenever more information is available to the project manager, then it is appropriate to apply the more robust information set to the project planning and estimating activities.

If you can imagine that working only with the expected values is a tedious undertaking on a complex network, consider the idea of working with many points from each probability distribution from each task on the network. You immediately come to the conclusion that it is not possible to do such a thing manually. Thus, we look to computer-aided simulation to assist the project manager in evaluating the project network. One immediate advantage is that all the information about task performance represented by the probability distribution is available and usable in the computer simulation. There are many simulation possibilities, but one very popular one in common use and compatible with almost all scheduling programs and spreadsheets is the Monte Carlo simulation.

The Monte Carlo Simulation

The concept of operations behind the Monte Carlo simulation is quite simple: by using a Monte Carlo computer simulation program, [4] the network schedule is "run" or calculated many times, something we cannot usually do in real projects. Each time the schedule is run, a duration figure for each task is picked from the possible values within the pessimistic to optimistic range of the probability distribution for the task. Now each time the schedule is run, for any given task, the duration value that is picked will usually be different. Perhaps the first time the schedule is calculated, the most pessimistic duration is picked. The next time the schedule is run, perhaps the most likely duration is picked. In fact, over a large number of runs, wherein each run means calculating the schedule differently according to the probabilistic outcomes of the task durations, if we were to look at a report of the durations picked for a single task, it would appear that the values picked and their frequency of pick would look just like the probability distribution we assigned to the task. The most likely value would be picked the most and the most pessimistic or optimistic values would be picked least frequently. Table 7-1 shows such a report in histogram form. The histogram has a segregation or discrete quantification of duration values, and for each value there is a count of the number of times a duration value within the histogram quantification occurred.

Table 7-1: Monte Carlo Outcome for Tasks

"Standard" Normal Distribution of Outcome Milestone

Normalized Outcome Value[a] (As Offset from the Expected Value)

Histogram Value * 100[b]

Cumulative Histogram * 100[b] (Confidence)

-3

0.110796

0.110796

-2.75

0.227339

0.338135

-2.5

0.438207

0.776343

-2.25

0.793491

1.569834

-2

1.349774

2.919607

-1.75

2.156932

5.07654

-1.5

3.237939

8.314478

-1.25

4.566226

12.8807

-1

6.049266

18.92997

-0.75

7.528433

26.4584

-0.5

8.80163

35.26003

-0.25

9.6667

44.92673

0

9.973554

54.90029

0.25

9.6667

64.56699

0.5

8.80163

73.36862

0.75

7.528433

80.89705

1

6.049266

86.94632

1.25

4.566226

91.51254

1.5

3.237939

94.75048

1.75

2.156932

96.90741

2

1.349774

98.25719

2.25

0.793491

99.05068

2.5

0.438207

99.48888

2.75

0.227339

99.71622

3

0.110796

99.82702

[a]The outcome values lie along the horizontal axis of the probability distribution. For simplicity, the average value of the outcome (i.e., the mean or expected value) has been adjusted to 0 by subtracting the actual expected value from every outcome value: Adjusted outcomes = Actual outcomes - Expected value.

After adjusting for the mean, the adjusted outcomes are then "normalized" to the standard deviation by dividing the adjusted outcomes by the standard deviation: Normalized outcomes = Adjusted outcomes/σ.

After adjusting for the mean and normalizing to the standard deviation, we now have the "standard" Normal distribution.

[b]The histogram value is the product of the horizontal value (outcome) times the vertical value (probability); the cumulative histogram, or cumulative probability, is the confidence that a outcome value, or a lesser value, will occur: Confidence = Probability outcome Outcome value.

For better viewing, the cell area and the cumulative area have been multiplied by 100 to remove leading zeroes. The actual values are found by dividing the values shown by 100.

Monte Carlo Simulation Parameters

The project manager gets to control many aspects of the Monte Carlo simulation. Such control gives the project manager a fair amount of flexibility to obtain the analysis desired. A few of the parameters usually under project manager control follow. The software package actually used will be the real control of these parameters, but typically:

  • The distribution applied to each task, a group of tasks, or "globally" to the whole network can be picked from a list of choices.

  • The distribution parameters can be specified, such as pessimistic and optimistic values, either in absolute value or as a percentage of the most likely value that is also specified.

  • The task or milestone (one or more) that is going to be the "outcome" of the analysis can be picked.

  • The number of runs can be picked. It is usually hard to obtain good results without at least 100 independent runs of the schedule. By independent we mean that all initial conditions are reset and there is no memory of results from one run to the next. For larger and more complex networks, running the schedule 100 times may take some number of minutes, especially if the computer is not optimized for such simulations. Better results are obtained with 1,000 or more runs. However, there is a practical trade-off regarding analysis time and computer resources. This trade-off is up to the project manager to handle and manage.

Monte Carlo Simulation Outcomes

At the outcome task of the simulation, the usual simulation products are graphical, tabular, and often presented as reports. Figure 7-9 shows typical data, including a "critical path and near-critical analysis" on paths that might be in the example network. The usual analysis products from a Monte Carlo simulation might include:

  • A probability density distribution, with absolute values of outcome value and a vertical dimension scaled to meet the requirement that the sum of all probabilities equals 1

  • A cumulative probability distribution, the so-called "S" curve, again scaled from 0 to 1 or 0 to 100% on the vertical axis and the value outcomes on the horizontal axis

  • Other statistical parameters, such as mean and standard deviation

click to expand
Figure 7-9: Monte Carlo Outcomes.

The Near-Critical Path

Identification of near-critical paths that have a reasonable probability of becoming critical is a key outcome of Monte Carlo simulation. For many project managers, the near-critical path identification is perhaps the most important outcome. In fact, during the course of the Monte Carlo simulation, depending on the distributions applied to the critical tasks and the distributions applied to other paths, there will be many runs, perhaps very many runs, where the critical path identified by straightforward CPM calculations will not in fact be critical. Some other path, on a probabilistic basis, is critical for that particular run. Most Monte Carlo packages optimized for schedule applications keep careful record of all the paths that are, or become, critical during a session of runs. Appropriate reports are usually available.

Project managers can usually specify a threshold for reporting the near-critical paths. For example, perhaps the report contains only information about paths that have a 70% confidence, or higher, of becoming critical. Setting the threshold helps to separate the paths that are nearly critical and should be on a "watch list" along with the CPM-calculated critical path. If the schedule network is complex, such threshold reporting goes a long way in conserving valuable management time.

Convergence of Parameters in the Simulation

During the course of the session of 100 or more runs of the schedule, you may be able to observe the "convergence" of some of the statistical parameters to their final values. For instance, the expected value or standard deviation of the selected "outcome" task is going to change rapidly from the very first run to subsequent runs as more data are accumulated about the outcome distribution. After a point, if the parameter is being displayed, the project manager may well be able to see when convergence to the final value is "close enough." Such an observation offers the opportunity to stop the simulation manually when convergence is obtained. If the computer program does not offer such a real-time observation, there is usually some type of report that provides figures of merit about how well converged the reported parameters are to a final value.

There is no magic formula that specifies how close to the final value the statistical parameters like expected value, standard deviation, or variance should be for use in projects. Project managers get to be their own judge about such matters. Some trial and error may be required before a project manager is comfortable with final results.

Fixed Dates and Multiple Precedences in Monte Carlo Simulations

Fixed dates interfere with the Monte Carlo simulation by truncating the possible durations of tasks to fixed lengths or inhibiting the natural "shift-right" nature of merge points as discussed in the next section. Before running a simulation, the general rule of thumb is to go through your schedule and remove all fixed dates, then replace them with finish-to-start dependencies of project outcomes.

The same is usually said for precedences other than finish-to-start: redefine all precedences to finish-to-start before running the simulation. Many Monte Carlo results may be strange or even incorrect, depending on the sophistication of the package, if there are other than finish-to-start dependencies in the schedule. Again, the general rule of thumb is to go through your schedule and remove all relationships other than finish-to-start and replace them with an alternative network architecture of all finish-to-start dependencies of project outcomes. Although objectionable at first, the author has found that few networks really require other than finish-to-start relationships if the proper granularity of planning is done to identify all the points for finish-to-start, which obviates the need to use the other relationships.

[4]There are many PC and larger system software packages that will run a Monte Carlo simulation on a data set. In this chapter, our focus is on the network schedule, so the easiest approach is to obtain a package that "adds in" or integrates with your scheduling software. Of course, Monte Carlo simulation is not restricted to just schedule analysis. Any set of distributions can be analyzed in this way. For instance, the cost data from the WBS whereon each cost account has a probability distribution are candidates for Monte Carlo simulation. For cost analysis, an add-in to a spreadsheet or a statistical programs package would be ideal for running a Monte Carlo analysis of a cost data set.