"Statistics are no substitute for judgment."
—HENRY CLAY, U.S. Senator
Risk assessment provides a prioritized list of risks. Using the list, it becomes clear just how much trouble your project is in. An accumulation of significant scope risks may indicate that your project is literally impossible. Too many schedule or resource risks may indicate that your project is unlikely to complete within its constraints. Project risk management is a potent tool for transforming a seemingly impossible project into a merely challenging one.
Managing risk begins with your prioritized list of significant risks, but these details and statistics are just the starting point; you must then add your judgment and decide how to proceed. For each of the significant risks, you need to seek root causes to determine your best management strategy. For risks where the project team has influence over the root cause, you can develop and analyze ideas to reduce or eliminate the risk and then modify the project plans to incorporate these ideas wherever it is feasible. For risks that cannot be avoided or that remain significant, you can then develop contingency plans for recovery should the risk occur.
Most of the content of this chapter falls into the "Risk Response Planning" portion of the Planning Processes in the PMBOK Guide, but it also draws from other Planning Processes. The principal concepts this chapter covers include:
What, if anything, can be done about a risk depends a great deal on its causes. For each identified risk that is assessed as significant, you must determine the source and type of risk that it represents.
The process for cause-and-effect analysis is not a difficult one. For risk analysis, it begins with the listed risks and their descriptions. Effective root cause analysis depends on a consistent and thorough understanding of the project's risks, so it helps to start by reviewing the list with the project contributors so that they can describe each risk in their own words.
The next step is to brainstorm possible sources for the risk. Any brainstorming process will be effective so long as it is successful in determining conditions or events that may lead to the risk. You can begin with major cause categories (such as scope, schedule, or resource) or simply think about specific factors that may lead to the risk. However you begin the analysis, complete it by organizing the information into categories of root cause. As in any brainstorming, it is best to focus on the quantity of ideas, not the quality. Even an idea that seems very unlikely may trigger an important thought for someone participating in the analysis. Some redundancy between the categories is common, and removing it is a matter of personal choice.
Cause-and-effect analysis using fishbone diagrams, so called because of their appearance, was popularized by the Japanese quality movement guru Dr. Kaoru Ishikawa (they are also sometimes called Ishikawa diagrams). These diagrams may be used to display root causes of risk visually, allowing deeper understanding of the source and likelihood of potential problems. Once you have organized the ideas into a branching diagram similar to the one in Figure 8-1, review it to see whether that perspective on the risk stimulates any additional thinking. Note that the causes may themselves have multiple potential sources. Continue the root cause analysis process for each significant risk in the project.
Figure 8-1: Fishbone diagram example.
In dealing with risk, there are really only two rather simple options. In an advertisement some years ago, the options were demonstrated pictorially using an egg. On the left side of the picture was a falling egg, headed for a pillow held in a person's hand. On the right side was a fallen egg, broken and oozing over the flat, hard surface it had smashed into, with a second hand swooping in holding a paper towel. The left side was titled "Prevention" and the right side "Recovery." Management of risk in projects always involves these tactics—prevention to deal with causes, and recovery to deal with effects.
The three categories of project risk are controllable known risks, uncontrollable known risks, and unknown risks. All the significant listed project risks are known risks and are either under your control or not. For each of these risks it is possible to plan for response, at least in theory, and that is the topic of this chapter. The third category, unknown risks, is hidden, so specific planning is not generally of much use. The best method for managing unknown risk involves setting project reserves, in schedule or budget (or both), on the basis of the measured consequences of unanticipated problems on similar past projects. Keeping track of specific past problems also converts your past unknown risks into known risks. Managing unknown project risk is addressed in more detail in Chapter 10.
Root cause analysis not only makes known project risks more understandable but also shows you how to manage each risk. Depending on the root cause or causes, you can determine whether the risk arises from factors you can control and may therefore be preventable or whether it is due to uncontrollable causes. When the causes are out of your control, risk can only be managed through recovery. These strategies are summarized in Figure 8-2.
Figure 8-2: Risk management strategies.
Known controllable risks are at least partially under the control of the project team. Risks such as the use of a new technology, small increases in complexity or performance of a deliverable, or pressure to establish aggressive deadlines are examples of this. Working from an understanding of the root causes for these problems, you may be able to modify project plans to avoid or minimize the risk.
For known uncontrollable risks, the project team has essentially no influence on the source of the risk. Loss of key project staff members, business reorganizations, and external project factors such as weather are examples. For these problems, the best tactic is to deal with effects after the risk occurs, recovering with a contingency plan you prepared in advance.
It is common for a root cause analysis to uncover both some causes that you can control and some that you cannot, for the same risk. Responding to risks with several possible sources may require both replanning and preparation for recovery.
While the dichotomy between controllable and uncontrollable may seem simple, it often is not. The perceived root causes of a risk vary depending on the description of the risk. To take the example of the fishbone diagram in Figure 8-1, many of the root causes seem out of the control of the project team, as the risk is described as the loss of a particular person. If the exposure were redefined to be the loss of a particular skill set, which is probably more accurate, then the root causes would shift to ones that the project might influence through cross-training, negotiating for additional staff, or other actions.
Even when a risk seems to be uncontrollable, the venerable idea from quality analysis of "Ask why five times" may open up the perspective on the risk and reveal additional options for response. If weather, earthquakes, or other natural disasters are listed as risks to particular activities, probe deeper into the situation to ask why and how that particular problem has an impact on the project. The risk may be a consequence of a project assumption or a choice made in planning that could be changed, resulting in a better, less problematic project. Shifting the time, venue, infrastructure, or other parameters of risky activities may remove uncontrollable risks from your project, or at least diminish their impact.
Two basic options are available for risk management: dealing with causes and dealing with effects. There are, however, a number of variations on both of these themes. Dealing with causes involves risk prevention—either eliminating the risk (avoidance) or lowering its probability or potential impact (mitigation). Avoidance of risks means changing the project plan or approach to remove the root cause of the risk from your project. One way to avoid falling off a cliff is to stay away from cliffs. Mitigating actions rarely remove a risk completely, but they do serve to reduce it. Some mitigating actions reduce the probability of a risk event, such as checking the air pressure in your automobile tires before a long trip. Other mitigations reduce the risk impact, such as wearing a seat belt to minimize injury. Neither of these actions prevents the problem, but they do serve to reduce the overall risk by lowering the "loss" or the "likelihood."
Similarly, some risks are transferred to others. Many kinds of financial risks are transferred to insurance companies; you may purchase coverage that will compensate your losses in the event of a casualty that is covered by the policy. Again, this does not remove the risk, but it does reduce the financial impact should the risk occur. Transfer of risk deals with causes if the impact of the risk is primarily financial, but in most cases it is used to deal with risk effects—aiding in the recovery.
Dealing with the effect of a risk may be done either in advance (contingency planning) or after the fact (passive acceptance). Some risks are too minor or too expensive to consider preventing. For minor risks, acceptance may be appropriate; simply decide to deal with the consequences of the problem if and when it occurs. For more serious problems where avoidance, mitigation, and transfer are ineffective, impractical, or impossible, contingency planning is the best option.
For some risks, one of these ideas will be sufficient; for others, it may be necessary to use several.
As was discussed briefly in Chapter 6, each activity risk has a signal, perhaps more than one, indicating that the risk has crossed over from a possibility to a certainty. This signal, or trigger event, may be in advance of the risk or coincident with it. It may be visible to everyone involved in the project, or it may be subtle and hidden. For each risk, strive to define a trigger event that provides as much advance notification of the problem as possible. Consider this risk: "A key project team member quits." One possible trigger event might be the submission of a resignation letter. This is an obvious trigger, but it is a late one. There are earlier triggers to watch for, such as a drop in motivation, erratic attendance, frequent "personal" telephone calls, or even an uncharacteristic improvement in grooming and dress. These triggers are not foolproof, and they require more attention and effort to monitor, but they may also foreshadow other problems even if the staff member does not intend to leave.
In addition to one or more trigger events, identify the portions of the project plan where the risk is most probable, being as precise as possible. For some risks there may be a single exposure related to one specific activity; more general risks (such as loss of key staff members) may occur throughout the project.
Risk management decisions and plans are made in advance of the trigger event, and they include all actions related to avoidance, mitigation, or transfer, as well as preparation for any contingent actions.
Risk management responses that relate to recovery fall on the project timeline after the risk trigger but are only used if necessary. For each significant risk that you cannot remove from the project, assign an owner to monitor for the trigger event and to be responsible for implementing the contingency plan or otherwise working toward recovery. The risk management time line is summarized in Figure 8-3.
Figure 8-3: Risk management timeline.
After each risk is categorized and you have identified those risks for which the project team can influence some or all of the causes, you are ready to begin developing response possibilities for prevention, including avoidance, mitigation, and transfer. Analyze all the options you and your team develop, examining both the cost of the idea and its potential benefits. If good, cost-effective ideas are proposed, the best of them are candidates for inclusion in your draft project plan. Prevention ideas must earn their way into the project plan. Even excellent ideas that completely remove a risk should be bypassed if their overall cost exceeds the expected "loss times likelihood" of the risk.
The final process step is to integrate all accepted risk prevention ideas into your preliminary project plan and review the plan for new risks or unintended consequences as a result of the changes.
Generating Ideas for Each Risk
There are many ways to develop risk responses. A common method is brainstorming with the project team. This is widely used in planning where it is advantageous to start with a number of possible choices. It is also useful to discuss risks with peers and others who may have relevant experiences, and it may be worthwhile to consult experts and specialists for types of risks that you are not familiar with.
Few known risks are completely novel, so it is quite possible that many of the risks you face have been addressed on earlier projects. A quick review of project retrospective analyses, final reports, "lessons learned," and other archived materials may provide information on what others did in response to similar risk situations they encountered. In addition to finding things that did not work and that are worth avoiding, you may find useful ideas that effectively deal with the risks you need to manage.
There are also many ideas available in the public domain, in papers, books, articles, and on the Web. References on project management, particularly those that are tailored to projects similar to yours, are filled with advice, and some of it may be very valuable. Life cycles and project management methodologies also provide direction and useful ideas for risk prevention and avoidance.
A number of possible preventive actions follow in the next several pages, including tactics for risk avoidance, mitigation, and transfer. These may be useful in seeding a brainstorming exercise or in planning for a specific response. The ideas listed here include some that may be appropriate only for particular kinds of technical projects, but many are useful for any sort of work.
Strategies for Avoiding Risks
Risk avoidance is the most effective way to deal with the causes of risks, because it obliterates them. Unfortunately, avoidance is not possible for all project risks, because many risks are tightly coupled with the requirements of technical projects. Avoiding risks in your project requires you to reconsider choices and decisions you made in defining and planning your project. Most of Chapters 3, 4, and 5 concerned use of project planning processes to identify risks. While some of the risks you discovered may be unavoidable consequences of your project, a review of the current state of your plan may turn up opportunities to replan the work in ways that remove specific serious risks. Tactics for avoiding scope risks suggested by the material in Chapter 3 include:
Many of your schedule risks are consequences of decisions you made in preparing your preliminary schedule. You may be able to remove sources of schedule risk using ideas covered in Chapter 4:
Resource risks may also be a consequence of choices you made in resource planning. Explore opportunities to avoid these risks using the concepts of Chapter 5:
Avoidance tactics are not limited to these ideas by any means. Anything that you can realistically do to eliminate the root cause of a risk has potential for risk avoidance.
General Risk Mitigation Strategies
While avoidance is very effective in managing risk, it can never deal with all your significant project risks; at least some of them are intrinsic to your project. Mitigation strategies that reduce the probability or impact of a potential problem are the next choice for risk management. Some generic ideas for risk mitigation include:
One of the least expensive and strongest preventive actions a project leader can take is to communicate more—and more effectively. Risks and risk consequences that are visible always affect the way that people work. If all the team members are aware how painful the project will become following a risk, they are likely to proceed, to the best of their ability, to work in ways that minimize the risk. You should discuss risks regularly. Post risk lists on walls, on Web sites, and in other places people will see them. Include risk status information in status reporting. In addition to specific risk data, also establish unambiguous project documentation that is easily accessible by the whole team. Communication can significantly reduce risk probabilities. Communicate. Communicate. Communicate.
Another broad strategy for managing risk relates to project staffing. Difficult projects benefit from having a mix of specialists and generalists. Specialists are essential on technical projects because no one can know everything, and the specialist can generally complete assigned work in his or her specialty much faster than a generalist. However, a project team composed only of specialists is not very robust and tends to run into frequent trouble. The reason is that project planning on specialist-heavy projects is often intense and detailed for work in the specialists' areas and remarkably sketchy for other work. Also, such teams often lack broad problem-solving skills. Generalists on a project (nearly always including the project leader) are needed to fill in the gaps and ensure that as much of the project work as possible is initially made visible. Generalists are also best at finding and solving cross-disciplinary problems. As the head generalist, the project manager should always reserve at least a small percentage of his or her time for problem solving, helping out on troubled activities, and general fire-fighting. Even if the project leader has a solid grasp of all the technical project issues, it is useful to have other generalists on the team in case several things on the project go wrong at the same time. Generalists can reduce the time to solution for problems of all kinds and reduce schedule impact.
Managing project risk is always easier with friends in high places. Establish and work to sustain strong sponsorship for your project. While strong sponsorship does not ensure a risk-free project, weak (or no) upper-level sponsorship is a significant source of risk. Form a good working relationship with the project sponsor(s), and work to understand their expectations for project information. Reinforce the importance and value of the project regularly, and don't let sponsors forget about you. Plan to update your management frequently on project progress and challenges, and involve management early in problems and escalations that require more authority than the project team has. Validate project objectives with sponsors and customers, and work to set realistic expectations. Using your budget and staffing plans, get commitments for adequate funding and project talent. Strong sponsorship reduces timing impact of risks and reduces the probability of many kinds of resource risk.
Project risk also increases, particularly on lengthy projects, whenever the project team is disconnected from the ultimate customers for the deliverable. Establish and maintain contact with the end users or with people who can represent them. Seek strong user buy-in, and work with the customer to define the project scope and to validate all acceptance and testing criteria. Establish measurable criteria, and determine what will be required for the users to deem the project a success. Identify the individual or individuals who will have the final word on this and keep in contact with them. The probability of scope risk and the likelihood of late project schedule difficulties are both reduced by meaningful user involvement.
A final general strategy for lowering project risk is to set clear decision priorities for the project. Validate the priorities with both the sponsors and the end users, and ensure that the project priorities are well known to the project team. Base project decisions on the priorities, and know the impact of failing to meet each priority established for the project. This not only helps manage scope risks but also permits quick decisions within the project that minimize schedule impacts.
Mitigation Strategies for Scope and Technical Risks
Mitigation of scope risks involve shifts in approach and potential changes to the project objective. Ideas for mitigating scope risks include:
More on each of these ideas follows.
The most significant scope risks in the PERIL database are due to changes. Minimizing change risk involves the first two tactics—scope definition and change management. Scope definition, discussed in detail in Chapter 3, increases risk both when it is incomplete and when it is too inclusive.
Scope risk is high for projects with inadequate specifications. While it is true that thorough, clear definition of the deliverable is often difficult on technical projects, failure to define the results adequately leads to even greater difficulty. For any project to be ultimately successful, every specification must eventually be uncovered and met. If the project is allowed to meander toward this end, dragging along lists of options and working with unstable assumptions, changes will be frequent, expensive, and painful. Excessive change is an inevitable price of inadequate scope definition. Since project scope determines the work breakdown, inadequate scope definition results in unidentified activities, causing unpredictable impact to project timing, effort, and cost.
Closely inspect the list of features to be included to verify that all the requested requirements are in fact necessary. The Pareto principle can be applied effectively to most technical projects, defining an alternate scope that provides much, if not nearly all, of the project's value by implementing only the most essential capabilities. Such a project will be smaller, shorter, and less costly and represents lower risk. It is often possible to deliver base functionality and then extend it in a follow-on project at a lower total cost than that for a single project with more comprehensive scope. Viewing project scope as a continuum that may be expanded over time using a succession of less risky projects is a very effective tool for minimizing scope risk on technical projects. Evolutionary software development methodologies employ this principle to both manage scope risk and deliver useful functionality to users as quickly as possible.
Setting project scope more aggressively than is necessary is also a common source of scope risk. The "blot out the sun" technical project, which includes every possible feature, bell, whistle, and capability that seems possible (plus some that will be added later when they occur to the project team), is not only excessively long and expensive but also extremely difficult to manage. One common method used for the all-inclusive project is the concept of "musts" and "wants." The project team lists the absolute requirements as "musts" and also creates a list of desirable other inclusions as "wants." While this is appropriate during initial planning, maintaining the list of options throughout the project guarantees many project scope changes. As preliminary planning concludes and the time to commit to your project draws near, you need to be brutal. The list of features that it would be nice to have must shrink. Each listed "want" must either be added to project scope as a firm requirement or dropped. Any optional features carried into development will require effort, even early in the project, that would be better spent doing more essential project work. Work to freeze project scope using lists of what "is" and what "is not" in the project, instead of musts and wants. Scope definition adds all the "musts" to the "is" list and commits to their delivery. All "wants" are either accepted as requirements and added to the "is" list or are demoted to the "is not" list and are excluded from the project scope. A process that defines scope by freezing what "is" and what "is not" in the project deliverable at the conclusion of initial planning is a key tactic for mitigating this sort of change risk.
The second necessary tactic for reducing change risk is to uniformly apply an effective process for managing all changes to project scope. To manage risks on large, complex projects, the process is generally very formal, using forms, committees, and extensive written reporting. For technical projects done under contract, risk management also requires that the process be described in detail in the contract signed by the two parties. On smaller projects, even if it is less formal, there still must be uniform treatment of all proposed changes, considering both their benefits and their expected costs. For your project, adopt a process that rejects all changes that fail the cost-justification test. Also, ensure that the process provisionally rejects any change that impacts the project's overall objective, even when the change is proposed by the project's sponsor or customer. Such changes represent a new project and are acceptable only if all stakeholders agree to the shifts in the project, including any increases in cost and timing. A significant increase in scope with no adjustment in the resources or deadline nearly always represents unacceptably high risk. It is not enough to have a change management process; mitigating scope risks requires its disciplined use.
Scope risks are often hard to evaluate at the beginning of technical projects. One way to gain better insight is to schedule work during planning to examine feasibility and functionality questions as early as possible. Use prototypes, simulations, and models to evaluate concepts with users. Schedule early tests and investigations to verify whether untried technologies are likely to work. Plan for walkthroughs and scenario discussions in order to spread awareness and to identify potential problems and defects early enough to correct them. Also consider scale risks. Even if there are no problems during small-scale, limited tests, scope risks may still remain that will be visible only during full-scale production. Plan for at least some rudimentary tests of functionality in full-scale operation as early in the project as is practical. Schedule work to uncover issues and problems near the beginning of the project, and be prepared to make changes or to abandon the project based on what you learn.
While it is risky to defer difficult or unknown activities until late in the project, it may be impractical to begin with them. In order to get started, you may need to complete some simpler activities first and then move on to more complicated activities as you build expertise. In any case, however, develop your plans to schedule the more risk-prone activities as early in your project as you can.
Lack of skills on the project team also increases scope risk, so define exactly how you intend to acquire all needed expertise. If you intend to use outside consultants, plan to spend both time and effort in their selection, and ensure that the necessary funding to pay for them is in the project budget. If you need to develop new skills on the project team, identify the individuals involved and plan so that each contributor is trained, in advance, in all the needed competencies. If the project will use new tools or equipment, schedule installation and complete any needed training as early in the project as possible.
Scope problems also arise from faulty communications. If the project depends on a distributed team that speaks several languages, identify all the languages needed for project definition and planning documents, and plan for their translation and distribution. Confusion arising from project requirements that are misinterpreted or poorly translated can be expensive and very damaging, so verify that the project information has been clearly understood in discussions, using interpreters if necessary. It is also critical to provide written follow-up after meetings and telephone discussions.
Scope often depends on the quality and timely delivery of things the project receives from others. Mitigating these risks requires clear, carefully constructed specifications to minimize the possibility that the things that you get are consistent with the request but are inappropriate for the project's intended use. If you have little experience with a provider, it may be prudent to find and use a second source in addition to the first, even though this can increase the cost. A second source can lower the probability of the project getting stuck due to delivery of a failed component, a deliverable that has performance or quality problems, or dependence on "vaporware." The cost of a redundant source may be very small compared to the cost of a delayed project.
External factors also lead to scope risks. Natural disasters such as floods, earthquakes, and storms, as well as not-so-natural disasters like computer viruses, may cause loss of critical information, software, or necessary components. While there is no way to prevent the risks, provision for some redundancy, adequate frequent backups of computer systems, and reduced dependency on one particular location can minimize the impact of this sort of risk.
Finally, managing scope risk also requires tracking the initial definition with any and all changes approved during the project. You can significantly lower scope risk by adopting a process that tightly couples all accepted changes to the planning process, as well as making the consequences of scope decisions visible throughout the project.
Mitigation Strategies for Schedule Risks
Tactics for mitigating schedule risks include making additional investments in planning and revising your project approach. Some ideas to consider include:
More detail on each of these ideas follows.
The riskiest activities in the project tend to be the ones that have very significant worst-case estimates. For any activity where the most-likely estimate is a lot lower than what could plausibly occur, calculate an "expected" duration using the PERT formula. Use these estimates in project planning to provide some reserve for particularly risky work and to reduce the schedule impact.
Project risk is lower when you schedule activities related to the highest priorities for the project as early as possible, moving activities of lower priority later in the project. For each scheduled activity, review the deliverables, and specify how and when each will be used. Wherever possible, schedule the work so that there is a time buffer between when each deliverable is complete and the start of the activities that require them. If there are any activities that produce deliverables that seem to be unnecessary, either validate their requirement with project stakeholders or remove the work from the project plan.
Many schedule risks are caused by delays that may be avoided through more proactive communication. Whenever decisions are needed, plan to remind the decision makers at least a week in advance and get commitment for a swift turnaround. If specialized equipment or access to limited services will be required, put an activity in the plan to review your needs with the people involved somewhat before the scheduled work. If scarce equipment for some kinds of project work is a chronic problem, propose adding capacity to lower the risk on your project, as well as for all other parallel work. The preventive maintenance schedules for production systems are generally determined well in advance. Inquiring during project planning about availability of needed services and then synchronizing plans with the maintenance schedules can reduce conflicts and delays.
New things—technology, hardware, systems, or software— are very common sources of delay. Manage risk by seeking alternatives using older, known capabilities unless using the new technology is an absolute project requirement. A "lower-tech" alternative may in some cases be a better choice for the project anyway, or it could serve as a standby option to be used if necessary in case an emerging technology proves not to be quite ready. Identify what you would need to do or change in the project to complete your work without the newer technology; this information can provide both pressure and motivation to the project team to do what is necessary to get the new technology to work.
One cause of significant delay is developing a specific design and then sending it out to be built or created before it can be tested. It may take weeks to get the tangible result of the design back, and if it has problems the entire cycle must be repeated, doubling the duration (or worse—it may not work the second time, either). In areas such as chip design, more than one chip will be made on each wafer, anyway, and it might be useful to design a number of slightly different versions that can all be fabricated at the same time. Most of the chips will be of the primary design, but other variations created at the same time can also be tested, thus increasing the chances of having a component that can be used to continue with project work. There are other cases where slightly different versions may be created in parallel, such as printed circuit boards, mechanical assemblies, and other newly designed hardware. While this could increase the project cost, protecting the project schedule is often a much higher priority. Varying the parameters of a design and evaluating the results is also useful for quickly understanding the principles involved. The deliverables created in current and future projects will benefit from this deeper understanding.
Delays due to shipping problems are significant on many projects and in many cases can be avoided simply by ordering or shipping items earlier in the project. Just because it is generally thought to take a week to ship a piece of equipment from San Jose, California, to Bangalore, India, does not mean you should wait until a week before it is needed in India to ship it. There are only two ways to get something done sooner—work faster or start earlier. With shipping, expediting may not always be effective, so it is prudent planning to request and send things that require physical transport well ahead of the need, particularly when it involves complex paperwork and international customs regulations.
Similarly, delay may result from the need to have new equipment or new skills for the project. The time necessary to get new equipment installed and running or to master new skills may prove longer than you think. If you underestimate how long it will take, project work that depends on the new hardware or skills could have to wait. Planning proactively for these project requirements removes many risks of this sort from your project (and, as mentioned earlier, it also lowers the chances that you might lose, or never get, the required funding). Estimate these activities conservatively, and schedule installations, upgrades, and training as early in your project as practical—well before they are needed.
Large projects are intrinsically risky. If a project requires more than twenty full-time staff members, explore the possibility of partitioning it into smaller projects responsible for subsystems, modules, or components that can be developed in parallel. However, when you decompose a large program into autonomous smaller projects, be sure to clearly define all interfaces between them both in terms of specifications required and timing. While the independent projects will be easier to manage and less risky, the overall program could be prone to late integration problems without adequate systems-level planning and strong interface controls.
Long projects are also risky. Work to break projects longer than a year into phases that produce measurable outputs. A series of evolutionary projects of short duration create value sooner than a more ambitious longer project, and the shorter projects are more likely to fall within a reasonable planning horizon of six months or less. This is one of the most important aspects of evolutionary software development, such as extreme programming or other agile methodologies. Mandating delivery of intermediate results sooner is an effective way to lower both schedule and scope risk.
If a lengthy project must be undertaken as a whole, you can adopt a "rolling-wave" planning philosophy, planning the current and next phase in detail whenever you transition from one phase to the next and making adjustments to the project as you proceed to reflect what has been learned in the previous phase. Adjustments at phase transition include changes to plans for future phases, changes to the project deliverable, shifts in project staffing, and changes to other parameters of the project objective. Rolling-wave planning does mean that at the end of each phase, the project team needs to conduct a thorough project review and to be prepared to continue as planned, continue with changes, or abort the project.
Schedule risk also arises from time conflicts outside the project. Check the plan for critical project work that coincides with paid holidays, the end of financial reporting periods, times when people are likely to take vacations or otherwise be distracted, and so forth. Verify that intermediate project objectives and milestones are consistent with the personal plans of the staff members responsible for the work. On global projects, collect data for each region to minimize problems that may arise when part of the project team will be unavailable. When there are known project time conflicts with any of these nonproject factors, modify the plan to avoid them, either by accelerating the work to complete earlier or scheduling it to fall later.
Finally, commit to rigorous activity tracking throughout the project, and periodically schedule time to review your entire plan: the estimates, risks, work flow, project assumptions, and other data.
Mitigation Strategies for Resource Risks
As with schedule risks, there are many tactics for resource risk mitigation. Some ideas for minimizing resource risk include:
More on each of these tactics follows.
One of the most common avoidable resource risks on technical projects is required overtime. Starting a project with full knowledge that the deadline is not possible unless the team works overtime for much of the project's duration is a prescription for failure. Whenever the plan shows requirements for effort in excess of what is realistically available, rework the plan to eliminate the condition. Even on well-planned projects, there are always plenty of opportunities for people to stay late, work weekends and holidays, lose sleep, and otherwise devote time to the project from their side of the "work/life" balance. Technical projects get done because people want them to be successful and are willing to put in the extra effort that it requires. Projects that are planned to require overtime (or, even worse, are not planned at all and result in massive overtime) are in trouble from the beginning for two reasons: productivity is low and turnover is high. Projects that involve significant overtime are rarely motivating, especially in the long run. People strive to avoid these projects, not to participate in them. The people who are stuck on them do not work at full efficiency. Tom DeMarco and others have written about the phenomenon of the "mental undertime" that accompanies required overtime, so productivity suffers. In addition, at least some of the project team will be looking for somewhere else to work. If they are successful, the project will lose team members; if not, they are still expending time off the project, and the available effort for the project is further depleted.
Realistically, managing resource risk on technical projects nearly always requires at least some unplanned overtime. Unanticipated activities (no WBS can ever be absolutely comprehensive), coverage for staff members who have emergencies, execution of contingency plans, and recovery from all the risks either not identified or below the "cut line" for management—all of these may require overtime. What you want to avoid are project plans that contain significant planned overtime. If you use up all the reasonably available overtime to do scheduled work, any difficulties that arise may leave the project little choice but to crash and burn.
Resource risk is lower on projects whenever motivation is high. Motivation is a key factor in how people respond to overtime, and low motivation is frequently a root cause of many resource-related risks. Technical projects are nearly always difficult. When they are successful, it is not because they are simple; it is because people working on them want to be successful—they care about the project. Project leaders who are good at building teamwork and getting people working on the project to trust and care about each other are much more successful than project leaders who work at a distance, using just electronic mail and printed Gantt charts. Successful projects require effective teamwork, so plan to get the staff members together physically for at least a short start-up workshop. Particularly if they will spend most of the project separated as a "virtual" or a global team, having them meet face-to-face is a tried-and-true tactic for beginning a project with a cohesive, motivated team. Another method for connecting and motivating people uses mentoring to establish additional capabilities on the team where critical skills are scarce. The development of junior people is prudent risk management, and when project contributors want to develop new skills, being mentored can be very motivating for them. The senior people who serve as mentors may resist, but two benefits to them are a reduction in their work (and stress) when it can be assigned to others and the explicit recognition of their past accomplishments and experience. (Flattery will often get you to places that straightforward requests cannot.)
Teamwork across cross-functional project boundaries is also important. The more involvement in project planning, start-up or launch activities, and other meaningful work with others you plan early in the project, the more team cohesion there will be. People who know and trust one another will back one another up and help to solve one another's problems. People who do not know one another well tend to mistrust one another and create conflict, arguments, and unnecessary project problems. Working together to plan and initiate project work transforms it from the "project leader's project" to "our project."
Financial risk is also significant for many projects. For activities in the project that have a worst-case cost that is significantly higher than the most likely cost, use the PERT formula to estimate an "expected cost," and use this estimate to reflect the potential financial exposure. Use "expected costs" in determining the proposed project budget.
As with schedule risk, adequate sponsorship is essential to resource risk management. Get early commitment from the project's sponsor for staffing and for funding, on the basis of planning data (a detailed discussion of negotiating for this follows in Chapter 10). The priority of the project is also under the control of the project sponsor, so work to understand the relative priority of the project in his or her mind. Strive to obtain the highest priority that is realistic for your project (and document this in writing). If the project has more than one sponsor, determine who has the highest influence on the project. In particular, it is good to know who would be able to make a decision to cancel your project so that you can take good care of that person and keep him or her aware of your progress. It is also useful to know who in the organization above you would suffer the most serious consequences if your project does not go well, because these managers have a personal stake in your project, and they will likely be useful when risk recovery requires escalation.
Too little involvement of customers and end users in definition, design, and testing is also a potential resource risk, so obtain commitments early on for all activities that require it. Also, plan to provide reminders to them in advance of the project work that needs their participation.
Risks resulting from staffing gaps can be reduced or detected earlier through more effective communication. Assess the likelihood that project staff (including yourself) might join the project late because of ongoing responsibilities in prior projects that are delayed. Get credible status reports from these projects, and determine how likely it is that the people working on them will be available to work on your project. If the earlier projects are ending with a lot of stress and overtime, reflect the need for some recovery time and less aggressive estimates in your project plans for the affected team members. Also, plan to notify any contributors with part-time responsibilities on your project in advance of their scheduled work.
Loss of project staff due to safety problems is not common on technical projects, but a review of activities looking for known dangerous work is still a good idea. Modify plans for any activities that you suspect may have health or safety risks to minimize the exposure. You may be able to make changes to the environment, time, and place for the work or to the practices used that may mitigate the risk. Also consider the experience and skills of the staff that may be exposed to risks, and work to replace any contributors with too little relevant background.
For each activity where the people who will do the work are a potential risk source, involve them in developing the response. In addition to potentially helping you to find more, and better, ideas for prevention, this will tend to sensitize them to the impact of the problem and can greatly reduce the likelihood of the risk.
For new, challenging, or otherwise risky activities, strive to find experienced contributors who have a reputation for effective problem solving. While you cannot plan and schedule innovation, you can identify people who seem to be good at it.
Outsourcing is a large and growing source of resource risk on projects. The discussion in Chapter 5 includes a number of exposures, and mitigating these risks requires discipline and effort. For each contract with a service provider that your project depends upon, designate a liaison on the project team to manage the relationship. Do this also for other project teams in your own organization that you need to work with. If you plan to be the liaison, ensure that there is sufficient time allocated for you to do this in the resource plan (in addition to meeting all your other responsibilities). Involve the owner of each relationship in selection, negotiation, and finalization of the agreement. Ensure that the agreement is sufficiently formal (a contract with an external supplier, a "memo of understanding" or similar document for an internal supplier) and that it is specific as to both time and technical requirements for the work consistent with your project plan. Provide incentives and penalties in the agreement when appropriate, and, whenever possible, schedule the work to complete earlier than your absolute need.
With any project work performed outside the view of the project team, schedule reviews of early drafts of required documents. Also, participate in inspections and interim tests, and examine prototypes. Identify and take full advantage of any early opportunities to verify tangible evidence of progress. Plan to collect status information regularly, and work to establish a relationship that will make it more likely that you will get credible status, including bad news, throughout your project.
A significant risk situation on fee-for-service projects is a lack of involvement of the technical staff during the proposal and selling phases. When a project is scoped and a contract commitment is made before the project team has any awareness of the project, resource risks (not to mention schedule and scope risks) can be enormous. This "price to win the business" technique is far too common in selling fee-for-solution projects, and it often leads to seemingly large and attractive fixed-price revenue contracts that are later discovered to involve even larger and extremely unattractive costs. Some projects sold this way may even be impossible to deliver at all. Prevention of this risk would be reasonably easy using time-travel technology, by turning back the clock and involving the project team in setting the terms and conditions for any agreement. Since that is impossible, and since this risk may already be a certainty when the project team gets into project and risk planning, the only recourse is to mitigate the situation insofar as possible.
Minimizing the risks associated with committed projects based on little or no analysis requires the project team to initiate the processes of basic project and risk planning as quickly as it can, doing bottom-up planning based on the committed scope. Using best-effort planning information, uncover any expectations for timing and cost that must be shifted into line with reality. Timing expectations are visible to all, so any shifts there must be dealt with internally, as well as with the customer, which could require contract modifications. Resource and cost problems may be hidden from the customer, but they still require internal adjustment and commitment to a realistic budget for the project, even if it significantly exceeds the amount that can be recovered under the contract. If this is all done quickly enough, before everyone has mentally settled into expectations based on the "price to win" contract, it may even be possible to adjust the fees in the contract. While it may be tempting to adopt a "safe so far" attitude and hope for the miracle that would allow project delivery consistent with the flawed contract, delay nearly always makes things worse. The last, best chance to set realistic expectations for such a project is within a few days of its start. After this, the situation becomes progressively uglier and more expensive to resolve.
It is also important to document and make these "price to win" situations visible, in order to minimize the chances of future recurrence. Organizations that chronically pursue business like this rarely last long.
Finally, establish resource metrics for the project, and track them against realistic planning data. Track progress, effort, and funding throughout the project, and plan to act quickly when the information shows that the trends show adverse variances against the plan.
Risk Transference
Risk transference is most effective when you are dealing with risks whose impact is primarily financial. The best-known form of transfer is insurance: for a fee, someone else bears the financial consequences of a risk. Transfer works to benefit both parties, because the purchaser of the insurance avoids the risk of a potentially catastrophic monetary loss in exchange for paying a small (in comparison) premium, and the seller of the insurance benefits by aggregating the fees collected to manage the risk in a large population of insurance buyers, who may be expected to have a stable and predictable "average" risk. In technical projects, this sort of transfer is not extremely common, but it is used. Unlike other strategies for mitigation, transfer does not actually do anything to lower the probability or diminish the nonfinancial impact of the risk. With transfer, the risk is accepted, and it either happens or it does not. However, any budgetary impact will fall outside the project, limiting the resource risk impact.
Transfer of scope and technical risk is often the justification for outsourcing, and this sometimes works very well. If the project team lacks a needed skill, hiring an expert or consultant to do the work transfers the activities to people who may be in a better position to get it done. Unfortunately, the risk does not actually transfer to the third party; the project still belongs to the team, so it still bears the risk of nonperformance. Should things not go well, the fact that the bill for services will not need to be paid will be of small consolation. Even the possibility of legal action is unlikely to help the project. This sort of transfer as a risk prevention strategy is very much a judgment call. In some cases, the risks accepted may significantly exceed the risks managed, no matter how well you write the contract.
Avoidance, mitigation, and transfer nearly always have costs, sometimes very significant ones. Before you adopt any ideas to avoid or reduce risks, some analysis is necessary. For each risk to be managed, estimate the expected consequences in quantitative terms. For each proposed option to deal with the risk, assess the marginal costs and timing impact involved. After comparing this data, cost-effective preventive actions dealing with risk causes can be integrated into the project plan.
Comparing Costs and Benefits
The first step in this analysis is to determine the expected cost of the risk, the "loss times likelihood." For this, you need the probability in numerical terms, as well as estimates of the risk impact in terms of financial, schedule, and possibly other factors.
For a risk that is assessed as "moderate" probability, the historical records may provide an estimated probability of 15 percent, about one chance in six. The impact of risks is also difficult to estimate in many cases, as was discussed in Chapter 7, but some assessment is required. Whether the impact is in money, time, or both, it is weighted using the probability to derive an expected amount. For a risk that represents three weeks of schedule slip and $2 million in cost, the expected risk impact will be about one-half week (which is probably not that significant) and $300,000 (which would be, for most projects, very significant). In each case, this is 15 percent of the total impact, shown graphically in Figure 8-4.
Figure 8-4: Expected impact.
The consequences of each idea for avoiding or mitigating the risk in time and money may be compared with the expected impact estimates to see whether they are cost-justified. If an idea only mitigates the risk—lowering the impact or probability of the problem—then the comparison is between the cost for mitigation and difference between the "before and after" estimates for the risk.
Determining whether a preventive is justified is always a judgment call, and it may be a difficult one. It is made more so because the data are often not very precise or dependable and the fact that it is human nature to prevent problems if possible. Just because you can prevent a risk, though, does not mean that you should. Seeking a risk-free project is illogical for two reasons. First, it isn't possible. All projects have some residual risk, no matter how much you do to avoid it. Second, a project with every possible risk prevention idea built into the plan will be far too expensive and time-consuming to ever get off the ground. For each potential idea that reduces project risk, compare the expected costs of the risk with the cost of prevention before building it into the project plan. In the case given, with the expected half-week of delay and $300,000 in expense, an idea that requires a week of effort and costs $1.5 million would most likely not be adopted, as this "cure" is nearly as bad as the relatively unlikely risk. This situation would be similar to paying more for insurance than the cost of the expected loss. A preventive that costs less and requires little effort, though, may very well represent a prudent plan modification. Even if some of the ideas you generate for risk prevention are not cost-justified, the same (or similar) approaches may still have application as contingency plans.
You will usually generate a number of cost-effective ideas, so the next step is to select ideas for implementation that can lower project risk impact or probability at justifiable cost, and integrate them into the preliminary project plan.
Updating the Plan
For each cost-justified risk avoidance, mitigation, or transfer idea, shifts in the project planning documents are necessary. Most ideas require additional or different work, so there are changes to the project WBS or revisions to effort and duration estimates for existing activities. Any added work requires staffing, so the profiles in the resource plan also require updating. If the resulting plan has problems meeting existing project constraints, additional replanning is required, which may create new risks. Before adoption, each idea for risk prevention must earn its way into the project by lowering, not increasing, project risk. Before any modifications, review the plan for unintended consequences, and document the justification for all additional project work.
Avoidance, mitigation, and transfer, when justified and added to the project, all serve to make a project less risky, but risks inevitably remain. You may have no influence on the root causes of some risks or may find no preventive action for them that is cost-effective. You may have mitigation strategies that help with other risks but still leave substantial residual risk. For most of the significant risks that remain, you should develop contingency plans, although for some cases you may decide to passively accept the risk.
Contingency planning deals with risk effects by generating plans for recovery or "fallback." The process for contingency planning is entirely the same as for any other project planning, and it should be conducted at the same level of detail and using the same methodologies and tools as other project planning.
For each risk managed with a contingency plan, you begin with the trigger event that signals the occurrence of the risk. The most effective risk trigger precedes the risk consequences by as much as possible. Early triggers increase the number of potential recovery options, and in some cases they may permit you to reduce the impact of the risk, so verify that the trigger you have is the best available.
Each risk also must be assigned to an owner, who will develop the initial contingency plan, monitor the project for the trigger event, and be responsible for maintaining the contingency plans. He or she must be particularly vigilant whenever the risk trigger is less obvious. If the risk should occur, the risk owner is responsible for beginning to execute the contingency plan, working toward project recovery. The owner of a project risk is most often the same person who owns the project activity related to the risk, but for risks with particularly severe, project-threatening consequences, the project leader may be a better choice.
General Contingency Planning Strategies
Contingency planning for risks often starts with leftover ideas. Some ideas may have been considered for schedule compression (discussed in Chapter 6) but were not needed. Others might be risk prevention strategies that were not adopted in the preliminary baseline plan for cost or other reasons. While some of these ideas may be simply adopted as contingency plans without modification, in other cases they may need to be modified for "after the fact" use. Prevention strategies such as using an alternate source for components or schedule compression strategies such as expediting late project printing activities can be documented as contingency plans with no modification. Some risk avoidance ideas can serve as contingencies after minor changes. Dropping back to an older technology, for example, might require additional work to back out any dependencies on the failed newer technology, and other project change is likely to be required.
Contingency planning in itself is a powerful risk prevention tool, as the process of planning for recovery shows clearly how difficult and time-consuming it will be to recover from problems. This provides additional incentive for the project team to work in ways that make risks less likely to occur. You should strive to make risks and risk planning as visible as possible in project communication. Your project team can work to avoid only the potential problems that they are aware of.
Contingency Planning Strategies for Schedule Risks
Whenever a risk results in a significant delay, the contingency plan must seek an alternate version of the work flow that provides either a way to expedite work so that you can resume the project plan at some later point or an alternate way to complete the project that minimizes impact to the project deadline.
Recovery involves the same concepts and ideas used for schedule compression, discussed in Chapter 6. The baseline plan will require revision to make effort available for recovery immediately following the risk, so other work must be shifted, changed, or eliminated. You may be able to delay the start of less crucial planned activities, postponing them to later in the project. Any noncritical activity work that is simultaneous to or scheduled to follow the risk event may be interrupted or postponed to allow more focus on recovery. Some activity dependencies may be revised so that project activities are done out of the planned sequence, freeing contributors to work on the problem. In all of these cases, necessary activities shift later in the schedule, increasing the impact of future risks and creating new failure modes and exposures as more and more project work becomes schedule-critical.
It may even be possible to eliminate planned work if it is nonessential or to devise quicker approaches to project activities that could obtain similar, but possibly less satisfactory, results. Eliminating work and adopting "shortcuts" are generally best done as part of the main baseline plan, but for some projects it may be possible to defer these decisions until later in the work, using them on an as-needed basis.
"Crashing" project activities scheduled for later in the project to decrease their duration may also permit later starts that can free up project effort for recovery, if the project has sufficient budget reserve or access to the additional staffing to make this possible. Adding staff to the project to work on recovery may also be an option, but you should get specific commitment for any resources required. Also, include all training and project familiarization required as part of your baseline plan to minimize the disruption inevitable with new staff. Without adequate preparation, this tactic might delay your project even more.
It may not be possible to replan the project to protect the deadline, especially when the risk involves work near the scheduled end of the project. In this case, the goal of contingency planning is to minimize the unavoidable slippage and to provide the data necessary to document a new, later completion date.
A generic schedule contingency strategy involves establishing some schedule reserve for the project. Establishing schedule reserve is explored in more detail in Chapter 10.
For risks that create significant resource increases, contingency planning involves revising the resource plans to protect the project budget, or at least to limit the damage. Again, the process for this parallels the discussion for dealing with resource constraints in Chapter 6.
The most common strategy is also one of the least attractive—working overtime and on weekends and holidays. This tried-and-true recovery method works adequately on most projects, provided the resource impact is minimal and project staffing is not already expected to work significantly beyond the normal workday and workweek. If the amount of additional effort required is very high, or if the project team is stretched too thin when the risk occurs, this contingency strategy may backfire and actually make things worse by lowering motivation and leading to higher staff turnover.
For some projects, there may be contributors who are assigned to the project but are underused during part of it. If this is the case, shifting work around in the schedule may allow them to assist with risk recovery and still effectively meet other commitments. This tactic, like dealing with schedule risks using float, tends to increase overall project risk later in the project.
Eliminating later work or substituting approaches other than those planned may also reduce the resources needed for work later in the project, but this is generally more appropriately handled as part of the initial plan. If the work is not essential, or if there is a quicker way to obtain an acceptable result, these choices ought to be reflected in the baseline plan, not viewed as potential jetsam to fling overboard if necessary.
Particularly for resource risks, it may be impossible to avoid damage to the overall resource plan and budget. All adverse variances increase the total project cost, so there may be few or no easy ways left to cut back other expenses to compensate. Minimizing the impact of risk recovery involves contingency planning that revises resource use in ways that protect the budget as much as possible. Tactics such as assigning additional staff to later critical path activities or "borrowing" people from other, lower-priority projects may have very little budget impact. Expediting external activities using incentive payments and outsourcing work planned for the project team may also be possible, but seek approval in advance for the additional cost as part of your contingency planning. If a contingency plan requires any training or other preliminary work to be effective, make these activities part of your baseline project plan.
A generic resource contingency strategy involves establishing a budget reserve for the project, similar to the schedule reserve discussed earlier. Budget reserve is discussed further in Chapter 10.
Contingency Planning Strategies for Scope Risks
Contingency planning for scope risks is not too complicated. The plans involve either protecting the specifications for the deliverable or reducing the scope requirements. Attempting to preserve the requirements is done by adding more work to the schedule (using tactics summarized previously), using more resource, or both. In most cases, it is very difficult to assess in advance the magnitude of change that this may require, as the level of difficulty in fulfilling requirements for technical projects is highly variable—from relatively trivial in some cases to impossible in others. Contingency plans for scope risks usually provide for some level of recovery effort, followed by a review to determine whether to continue, modify the scope, or abandon the project.
For many technical projects, scope risks are managed by modifying the project objective to provide most of the value of the project deliverable in a way that is consistent with schedule and resource objectives. The process for this, similar to that discussed in Chapter 6, starts with a prioritized list of specifications. It may be possible to drop some of the requirements entirely or to defer them to a later phase or project. There may also be potential for relaxing some of the requirements, making them easier to achieve. While this can be done effectively for some projects in advance, contingency planning for scope risks generally includes a review of project accomplishments and any shifts in assumptions, so your decisions on what to drop will be based on current data.
Passive Acceptance
For some risks, it may not be possible, or worthwhile, to plan specifically for recovery. Acceptance, as a general risk management technique, includes both transfer and contingency planning, because in both of these situations the risk causes are not influenced and the risk either happens or does not. For transfer and for contingency planning, specific responses are provided in advance to assist in recovery. For some risks, though, neither of these options may be practical. When the consequences of a risk are sufficiently unclear, as may be the case for scope and some other risks in technical projects, planning for recovery in advance may be impossible. An example of this might be a stated requirement to use new technology or hardware for the project. In such a case, many potential problems, ranging from the trivial to a complete disaster, are possible.
When a specific risk response is not an option, there are a number of choices. If the risk is sufficiently serious, it may be the best course to abandon the project altogether as too risky or to consider a major change in the objective. For situations that are less damaging, you may choose to proceed with the project having no specific risk response, passively accepting the risk (and hoping for the best). If you adopt this alternative, it is prudent to document the risk as thoroughly as possible and to provide for some project-level schedule and budget reserves to be used in managing the passively accepted known risks, as well as your unknown project risks.
Document All Risk Plans
For risks with multiple potential consequences or particularly severe effects, you may want to generate more than one contingency plan. Before finalizing a contingency plan (or plans), review for overall cost and probable effectiveness. If you do develop more than one response for a risk, prioritize the plans, putting first the plan you think will be most effective.
Document all contingency plans, and include the same level of detail as in the project plans: WBS, estimates, dependencies, schedule, resources required, the expected project impact, and any relevant assumptions. For each risk response plan, clearly specify the trigger event to detect that the risk has happened. Also, include the name of the owner who will monitor the risk trigger, maintain the contingency plan, and be responsible for its execution if the risk occurs.
As part of the overall project documentation, document your risk response plan, and work to make the risks visible. One method for increasing risk awareness is to post a "top ten" risk list (revised periodically) either on the project Web site or with posters on the walls of project work areas. Ensure adequate distribution and storage of all risk plans, and plan to review risk management information at least quarterly.
Some projects define and maintain a risk register as part of their risk response plan. For each managed risk, the register includes:
Add risk plans to the other project documentation, and choose an appropriate location for storage that is available to all project contributors and stakeholders.
Some years ago, a large multinational company initiated a yearlong effort to establish a new European headquarters. Growth over the years had spread people, computers, and other hardware all over Geneva, Switzerland, and the inconvenience and expense for all of this had grown unacceptable. The goal was to consolidate all the people and infrastructure into a modern, new headquarters building. This effort involved a number of high-profile, risky projects, and I was asked to manage one of them.
One particularly risky aspect of the project involved moving two large, water-cooled mainframe computers out of the older data center where the systems had operated for some years and into a more modern center in the new headquarters building. In the new location, the systems would be co-located with all the other headquarters computers and the telecommunications equipment that tied them to other sites in Europe and around the world. Both systems were critical to the business, so each was scheduled to be moved over a three-day holiday weekend. It was essential that each system be fully functional in the old data center at the end of the week before the move and fully functional in the new data center before the start of business following the holiday, three days later.
Most of the risks were fairly mundane, and they were managed through thorough planning, adequate staffing, and extensive training, all committed months in advance. Other precautions, such as additional data backups, were also taken. The move itself was far from mundane, though, because the old data center, for some reason, had been established on the fifth floor of a fairly old building. The elevator in the building was very small, about one meter square, and could carry no more than the weight of three or four people (who had to be on very friendly terms). When the systems were originally moved into the building, a system-size door had been cut into the marble faade of the building, and a crane with a suspended box was used to move the systems into the data center. Over the years, upgrades and replacements had been moved in and out the same way.
Up to the time of this project, only older hardware being replaced had ever been moved out of the data center this way. In these cases, if there had been a mishap it would have not affected operations, since the older systems were moved out only once the replacement systems were successfully moved in. For the relocation project, this was not the case. Both systems had to be moved out, transported, and reinstalled successfully, and any problem that started twenty meters in the air would result in a significant and expensive service interruption far longer than the allocated three days.
The new data center was, sensibly, at ground level; eliminating the need to suspend multimillion-dollar mainframes high in the air was one of the reasons the project was undertaken. Successful completion of the project would mean ground-level systems in the new data center and far easier maintenance for all future operations.
In addition to the obvious risk of a CPU plummeting to the ground, the short timing of the project also involved other exposures such as weather, wind, traffic, injuries to workers, problems with the crane, and many other potential difficulties. The assessment of risk for most of these situations resulted either in adjustments in staffing, shifts in the plan, or passive acceptance, because there was sufficient experience and people were confident that most of the potential problems could be managed during the move.
The one remaining risk that concerned all of us was that one of the mainframe computers might smash into the sidewalk. The consequences of this could not be managed during the threeday weekend, so a lot of analysis went into exploring ways to manage this risk.
Risk assessment was the subject of significant debate, particularly with regard to probability. Some thought it "low," saying, "This is Switzerland; we move skiers up the mountains this way all the time." Others, particularly people from the United States, were less optimistic. In the end, the consensus was "moderate." There was less debate on risk impact, which in this case had a very literal meaning. In addition to issues of cost and delay, there were significant other concerns such as safety, the large crater in the pavement, noise, and computer parts bouncing for blocks around.
The primary impact was in time and cost and was deemed "high," so considerable planning went into mitigating the risk. A number of ideas were explored, including disassembly of the system for movement in pieces using the elevator, building a lift along the side of the building (the two systems were to be moved a month apart, so this cost would have covered both), using padding or some sort of cushion for the ground, and a number of other even less practical ideas. The disassembly idea was considered seriously but was deemed inappropriate due to timing and the discouraging report from the vendor that "those systems do not always work right initially when we assemble them in the factory." The external lift idea was a good one, but hardware that could reach to the fifth floor was unavailable. A large net or cushion would have minimized the spread of debris but seemed unlikely to ensure system operation. It was not until the problem was reframed that the best idea emerged. The risk was not really the loss of that particular system; it was the loss of a usable system.
A plan to purchase a new system and install it in advance in the new data center would make the swift and successful move of the existing hardware unnecessary. Once operations were transferred to the new hardware, the old system could be lowered to the street, and successful, if sold as used equipment. This was a very effective plan for avoiding the risk, but it had one problem—cost. The difference between the salvage value of the current machine and the purchase price of a new one was roughly $2 million. This investment was far higher than the expected consequences of the risk, so it was rejected as part of the plan. We decided to take as many precautions as possible and accept the risk.
All this investigation made the contingency planning easy, as the research we had done into acquiring a new system was really all that was necessary. We ordered a new system and got a commitment from the vendor to fill the order with the next machine built if there were any problems moving the existing system. (The vendor was happy to agree to this, as it was heavily involved in many aspects of the relocation.) Once the move had been competed successfully, the order could be canceled with no penalty.
The consequences documented for the contingency plan were that the system would be unavailable for about three weeks, and the cost of the replacement system would be roughly $3 million.
As it happened, the same staff and basic plan was employed for both mainframe moves, and both went without any incident. Although the contingency plan was not used, everyone felt that the risk planning had been a good investment. The process revealed clearly what we were facing, and it heightened our awareness of the overall risk. It uncovered many related smaller problems that were eliminated, which saved time and made the time-critical work required much easier. It also made all of us confident that the projects had been very carefully and thoroughly planned and that we would be successful. Even when risk management cannot eliminate all the risks, it is worthwhile to the project.
Key Ideas for Managing Activity Risks
Risk management represented one of the largest investments for the Panama Canal project. Of the risks mentioned in Chapter 7, most were dealt with in effective and, in several cases, innovative ways.
The risk of disease, so devastating on the earlier project, was managed through diligence, science, and sanitation. The scale and cost of this effort was significant, but so were the results. Wide-spread use of methods for mosquito control under the guidance of Dr. William Gorgas was effective on a scale never seen before. Specific tactics used, such as frequently applying thin films of oil on bodies of water and the disciplined dumping of standing water wherever it gathered (which in a rain forest was nearly everywhere), were so effective that their use worldwide in the tropics continues to this day. Once the program for insect control was in full effect, Panama was by far the healthiest place anywhere in the tropics. Yellow fever was eliminated. Malaria was rare, as were tuberculosis, dysentery, pneumonia, and a wide range of other diseases common at the time. Not only were the diseases spread by mosquitoes virtually eliminated, but also work went much faster without the annoyance of the omnipresent insects. Although some estimates put the cost at ten dollars for every mosquito killed, the success of the canal project depended heavily on Dr. Gorgas to ensure that the workers stayed healthy. This risk was managed thoroughly and well.
For the risk of frequent and sudden mud slides, there were no elegant solutions. As the work commenced, it seemed to many that "the more we dug, the more remained to be dug." Unfortunately, this was true; it proved impossible to use the original French plan for the trench in the Culebra Cut to have sides at fortyfive degrees (a 1:1 slope). This angle created several problems, the largest of which was the frequent mud slides. In addition, the sides of the cut pressed down on the semisolid clay the excavators were attempting to remove, which squeezed it up in the center of the trench. The deeper the digging, the more the sides would sink and the center would rise; like a fluid, it would seek its level. The contingency plan was inelegant but ultimately effective—more digging. The completed canal had an average 4:1 slope, which minimized the mud slides and partially stabilized the flowing clay. This bruteforce contingency plan not only resulted in the need to dispose of much more soil but also represented about triple the work. Erosion, flowing clay, and occasional mud slides continue to this day, and the canal requires frequent dredging to remain operational.
Dealing with the risks involved with building the enormous locks required a number of tactics. As with the mud slides, the massive concrete sides for the locks were handled by brute force and overengineering. Cement was poured at Panama on a scale never done before. The sides of the locks are so thick and so heavily reinforced that, even after nearly ninety years of continuous operation, with thousands of ship passages and countless earthquakes, there are very few cracks or defects. The locks still look much as they did when they were new.
The mechanical and electrical challenges were quite another matter. The locks were colossal machines with thousands of moving parts, many huge. Years of advance planning and experimentation led to ultimate success. The canal was a triumph of precision engineering and use of new steels. Vanadium alloy steels used were developed initially for automotive use, and they proved light and strong enough to serve in the construction of doors for the locks. Holding the doors tightly closed against the weight of the water in a filled lock required a lot of mass, mass that the engineers wanted to avoid moving each time the doors were opened or closed. To achieve this, the doors were made hollow. Whenever they are closed, they are filled with water before the lock is filled, providing the necessary mass. The doors are then drained before they are opened to allow the ships raised (or lowered) to pass through.
Even with this strategy, moving doors of this size and weight required the power of modern engines. The choice of electrical operation proved very difficult and required much innovation (the first all-electric factory in the United States was barely a year old at the time of this decision), but electricity did provide a number of advantages. With electric controls, the entire canal system can be controlled centrally. Scale models were built to show the positions of each lock in detail. The lock systems are all controlled using valves and switches on the model, and mechanical interlocks beneath the model prevent errors in operation, such as opening the doors on the wrong end of a lock or opening them before the filling or draining of water is complete. Complete status can be monitored for all twelve locks.
When George Goethals began to set all of this up, he realized that neither he nor anyone else had ever done anything like it. For most of the controls and the more than 1,000 electric motors the canal required, Goethals managed risk by bringing in outside help. He awarded a sizable contract to a rapidly growing U.S. company known for its expertise in electrical systems. Although it was still fairly small and not known internationally, the General Electric Company had started to attract worldwide attention by the time the Panama Canal opened. This was a huge contract for GE, and it was its first large government contract. Such a large-scale collaboration of private and public organizations was unknown prior to this project. The relationship between Goethals and GE served as the model for the Manhattan Project during World War II and for countless other modern projects in the United States and elsewhere. For good or ill, the modern military-industrial complex began in Panama.
Despite the project's success in dealing with most risks, explosives remained a significant problem throughout construction. As in many contemporary projects, loss of life and limbs while handling explosives was common. Although stringent safety precautions helped, the single largest cause of death on the second Panama Canal project was TNT, not disease. For this risk, the builders found no solutions or viable alternatives, so throughout the project they were quite literally "playing with dynamite."
Introduction