6.8 Best practice 8: Train personnel and practice disaster-recovery


6.8 Best practice #8: Train personnel and practice disaster-recovery

In larger, established companies, this practice is usually a given (but not always). However, in small companies, it needs to be the rule as well.

Training should start with the disaster-recovery plan and the execution of that plan. Personnel should be trained in all aspects of the system.

Understanding how the backup software works is not enough. Train operations personnel on Windows, Exchange, and the hardware platform on which they run. If your Exchange servers are deployed in a SAN or a clustered environment, this added complexity must be well understood by those responsible for system recovery. Ensure that personnel know the intricacies of the Exchange database engine, transaction-based storage, and how recovery is performed. Engage Microsoft PSS and other knowledgeable support resources in the process of planning and training your staff. Ensure that those responsible are aware of support resources and how to utilize them and how to escalate issues when things go wrong. When a problem does occur, make sure that each player in the recovery scenario understands his or her role.

The best training ground for personnel is a disaster-recovery fire drill— not just one, however, but lots of them. An Exchange system manager who drops a bomb on his or her recovery staff when there is no real emergency will be much better prepared when the situation is real. Finding out that your recovery procedures don’t really work during an actual emergency is the worst time to get this news. Hopefully, the procedures have seen hours of QA and validation before this point. However, as an added measure, test your plan and procedures thoroughly. Murphy’s Law says that it is a very real possibility that, when you most need something to work, it won’t. Not only can your procedures be flawed (and therefore must be tested), but also, despite your best efforts and failsafe measures, your backup could be useless. Learning to handle an exception like this is much better addressed during a drill than a real-world crisis. Don’t be afraid to test your backups periodically by restoring them onto a spare server or deployed recovery server. The more practice your operations staff has recovering Exchange data, the more likely it is that they will respond in an accurate and timely manner during a live outage. While validation of your procedures should work out most of these bugs, Exchange fire drills are an invaluable practice in which to get involved. Your organization should not neglect this important point and should implement a solid program to train all system managers, operators, and administrators on disaster-recovery plans and procedures. This training program should include escalation procedures, recovery scenarios, and periodic disaster-recovery drills that simulate all scenarios that your Exchange administrators and operations staff are likely to encounter. Finally, don’t forget about the hit-by-a-bus scenario. Ensure that your training plan includes the cross-training of staff to ensure that vacations and absences don’t result in an inability to meet your SLAs due to a lack of trained personnel.




Mission-Critical Microsoft Exchange 2003. Designing and Building Reliable Exchange Servers
Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)
ISBN: 155558294X
EAN: 2147483647
Year: 2003
Pages: 91
Authors: Jerry Cochran

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net