Recovering from a Disk Failure

 <  Day Day Up  >  

Organizations create disaster-recovery plans and procedures to protect against a variety of system failures, but disk failures tend to be the most common in networking environments. The technology used to create processor chips and memory chips has improved drastically over the past couple decades, minimizing the failure of systemboards. And while the quality of hard drives has also drastically improved over the years , because hard drives are constantly spinning, they have the most moving parts in a computer system and tend to be the items of most failure.

Key to a disk fault-tolerant solution is creating hardware fault tolerance on key server drives that can be recovered in case of failure. Information is stored on system, boot, and data volumes that have varying levels of recovery needs.

Hardware-Based RAID Array Failure

Common uses of hardware-based disk arrays for Windows servers include RAID 1 (mirroring) for the operating system and RAID 5 (striped sets with parity) for separate data volumes. Some deployments use a single RAID 5 array for the OS, and data volumes for RAID 0/1 (mirrored striped sets) have been used in more recent deployments.

RAID controllers provide a firmware-based array-management interface, which can be accessed during system startup. This interface enables administrators to configure RAID controller options and manage disk arrays. This interface should be used to repair or reconfigure disk arrays if a problem or disk failure occurs.

Many controllers offer Windows-based applications that can be used to manage and create arrays. Of course, this requires the operating system to be started to access the Windows-based RAID controller application. Follow the manufacturer's procedures on replacing a failed disk within hardware-based RAID arrays.

NOTE

Many RAID controllers allow an array to be configured with a hot spare disk . This disk automatically joins the array when a single disk failure occurs. If several arrays are created on a single RAID controller card, hot spare disks can be defined as global and can be used to replace a failed disk on any array. As a best practice, hot spare disks should be defined for arrays.


System Volume

If a system disk failure is encountered , the system can be left in a completely failed state. To prevent this problem from occurring, the administrator should always try to create the system disk on a fault-tolerant disk array such as RAID 1 or RAID 5. If the system disk was mirrored (RAID 1) in a hardware-based array, the operating system will operate and boot normally because the disk and partition referenced in the boot.ini file will remain the same and will be accessible. If the RAID 1 array was created within the operating system using Disk Manager or diskpart.exe , the mirrored disk can be accessed upon bootup by choosing the second option in the boot.ini file during startup. If a disk failure occurs on a software-based RAID 1 array during regular operation, no system disruption should be encountered.

Boot Volume

If Windows Server 2003 has been installed on the second or third partitions of a disk drive, a separate boot and system partition will be created. Most manufacturers require that for a system to boot up from a volume other than the primary partition, the partition must be marked active before functioning. To satisfy this requirement without having to change the active partition, Windows Server 2003 always tries to load the boot files on the first or active partition during installation, regardless of which partition or disk the system files will be loaded on. When this drive or volume fails, if the system volume is still intact, a boot disk can be used to boot into the OS and make the necessary modification after changing the drive.

Data Volume

A data volume is by far the simplest of all types of disks to recover. If an entire disk fails, simply replacing the disk, assigning the previously configured drive letter, and restoring the entire drive from backup will restore the data and permissions.

A few issues to watch out for include these:

  • Setting the correct permissions on the root of the drive

  • Ensuring that file shares still work as desired

  • Validating that data in the drive does not require a special restore procedure

 <  Day Day Up  >  


Microsoft Exchange Server 2003 Unleashed
Microsoft Exchange Server 2003 Unleashed (2nd Edition)
ISBN: 0672328070
EAN: 2147483647
Year: 2003
Pages: 393
Authors: Rand Morimoto

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net