Lesson 1: Implementing Disaster Protection | MCSE Training Kit(c) Microsoft Windows 2000 Accelerated 2000

A computer disaster is any event that renders a computer unable to start. This can include the destruction of the master boot record stored on a system device, the deletion of one or more operating system files, destruction of a computer's physical system device, or destruction of the computer itself. The term disaster protection refers to any effort to prevent computer disasters and minimize downtime in the event of system failure. You can achieve a level of disaster protection by configuring an uninterruptible power supply (UPS) and implementing fault-tolerant disk configurations.

After this lesson, you will be able to

Configure a UPS to provide power if a local power source fails
Implement disk fault tolerance

Estimated lesson time: 40 minutes

Configuring an Uninterruptible Power Supply

Disaster recovery is the restoration of a computer so that you can log on and access system resources after a computer disaster has occurred. One common type of computer disaster is the loss of local power, which can result in damaged or lost data on a server or client computer. While companies usually protect servers against this type of disaster, you might also consider providing protection for client computers against power loss, depending on the reliability of your local power supply.

An uninterruptible power supply provides power if the local power fails and usually is rated to provide a specific amount of power for a specific period of time. In general, a UPS should provide power long enough for you to shut down a computer in an orderly way by quitting processes and closing sessions.

NOTE
Before purchasing a UPS for use with Windows 2000, determine whether the proposed device is on the Windows 2000 Hardware Compatibility List (HCL).

Configuring Options for the UPS Service

Use the UPS tab of Power Options Properties dialog box to configure the UPS service. You can access this dialog box by selecting Power Options in Control Panel. To configure the UPS service, you must specify the following information:

The COM port to which the UPS device is connected
The conditions that trigger the UPS device to send a signal, such as a power failure, low battery power, and remote shutdown by the UPS device
The time interval for maintaining battery power, recharging the battery, and sending warning messages after power failure

NOTE
The configuration options for the UPS service can vary depending on the specific UPS device attached to your computer. For details about possible settings, see the manufacturer's documentation included with the UPS device.

Testing a UPS Configuration

After you have configured the UPS service for your computer, test the configuration to ensure that your computer is protected from power failures. You can simulate a power failure by disconnecting the main power supply to the UPS device. During the test, the computer and peripherals connected to the UPS device should remain operational, messages should display, and events should be logged.

NOTE
You should not use a production computer to test the UPS configuration. You should use a spare computer or test computer. If you use a production computer, you could lose some of the data on the computer and possibly have to reinstall Windows 2000. Remember, when a computer suddenly stops, data can be lost or corrupted. The reason for having a UPS is to allow a graceful shutdown of the computers rather than an abrupt stop.

In addition, you should wait until the UPS battery reaches a low level to verify that an orderly shutdown occurs. Then, restore the main power source to the UPS device and check the event log to ensure that all actions were logged and there were no errors.

NOTE
Some UPS manufacturers provide their own UPS software to take advantage of the unique features of their UPS devices.

Implementing Disk Fault Tolerance

Fault tolerance is the ability of a computer or operating system to respond to a catastrophic event, such as a power outage or hardware failure, so that no data is lost and that work in progress is not corrupted. Fully fault-tolerant systems using fault-tolerant disk arrays prevent the loss of data.

Although the data is available and current in a fault-tolerant system, you should still make backups to protect the information on hard disks from erroneous deletions, fire, theft, or other disasters. Disk fault tolerance is not an alternative to a backup strategy with offsite storage, which is the best insurance for recovering lost or damaged data.

If you experience the loss of a hard disk due to mechanical or electrical failure and have not implemented fault tolerance, your only option for recovering the data on the failed drive is to replace the hard disk and restore your data from a backup. However, the loss of access to the data while you replace the hard disk and restore your data can translate into lost time and money.

RAID Implementations

To maintain access to data during the loss of a single hard disk, Windows 2000 Server provides a software implementation of a fault tolerance technology known as redundant array of independent disks (RAID). RAID provides fault tolerance by implementing data redundancy. With data redundancy, a computer writes data to more than one disk, which protects the data in the event of a single hard disk failure.

You can implement RAID fault tolerance as either a software or hardware solution.

Software Implementations of RAID

Windows 2000 Server supports two software implementations of RAID: mirrored volumes (RAID 1) and striped volumes with parity (RAID 5), otherwise known as RAID-5 volumes. However, you can create new RAID volumes only on Windows 2000 dynamic disks.

With software implementations of RAID, there is no fault tolerance following a failure until the fault is repaired. If a second fault occurs before the data lost from the first fault is regenerated, you can recover the data only by restoring it from a backup.

NOTE
When you upgrade Windows NT 4.0 to Windows 2000, any existing mirror sets or stripe sets with parity are retained. Windows 2000 provides limited support for these fault tolerance sets, allowing you to manage and delete them.

Hardware Implementations of RAID

In a hardware solution, the disk controller interface handles the creation and regeneration of redundant information. Some hardware vendors implement RAID data protection directly in their hardware, as with disk array controller cards. Because these methods are vendor specific and bypass the fault tolerance software drivers of the operating system, they offer performance improvements over software implementations of RAID. In addition, hardware implementations of RAID usually include extra features, such as additional fault-tolerant RAID configurations, hot swapping of failed hard disks, hot sparing for online failover, and dedicated cache memory for improved performance.

NOTE
The level of RAID supported in a hardware implementation depends on the hardware manufacturer.

Consider the following when deciding whether to use a software or hardware implementation of RAID:

Hardware fault tolerance is more expensive than software fault tolerance.
Hardware fault tolerance generally provides faster disk I/O than software fault tolerance.
Hardware fault tolerance solutions might limit equipment options to a single vendor.
Hardware fault tolerance solutions might implement hot swapping of hard disks to allow for replacement of a failed hard disk without shutting down the computer and hot sparing so that a failed disk is automatically replaced by an online spare.

Mirrored Volumes

A mirrored volume uses the Windows 2000 Server fault tolerance driver (Ftdisk.sys) to write the same data to a volume on each of two physical disks simultaneously, as shown in Figure 25.1. Each volume is considered a member of the mirrored volume. Implementing a mirrored volume helps to ensure the survival of data in the event that one member of the mirrored volume fails.

Figure 25.1 Mirrored volume

A mirrored volume can contain any partition, including the boot or system partition; however, both disks in a mirrored volume must be Windows 2000 dynamic disks.

Mirrored volumes can be striped across multiple disks. This configuration is often referred to as RAID 10, RAID 1 mirroring, and RAID 0 striping. Unlike RAID 0, RAID 10 is a fault-tolerant RAID configuration because each disk in the stripe is also mirrored. RAID 10 improves disk input/output (I/O) by performing read and write operations across the stripe.

Performance on Mirrored Volumes

Mirrored volumes can enhance read performance because the fault tolerance driver reads from both members of the volume at once. There can be a slight decrease in write performance because the fault tolerance driver must write to both members. When one member of a mirrored volume fails, performance returns to normal because the fault tolerance driver works with only a single partition.

Because disk space usage is only 50 percent (two members for one set of data), mirrored volumes can be expensive.

CAUTION
Deleting a mirrored volume deletes all the information stored on that volume.

Disk Duplexing

If the same disk controller controls both physical disks in a mirrored volume and the disk controller fails, neither member of the mirrored volume is accessible. You can install a second controller in the computer so that each disk in the mirrored volume has its own controller. This arrangement, called disk duplexing, can protect the mirrored volume against both controller failure and hard disk failure. Some hardware implementations of disk duplexing use two or more channels on a single disk controller card.

Disk duplexing reduces bus traffic and potentially improves read performance. Disk duplexing is a hardware enhancement to a Windows 2000 mirrored volume and requires no additional software configuration.

RAID-5 Volumes

Windows 2000 Server also supports fault tolerance through striped volumes with parity (RAID 5). Parity is a mathematical method of determining the number of odd and even bits in a number or series of numbers, which can be used to reconstruct data if one number in a sequence of numbers is lost.

In a RAID-5 volume, Windows 2000 achieves fault tolerance by adding a parity-information stripe to each disk partition in the volume, as shown in Figure 25.2. If a single disk fails, Windows 2000 can use the data and parity information on the remaining disks to reconstruct the data that was on the failed disk.

Figure 25.2 Raid-5 parity-information stripes

Because of the parity calculation, write operations on a RAID-5 volume are slower than on a mirrored volume. However, RAID-5 volumes provide better read performance than mirrored volumes, especially with multiple controllers, because data is distributed among multiple drives. If a disk fails, however, the read performance on a RAID-5 volume slows while Windows 2000 Server reconstructs the data for the failed disk by using parity information.

RAID-5 volumes have a cost advantage over mirrored volumes because disk usage is optimized. The more disks you have in the RAID-5 volume, the less the cost of the redundant data stripe. Table 25.1 shows how the amount of space required for the data stripe decreases with the addition of 2-gigabyte (GB) disks to the RAID-5 volume.

Table 25.1 Data Stripe size vs. disk size

Number of disks	Disk space used	Available disk space	Redundancy
3	6 GB	4 GB	33 percent
4	8 GB	6 GB	25 percent
5	10 GB	8 GB	20 percent

There are some restrictions that RAID-5 volumes implement in software. First, RAID-5 volumes involve a minimum of three drives and a maximum of 32 drives. Second, a software-level RAID-5 volume cannot contain the boot or system partition.

The Windows 2000 operating system is not aware of RAID implementations in hardware. Therefore, the restrictions that apply to software-level RAID do not apply to hardware-level RAID configurations.

Mirrored Volumes versus RAID-5 Volumes

Mirrored volumes and RAID-5 volumes provide different levels of fault tolerance. Deciding which option to implement depends on the level of protection you require and the cost of hardware. The major differences between mirrored volumes (RAID 1) and RAID-5 volumes are performance and cost. Table 25.2 describes some differences between software-level RAID 1 and RAID 5.

Table 25.2 Differences between RAID 1 and RAID 5

Mirrored volumes RAID 1	Striped volumes with parity RAID 5
Supports FAT and NTFS	Supports FAT and NTFS
Can protect system or boot partition	Cannot protect system or boot partition
Requires 2 hard disks	Requires a minimum of 3 hard disks and allows a maximum of 32 hard disks
Has a higher cost per megabyte	Has a lower cost per megabyte
50 percent utilization	33 percent minimum utilization
Has good write performance	Has moderate write performance
Has good read performance	Has excellent read performance
Uses less system memory	Requires more system memory

Generally, mirrored volumes offer read and write performance comparable to that of single disks. RAID-5 volumes offer better read performance than mirrored volumes, especially with multiple controllers, because data is distributed among multiple drives. However, the need to calculate parity information requires more computer memory, which can slow write performance.

Mirroring uses only 50 percent of the available disk space, so it is more expensive in cost per megabyte (MB) than disks without mirroring. RAID 5 uses 33 percent of the available disk space for parity information when you use the minimum number of hard disks (three). With RAID 5, disk utilization improves as you increase the number of hard disks.

Implementing RAID Systems

The software-level fault tolerance features of Windows 2000 Server are available only on Windows 2000 dynamic disks. In Windows 2000 Server, you create software-level mirrored and RAID-5 volumes by using the Create Volume wizard in the Computer Management snap-in.

To create a volume by using the Create Volume wizard, access the Disk Management folder in the Computer Management snap-in. When you select the Disk Management folder, the details pane of the Computer Management window displays a text view of the physical disks in your computer and a graphical view (Figure 25.3).

Figure 25.3 Disk Management folder of the Computer Management snap-in

In the details pane, select an area of unallocated space. Then, from the Action menu, point to All Tasks and click Create Volume. Follow the steps in the Create Volume wizard to create a volume.

NOTE
Windows 2000 Advanced Server and Windows 2000 Data Center support server clustering for an even higher level of fault tolerance. Clustering is beyond the scope of this training kit.

Lesson Summary

In this lesson you learned that you can achieve a level of disaster protection by configuring a UPS and by implementing disk fault tolerance. A UPS provides power in case the local power fails. In general, a UPS should provide power long enough to warn users connected to the server of the failure and to then perform an orderly shut down. You can configure the UPS service on the UPS tab of the Power Options Properties dialog box. After you have configured the UPS service for your computer, you should test the configuration to ensure that your computer is protected from power failures.

You also learned that in addition to UPS for power protection, fault-tolerant RAID provides an additional level of data protection. You can use fault-tolerant RAID configurations to implement disk fault tolerance as either a software or hardware solution. A software-level mirrored volume uses the Windows 2000 Server fault tolerance driver (Ftdisk.sys) to write the same data to a volume on each of two physical disks simultaneously. Windows 2000 Server also supports fault tolerance through software-level striped volumes with parity (RAID 5). In a RAID-5 volume, Windows 2000 achieves fault tolerance by adding a parity-information stripe to each disk partition in the volume. Generally, mirrored volumes offer read and write performance comparable to that of single disks. RAID-5 volumes offer better read performance than mirrored volumes. RAID implementations benefit from using multiple controllers (disk duplexing) because disk I/O is distributed among multiple data channels to increase performance and fault tolerance. You create software-level mirrored and RAID-5 volumes by using the Create Volume wizard in the Computer Management snap-in.