Many organizations rely heavily on their computers, and once the computers are networked, they come to rely on those network communications as well. Depending on the type of organization using the network, an equipment failure or other service interruption can mean lost productivity, lost revenue, and, in some cases, even lost lives. This is why many networks have built-in fault-tolerance mechanisms. When the functions of a network are absolutely critical, such as in hospitals or airport control towers, the fault-tolerance mechanisms can be incredibly elaborate. In most cases, however, only a few key components are protected from outages due to hardware or software faults. This lesson examines some of the systems that you can use to protect a network from such disasters.
Many organizations must have their data available all the time to function. If a drive on a server fails, the data should be restorable from a backup, but the time lost replacing the drive and restoring the data can mean lost productivity that costs the company dearly. To provide a higher degree of data availability, there are a variety of hardware technologies that work in different ways to ensure that network data is continuously accessible. Some of these technologies are as follows:
Table 14.1 RAID Levels
RAID Level | RAID Technology | Description |
---|---|---|
0 | Disk striping | Enhances performance by writing data to multiple disk drives, one block at a time; provides no fault tolerance. |
1 | Disk mirroring and duplexing | Provides fault tolerance by maintaining duplicate copies of all data on two drives. Disk mirroring uses two drives connected to the same host adapter, and disk duplexing uses two drives connected to different host adapters. |
2 | Hamming error-correcting code (ECC) | Ensures data integrity by writing error-correcting code to a separate disk drive; rarely implemented. |
3 | Parallel transfer with shared parity | Provides fault tolerance by striping data at the byte level across a minimum of two drives and storing parity information on a third drive. If one of the data drives fails, its data can be restored using the parity information. |
4 | Independent data disks with shared parity | Identical to RAID 3, except that the data is striped across the drives at the block level. |
5 | Independent data disks with distributed parity | Provides fault tolerance by striping both data and parity across three or more drives, instead of using a dedicated parity drive, as in RAID 3 and RAID 4. |
6 | Independent disks with two-dimensional parity | Provides additional fault tolerance by striping data and two complete copies of the parity information across three or more drives. |
7 | Asynchronous RAID | Proprietary hardware solution that consists of a striped data array and a separate parity drive, plus a dedicated operating system that coordinates the disk storage activities. |
10 | Striping of mirrored disks | Combines RAID 0 and RAID 1 by striping data across mirrored pairs of disks, thus providing both fault tolerance and enhanced performance. |
53 | Striped array of arrays | Stripes data across multiple RAID 5 arrays, providing the same fault tolerance as RAID 5 with additional performance enhancement. |
0+1 | Mirroring of striped disks | Combines RAID 0 and RAID 1 in a different manner by mirroring the data stored on identical striped disk arrays. |
None of the data availability techniques described here is intended to be a replacement for regular backups using a device such as a tape drive. For more information about backing up network data, see Lesson 1: Backups, in Chapter 16, "Network Maintenance."
Data availability techniques are useful, but they do no good if the server running the disks malfunctions for some other reason. In addition to specialized data availability techniques, there are similar technologies designed to make servers more reliable. For example, some servers take the concept of hot swapping to the next level by providing redundant components, such as fan assemblies and various types of drives, that you can remove and replace without shutting down the entire computer. Of course the ultimate solution for server fault tolerance is to have more than one server, and there are various solutions available that enable multiple computers to operate as one, so that if one server should fail, another can immediately take its place.
Novell NetWare SFT III is one of the first commercially successful server duplication technologies. NetWare SFT III is a version of NetWare that consists of two copies of the network operating system, plus a proprietary hardware connection that is used to link the two separate server computers, as shown in Figure 14.1. The servers run an application that synchronizes their activities. When a user saves data to one server volume, for example, the data is written to both servers at the same time. If one of the servers should malfunction for any reason, the other server instantaneously takes its place.
Figure 14.1 NetWare SFT III connects two servers, using one as a failover backup to the other
SFT III is designed solely to provide fault tolerance, but the next generation of this technology does more. Clustering is a technique for interconnecting multiple computers to form a unified computing resource (see Figure 14.2). In addition to providing fault tolerance, a cluster can also distribute the processing load for specific tasks among the various computers or balance the processing load by allocating client requests to different computers in turn. To increase the speed and efficiency of the cluster, administrators can simply connect another computer to the group, which adds its capabilities to those of the others. Both Microsoft and Novell support clustering, Microsoft with Windows 2000 Advanced Server or Microsoft Windows NT 4.0 Enterprise Edition and Novell with NetWare Cluster Services for NetWare 5.1.
Figure 14.2 A server cluster provides fault-tolerance, load-balancing, and process distribution services
Service interruptions on a network are not always the result of a computer or drive failure. Sometimes the network itself is to blame. For this reason, many larger internetworks are designed to include redundant components that enable traffic to reach a given destination in more than one way. If a network cable is cut or broken, or if a router or switch fails, redundant equipment enables data to take another path to its destination. There are several ways to provide redundant paths. Typically, you have at least two routers or switches connected to each network, so that the computers can use either one as a gateway to the other segments. For example, you can build an internetwork with two backbones, as shown in Figure 14.3. Each workstation can use either of the routers on its local segment as a gateway. You can also use this arrangement to balance the traffic on the two backbones by configuring half of the computers on each LAN to use one of the routers as their default gateway and the other half to use the other router.
Figure 14.3 Building a network with two backbones provides both fault tolerance and load balancing
Which disk technology (mirroring, duplexing, spanning, or striping) applies to each of the following statements?