Lesson 2:Providing Fault Tolerance

Many organizations rely heavily on their computers, and once the computers are networked, they come to rely on those network communications as well. Depending on the type of organization using the network, an equipment failure or other service interruption can mean lost productivity, lost revenue, and, in some cases, even lost lives. This is why many networks have built-in fault-tolerance mechanisms. When the functions of a network are absolutely critical, such as in hospitals or airport control towers, the fault-tolerance mechanisms can be incredibly elaborate. In most cases, however, only a few key components are protected from outages due to hardware or software faults. This lesson examines some of the systems that you can use to protect a network from such disasters.


After this lesson, you will be able to

  • Understand the various mechanisms used to make network data continuously available
  • Describe how clustering ensures the constant availability of vital network servers
  • Understand how to use redundant equipment to provide fault-tolerant network communications

Estimated lesson time: 15 minutes


Data Availability

Many organizations must have their data available all the time to function. If a drive on a server fails, the data should be restorable from a backup, but the time lost replacing the drive and restoring the data can mean lost productivity that costs the company dearly. To provide a higher degree of data availability, there are a variety of hardware technologies that work in different ways to ensure that network data is continuously accessible. Some of these technologies are as follows:

  • Mirroring.  Disk mirroring is an arrangement in which two identical hard disk drives connected to a single host adapter always contain identical data. The two drives appear to users as one logical drive, and whenever anyone saves data to the mirror set, the computer writes it to both drives simultaneously. If one hard drive unit should fail, the other can take over immediately until the malfunctioning drive is replaced. Many operating systems, including Microsoft Windows 2000, Microsoft Windows NT, and Novell NetWare, support disk mirroring. The two main drawbacks of this technique are that the server provides only half of its available disk space to users, and that although mirroring protects against a drive failure, a failure of the host adapter or the computer can still render the data unavailable.
  • Duplexing.  Disk duplexing provides a higher degree of data availability by using duplicate host adapters as well as disk drives. Identical disk drives on separate host adapters maintain exact copies of the same data, creating a single logical drive, just as in disk mirroring, but in this case, the server can survive either a disk failure or a host adapter failure and still make its data available to users.
  • Volumes.  A volume is a fixed amount of data storage space on a hard disk or other storage device. On a typical computer, the hard disk drive may be broken up into multiple volumes to separate data into discrete storage units. For example, if you have a C and a D drive on your computer, these two letters can refer to two different hard drives or to two volumes on a single drive. Network servers function in the same way, but with greater flexibility. You can create multiple volumes on a single drive or create a single volume out of multiple drives. This latter technique is called drive spanning. You can use drive spanning to make all the storage space on multiple drives in a server appear to users as a single entity. The drawback of this technique is that if one of the hard drives containing part of the volume fails, the whole volume is lost.
  • Striping.  Disk striping is a method by which you create a single volume by combining the storage on two or more drives and writing data alternately to each one. Normally, a spanned volume stores whole files on each disk. When you use disk striping, the computer splits each file into multiple segments and writes alternate segments to each disk. This speeds up data access by enabling one drive to read a segment while the other drive's heads are moving to the next segment. When you consider that network servers might need to process dozens of file access requests at once (from various users), the speed improvement provided by disk striping can be significant. However, striped volumes are subject to the same problem as volumes that are spanned. If one drive in the stripe set fails, the entire volume is lost.
  • Redundant array of independent disks (RAID).  This is a comprehensive data availability technology with various levels that provide all of the functions described in the technologies previously listed. Higher RAID levels store error correction information along with the data, so that even if a drive in a RAID array fails, its data still remains available from the other drives. Although RAID is available as a software product that works with standard disk drives, many high-end servers use dedicated RAID drive arrays, which consist of multiple hard drive units in a single housing, often with hot swap capability. Hot swapping is when you can remove and replace a malfunctioning drive without shutting off the other drives in the array. This enables the data to remain continuously available to network users, even when the support staff is dealing with a drive failure. The various RAID levels and their functions are listed in Table 14.1.
  • Network attached storage (NAS).  This technology uses a dedicated storage appliance that connects directly to the network and contains its own embedded operating system. Essentially a multiplatform file server, computers on the network can access the NAS appliance in a variety of ways.
  • Storage Area Networks (SANs).  A SAN is a separate network installed at a local area network (LAN) site that connects servers to disk arrays and other network storage devices, making it possible to use dedicated storage hardware arrays without overloading the client network with storage-related traffic. SANs typically use the Fibre Channel protocol to communicate, but they can theoretically use any network medium and protocol.

Table 14.1  RAID Levels

RAID Level RAID Technology Description

0

Disk striping

Enhances performance by writing data to multiple disk drives, one block at a time; provides no fault tolerance.

1

Disk mirroring and duplexing

Provides fault tolerance by maintaining duplicate copies of all data on two drives. Disk mirroring uses two drives connected to the same host adapter, and disk duplexing uses two drives connected to different host adapters.

2

Hamming error-correcting code (ECC)

Ensures data integrity by writing error-correcting code to a separate disk drive; rarely implemented.

3

Parallel transfer with shared parity

Provides fault tolerance by striping data at the byte level across a minimum of two drives and storing parity information on a third drive. If one of the data drives fails, its data can be restored using the parity information.

4

Independent data disks with shared parity

Identical to RAID 3, except that the data is striped across the drives at the block level.

5

Independent data disks with distributed parity

Provides fault tolerance by striping both data and parity across three or more drives, instead of using a dedicated parity drive, as in RAID 3 and RAID 4.

6

Independent disks with two-dimensional parity

Provides additional fault tolerance by striping data and two complete copies of the parity information across three or more drives.

7

Asynchronous RAID

Proprietary hardware solution that consists of a striped data array and a separate parity drive, plus a dedicated operating system that coordinates the disk storage activities.

10

Striping of mirrored disks

Combines RAID 0 and RAID 1 by striping data across mirrored pairs of disks, thus providing both fault tolerance and enhanced performance.

53

Striped array of arrays

Stripes data across multiple RAID 5 arrays, providing the same fault tolerance as RAID 5 with additional performance enhancement.

0+1

Mirroring of striped disks

Combines RAID 0 and RAID 1 in a different manner by mirroring the data stored on identical striped disk arrays.

None of the data availability techniques described here is intended to be a replacement for regular backups using a device such as a tape drive. For more information about backing up network data, see Lesson 1: Backups, in Chapter 16, "Network Maintenance."

Server Availability

Data availability techniques are useful, but they do no good if the server running the disks malfunctions for some other reason. In addition to specialized data availability techniques, there are similar technologies designed to make servers more reliable. For example, some servers take the concept of hot swapping to the next level by providing redundant components, such as fan assemblies and various types of drives, that you can remove and replace without shutting down the entire computer. Of course the ultimate solution for server fault tolerance is to have more than one server, and there are various solutions available that enable multiple computers to operate as one, so that if one server should fail, another can immediately take its place.

Novell NetWare SFT III is one of the first commercially successful server duplication technologies. NetWare SFT III is a version of NetWare that consists of two copies of the network operating system, plus a proprietary hardware connection that is used to link the two separate server computers, as shown in Figure 14.1. The servers run an application that synchronizes their activities. When a user saves data to one server volume, for example, the data is written to both servers at the same time. If one of the servers should malfunction for any reason, the other server instantaneously takes its place.

Figure 14.1  NetWare SFT III connects two servers, using one as a failover backup to the other

SFT III is designed solely to provide fault tolerance, but the next generation of this technology does more. Clustering is a technique for interconnecting multiple computers to form a unified computing resource (see Figure 14.2). In addition to providing fault tolerance, a cluster can also distribute the processing load for specific tasks among the various computers or balance the processing load by allocating client requests to different computers in turn. To increase the speed and efficiency of the cluster, administrators can simply connect another computer to the group, which adds its capabilities to those of the others. Both Microsoft and Novell support clustering, Microsoft with Windows 2000 Advanced Server or Microsoft Windows NT 4.0 Enterprise Edition and Novell with NetWare Cluster Services for NetWare 5.1.

Figure 14.2  A server cluster provides fault-tolerance, load-balancing, and process distribution services

Network Redundancy

Service interruptions on a network are not always the result of a computer or drive failure. Sometimes the network itself is to blame. For this reason, many larger internetworks are designed to include redundant components that enable traffic to reach a given destination in more than one way. If a network cable is cut or broken, or if a router or switch fails, redundant equipment enables data to take another path to its destination. There are several ways to provide redundant paths. Typically, you have at least two routers or switches connected to each network, so that the computers can use either one as a gateway to the other segments. For example, you can build an internetwork with two backbones, as shown in Figure 14.3. Each workstation can use either of the routers on its local segment as a gateway. You can also use this arrangement to balance the traffic on the two backbones by configuring half of the computers on each LAN to use one of the routers as their default gateway and the other half to use the other router.

Figure 14.3  Building a network with two backbones provides both fault tolerance and load balancing

Exercise 1: Data Availability Technologies

Which disk technology (mirroring, duplexing, spanning, or striping) applies to each of the following statements?

  1. Enables a server to survive a drive failure
  2. Uses multiple hard drives to create a single logical hard drive
  3. Enables a server to survive a disk host adapter failure
  4. Stores a single file on multiple drives
  5. Causes an entire volume to be lost when one drive fails

Lesson Review

  1. Which of the following storage services is not provided by RAID?
    1. Data striping
    2. Tape backup
    3. Disk mirroring
    4. Error correction
  2. What services does a cluster of servers provide that Novell's NetWare SFT III does not?
  3. What additional hardware can you install to create redundant paths through the network?
    1. NICs
    2. Hubs
    3. Servers
    4. Routers

Lesson Summary

  • Networks often use data storage techniques such as mirroring, duplexing, spanning, striping, redundant array of independent disks (RAID), network attached storage (NAS), and Storage Area Networks (SANs) to increase the efficiency and fault tolerance of the network storage subsystem.
  • Redundant servers, possibly configured in clusters, enable a network to survive even a major server failure without interrupting user productivity.
  • Creating redundant paths through the network enables communications to continue, even in the event of a cable break or a router failure.


Network+ Certification Training Kit
Self-Paced Training Kit Exam 70-642: Configuring Windows Server 2008 Network Infrastructure
ISBN: 0735651604
EAN: 2147483647
Year: 2001
Pages: 105

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net