Providing Redundancy in Server Hardware


Redundancy is a principle of fault tolerance that has many applications for servers and networks. As you have learned in this chapter, you can use redundant network adapters, routers, and hard disks in a RAID array to provide automatic failover and recovery in the event of primary device failure.

However, other parts of a server should also be protected with redundancy whenever possible. Depending on the component, it may not always be possible to arrange for automatic failover, but you should provide for as much redundancy in installed equipment as possible, and you should have spare hardware readily at hand for other types of server components.

Major component areas to consider for built-in redundancy or, at minimum, redundancy through in-stock spare parts include the following:

  • Power supplies

  • Fans

  • Memory

  • Processors

The following sections provide details.

Note

Standard servers provide integrated redundancy for only select onboard components. However, fault-tolerant servers that incorporate replicated hardware that processes instructions at the same time as the primary hardware are available from Stratus Technologies (www.stratus.com), NEC Solutions (America), Inc. (www.necsam.com), and other vendors. Essentially, fault-tolerant servers provide two complete servers that operate in parallel in a single chassis. If the primary hardware fails in a fault-tolerant server, the replicated hardware takes over automatically.


Power Supplies

A power supply is one of the most vulnerable pieces of hardware in a server or other network device. Power supplies are always carrying a load, and with the type of work they perform, they are usually one of the first major pieces of hardware in a server or other network device to fail. If you have no redundancy in place for this device, that failure will bring down everything drawing power from it.

Because of the severity of having a power supply go out, you can usually order your major network equipment and almost all servers with dual (redundant) power supplies. It's a very good idea to spend the extra money on the option for redundant power supplies in any major network device.

Note

If your server does not include an option for a redundant power supply (RPS) but you can use off-the-shelf power supplies in your server chassis, you can purchase an RPS from any one of a number of vendors as a retrofit.


For more information about RPSs, see "Redundant Power Supplies (RPSs)," p. 675.


Power supplies are relatively inexpensive, especially compared to the loss of productivity that results when a power supply failure takes a server offline, so you should consider keeping at least one spare power supply unit for your server on hand. This is especially important if your servers are rack mounted. 3U, 2U, and 1U rack-mounted servers use power supplies customized to each form factor size.

Fans

Keeping systems cool is a crucial step in keeping a network up and running. Electronics of all types generate heat, and all of them run much better the cooler they are. You can look at almost any electronic component and see at least one fan, and most computer equipment, including network devices such as routers and switches, has at least two fans.

Servers usually have at least four or more fans, and having a redundant fan in place is a very good idea. You usually have at least one or two fans in each of your power supplies, and as you just read, you should have two power supplies. You will also have at least two system fans. You should definitely have one for each processor in the system: Your video card could have a fan on it, and some systems even have fans on the system boards. The fans for which you should consider having redundancy are the system fans. You can have online redundant system fans for some servers that will automatically switch on if another fan goes out. You should also consider keeping a spare active heatsink (processor fan) onsite because most servers will not boot if the active heatsink's fan fails.

Note that whereas tower chassis can often use standard 80mm to 120mm fans, fans in 1U, 2U, and 3U chassis are usually special models that are not sold at normal computer parts stores. Similarly, active heatsink specifications for some server processors (such as the 90nm-manufacturing process Intel Xeon) may be substantially different from those used by desktop processors. You should keep a spare or two handy to avoid unnecessary downtime.

To determine whether an onboard fan is failing, you should check the system management software installed on your server or the BIOS System Monitor.

Memory

Like most pieces of hardware for a server, you should also have redundant memory. Although this practice is the least used of the redundant items listed in this chapter, it is yet another way of preventing downtime. Some servers come with a redundant memory bank, but you must enable this feature in the system BIOS before it will work. Redundant memory is known as memory sparing. In addition to memory sparing, some systems support hot-plug RAID memory, which creates a memory array similar to a RAID 5 disk array.

To learn more about hot-plug RAID memory and other methods used by servers to improve memory reliability, see "Advanced Error Correction Technologies," p. 391.





Upgrading and Repairing Servers
Upgrading and Repairing Servers
ISBN: 078972815X
EAN: 2147483647
Year: 2006
Pages: 240

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net