12.4 Server Clustering | Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel and IP SANs (2nd Edition)

As enterprise applications have shifted from mainframe and midrange systems to application and file servers, the reliable access to data that the legacy systems provided (and that required decades of engineering to accomplish) has been compromised. To make their products acceptable for enterprise use, server manufacturers have responded with more sophisticated designs that offer dual power supplies, dual LAN interfaces, multiple processors, and other features to enhance performance and availability. The potential failure of an individual component within a server is thus accommodated by using redundancy. Redundancy typically implies hardware features but may also include redundant software components, including applications. Extending this strategy, redundancy can also be provided simply by duplicating the servers themselves, with multiple servers running identical applications. In the case of failure of a hardware or software module within a server, you shift users from the failed server to one or more servers in a server cluster.

The software used to reassign users from one server to another with minimal disruption to applications is necessarily complex. Clustering software written for high-availability implementations can be triggered by the failure of a hardware, protocol, or application component. The recovery process must preserve user network addressing, login information, current status, open applications, open files, and so on. Clustering software may also include the ability to balance the load among active servers. In this way, in addition to failover support, the servers in a cluster can be maximized to increase overall performance.

Small clusters can be deployed with traditional parallel SCSI cabling for shared data, but generally they are limited to two servers. SANs allow server clusters to scale to very large shared data configurations, with more than a hundred servers in a single cluster.

Because the focus of clustering is to facilitate availability, deploying a server cluster on a SAN typically includes redundant paths from multiple servers to data. Software on each server must monitor the health of hardware components and applications and must be able to inform other servers in the cluster if a failure or loss of service occurs. This heartbeat status is usually propagated over a dedicated (and sometimes redundant) LAN interface. If redundant paths to data are provided, each server must also monitor the status of each SAN connection and must redirect storage traffic if a switch segment fails. In addition, the data itself may be secured via local or remote RAID mirroring, which provides a duplicate copy if a primary storage unit fails. This tiered strategy helps to ensure the availability of servers, access to data, and the data itself.

In the site represented in Figure 12-6, a cluster of eight servers is configured with redundant data paths over two separate SAN segments. Two SAN switches are configured as primary and secondary paths between the clustered servers and RAID disk arrays. For this installation, the status heartbeat can be configured with redundant Ethernet links between each server, so that the failure of an Ethernet link will not falsely trigger a condition in which each server would attempt to assume services for others. Because the clustering software determines which components or applications on each server should be covered by a failure, subsets of recovery policies can be defined within the eight-server cluster. In this configuration, all servers share a common database application, and subsets of three servers are configured for failover for specific user applications. The sample configuration can also be scaled to accommodate additional servers or storage by adding SAN switch ports via switch-to-switch links (E_Port for Fibre Channel, link aggregation for Gigabit Ethernet) or by installing direct-class SAN switches.

Figure 12-6. A fully redundant server cluster with dual data paths

graphics/12fig06.gif

For server attachment, dual pathing is not simply a matter of installing multiple Fibre Channel HBAs or iSCSI adapters. The operating system (for example, Microsoft .NET) or device drivers on the SAN adapter cards must be capable of monitoring link availability on each SAN connection and diverting storage traffic from one path to another in the event of link or adapter card failure. Similarly, failover from one storage array to another is determined by the mirroring or data replication method used. Storage arrays that provide disk-to-disk data replication without server intervention may allow instantaneous failover (for synchronous mirroring) or delayed failover (for asynchronous mirroring).