High Availability with HP Serviceguard

The Serviceguard name is used to describe a set of high-availability and disaster-tolerant solutions. These are all based on the Serviceguard HA product but provide different features depending on the goals of the solution. We will provide a very brief overview of why you might need a high-availability solution, then we will cover the products.

note

This section only provides an overview of the Serviceguard suite. More details, and some usage examples, will be provided in Chapter 16, "Serviceguard."

High Availability and Disaster Tolerance

In this section we will provide a brief overview of what high availability is, how it is different than disaster tolerance, and why these solutions are as important as they are.

Why Are High Availability and Disaster Tolerance Important?

There are many sources for information about how costly a major service outage can be. The cost of a failure varies based on the industry and the nature of the application. A large number of factors impact the true cost of a failure. These include:

tarnished company reputation and customer loyalty
lost opportunities and revenue
idle or unproductive labor
cost of restoration
penalties
litigation
loss of stock value
loss of critical data

It is likely that you already know which applications in your environment are the ones that are likely to cause the company to have serious financial problems. The type of high availability or disaster tolerance you might want to implement will often depend on the cost of implementing the failover technology compared to the likely cost of a failure. We will describe the various options you have so you can make an educated decision.

High Availability vs. Disaster Tolerance

High availability typically requires providing redundancy within a datacenter to maintain service when there are hardware or software failures. This can also help minimize the damage done by human errors, which account for about 40% of all application failures. Service can normally be restored in only a few minutes.

Disaster tolerance involves providing redundancy between separate datacenters so that the service can be restored quickly in the event of a major disaster, which might include a fire, a flood, an earthquake, or terrorism. Service can typically be restored in from tens of minutes to a few hours.

There is a third solution, sometimes called disaster recovery. This typically involves sending staff to separate facility with backup tapes. The disaster recovery facility might have spare equipment or may have systems that are used for lower-priority work, so they can be repurposed in the event of a disaster. Service recovery using this method can take days to weeks, depending on how similar the spare systems are to the original production systems.

Components of High-Availability Technology

Many components are necessary to provide a highly available infrastructure. Some are hardware, some are software, some are architecture, and some are processes. Some examples of the components include:

Single-system high-availability components: this involves purchasing systems, storage and network equipment that have high availability built in through the use of redundant internal components and online addition and replacement of components.
Multisystem availability: This includes clustering, load balancing, rapid failover, and recovery. This is what the Serviceguard product provides.
High availability through manageability: it is also possible to provide some level of high availability by ensuring that you are notified very quickly when a failure occurs and having manual processes in place that can quickly bring the service back on line. This will never be as fast as it could be if it were automated, but is often better than nothing.
Disaster tolerance: this generally involves providing data replication between separate sites to ensure that if a disaster were to impact the main production site there would be an up-to-date copy of the data that could be brought up in a short period of time. This is what the Serviceguard Extended Campus Cluster, Metrocluster, and Continentalcluster products provide.

Now let's take a look at the anatomy of a Serviceguard cluster.

HP High-Availability Solutions

Installing high-availability software on a system is not sufficient to get high availability. It is also important that the hardware be configured to allow for a failure. This involves setting up redundant paths to all your mission-critical data and applications.

Serviceguard Concepts

A number of critical concepts will help you understand the rest of this section. These include:

Cluster: A cluster is a set of systems over which you will run your application packages. Multiple systems are configured into what is called a Serviceguard cluster. These systems must have connectivity to shared networks and storage to ensure that the packages can fail over from one system in the cluster to another.
Package: A Serviceguard package consists of the application processes and system resources necessary to allow the application to run. This includes the disks containing the data for the application and the network IP address that users access the application with.
Service: In Serviceguard terminology, a service can be one or more of the processes of the application, any service required by the package, and/or monitoring utilities that are watching the application. Serviceguard will monitor any process that is configured as a service and will attempt to restart the process if it exits. If the restart fails the configured number of times, the package will be failed over to another server.
Data network: These networks allow users to connect to the application. The cluster will be configured with one or more data networks. It is generally recommended in a high-availability architecture that there be dual network interfaces on each system that use different network infrastructures. This ensures that a single card failure doesn't cause an application failover.
Heartbeat network: A special network in a Serviceguard configuration that is used exclusively to allow the nodes in the cluster to monitor each other. This is how Serviceguard determines when a failure occurs that requires a restart of the application packages on another system in the cluster.
Relocatable package IP Address: These are the addresses that allow users to connect to the application. Each package can have one or more relocatable IP addresses. When a failover occurs, the address of the package is moved to the new system so that users can connect to the application without having to know that it is running on a new system.
Shared storage: Any storage used by the application must be available to each system in the cluster that you want to run the application on. When a failover occurs, the storage is disconnected from the primary system and then connected to the failover system before the application is started on the new node.
Cluster lock disk/cluster lock logical unit number (LUN): one serious concern in a clustered environment is ensuring that the multiple systems that have connectivity to any shared disk don't all think they own the data. If this were to occur, it would be possible for the data to become corrupted. The risk in a Serviceguard environment is that a false failover can happen where the two systems could lose network connectivity but not storage connectivity. In this case, if Serviceguard started the application on the failover node while the original node was still writing to the disk storage, data corruption might occur. Serviceguard has a number of functions that were designed to ensure this doesn't happen. The cluster lock is one of these functions. This will be described in more detail later in this section.
Quorum server: as an alternative to the cluster lock disk or cluster lock LUN, it is also possible to run a quorum server to accomplish the same function. A single quorum server can support up to 50 different clusters and up to 100 different servers. Also, the cluster lock disk is only supported on clusters of two to four nodes. The disk/LUN solution doesn't require a separate server, so it is often used for two-node failover clusters. It is impractical to have shared connectivity to the cluster lock disk/LUN from larger clusters, but performance is the primary reason that the quorum server is required on clusters larger than four nodes.

Hardware Architecture

High availability generally involves redundancy. The more redundant components you have, the higher the availability you will be able to achieve. Clearly, there is a point of diminishing returns. Figure 13-16 shows the architecture of a reasonable middle ground.

Figure 13-16. An Example of the Architecture of High-Availability Hardware

In this example, each system has dual network connections for data and a separate heartbeat LAN that also provides connectivity to the quorum server. In addition, each system has dual fiber-channel cards, each of which is connected to a separate fiber-channel switch, each of which is then connected to one or more storage devices. The key to this is that no one failure will cause a failover. It would take two components, both of which are matched and on the same system, before a failover would occur. Even then, since all the nodes in the cluster are connected to the same storage and networks, any of the other nodes can take over for the failed system.

Software Architecture

Now lets talk about the Serviceguard product. The key features of the Serviceguard cluster product include:

Each Serviceguard cluster can have up to 16 nodes in it.
Serviceguard is designed for use when all nodes are in a single datacenter.
Serviceguard provides for automatic failover of up to 150 application packages, 900 services, and 200 relocatable package IP addresses per cluster.
Disks can be connected to the systems in the cluster using either SCSI or fibre channel.
The cluster requires a single IP subnet for heartbeat networks, which can be Ethernet, fiber distributed data interface (FDDI), or Token Ring.
Serviceguard uses LVM or VxVM to manage disk volumes.
A cluster lock disk is required for clusters with two nodes, but it is optional for clusters with three or four nodes.
Alternatively, you can use a quorum server in place of a cluster lock disk to support clusters with up to 16 nodes.

We will discuss some of these features in more detail later when we describe the hardware and software architectures of a Serviceguard cluster.

One other thing to consider when designing your cluster is how you want the applications to behave after they have failed over, particularly whether you want to have idle hardware that is able to accept the workload without any performance impact. There are three primary cluster types:

Active-Standby: this is where you have a hot standby node that is not running any packages. This model usually involves a matched pair of servers so that you can ensure that the failover node is ready to accept a failover at any time with no performance impact on the production workload. This doubles the hardware cost of the solution.
Active-Active: this is where each of the nodes in the cluster has one or more packages running on it at all times. When a failover occurs, the package startup scripts on the failover node can shut down the other packages running there (if they are test or development workloads, for example). Optionally, additional resources could be made available by using some of the other VSE solutions and a workload management tool.
Rolling Standby: this is where you have a multinode cluster in an n+1 configuration such that there is always one node in the cluster that is not running any packages. When a failover occurs, the standby node becomes the primary node for whatever workload fails over. When the failing node comes back up, it becomes the new standby node. This provides much of the benefit of an active-standby configuration but at a more reasonable cost.

Protecting Against Split Brain

One serious concern in a clustered environment is ensuring that the applications that can run on multiple systems don't start up on more than one system at a time. This is a phenomenon called "split brain" in which a cluster breaks up into multiple smaller clusters, each of which thinks the rest of the cluster has failed. Figure 13-17 shows how this can occur.

Figure 13-17. How a Split-Brain Cluster Could Corrupt Data

This picture shows that the network connectivity to two of the nodes in the cluster is lost. However, there is no loss of connectivity to the storage. Therefore, node C thinks that node A has failed and starts the failover process. This includes connecting to the disks for A and starting the package. If this were to occur, then both of these nodes would be running the application and both would be connected to the same storage. This could cause data corruption. The Serviceguard product goes to great lengths to ensure that this can never happen.

The first thing Serviceguard does is attempt to reform the cluster whenever it detects a failure. This must happen before any packages are failed over. This reformation will fail unless the cluster has at least half of the systems that were in the original cluster. Since no two portions of the cluster will both be able to get more than half the nodes in the original cluster, there is no way that Serviceguard can reform more than one cluster. This resolves the split-brain problem for all but one special case where a cluster splits up into two equal halves. Serviceguard uses several mechanisms as tie-breakers to avoid this situation. The first is a cluster lock disk on HP-UX and a cluster lock LUN on Linux. These are used for smaller clusters. For larger clusters this is managed using a quorum server. In both cases the disk, LUN, or quorum server is connected to all of the nodes in the cluster. If the cluster reforms and has exactly half of the nodes from the original cluster, it will attempt to acquire the cluster lock. If it succeeds, it will reform the cluster and start any packages that were on the nodes that are no longer in the cluster. If it is unable to get the lock, it shuts down any packages and disconnects them from the shared storage. This ensures that no two nodes are ever connected to the shared storage at the same time.

Serviceguard Manager

Serviceguard also has a management utility called Serviceguard Manager. This is a Java graphical user interface for managing your Serviceguard clusters. Figure 13-18 shows some screenshots of Serviceguard Manager.

Figure 13-18. Screenshots of Serviceguard Manager

In this figure you can see that the left-hand pane shows a list of clusters and the right-hand pane provides status details of whichever cluster you have selected on the left. The details pane shows the cluster, the nodes in the cluster, and the packages running on each node. You can also see the status of each node and each package. There are mini-icons next to packages whenever something of interest is happening with that package, such as package shutdown or startup or the lack of an available active failover node for the package.

From the GUI you can perform virtually any operation you might want on a cluster. Some examples include:

start or halt a cluster
start or halt a node in a cluster
start or halt a package
move a package from one node in the cluster to another
edit a package or cluster configuration

Serviceguard Manager works by connecting to a daemon called the object manager that is configured to monitor one or more clusters. You connect to the object manager by logging in from Serviceguard Manager. You can control who is allowed to log in to the object manager by editing the /etc/cmclnodelist file on the node running the object manager.

As you saw earlier in this chapter, this GUI has been integrated with the new VSE management suite. In the first release of the Virtualization Manager, the integration provides a context-sensitive launch of the current Serviceguard Manager product. This integration will become tighter in future releases.

HP Disaster-Tolerant Solutions

High-availability clusters are intended to provide nearly immediate recovery from a single point of failure. These are achieved through redundant hardware and Serviceguard software to recover from component or node failures and are typically implemented in a single datacenter. For truly mission-critical applications where any sustained outage poses a significant risk to the business, it is important to guard against multiple points of failure as well.

Disaster-tolerant clusters are capable of restoring service even after multiple failures or massive single failures in the primary datacenter. These solutions replicate the application data and provide the ability to move the application to an entirely different datacenter in a different part of the building, a different part of the city, or another city. The distance between the datacenters is dependent on the types of disasters you are trying to guard against and the technology used to replicate the data between the datacenters.

We will discuss three types of disaster-tolerant clusters in this section:

Extended Distance Cluster: These clusters can be from a mile or two to up to 100km depending on the technology used to connect the datacenters for data replication.
Metrocluster: A Metrocluster is similar to an Extended Distance Cluster, but it includes a fault-tolerant disk array and a third site as a cluster arbitrator.
Continentalcluster: A Continentalcluster involves two separate clusters in geographically dispersed locations that can be unlimited distances apart.

Extended Distance Cluster

An Extended Distance Cluster, sometimes called an Extended Campus Cluster, runs a cluster across multiple datacenters with high-speed networking between them. The distance between the datacenters is dependent on the technology used for data replication. Table 13-2 lists the technologies and the distances they support.

Table 13-2. Supported Distance Between Datacenters is Dependent on the Networking Technology
Type of Link	Maximum Distance Supported
Fast/Wide SCSI	25 meters
Gigabit Ethernet Twisted Pair	50 meters
Short Wave FibreChannel	500 meters
Long Wave FibreChannel	10 kilometers
FDDI Networking	50 kilometers
Finisar Gigabit Interface Converters	80 kilometers
Dense Wave Division Multiplexing (DWDM)	100 kilometers

It is possible to set up a two-datacenter solution or a three-datacenter solution. The key difference between these is that the two-datacenter solution is implemented with dual cluster lock disks and the three-datacenter solution uses a quorum server in the third datacenter.

Two-Datacenter Extended Distance Cluster

Because the two-datacenter solution requires the use of dual cluster lock disks, the size of the cluster is limited to a maximum of four nodes. In addition, the cluster must be split evenly; you can have a two-node cluster or a four-node cluster. Figure 13-19 shows the layout of a two-site cluster.

Figure 13-19. A Two-Site Extended Distance Serviceguard Cluster

You must have at least two network paths and it is recommended that you have three different network paths between the datacenters to ensure continuous access to the two cluster lock disks should there be a network failure. Application data is mirrored between the two primary datacenters and you must ensure that the mirrored copies are in different datacenters.

Three-Datacenter Extended Distance Cluster

The three-datacenter solution is architecturally very similar to the two-datacenter solution. The third site serves as a tie-breaker in case connectivity is lost between the other two sites. Figure 13-20 shows the layout of a three-site cluster.

Figure 13-20. A Three-Site Extended Distance Serviceguard Cluster

The third site can either run two nodes that are part of the cluster, called arbitrator nodes, or a quorum server. The two arbitrator nodes are part of the cluster but can't be sharing the disks in either of the primary datacenters. They are cluster members that are not running any packages or they can be running packages that failover locally but not to either of the other datacenters. Or the third site could be running a quorum server. This brings several advantagesmore nodes and lower overhead. Since the arbitrator nodes are cluster members that can't be running any packages, the maximum size of the cluster would be 14, or seven nodes in each datacenter. In addition, the overhead of the cluster quorum server is quite low and could be run on a small Linux server or a server that is running other workloads.

Metrocluster

The main difference between an Extended Distance Cluster and a Metrocluster is that the data replication between the two primary sites is handled by an EMC Symmetrix, HP XP, or EVA disk array. Figure 13-21 provides an example architecture for a Metrocluster.

Figure 13-21. An Example Three-Site Serviceguard Metrocluster

Notice in this figure the existence of the CA/SRDF Link is the primary difference between this architecture and the Extended Distance Cluster.

There are two Metrocluster products, Metrocluster/CA for HP disk arrays and Metrocluster/SRDF for EMC. The "CA" in "Metrocluster/CA" stands for "continuous access," which is a data replication product available with the XP and EVA disk arrays from HP. The "SRDF" in "Metrocluster/SRDF" stands for "Symmetrix remote data facility," which is the data-replication product available from EMC for the EMC Symmetrix disk arrays.

The Metrocluster/CA product is a set of scripts and utilities that simplifies the integration of the CA functionality with the Extended Distance Cluster. The Metrocluster/SRDF product is a similar set of utilities designed to assist in integrating the EMC SRDF product.

For more information on how to set up either of these products, see the "Designing Disaster Tolerant High Availability Clusters" document available on http://docs.hp.com.

Continentalclusters

Datacenters that are more than 100 kilometers away require a very different architecture. The primary difference between an Extended Distance Cluster and a Continental Cluster is that with the Extended Distance Cluster you have a single cluster with nodes in multiple datacenters. With the Continentalcluster there are two separate clusters. The solution is to allow one cluster to take over operation of the critical packages in the other cluster in the event of a disaster that takes down an entire cluster. The Continentalclusters product is a set of utilities to monitor geographically remote clusters and a command to start critical packages on the recovery cluster if the primary cluster is lost.

An example of the architecture of Continentalclusters is shown in Figure 13-22.

Figure 13-22. An Example Serviceguard Continental Cluster

As you can see in Figure 13-22, there are two distinct clusters running in separate data-centers connected by a wide-area network (WAN). Although the figure shows an active-standby configuration, it is not necessary for the recovery cluster to be idle. A supported configuration is where both clusters are running packages under normal circumstances and each cluster is monitoring the health of the other cluster. Continentalclusters use physical or logical data replication using disk arrays, just like Metroclusters.

Because of the higher likelihood that a spurious network error will cause a false alarm, Continentalclusters does not automate the failover. However, the product does provide a single command to initiate the failover manually. The process of failover would include:

Continentalclusters detects a failure of the remote cluster: Each cluster monitors the health of the other cluster and generates alarms when a failure occurs.
Continentalclusters sends a notification of the failure. This can be done through a text log, e-mail, an SNMP trap, or an OPCmsg (for OpenView Operations).
You verify that the cluster has failed. It is critical that you ensure that the applications are not running on both clusters at the same time. This would involve contacting the WAN service provider, speaking with operation staff at the primary site, and discussing options with application owners.
You issue the recovery command. The cmrecovercl command will halt the data replication between the clusters and start all the recovery packages on the local cluster.

One thing to consider with Continentalclusters is that because the two clusters are physically distinct and managed separately, you will need to have administration processes in place to ensure that the versions of the applications on the primary and recovery clusters are kept in sync so that when a recovery is required, there aren't any surprises. This is true of all high-availability clusters, but it may be more of a challenge in this case because of the distance between the datacenters supported by Continentalclusters.