High availability (HA) is the term that is used to describe systems that run and are available to customers more or less all the time.
Failover protection can be achieved by keeping a copy of your database on another machine that is perpetually rolling the log files forward. Log shipping is the process of copying whole log files to a standby machine, either from an archive device or through a user exit program running against the primary database. With this approach, the primary database is restored to the standby machine, using either the DB2 restore utility or the split mirror function. You can use the new suspended I/O support to initialize the new database quickly. The secondary database on the standby machine continuously rolls the log files forward.
If the primary database fails, any remaining log files are copied over to the standby machine. After a rollforward to the end of the logs and stop operation, all clients are reconnected to the secondary database on the standby machine.
Failover strategies are usually based on clusters of systems. A cluster is a group of connected systems that work together as a single system. Clustering allows servers to back each other up when failures occur by picking up the workload of the failed server.
IP address takeover (or IP takeover) is the ability to transfer a server IP address from one machine to another when a server goes down; to a client application, the two machines appear at different times to be the same server.
Failover software may use heartbeat monitoring or keepalive packets between systems to confirm availability. Heartbeat monitoring involves system services that maintain constant communication between all the servers in a cluster. If a heartbeat is not detected , failover to a backup system starts. End users are usually not aware that a system has failed.
For clarification and consistency with the naming convention throughout the book, a database node is now called a database partition , and when referencing a node name in the cluster, we refer to it as a server .
The two most common failover strategies on the market are known as idle standby and mutual takeover, although the configurations associated with these terms may also be associated with different terms that depend on the vendor.
In this configuration, one system is used to run a DB2 instance, and the second system is idle , or in standby mode, ready to take over the instance if there is an operating system or hardware failure involving the first system. Overall system performance is not impacted, because the standby system is idle until needed.
In this configuration, each system is the designated backup for another system. Overall system performance may be impacted, because the backup system must do extra work following a failover: It must do its own work plus the work that was being done by the failed system.
Failover strategies can be used to failover an instance, a partition, or multiple database partitions.
When designing and testing a cluster:
In order to implement a split mirror scenario with DB2 Universal Database (UDB) Enterprise Server Edition, it is very important to understand the following three concepts.
Split mirror is an identical and independent copy of disk volumes that can be attached to a different system and can be used in various ways, e.g., to populate a test system, as a warm standby copy of the database, and to offload backups from the primary machine.
A split mirror of a database includes the entire contents of the database directory, all the table space containers, the local database directory, and the active log directory, if it does not reside on the database directory. The active log directory needs to be split only for creating a clone database using the "snapshot" option of the "db2inidb" tool.
Suspend I/O Feature
When splitting the mirror, it is important to ensure that there is no page write occurring on the source database. One way to ensure this is to bring the database offline. But, due to the required downtime, this method is not a feasible solution in a true 24x7 production environment.
In an effort to provide continuous system availability during the split mirror process, DB2 UDB Enterprise Server Edition (ESE) provides a feature known as suspend I/O, which allows online split mirroring without shutting down the database. The suspend I/O feature ensures the prevention of any partial page write by suspending all write operations on the source database. While the database is in write suspend mode, all of the table space states change to a new state SUSPEND_WRITE, and all operations function normally.
However, some transactions may wait if they require disk I/O, such as flushing dirty pages from the buffer pool or flushing logs from the log buffer. These transactions will proceed normally, once the write operations on the database are resumed. The following command is used to suspend or resume write operations on the source database:
db2 set write <suspend resume> for database
The db2inidb Tool
The split mirror created using the suspend I/O feature continues to stay in a write-suspend mode until it is initialized to a useable state. To initialize the split mirror, you can invoke the db2inidb tool.
This tool can either perform a crash recovery on a split mirror image or can put it in a rollforward pending state, depending on the options provided in the db2inidb command, the syntax of which is as follows :
db2inidb <database_alias> as < snapshot standby mirror >[ relocate using <config_file> ]
The snapshot option clones the primary database to offload work from the source database, such as running reports , analysis, or populating a target system.
The standby option continues rolling forward through the log, and even new logs that are created by the source database are constantly fetched from the source system.
The mirror uses the mirrored system as a backup image to restore over the source system.
The relocate option allows the split mirror to be relocated in terms of the database name, database directory path, container path, log path , and the instance name associated with the database.
Common Usage of Suspend I/O and db2inidb
The combination of the suspend I/O feature and the db2inidb tool is necessary to bring the split mirror database into a functional state. With the functionalities of the three options (snapshot, standby, mirror) provided in the db2inidb tool, in conjunction with the suspend I/O feature, it is possible to create a fast snapshot of a database, which can be used to:
The suspend I/O feature is necessary to ensure that all DB2 data gets written out to the disk consistently (no partial page write) before splitting the mirror. This assures a well-defined state where the database can be recovered to later, using the db2inidb tool.
The db2inidb tool can either force the database to perform a crash recovery (when the snapshot option is specified) or put the database into a rollforward pending state (when the standby or mirror option is specified) to allow processing of additional log files.
High Availability through Log Shipping
Log shipping is the process of copying whole log files to a standby machine, either from an archive device or through a user exit program running against the primary database. The standby database is continuously rolling forward through the log files produced by the production machine. When the production machine fails, a switch over occurs, and the following takes place:
The standby machine has its own resources (i.e., disks) but must have the same physical and logical definitions as the production database. When using this approach, the primary database is restored to the standby machine by using the restore utility or the split mirror function.
To ensure that you are able to recover your database in a disaster recovery situation, consider the following:
High Availability through Online Split Mirror and Suspended I/O Support
Suspended I/O supports continuous system availability by providing a full implementation for online split mirror handling; that is, splitting a mirror without shutting down the database. A split mirror is an "instantaneous" copy of the database that can be made by mirroring the disks containing the data and splitting the mirror when a copy is required. Disk mirroring is the process of writing all of your data to two separate hard disks; one is the mirror of the other. Splitting a mirror is the process of separating the primary and secondary copies of the database.
If you would rather not back up a large database using the DB2 backup utility, you can make copies from a mirrored image by using suspended I/O and the split mirror function. This approach also:
The db2inidb command initializes the split mirror so that it can be used:
In a partitioned database environment, you do not have to suspend I/O writes on all partitions simultaneously . You can suspend a subset of one or more partitions to create split mirrors for performing offline backups. If the catalog partition is included in the subset, it must be the last partition to be suspended.
In a partitioned database environment, the db2inidb command must be run on every partition before the split image from any of the partitions can be used. The tool can be run on all partitions simultaneously, using the db2_all command.
Ensure that the split mirror contains all containers and directories that comprise the database, including the volume directory (each autonumbered directory within a volume).
Split Mirror to Clone a Database
Clone the primary database to offload work from source database, such as running reports, analysis, or populating a target system.
The following scenario shows how to create a clone database on the target system, using the suspend I/O feature. In this scenario, the split mirror database goes through a crash recovery initiated by the db2inidb tool with the snapshot parameter. A clone database generated in this manner can be used to populate a test database or to generate reports. Due to crash recovery, the clone database will start a new log chain; therefore, it will not be able to replay any future log files from the source database. A database backup taken from this clone database can be restored to the source database. However, it will not be able to roll forward through any log records generated after the database was split. Thus, it will be a version-level copy only.
Split Mirror as a Standby Database
Continue to roll forward through the logs and even new logs that are created by the source database are constantly fetched from the source system.
The following scenario shows how to create a standby database on the target system, using the suspend I/O feature. In a warm standby database scenario, the log files of the source database will be applied on the target (standby) database. The standby database will be kept in a rollforward pending state until the rollforward has been stopped . A DB2 backup image taken on the clone database (DMS only) can be used for restoring on the source database for the purpose of performing a rollforward recovery by using the log files produced on the source database after the mirror was split. Please see the following steps:
Split Mirror as a Backup Image
Use the mirrored system as a backup image to restore over the source system.
The following scenario shows how to create a mirror database on the target system, using the suspend I/O feature. The purpose of this option is to provide the possibility of using a split mirror database for restoring on top of the source database, then to roll forward the log files of the source database. It is important to note that the split mirror must remain in the SUSPEND_WRITE state until it has been copied over on top of the source database.
Restore the Split Mirror Image
There is no "target" database in this scenario. The intent of this scenario is to use the mirror copy to restore on top of the "source" database to recover from a disk failure. The split mirror cannot be backed up using the DB2 backup utility, but it can be backed up using operating system tools. If the source database happens to crash, it can be restored with the split mirror image by copying it on top of the source database. Please see the following steps:
High Availability on AIX
Enhanced scalability (ES) is a feature of High Availability Cluster Multi-Processing (HACMP) for AIX. This feature provides the same failover recovery and has the same event structure as HACMP. Enhanced scalability also has other provisions:
The servers in HACMP ES clusters exchange messages called heartbeats or keepalive packets , by which each server informs the other server about its availability. A server that has stopped responding causes the remaining servers in the cluster to invoke recovery. The recovery process is called a server-down-event and may also be referred to as failover . The completion of the recovery process is followed by the reintegration of the server into the cluster. This is called a server-up-event .
There are two types of events: standard events that are anticipated within the operations of HACMP ES and user-defined events that are associated with the monitoring of parameters in hardware and software components. One of the standard events is the server-down-event . When planning what should be done as part of the recovery process, HACMP allows two failover options: hot (or idle) standby and mutual takeover.
When using HACMP, ensure that DB2 instances are not started at boot time by using the db2iauto utility, as follows:
db2iauto off InstName
where InstName is the login name of the instance.
In a hot-standby configuration, the AIX server that is the takeover server is not running any other workload. In a mutual takeover configuration, the AIX server that is the takeover server is running other workloads.
Generally , in a partitioned database environment, DB2 UDB runs in mutual takeover mode with multiple database partitions on each server. One exception is a scenario in which the catalog partition is part of a hot-standby configuration.
When planning a large DB2 installation on an RS/6000 SP using HACMP ES, you need to consider how to divide the servers of the cluster within or between the RS/6000 SP frames. Having a server and its backup in different SP frames allows takeover in the event that one frame goes down (that is, the frame power/switch board fails). However, such failures are expected to be exceedingly rare because there are N+1 power supplies in each SP frame, and each SP switch has redundant paths, along with N+1 fans and power. In the case of a frame failure, manual intervention may be required to recover the remaining frames. This recovery procedure is documented in the SP Administration Guide. HACMP ES provides for recovery of SP server failures; recovery of frame failures is dependent on the proper layout of clusters within one or more SP frames .
Another planning consideration is how to manage big clusters. It is easier to manage a small cluster than a big one; however, it is also easier to manage one big cluster than many smaller ones. When planning, consider how your applications will be used in your cluster environment. If there is a single, large, homogeneous application running, for example, on 16 servers, it is probably easier to manage the configuration as a single cluster, rather than as eight two-server clusters. If the same 16 servers contain many different applications with different networks, disks, and server relationships, it is probably better to group the servers into smaller clusters. Keep in mind that servers integrate into an HACMP cluster one at a time; it will be faster to start a configuration of multiple clusters, rather than one large cluster. HACMP ES supports both single and multiple clusters, as long as a server and its backup are in the same cluster.
HACMP ES failover recovery allows predefined (also known as cascading ) assignment of a resource group to a physical server. The failover recovery procedure also allows floating (or rotating) assignment of a resource group to a physical server. IP addresses and external disk volume groups, file systems, or NFS file systems, as well as application servers within each resource group specify either an application or an application component, which can be manipulated by HACMP ES between physical servers by failover and reintegration. Failover and reintegration behavior is specified by the type of resource group created and by the number of servers placed in the resource group.
For example, consider a partitioned database environment, if its log and table space containers were placed on external disks and other servers were linked to those disks, it would be possible for those other servers to access these disks and to restart the database partition (on a takeover server). It is this type of operation that is automated by HACMP. HACMP ES can also be used to recover NFS file systems used by DB2 instance main user directories.
Read the HACMP ES documentation thoroughly as part of your planning for recovery with DB2 UDB in a partitioned database environment. You should read the Concepts, Planning, Installation, and Administration guides, then build the recovery architecture for your environment. For each subsystem that you have identified for recovery, based on known points of failure, identify the HACMP clusters that you need, as well as the recovery servers (either hot standby or mutual takeover).
It is strongly recommended that both disks and adapters be mirrored in your external disk configuration. For DB2 servers that are configured for HACMP, care is required to ensure that servers on the volume group can vary from the shared external disks. In a mutual takeover configuration, this arrangement requires some additional planning, so that the paired servers can access each other's volume groups without conflicts. In a partitioned database environment, this means that all container names must be unique across all databases.
One way to achieve uniqueness is to include the partition number as part of the name. You can specify a database partition expression for container string syntax when creating either SMS or DMS containers. When you specify the expression, the database partition number can be part of the container name or, if you specify additional arguments, the results of those arguments can be part of the container name. Use the argument $N ([blank]$N) to indicate the database partition expression. The argument must occur at the end of the container string.
Following are some examples of how to create containers using this special argument:
The following containers would be used:
CREATE TABLESPACE TS1 MANAGED BY DATABASE USING (device '/dev/rcont $N' 20000) /dev/rcont0on DATABASE PARTITION 0 /dev/rcont1on DATABASE PARTITION 1
The following containers would be used:
[View full width]
The following containers would be used:
CREATE TABLESPACE TS3 MANAGED BY SYSTEM USING ('/TS3/cont $N%2, '/TS3/cont $N%2+2') /TS3/cont0on DATABASE PARTITION 0 /TS3/cont2on DATABASE PARTITION 0 /TS3/cont1on DATABASE PARTITION 1 /TS3/cont3on DATABASE PARTITION 1
A script file, rc.db2pe, is packaged with DB2 UDB Enterprise Server Edition (and installed on each server in /usr/bin) to assist in configuring for HACMP ES failover or recovery in either hot standby or mutual takeover servers. In addition, DB2 buffer pool sizes can be customized during failover in mutual takeover configurations from within rc.db2pe. Buffer pool sizes can be configured to ensure proper resource allocation when two database partitions run on one physical server.
HACMP ES Event Monitoring and User-Defined Events
Initiating a failover operation if a process dies on a given server is an example of a user-defined event. Examples that illustrate user-defined events, such as shutting down a database partition and forcing a transaction abort to free paging space, can be found in the sqllib/samples/hacmp/es subdirectory.
A rules file, /usr/sbin/cluster/events/rules.hacmprd, contains HACMP events. Each event description in this file has the following nine components:
Each object requires one line in the event definition, even if the line is not used. If these lines are removed, HACMP ES Cluster Manager cannot parse the event definition properly, and this may cause the system to hang. Any line beginning with "#" is treated as a comment line.
The rules file requires exactly nine lines for each event definition, not counting any comment lines. When adding a user-defined event at the bottom of the rules file, it is important to remove the unnecessary empty line at the end of the file, or the server will hang.
HACMP ES uses PSSP event detection to treat user-defined events. The PSSP Event Management subsystem provides comprehensive event detection by monitoring various hardware and software resources.
The process can be summarized as follows:
In Figure 3.1, both servers have access to the installation directory, the instance directory, and the database directory. The database instance db2inst is being actively executed on server 1. Server 2 is not active and is being used as a hot standby. A failure occurs on server 1, and the instance is taken over by server 2. Once the failover is complete, both remote and local applications can access the database within instance db2inst. The database will have to be manually restarted or, if AUTORESTART is on, the first connection to the database will initiate a restart operation. In the sample script provided, it is assumed that AUTORESTART is off and that the failover script performs the restart for the database.
Figure 3.1. Failover on a two-server HACMP cluster.
Partition Failover (Hot Standby)
In the following hot-standby failover scenario, we are using an instance partition instead of the entire instance. The scenario includes a two-server HACMP cluster as in the previous example, but the machine represents one of the partitions of a partitioned database server. Server 1 is running a single partition of the overall configuration, and server 2 is being used as the failover server. When server 1 fails, the partition is restarted on the second server. The failover updates the db2nodes.cfg file, pointing to server 2's host name and net name, then restarts the partition on the new server.
Following is a portion of the db2nodes.cfg file, both before and after the failover. In this example, database partition number 2 is running on server 1 of the HACMP machine, which has both a host name and a net name of srvr201 . The server 2 srvr202 is running as a hot standby, ready to take over the execution of the partition if there is a failure on srvr201 . After the failover, database partition number 2 is running on server 2 of the HACMP machine, which has both a host name and a net name of srvr202 .
Before: 1 srvr101 0 srvr101 2 srvr201 0 srvr201 <= HACMP running on primary server db2start dbpartitionnum 2 restart hostname srvr202 port 0 netname srvr202 After: 1 srvr101 0 srvr101 2 srvr202 0 srvr202 <= HACMP running on standby server
Multiple Logical Partition Database Failover
A more complex variation on the previous example involves the failover of multiple logical partition databases from one server to another. Again, we are using the same two server HACMP cluster configuration as above. However, in this scenario, server 1, srvr201 , is actively running three logical database partitions while server 2, srvr202, is running as a hot standby, ready to take over the execution of the partition if there is a failure on srvr201 . The setup is the same as that for the simple database partition failover scenario but in this case, when server 1 fails, each of the logical database partitions must be started on server 2. It is critical that each logical database partition must be started in the order that is defined in the db2nodes.cfg file: The logical database partition with port number 0 must always be started first.
Following is a portion of the db2nodes.cfg file, both before and after the failover. In this example, there are three logical database partitions defined on server 1 of a two-server HACMP cluster. After the failover, database partitions 2, 3, and 4 are running on server 2 of the HACMP machine, which has both a host name and a net name of srvr202 .
Before: 1 srvr101 0 srvr101 2 srvr201 0 srvr201 <= HACMP running on the primary server 3 srvr201 1 srvr201 <= HACMP 4 srvr201 2 srvr201 <= HACMP db2start dbpartitionnum 2 restart hostname srvr202 port 0 netname srvr202 db2start dbpartitionnum 3 restart hostname srvr202 port 1 netname srvr202 db2start dbpartitionnum 4 restart hostname srvr202 port 2 netname srvr202 After: 1 srvr101 0 srvr101 2 srvr202 0 srvr202 <= HACMP running on the standby server 3 srvr202 1 srvr202 <= HACMP 4 srvr202 2 srvr202 <= HACMP
Partition Failover (Mutual Takeover)
In this example, we are running two of the partitions of a multi-partitioned database system on the two separate servers of an HACMP configuration. The database partition for each server is created on the path /db2, which is not shared with other partitions. The following is the contents of the db2nodes.cfg file associated with the overall multi-partition instance before and after the failover. The srvr201 crashes and fails over to srvr202 . After the failover, the database partition that was executing on srvr201 , which is defined as database partition number 2, starts up on srvr202 . Because srvr202 is already running a database partition number 3 for this database, therefore, database partition number 2 will be started as a logical database partition number on srvr202 with the logical port 1.
Before: 1 srvr101 0 srvr101 2 srvr201 0 srvr201 <= HACMP failover server 3 srvr202 0 srvr202 <= HACMP db2start dbpartitionnum 2 restart hostname srvr202 port 1 netname srvr202 After: 1 srvr101 0 srvr101 2 srvr202 1 srvr202 <= srvr201 failover to srvr202 3 srvr202 0 srvr202 <= HACMP
Scenario #1: Hot Standby with a Cascading Resource Group
In this HACMP configuration (hot standby with a cascading resource group), we use HACMP/ES 4.3 and DB2 UDB Enterprise Server Edition running on AIX 4.3.3. The cluster being defined is called dbcluster . This cluster has two servers ( dbserv1 and dbserv2 ), one resource group ( db2grp ), and one application server ( db2as ). Because we want the resource group and the application server to be active on the dbserv1 server when there are no failovers, we will define the dbserv1 server in the resource group first. Each of these servers will have two network adapters and one serial port. The servers will have a shared external disk, with only one server accessing the disk at a time. Both servers will have access to a volume group ( havg ), three file systems (/home/db2inst1, /db1, and /home/db2fenc1), and a logical volume (/dev/udbdata).
If the dbserv1 server has a hardware or software failure, the dbserv2 server will acquire the resources that are defined in the resource group. The application server is then started on the dbserv2 server. In our case, the application server that is started is DB2 UDB ESE for the instance db2inst1 . There are failures that would not cause the application to move to the dbserv2 server; these include a disk failure or a network adapter failure.
Here is one example of a failover: DB2 UDB ESE is running on a server called dbserv1 ; it has a home directory of /home/db2inst1, a database located on the /db1 file system, and a /dev/udbdata logical volume. These two file systems and logical volume are in a volume group called havg . The dbserv2 server is currently not running any application except HACMP, but it is ready to take over from the dbserv1 server, if necessary. Suppose someone unplugs the dbserv1 server. The dbserv2 server detects this event and begins taking over resources from the dbserv1 server. These resources include the havg volume group, the three file systems, the logical volume, and the hostname dbs1.
Once the resources are available on the dbserv2 server, the application server start script runs. The instance ID can log on to the dbserv2 server (now called dbs1 ) and connect to the database. Remote clients can also connect to the database, because the hostname dbs1 is now located on the db2serv2 server.
Follow these steps to set up shared disk drives and the logical volume manager:
User Setup and DB2 Installation
Now that the components of the LVM are set up, DB2 can be installed. The db2setup utility can be used to install and configure DB2. To understand the configuration better, we will define some of the components manually and use the db2setup utility to install only the DB2 product and license.
All commands described in this chapter must be invoked by the root user. Although the steps used to install DB2 are outlined below, for complete details, please refer to the DB2 for UNIX Quick Beginnings guide and the DB2 Client Installation Guide.
Before any groups or IDs are created, ensure that the volume group is activated and that the file systems /home/db2inst1 and /home/db2fenc1 are mounted.
To install and configure DB2 on the dbserv1 server:
Once DB2 HACMP is configured and set up, any changes made (for example, to the IDs, groups, AIX system parameters, or the level of DB2 code) must be done on both servers.
Following are some examples:
The testing procedure itself is simple. First, connect to the cluster from a client machine; next , cause one of the points of failure to fail; then watch to ensure that the failover takes place properly to make sure that the application is available and properly configured after failover. If the cluster is built using a cascading cluster configuration, check again after service has been restored to the original server. If the cluster is built using a rotating cluster configuration, bring up the original server again, then cause the second server to fail, which should restore the system to its original server.
When testing the availability of the application, be sure that accounts and passwords work as expected, that hostnames and IP addresses work as expected, that the data is complete and up to date, and that the hangover is essentially transparent to the user.
Configure a remote machine to be able to connect to the highly available DB2 UDB database. A script can be easily written that will connect to our database, select some data from a table, record the results, and disconnect from the database. If these steps are set inside a loop that will run until interrupted by the operator, the procedure can be used to monitor the state of the cluster.
Keep in mind that the script should continue even if the database cannot be contacted. This way, when the database restarts, it will provide a benchmark for the length of time failover is expected to take. Here is a brief sample script that may be useful for testing an HACMP cluster:
while true do db2 connect to database db2 "select count(*) from syscat.tables" db2 terminate sleep 60 done
Scenario #2: Mutual Takeover with a Cascading Resource Group
This configuration involves a six-database partitions and two clusters, each with mutual takeover and cascading resource groups. It uses HACMP/ES 4.3 and DB2 UDB ESE running on AIX 4.3.3.
The clusters being defined are named cl1314 and cl1516 , with cluster IDs of 1314 and 1516, respectively. We arbitrarily selected these numbers because we are using SP servers 13,14,15, and 16.
The servers within a cluster will have a shared external disk (Table 3.1).
Table 3.1. Scenario #2 Configuration
In the initial target configuration, the db2nodes.cfg will have the following entries:
130 b_sw_013 0 b_sw_013 131 b_sw_013 1 b_sw_013 140 b_sw_014 0 b_sw_014 150 b_sw_015 0 b_sw_015 160 b_sw_016 0 b_sw_016 161 b_sw_016 1 b_sw_016
If one of the two servers within the cluster (for example, cl1314 ) has a failure, the other server in the cluster will acquire the resources that are defined in the resource group. The application server is then started on the server that has taken over the resource group. In our case, the application server that is started is DB2 UDB ESE for the instance svtha1 .
In our example of a failover, DB2 UDB ESE is running on a server clsrv13 ; it has an NFS mounted home directory and a database located on the /db1ha/svtha1/SRV130 and /db1ha/svtha1/SRV131 file systems. This file system is in a volume group called havg1314 . The clsrv14 server is currently running DB2 for partition 140 and is ready to take over from the clsrv13 server, if necessary. Suppose someone unplugs the clsrv13 server.
The clsrv14 server detects this event and begins taking over resources from the clsrv13 server. These resources include the havg1314 volume group, the file system, and the hostname swserv13. Once the resources are available on the clsrv14 server, the application server start script runs. The instance ID can log on to the clsrv14 server (now with an additional hostname swserv13 ) and can connect to the database. Remote clients can also connect to the database, because the hostname swserv13 is now located on the clsrv14 server.
User Setup and DB2 Installation
Now that the components of the LVM are set up, DB2 can be installed. The db2setup utility can be used to install and configure DB2. To illustrate the configuration better, we will define some of the components manually and will use the db2setup utility to install only the DB2 product and license.
All commands described in this chapter must be invoked by the root user. Although the steps used to install DB2 are outlined below, for complete details, refer to the DB2 UDB ESE for UNIX Quick Beginnings guide and to the DB2 UDB ESE Installation and Configuration Supplement guide.
Before running db2icrt , make sure that the $HOME directory for the instance is available and the svtha1 id can write to the directory. Also make sure that a .profile file exists, because db2icrt will append to the file but will not create a new one.
For this example, we are using the svtha1 id that already exists on the SP complex.
High Availability on the Windows Operating System
Microsoft Cluster Service (MSCS) is a feature of Windows NT Server, Windows 2000 Server, and Windows .NET Server operating systems. It is the software that supports the connection of two servers (up to four servers in DataCenter Server) into a cluster for high availability and easier management of data and applications.
MSCS can also automatically detect and recover from server or application failures. It can be used to move server workloads to balance machine utilization and to provide for planned maintenance without downtime.
DB2 MSCS Components
A cluster is a configuration of two or more servers, each of which is an independent computer system. The cluster appears to network clients as a single server.
The servers in an MSCS cluster are connected using one or more shared storage buses and one or more physically independent networks. A network that connects only the servers but does not connect the clients to the cluster is referred to as a private network . The network that supports client connections is referred to as the public network. There are one or more local disks on each server. Each shared storage bus attaches to one or more disks. Each disk on the shared bus is owned by only one server of the cluster at a time. The DB2 software resides on the local disk. DB2 database files (tables, indexes, log files, etc.) reside on the shared disks. Because MSCS does not support the use of raw partitions in a cluster, it is not possible to configure DB2 to use raw devices in an MSCS environment.
The DB2 Resource
In an MSCS environment, a resource is an entity that is managed by the clustering software. For example, a disk, an IP address, or a generic service can be managed as a resource. DB2 integrates with MSCS by creating its own resource type called DB2 . Each DB2 resource manages a DB2 instance and when running in a partitioned database environment, each DB2 resource manages a database partition. The name of the DB2 resource is the instance name, although in the case of a partitioned database environment, the name of the DB2 resource consists of both the instance name and the partition number.
Pre-Online and Post-Online Scripts
You can run scripts both before and after a DB2 resource is brought online. These scripts are referred to as pre-online and post-online scripts . Pre-online and post-online scripts are .BAT files that can run DB2 and system commands.
In a situation when multiple instances of DB2 may be running on the same machine, you can use the pre-online and post-online scripts to adjust the configuration so that both instances can be started successfully. In the event of a failover, you can use the post-online script to perform manual database recovery. Post-online scripts can also be used to start any applications or services that depend on DB2.
The DB2 Group
Related or dependent resources are organized into resource groups. All resources in a group move between cluster servers as a unit. For example, in a typical DB2 single-partition cluster environment, there will be a DB2 group that contains the following resources:
The DB2 resource is configured to depend on all other resources in the same group, so the DB2 server can be started only after all other resources are online.
Two types of configuration are available:
In a partitioned database environment, the clusters do not all have to have the same type of configuration. You can have some clusters that are set up to use hot standby and others that are set up for mutual takeover. For example, if your DB2 instance consists of five workstations, you can have two machines set up to use a mutual takeover configuration, two to use a hot-standby configuration, and one machine not configured for failover support.
Hot standby configuration
In a hot standby configuration, one machine in the MSCS cluster provides dedicated failover support, and the other machine participates in the database system. If the machine participating in the database system fails, the database server on it will be started on the failover machine. If, in a partitioned database system, you are running multiple logical database partitions on a machine and it fails, the logical database partitions will be started on the failover machine.
Mutual takeover configuration
In a mutual takeover configuration, both workstations participate in the database system (i.e., each machine has at least one database server running on it). If one of the workstations in the MSCS cluster fails, the database server on the failing machine will be started to run on the other machine. In a mutual takeover configuration, a database server on one machine can fail independently of the database server on another machine.
Clustered Servers for High Availability
In this example, we define a dbclust cluster. The members of the cluster are serv1 and serv2. Clients communicate over the public network to the cluster through the IP address assigned to the cluster's host name. The cluster's host name can be assigned to only one member of the cluster at any given time but can move to any member of the cluster. The shared storage is accessible to all members in the cluster but can be assigned to only one member server at any given time. The member servers use a private network to check on the vitality of other members in the cluster (it is called the server's heartbeat ).
The basic components consist of two servers that establish a cluster when MSCS software is installed and configured on both servers. Prior to installing the MSCS software, these two servers must be able to communicate with each other over a network.
Highly recommended is a dedicated private network between the two servers that can be used to communicate their heartbeats without interference from traffic on the public network. Both servers must also have access to shared storage.
The MSCS software configuration process will create several default cluster resource types. These include a network name, an IP address, and at least one physical disk that is referred to as the quorum drive and usually assigned the Q: drive letter. The network name represents the cluster's host name that is registered in DNS and assigned the IP address. The primary purpose of the cluster's network name is to manage the cluster by a DNS name.
Before Installing Microsoft Cluster Service
Prior to installing and configuring the MSCS software, there are a number of pre-installation tasks that need to be addressed.
Installing Microsoft Cluster Service
If the MSCS was installed as part of the initial operating system load, you can start the Cluster Service Configuration Wizard by selecting the Control Panel, Add/Remove Programs, Add/Remove Windows Components, Configure.
If the MSCS was not installed as part of the initial operating system load, you can install it by selecting Control Panel, Add/Remove Programs, Add/Remove Windows Components, Components, and the Cluster Service Configuration Wizard will start as part of this installation process.
Tasks to be performed during the installation and configuration using the Cluster Service Configuration Wizard:
As each individual node is added to the cluster, it will appear within the MSCS Cluster Administrator. We can see that both serv1 and serv2 are now members of the cluster dbclust from the left panel.
After Installing Microsoft Cluster Service
Once the MSCS software has been installed on all servers within the cluster, we need to perform post-install tasks to verify that everything is in working order. To prepare for these tasks, we will consolidate all of the resources into one group and rename the clusters quorum drive from Disk Q: to a more meaningful name.
The following is a list of tests that can be performed to verify that the Cluster Service is working properly.
Logon to the first server in the cluster, verify that the Cluster Group is currently online at this server, and open a Windows command prompt. Verify that you can ping the Cluster Group by IP address and name. Verify that you can access the quorum drive (Disk Q:). Move the Cluster Group to another member in the cluster and repeat.
Logon to a server that is not a member of the cluster, verify that the Cluster Group is currently online at the primary server, and open a Windows command prompt. Verify that you can ping the Cluster Group by IP address and name. Verify that you can access the quorum drive. Do this while moving the Cluster Group from one member of the cluster to another.
Logon to a client that will use the resources of this cluster, verify that the Cluster Group is currently online at the primary server, and open a Windows command prompt. Verify that you can ping the Cluster Group by IP Address and Cluster Name. Verify that you can access the quorum drive.
Before Enabling DB2 MSCS Support
There are tasks that should be performed prior to enabling DB2 UDB HA support with MSCS.
Enabling DB2 MSCS Support
Enabling DB2 MSCS support includes the following:
High Availability on Sun Solaris
Although there are a number of methods to increase availability for a data service, the most common is an HA cluster. A cluster, when used for HA, consists of two or more machines, a set of private network interfaces, one or more public network interfaces, and some shared disks. This special configuration allows a data service to be moved from one machine to another. By moving the data service to another machine in the cluster, it should be able to continue providing access to its data. Moving a data service from one machine to another is called a failover .
The private network interfaces are used to send heartbeat messages, as well as control messages, among the machines in the cluster. The public network interfaces are used to communicate directly with clients of the HA cluster. The disks in an HA cluster are connected to two or more machines in the cluster, so that if one machine fails, another machine has access to them.
A data service running on an HA cluster has one or more logical public network interfaces and a set of disks associated with it. The clients of an HA data service connect via TCP/IP to the logical network interfaces of the data service only. If a failover occurs, the data service, along with its logical network interfaces and set of disks, are moved to another machine.
One of the benefits of an HA cluster is that a data service can recover without the aid of support staff, and it can do so at any time. Another benefit is redundancy. All of the parts in the cluster should be redundant, including the machines themselves . The cluster should be able to survive any single point of failure.
Even though HA data services can be very different in nature, they have some common requirements. Clients of an HA data service expect the network address and host name of the data service to remain the same and expect to be able to make requests in the same way, regardless of which machine the data service is on.
Consider a Web browser that is accessing an HA Web server. The request is issued with a URL (Uniform Resource Locator), which contains both a host name and the path to a file on the Web server. The browser expects both the host name and the path to remain the same after a failover of the Web server. If the browser is downloading a file from the Web server and the server is failed over, the browser will need to reissue the request.
Availability of a data service is measured by the amount of time the data service is available to its users. The most common unit of measurement for availability is the percentage of "up time"; this is often referred to as the number of nines :
Hot standby is the simplest HA cluster topology. In this scenario, the primary machine is hosting the production database instance and associated resources. A second idle machine is available to host the production database instance and associated resources, should a failure occur on the primary machine. The second machine can also be running a workload (perhaps another DB2 instance) in order to maximize resource use.
In the mutual takeover case, you envision a cluster of N servers as N /2 pairs of servers. Server number N is responsible for failover support of server number N +1; server number N +2 is responsible for failover support of server number N +3; and so on until you reach the N th server.
Note that this scenario requires that N be an even number.
The advantage of this configuration is that in the normal (non-failure) case, all machines are hosting database resources and are performing productive work. The primary disadvantage is that, during the failure period (the period after one of the hardware resources has failed and before its repair), there is one server that is required to support, on average, twice the workload of any other physical server.
Mutual takeover ( N + 1)
Single defined server serves as standby . This case relies on an N server cluster, with one defined server as the standby for all N servers. The advantage of this scenario is that there is no performance degradation during the failure (the period after one of the hardware resources has failed and before its repair). The primary disadvantage is that approximately 1 / ( N + 1) of the aggregate physical computing resource goes unused during the normal operation.
Pair + M (N + M)
M defined servers serve as the hot standby for each N server . This case relies on an N server cluster, with M defined servers as the hot standby for each of the N servers. Essentially, this is the default cluster topology configured by the regdb2udb, where N is equal to the number of physical servers in the cluster and M is equal to N 1. The prime advantage of this configuration is that the environment is fully redundant; up to N 1 server failures can be tolerated while still maintaining full database access (subject, of course, to increased query response times due to capacity constraints when there are fewer than N servers in the cluster). In this way, DB2 UDB ESE, used in conjunction with Sun Cluster 3.0, ensures full database software redundancy and is most appropriate for environments requiring the highest degree of availability.
Another way to increase the availability of a data service is fault tolerance. A fault tolerant machine has all of its redundancy built in and should be able to withstand a single failure of any part, including CPU and memory. Fault-tolerant machines are most often used in niche markets and are usually expensive to implement. An HA cluster with machines in different geographical locations has the added advantage of being able to recover from a disaster affecting only a subset of those locations.
An HA cluster is the most common solution to increase availability because it is scalable, easy to use, and relatively inexpensive to implement.
Sun Cluster 3.0 provides HA by enabling application failover. Each server is periodically monitored, and the cluster software automatically relocates a cluster-aware application from a failed primary server to a designated secondary server. When a failover occurs, clients may experience a brief interruption in service and may have to reconnect to the server.
However, they will not be aware of the physical server from which they are accessing the application and the data. By allowing other servers in a cluster automatically to host workloads when the primary server fails, Sun Cluster 3.0 can significantly reduce downtime and increase productivity.
Sun Cluster 3.0 requires multihost disk storage. This means that disks can be connected to more than one server at a time. In the Sun Cluster 3.0 environment, multihost storage allows disk devices to become highly available. Disk devices that reside on multihost storage can tolerate single-server failures because there is still a physical path to the data through the alternate server. Multihost disks can be accessed globally through a primary server. If client requests are accessing the data through one server and that server fails, the requests are switched over to another server that has a direct connection to the same disks. A volume manager provides for mirrored or RAID 5 configurations for data redundancy of the multihost disks.
Currently, Sun Cluster 3.0 supports Solstice DiskSuite and VERITAS Volume Manager as volume managers. Combining multihost disks with disk mirroring and striping protects against both server failure and individual disk failure.
Global devices are used to provide cluster-wide HA access to any device in a cluster, from any server, regardless of the physical device location. All disks are included in the global namespace with an assigned device ID (DID) and are configured as global devices. Therefore, the disks themselves are visible from all cluster servers.
File Systems/Global File Systems
A cluster or global file system is a proxy between the kernel (on one server) and the underlying file system volume manager (on a server that has a physical connection to one or more disks). Cluster file systems are dependent on global devices with physical connections to one or more servers. They are independent of the underlying file system and volume manager. Currently, cluster file systems can be built on UFS, using either Solstice DiskSuite or VERITAS Volume Manager. The data becomes available to all servers only if the file systems on the disks are mounted globally as a cluster file system.
All multihost disks must be controlled by the Sun Cluster framework. Disk groups, managed by either Solstice DiskSuite or VERITAS Volume Manager, are first created on the multihost disk. Then they are registered as Sun Cluster disk device groups. A disk device group is a type of global device.
Multihost device groups are HA. Disks are accessible through an alternate path if the server currently mastering the device group fails. The failure of the server mastering the device group does not affect access to the device group, except for the time required to perform the recovery and consistency checks. During this time, all requests are blocked ( transparently to the application) until the system makes the device group available.
Resource Group Manager
The Resource Group Manager (RGM) provides the mechanism for HA and runs as a daemon on each cluster server. It automatically starts and stops resources on selected servers according to preconfigured policies. The RGM allows a resource to be highly available in the event of a server failure or to reboot by stopping the resource on the affected server and starting it on another. The RGM also automatically starts and stops resource-specific monitors that can detect resource failures and relocate failing resources onto another server.
The term data service is used to describe a third-party application that has been configured to run on a cluster, rather than on a single server. A data service includes the application software and Sun Cluster 3.0 software that starts, stops, and monitors the application. Sun Cluster 3.0 supplies data service methods that are used to control and monitor the application within the cluster. These methods run under the control of the RGM, which uses them to start, stop, and monitor the application on the cluster servers. These methods, along with the cluster framework software and multihost disks, enable applications to become HA data services. As HA data services, they can prevent significant application interruptions after any single failure within the cluster, regardless of whether the failure is on a server, on an interface component, or in the application itself. The RGM also manages resources in the cluster, including network resources (logical host names and shared addresses) and application instances.
Resource Type, Resource, and Resource Group
A resource type is made up of the following:
The RGM uses resource type properties to manage resources of a particular type.
A resource inherits the properties and values of its resource type. It is an instance of the underlying application running on the cluster. Each instance requires a unique name within the cluster. Each resource must be configured in a resource group. The RGM brings all resources in a group, online and offline, together on the same server. When the RGM brings a resource group online or offline, it invokes callback methods on the individual resources in the group.
The servers on which a resource group is currently online are called its primary servers , or its primaries . A resource group is mastered by each of its primaries. Each resource group has an associated server list property, set by the cluster administrator, to identify all potential primaries or masters of the resource group.
High Availability with VERITAS Cluster Server
VERITAS Cluster Server (VCS) can be used to eliminate both planned and unplanned downtime. It can facilitate server consolidation and effectively manage a wide range of applications in heterogeneous environments.
VCS supports up to 32 server clusters in both Storage Area Network (SAN) and traditional client/server environments. VCS can protect everything from a single critical database instance, to very large multi-application clusters in networked storage environments. This section provides a brief summary of the features of VCS.
VCS is an availability clustering solution that manages the availability of application services, such as DB2 UDB, by enabling application failover. The states of each individual cluster server and its associated software services are regularly monitored. When a failure occurs that disrupts the application service (in this case, the DB2 UDB service), VCS and/or the VCS HA-DB2 Agent detect the failure and automatically take steps to restore the service. This can include restarting DB2 UDB on the same server or moving DB2 UDB to another server in the cluster and restarting it on that server. If an application needs to be migrated to a new server, VCS moves everything associated with the application (i.e., network IP addresses, ownership of underlying storage) to the new server so that users will not be aware that the service is actually running on another server. They will still access the service using the same IP addresses, but those addresses will now point to a different cluster server.
When a failover occurs with VCS, users may or may not see a disruption in service. This will be based on the type of connection (stateful or stateless) that the client has with the application service. In application environments with stateful connections (such as DB2 UDB), users may see a brief interruption in service and may need to reconnect after the failover has completed. In application environments with stateless connections (such as NFS), users may see a brief delay in service but generally will not see a disruption and will not need to log back on.
By supporting an application as a service that can be automatically migrated between cluster servers, VCS can not only reduce unplanned downtime, but can also shorten the duration of outages associated with planned downtime (i.e., for maintenance and upgrades). Failovers can also be initiated manually. If a hardware or operating system upgrade must be performed on a particular server, DB2 UDB can be migrated to another server in the cluster, the upgrade can be performed, and DB2 UDB can then be migrated back to the original server.
Applications recommended for use in these types of clustering environments should be crash tolerant. A crash tolerant application can recover from an unexpected crash while still maintaining the integrity of committed data.
Crash tolerant applications are sometimes referred to as cluster friendly applications . DB2 UDB is a crash tolerant application.
When used with the VCS HA-DB2 Agent, VCS requires shared storage. Shared storage is storage that has a physical connection to multiple servers in the cluster. Disk devices resident on shared storage can tolerate server failures because a physical path to the disk devices still exists through one or more alternate cluster servers.
Through the control of VCS, cluster servers can access shared storage through a logical construct called disk groups . Disk groups represent a collection of logically defined storage devices whose ownership can be atomically migrated between servers in a cluster. A disk group can be imported to only a single server at any given time. For example, if Disk Group A is imported to Server1 and Server1 fails, Disk Group A can be exported from the failed server and imported to a new server in the cluster. VCS can simultaneously control multiple disk groups within a single cluster.
In addition to allowing disk group definition, a volume manager can provide for redundant data configurations, using mirroring or RAID 5, on shared storage. VCS supports VERITAS Volume Manager and Solstice DiskSuite as logical volume managers. Combining shared storage with disk mirroring and striping can protect against both server failure and individual disk or controller failure.
VERITAS Cluster Server Global Atomic Broadcast and Low Latency Transport
An interserver communication mechanism is required in cluster configurations so that servers can exchange information concerning hardware and software status, keep track of cluster membership, and keep this information synchronized across all cluster servers. The Global Atomic Broadcast (GAB) facility, running across a low-latency transport (LLT), provides the high-speed, low-latency mechanism used by VCS to do this. GAB is loaded as a kernel module on each cluster server and provides an atomic broadcast mechanism that ensures that all servers get status update information at the same time.
By leveraging kernel-to-kernel communication capabilities, LLT provides high-speed LLT for all information that needs to be exchanged and synchronized between cluster servers. GAB runs on top of LLT. VCS does not use IP as a heartbeat mechanism but offers two other, more reliable options. GAB with LLT can be configured to act as a heartbeat mechanism, or a GABdisk can be configured as a disk-based heartbeat. The heartbeat must run over redundant connections. These connections can either be two private Ethernet connections between cluster servers or one private Ethernet connection and one GABdisk connection. The use of two GABdisks is not a supported configuration because the exchange of cluster status between servers requires a private Ethernet connection.
For more information about GAB or LLT, or how to configure them in VCS configurations, please consult the VERITAS Cluster Server User's Guide for Solaris.
Bundled and Enterprise Agents
An agent is a program that is designed to manage the availability of a particular resource or application. When an agent is started, it obtains the necessary configuration information from VCS, then periodically monitors the resource or application and updates VCS with the status. In general, agents are used to bring resources online, take resources offline, or monitor resources and provide four types of services: start, stop, monitor, and clean.
Start and stop are used to bring resources online or offline, monitor is used to test a particular resource or application for its status, and clean is used in the recovery process.
A variety of bundled agents are included as part of VCS and are installed when VCS is installed. The bundled agents are VCS processes that manage predefined resource types commonly found in cluster configurations (i.e., IP, mount, process, and share), and they help to simplify cluster installation and configuration considerably. There are over 20 bundled agents with VCS.
Enterprise agents tend to focus on specific applications, such as DB2 UDB. The VCS HA-DB2 Agent can be considered an Enterprise Agent, and it interfaces with VCS through the VCS Agent framework.
VCS Resources, Resource Types, and Resource Groups
A resource type is an object definition used to define resources within a VCS cluster that will be monitored. A resource type includes the resource type name and a set of properties associated with the resource that are salient from an HA point of view. A resource inherits the properties and values of its resource type, and resource names must be unique on a cluster-wide basis.
There are two types of resources: persistent and standard (non-persistent). Persistent resources are resources such as network interface controllers (NICs) that are monitored but are not brought online or taken offline by VCS. Standard resources are those whose online and offline status is controlled by VCS.
The lowest level object that is monitored is a resource, and there are various resource types (e.g., share, mount). Each resource must be configured into a resource group, and VCS will bring all resources in a particular resource group online and offline together. To bring a resource group online or offline, VCS will invoke the start or stop methods for each of the resources in the group. There are two types of resource groups: failover and parallel. An HA DB2 UDB configuration, regardless of whether it is partitioned or not, will use failover resource groups.
A "primary" or "master" server is a server that can potentially host a resource. A resource group attribute called systemlist is used to specify which servers within a cluster can be primaries for a particular resource group. In a two-server cluster, usually both servers are included in the systemlist, but in larger, multiserver clusters that may be hosting several HA applications, there may be a requirement to ensure that certain application services (defined by their resources at the lowest level) can never fail over to certain servers.
Dependencies can be defined between resource groups, and VCS depends on this resource group dependence hierarchy in assessing the impact of various resource failures and in managing recovery. For example, if the resource group ClientApp1 cannot be brought online unless the resource group DB2 has already been successfully started, resource group ClientApp1 is considered dependent on resource group DB2.
Logical Hostname/IP Failover
A logical hostname, together with the IP address to which it maps, must be associated with a particular DB2 UDB ESE instance. Client programs will access the DB2 database instance using this logical hostname instead of the physical hostname of a server in the cluster. This logical hostname is the entry point to the cluster, and it shields the client program from addressing the physical servers directly. That is, this logical hostname/IP address is cataloged from the DB2 TCP/IP clients (via the catalog TCP/IP node DB2 command).
This logical hostname is configured as a logical hostname resource and must be added to the same resource group as the instance resource. In the case of a failure, the entire resource group, including the instance and the logical host name, will be failed over to the backup server. This floating IP setup provides HA DB2 service to client programs.
Ensure that this hostname maps to an IP address and that this name-to-IP address mapping is configured on all servers in the cluster, preferably in /etc/inet/ hosts on each server. More information on configuration for public IP addresses can be found in the Sun Cluster 3.0 Installation Guide.
Considerations for High Availability with DB2 ESE
One Logical Hostname
The DB2-HA package will create one logical hostname resource for a particular DB2 UDB ESE instance, and this logical hostname resource is added to the same resource group as the first partition in the instance (as defined by the first entry in the $INSTHOME/sqllib/db2nodes.cfg). In this case, client programs will use the logical hostname to access this DB2 UDB ESE instance. Therefore, this partition will be the coordinator partition (regardless of where that particular DB2 partition is physically hosted). This is the default install behavior of the DB2-HA package and is the most common configuration scenario.
DB2 UDB ESE is designed with symmetrical data access across partitions in the sense that client programs may access any database partition as an entry point to the DB2 UDB ESE instance and receive the same result sets from their queries, regardless of the coordinator partition used to process the query. Thus, a DB2 UDB ESE installation provides access redundancy (when the DB2 UDB ESE instance exists initially on more than one physical server). Here, the client program can access the DB2 UDB ESE instance through a round- robin selection of all available physical server names for the instance (or for a subset, provided that the subset contains at least two distinct physical servers). In the case of a failover, the DB2 UDB ESE instance can be accessed through any of the remaining healthy host names/IP addresses.
N Logical Hostnames
If the demands of the application require access to a particular DB2 UDB ESE partition, a logical hostname can be associated with each partition. Using Sun Cluster 3.0 administrative commands, each logical hostname resource can be added and grouped with its corresponding DB2 partition resource.
Consequently, the logical hostname/IP address will failover, together with its associated DB2 UDB ESE partition resource. Thus, connections to a logical hostname/IP address will always be associated with a connection to a particular DB2 UDB ESE coordinator partition.
Sun Cluster 3.0 DB2-HA AgentPackages
There are four methods that are used to control the way DB2 UDB is registered, removed, brought online, or taken offline in a Sun Cluster 3.0 environment. Note that, although there exist a number of other components in the package, only these four can be called directly.
This method will register appropriate resources and resource groups for a specified instance. Note that it will not attempt to bring online any resources. This will usually be the first script called, because it will perform all necessary steps to prepare DB2 UDB for Sun Cluster 3.0 control.
This method will execute required Sun Cluster 3.0 commands in order to remove DB2-HA (including resources and groups registered for this instance) from the cluster. Essentially, this method is the inverse of regdb2udb and will be generally called if the instance is no longer required to be HA.
This method will execute required Sun Cluster 3.0 commands in order to bring a DB2-HA instance online. It will not create any resources or resource groups.
This method will execute required Sun Cluster 3.0 commands in order to bring a DB2-HA instance offline. It will not remove any resources or groups from the Sun Cluster 3.0 infrastructure.
Note the naming convention of the resources and resource groups and their structure. The instance we have made HA is clearly a two-database partition DB2 UDB instance. The partition numbers (also referred to as DB2 logical database partition numbers ) are 0 and 1, and the instance name is db2inst1. For each partition, we can see that exactly one resource group is created, and within that resource group, there is exactly one resource (the HA hostname/IP address has been discussed earlier). This allows for fine-grained control of the movement of the DB2 UDB across the physical servers of the complex.
The naming system is rather mechanical and is chosen to ensure name uniqueness, regardless of the number of instances or partitions that are to be made HA.
The naming convention is as follows:
Note the one-to-one mapping of DB2 resources to DB2 resource groups (and of a particular DB2 instance's logical partition resources and the Sun Cluster 3.0 resources).
In addition, there is one HA hostname/IP address. This is the address that, for example, will be used by clients to catalog the databases in this instance. This hostname/IP address (if present) is always associated with the first DB2 resources group for the instance (first when reading the db2nodes.cfg file top to bottom). For the single-database partitioned, the address is associated with the only DB2 resource group defined for that instance.
Sample Configuration Sun Cluster 3.x and DB2 UDB
We will create a simple DB2 UDB environment on a Sun Cluster platform. The DB2 UDB instance name is db2inst1 , and the HA hostname is sc30 .
It is presumed that:
When using Sun Cluster 3.0 or VCS, ensure that DB2 instances are not started at boot time by using the db2iauto utility, as follows:
db2iauto off InstName
where InstName is the login name of the instance.
Installation of DB2 binary
The DB2 Universal Database setup utility will install the executable file on the path /opt/IBM/db2.
Prior to performing the install, you must ensure that this mount point is on a global device. This can be accomplished by mounting this path directly or by providing a symbolic link from this path to a global mount point.
For example, on one cluster server, run:
On remaining cluster servers, run:
ln -s /global/scdg2/scdg2/opt/IBM/db2 /opt/IBM/db2
Besides /opt/IBM/db2, /var/db2 can also be placed on a global file system. Some profile registry values and environment variables are stored in the files in /var/db2. Use the db2setup tool to create the instance. Ensure that the instance is not autostarted; the instance start and stop should be under the control of the Sun Cluster infrastructure. Additionally, the DB2 registry setting DB2SYSTEM should refer to the logical hostname, rather than the physical hostname.
The DB2 binary should be installed on a global shared file system. You must also take steps to ensure that the license key is available in the case of failover. You can achieve this in one of two ways:
The /etc/services file reserves a range of ports required for DB2 UDB communications. Ensure that the port range is sufficiently large to support all failover scenarios envisioned. For simplicity, we recommend that you configure the port range to be as large as the number of database partitions in the instance. You must configure the same port range for all cluster servers in the cluster.
The entries that compose the db2node.cfg determine the logical-to-physical mapping of the DB2 logical database partition to the appropriate physical host.
For each database partition that is expected to be subjected to significant disk activity, it is strongly recommended that each partition exist on a physical server with at least one local cluster file system mount point (i.e., for which that physical server is the primary of the cluster file system mount point). The cluster must be configured to give the user remote shell access from every server in the cluster to every server in the cluster (this step is required for multiple database partitions only). Generally, this is accomplished through the creation of a .rhosts file in the instance home directory. When this is completed, remote shell commands should proceed unprompted, and the db2_all command should execute without error.
As instance owner, issue the following command:
This should return the correct date and time from each server. If not, you'll have to troubleshoot and solve this problem before proceeding.
For the instance in question, issue the following command (again, as the instance owner):
This should complete successfully. If it does not complete successfully at all servers, that likely means a configuration error. You must review the DB2 UDB ESE Quick Beginnings guide and resolve the problem before proceeding.
Next, attempt to stop the instance with the following command (again, as the instance owner):
This should also complete successfully. Again, if for DB2 UDB ESE it does not complete successfully at all servers, that likely means a configuration error. You must review the DB2 UDB Quick Beginnings guide and resolve the problem before proceeding.
Once you've verified that the instance can be started and stopped, attempt to create the sample database (or an empty test database, if you prefer). Create the database on the path of the global device you plan to use for storage of the actual production database. When you're certain the create database command has completed successfully, remove the test or sample database, using the drop database command.
The instance is now ready to be made HA. Steps to configure:
First, use the regdb2udb utility to register the instance with the Sun Cluster 3.0 infrastructure. Use the scstat commands to investigate the status of the cluster. We should see the necessary resources and resource groups registered.
sun-ha1 # /opt/IBM/db2/V8.1/ha/sc30/util/regdb2udb a db2inst1 -h sc30 sun-ha1 # scstat p
The result of the regdb2udb processing is that the appropriate DB2 resources and resource groups are created and registered with Sun Cluster 3.0.
Next, use the onlinedb2udb utility to bring these registered resources online. Use the scstat commands to investigate the status of the cluster. We should see the necessary resources and resource groups registered.
sun-ha1 # /opt/IBM/db2/V8.1/ha/sc30/util/onlinedb2udb a db2inst1 -h sc30 sun-ha1 # scstat p
The result of the online processing is that the db2inst1 instance (and its associated HA IP address) is online and under the control of Sun Cluster 3.0.
As a result of this, you should see the resources brought online at the appropriate server (for example, on the physical hostname sun-ha1, you should see that it hosts the HA IP address for sc30, as well as the processes for the instance db2inst1, database partition 0).
There are two more supplied scripts that we will now discuss: offlinedb2udb and unregdb2udb.
Typically, you may wish to take a DB2 instance offline in order to remove the DB2 resources from Sun Cluster 3.0 control. For example, you may wish to bring the database engine down for an extended period of time. Directly issuing the appropriate DB2 commands (for example, db2start, db2stop) will be ineffective , because Sun Cluster 3.0 will interpret the absence of resources caused by the successful completion of the db2stop command as a failure and attempt to restart the appropriate database instance resources.
Instead, you must bring the resources offline as follows:
sun-ha1 # offlinedb2udb -a db2inst1 -h sc30 sun-ha1 # scstat p
As you can see from the scstat output, all resources are now offline. No resources associated with the instance db2inst1 should be present on either server, nor will Sun Cluster 3.0 take any action to protect this instance, should it be brought online manually (i.e., via db2start) and a failure occur.
Let's assume that you've decided to remove this instance permanently from Sun Cluster 3.0 monitoring and control. For this task, you may use the unregdb2udb method. Note that this method merely interfaces with Sun Cluster 3.0 in order to perform these tasks; the instance itself is neither dropped nor removed.
sun-ha1 # unregdb2udb -a db2inst1 -h sc30 sun-ha1 # scstat p
Configuration of Multiple DB2 Instances
For each additional instance, including the DAS that you wish to make HA, you are required to execute the regdb2udb command to register the instance with Sun Cluster 3.0. If it is desired to enable multiple HA DB2 instances, each HA DB2 instance will require a distinct HA hostname/IP address. This HA hostname/IP address is identified uniquely with exactly one instance.
For the DAS instance, the DAS instance name is used as the instance name argument when running the regdb2udb command. For example:
sun-ha1 # /opt/IBM/db2/V8.1/ha/sc30/util/regdb2udb -a db2as -h daslogicalhostname
Cluster Verification Testing
Testing is an important aspect of an HA cluster. The purpose of testing is to gain some confidence that the HA cluster will function as envisioned for various failure scenarios. What follows is a set of minimum recommended scenarios for cluster testing and verification. These tests should be run regularly to ensure that the cluster continues to function as expected. Timing will vary, depending on production schedules, the degree to which the cluster state evolves over time, and management diligence.
In this test, we're performing Sun Cluster 3.0 management commands to ensure that the db2inst1 instance can be controlled correctly.
First, verify that the instance is accessible from the clients (or locally) and that various database commands complete successfully (for example, create database).
Take the db2_db2inst1_0-rs and sc30 resources offline, using the following:
scswitch -n -j db2_db2inst1_0-rs scswitch -n -j sc30
Observe that the DB2 instance resources no longer exist on any server in the cluster, and the HA hostname sc30 is likewise inaccessible.
From the perspective of a client of this instance, the existing DB2 connections are closed, and new ones start up, pending the process to bring online the appropriate DB2 and IP resources.
To return the resources to their previous states, bring them online with the following SC30 commands:
scswitch -e -j sc30 scswitch -e -j db2_db2inst1_0-rs
DB2 clients waiting in the previous test mode will be able to connect and resubmit transactions to pick up from the last failure. The client program must send retries to accomplish this.
Test the failover of the DB2 instance and associated resources from sun-ha1 onto sun-ha2. At this point, the cluster is again at its initial state.
Bring the resources contained within the resource group db2_db2inst1_0-rg offline, using the commands described in Test 1.
Then move the containing resource group to the secondary server, using the following Sun Cluster 3.0 command:
scswitch -z -g db2_db2inst1_0-rg -h sun-ha2
Now attempt to enable the relevant resources, using the same commands described in Test 2.
You should see the DB2 resource for db2inst1 and the associated hostname/IP address now hosted by the secondary machine, sun-ha2. Verify by executing the scstat -p command.
Here, test the failover capabilities of the Sun Cluster 3.0 software itself. Bring the resources back into their initial state (i.e., have the db2_db2inst1_0-rg hosted on sun-ha1).
Once the instance and its associated resources are hosted on the sun-ha1 machine, perform a power-off operation on that physical server. This will cause the internal heartbeat mechanism to detect a physical server failure, and the DB2 resources will be restarted on the surviving server.
Verify that the results are identical to those seen in Test 3 (i.e., the DB2 resources should be hosted on sun-ha2, and the clients should behave similarly in both cases).
Bring the cluster back to its initial state. In this test, verify that the software monitoring is working as expected. To perform this test, you may issue commands as follows:
ps -ef grep db2sysc kill -9 <pid> or ps -ef grep db2tcpcm kill 9 <pid>
The Sun Cluster 3.0 monitor should detect that a required process is not running and attempt to restart the instance on the same server. Verify that this, in fact, does occur. The client connections should experience a brief delay in service while the restart process continues.
Note that there is a high number of distinct testing scenarios that can be executed, limited only by your resources and imagination . Those discussed here are the minimum you should run to test the correct functioning of the cluster.
High Availability on HP/UX
HP MC/ServiceGuard monitors the health of each server and quickly responds to failures in a way that minimizes or eliminates application downtime. MC/ServiceGuard is able to detect and respond automatically to failures in the following components:
With HP MC/ServiceGuard, application services and all the resources needed to support the application are bundled into special entities called application packages . These application packages are the basic units that are managed and moved within an enterprise cluster. Packages simplify the creation and management of HA services and provide outstanding levels of flexibility for workload balancing.
Fast Detection of Failure, Fast Restoration of Applications
Within an enterprise cluster, HP MC/ServiceGuard monitors hardware and software components, detects failures, and responds by promptly allocating new resources to support mission-critical applications. The process of detecting the failure and restoring the application service is completely automatedno operator intervention is needed.
Recovery times for failures requiring the switch of an application to an alternate server will vary, depending on the software services being used by the application. For example, a database application that is using a logging facility would need to perform transaction rollbacks as part of the recovery process. The time needed to perform this transaction rollback would be part of the total time to recover the application. MC/ServiceGuard will detect the server failure, reconfigure the cluster, and begin executing the startup script for the application package on an alternate server in a short period of time.
Installation Outline for DB2 and MC/ServiceGuard
The following steps outline the installation process and configuration changes required for DB2 in an HA environment and summarize the full installation as documented.
Configuring the Cluster
All of the MC/ServiceGuard scripts developed during the certification process have been provided in this document. The following section describes the creation and use of these scripts.
Create the ASCII cluster template file:
cmquerycl v C /etc/cmcluster/cluster.ascii n ptac171 n ptac178
Modify the template (cluster.ascii) to reflect the environment and to verify the cluster configuration:
cmcheckconf v C /etc/cmcluster/cluster.ascii
Create the cluster by applying the configuration file. This will create the binary file cmclconfig and automatically distribute it among the servers defined in the cluster:
cmapplyconf v C /etc/cmcluster/cluster.ascii
Start the cluster and check the cluster status. Test the cluster halt also:
cmruncl v n ptac171 n ptac178 cmviewcl v cmhaltcl f v cmruncl n ptac171 n ptac178
Configuring a ServiceGuard Package (on a Single Server)
Create the db2inst1 package configuration file and tailor to the test environment. Do not include the second server at this stage.
cd /etc/cmcluster mkdir db2inst1 cmmakepkg p db2inst1.conf # Edit db2inst1.conf
Create the db2inst1 package control script and tailor to the test environment. Do not include application startup/shutdown, service monitoring, or relocatable IP address at this stage.
cd db2inst1 cmmakepkg s db2inst1.cntl
Shut down cluster; verify and distribute the binary configuration files
cmhaltcl f v cmapplyconf v C /etc/cmcluster/cluster.ascii P \ /etc/cmcluster/db2inst1/db2inst1.conf
Test cluster and package startup. Shut down DB2 if running, unmount all logical volumes on /dev/db2, and deactivate the volume group first. Copy the db2inst1.cntl and db2inst1.ascii scripts into the /etc/cmcluster/db2inst1 directory.
Cmruncl # Start cluster and package cmviewcl v # Check that package has started
Edit db2inst1.cntl and assign the dynamic IP address of the db2inst1 package.
cmhaltpkg db2inst1 vi db2inst1.cntl # Edit to add package IP cmrunpkg v db2inst1 # Start DB2 Package cmviewcl v # Check package has started and clients
Enable switching to a local standby LAN card.
vi db2inst1.conf # Net switching enabled = YES cmapplyconf v C /etc/cmcluster/cluster.ascii P db2inst1.ascii cmhaltcl f v cmruncl v
Configuring a ServiceGuard Package (Adding a Second Server)
Enable db2inst1 to switch to a second server by editing the package control file:
vi db2inst1.conf # add SERVER_NAME ptac178 cmapplyconf v C /etc/cmcluster/cluster.ascii P db2inst1/db2inst1.conf cmhaltcl f v cmruncl v
Test package switch to ptac178 and back to ptac171
cmhaltpkg db2inst1 cmrunpkg n ptac178 db2inst1 # Run package on Ptac178 and cmmodpkg e db2inst1 # run DB2 and check application cmhaltpkg db2inst1 # Enable package switching cmrunpkg n ptac171 db2inst1 # Run package on ptac171 and test cmmodpkg e db2inst1 # DB2 runs here
Configuring DB2 in the MC/ServiceGuard Environment
Once DB2 is installed and configured in the MC/ServiceGuard cluster, the DB2 package scripts can be configured.
In testing the MC/ServiceGuard integration with IBM engineers , the db2inst1.cntl file was configured such that the db2inst1 service has 0 restarts and will failover to the adoptive server in the case of a software or hardware failure. This number can be changed to suit the needs of each install. It is recommended that the restart value be left at 0. DB2 is a robust product, and if there is a failure, the probability of a successful restart is low. To ensure a stable DB2 operating environment, it is suggested that MC/ServiceGuard be allowed to move the db2inst1 package to an adoptive server in the case of any failure.
The DB2 daemons that are monitored are db2sysc, db2tcpcm, db2srvlst, db2resyn, db2gds, and db2ipccm. Testing was performed where the list of processes above was monitored, and just db2sysc was monitored. The testing results were the same. If any of the other DB2 processes failed, db2sysc failed, as well. Because the MC/ServiceGuard monitor script for db2inst1 was monitoring the db2sysc process, the db2inst1 package was moved to the adoptive server in the case of any DB2 process failing.