The Cluster Application Availability facility is a framework to manage and monitor an application to make it highly available. If an application would normally be restricted to running on one cluster member at a time, CAA can be used to relocate the application from one cluster member to another in order to keep the application running within the cluster at all times.
For example, if you have an application called "NaHA-Widget[2]", that is restricted to running on member1, and member1 is shutdown (or crashes), what would happen to "NaHA-Widget"? It would no longer be available to your users.
If, however, the application was placed under the control of CAA, and member1 is shutdown (or crashes), then the "NaHA-Widget[3]" would automatically start up on another cluster member.
CAA monitors and manages resources. Resources can be applications, network interfaces, tape and media changer devices. A resource is defined by creating a profile. Once the profile is created, it must be registered with the CAA Resource Manager (caad (8)) before it can be managed.
Figure 23-1 shows the CAA architecture.
Figure 23-1: The CAA Architecture
Each member has a Resource Manager that communicates to the other cluster members' resource managers. The Resource Manager monitors the various resource types and manages (starts, stops, relocates) the application resources when certain events occur or other criteria are met. Events can be those received from EVM (see Table 23-2) or by direct intervention from the cluster administrator (e.g., running a caa_* command – see section 23.3.6). The term "other criteria" is used to indicate when an attribute value, defined within a resource's profile, is reached, causing the Resource Manager to take action. We will discuss this further in the following sections.
EVM Event Subscriptions | |
---|---|
Attribute | Event |
CAA | Clu.cnx.member.leave |
clu.cnx.member.join | |
clu.cnx.quorum.loss | |
clu.cnx.quorum.gain | |
clu.member.add | |
clu.member.delete | |
hw.net.down | |
hw.net.up | |
Chamger Respirce | hw.state_change.media_changer |
hw.state_media_changer._hwid.* | |
hw.deregistered.media_changer._hwid.* | |
Network Resources | hw.net.niff.down |
hw.net.niff.up | |
Tape Resource | hw.state_change.available |
hw.state_change.unavailable. |
The Resource Manager only monitors and manages those resources that are in the CAA registry (/var/cluster/caa/registry/caa.reg*). In other words, once you create (or modify) a resource, you must register the resource with CAA.
For more information on the CAA Resource Manager, see the caad (8) reference page.
Resource Monitors are shared library plug-ins that the Resource Manager uses to monitor and control a particular resource type. Since CAA supports four resource types (as of this writing), the /var/cluster/caa/monitors directory contains four requisite resource monitors.
Note that as of V5.1A, there exists a resource monitor registry (caa_type.reg) where resource monitors are registered with the resource manager. This registry is a text file but do not attempt to edit it.
The resource monitor registry does contain non-printable characters within it, so to see what resource monitors are within the caa_type.reg file, we recommend using the strings (1) command.
# strings /var/cluster/caa/registry/caa_type.reg application application.so SCRIPTPATH=/var/cluster/caa/script network network.so NONE tape tape.so NONE changer changer.so NONE
The changer, tape, and network resource monitors subscribe to EVM events in order to know when the monitored hardware component has failed or has become available (see Table 23-2).
The resource registry database is located in the /var/cluster/caa/registry directory. The file name may differ depending on the version of the TruCluster Server software that is installed but starts with "caa.reg".
Note | We have seen two file names as of this writing:
|
The resource registry database contains all of the information that the Resource Manager needs to monitor and manage the registered resources. The resource registry database must be updated whenever a resource's profile is modified. If a resource has not been added to the resource registry database, CAA will not know about it.
The resource registry database is a binary file, therefore the information contained within it cannot be easily gleaned simply by using the cat (1) command or your favorite editor. You could of course get some information by using the strings command, but this would not dump all of the information contained therein.
The easiest way to get information from the resource registry database is to use the caa_stat (1) command.
# caa_stat -t Name Type Target State Host -------------------------------------------------------------- autofs application OFFLINE OFFLINE cluster_lockd application ONLINE ONLINE sheridan clustercron application ONLINE ONLINE sheridan dhcp application OFFLINE OFFLINE named application OFFLINE OFFLINE
You can get more in-depth information about a registered resource's attributes by using the "-p" option.
# caa_stat -p clustercron NAME=clustercron TYPE=application ACTION_SCRIPT=clustercron.scr ACTIVE_PLACEMENT=0 AUTO_START=1 CHECK_INTERVAL=60 DESCRIPTION=clustercron FAILOVER_DELAY=0 FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 HOSTING_MEMBERS= OPTIONAL_RESOURCES= PLACEMENT=balanced REBALANCE= REQUIRED_RESOURCES= RESTART_ATTEMPTS=1 SCRIPT_TIMEOUT=60
We will discuss resource attributes in section 23.4.4; so don't be concerned if none of this makes sense at this stage. Our intent is to show you how to get to the information, not how to interpret it – that will come later in the chapter.
If you are a senior-level-cluster-guru-type-dude (a.k.a., Chief Troubleshooter in TruCluster Server Snoopology (CT2S2)) and would like to dump the raw contents of the resource registry database, you can use a relatively unknown (currently undocumented and hence unsupported) tool located in the /usr/sbin/cluster directory known as caa_dbConvert.
Here is an example of dumping the resource registry database (caa.reg.binaryDB) to a text file (caa.reg.txt) in the /tmp directory.
# cd /var/cluster/caa/registry # /usr/sbin/cluster/caa_dbConvert DUMP caa.reg.binaryDB /tmp/caa.reg.txt
Although we have not shown the contents of the /tmp/caa.reg.txt, it does contain quite a bit of interesting information.
Note | If you see the following error when using the above-mentioned command, use the full pathname to indicate the resource registry database (or change your directory location as we did in the previous example). |
mmapFile::mapFile, caa.reg.binaryDB, open error
For example:
# caa_dbConvert DUMP /var/cluster/caa/registry/caa.reg.binaryDB /tmp/caa.reg.txt
The resource registry history database is also located in the /var/cluster/caa/registry directory. As with the resource registry database, the file name may differ depending on which version of the TruCluster Server software is installed but starts with "caa.his".
Note | We have seen two file names as of this writing: |
| – V5.0A, V5.1, V5.1A (unpatched) |
| – V5.1A (IPK and above), V5.1B |
The resource registry history database is used to track the failure history of a resource. This database, like the resource registry database, is binary in format, so the best approach to getting information from the database is to use the caa_stat command. Resource failure history can be retrieved using the "-v" option.
# caa_stat -v nicUP NAME=nicUP TYPE=network FAILURE_THRESHOLD=2 FAILURE_COUNT=0 on molari FAILURE_COUNT=0 on sheridan TARGET=ONLINE on molari TARGET=ONLINE on sheridan STATE=ONLINE on molari STATE=ONLINE on Sheridan
As with the resource registry database (if you're a CT[2]S[2] that is), you can dump the resource registry history database using the caa_dbConvert command.
# /usr/sbin/cluster/caa_dbConvert DUMP caa.his.binaryDB /tmp/caa.his.txt
Here is an excerpt of the converted database showing the information for the nicUP resource.
... __RESOURCE__,nicUP 2_FAILURE_HISTORY,1018678401 1018682922 ...
You can see that the last two times the nicUP resource failed as follows:
# perl -e 'foreach $i (1018678401,1018682922) > { printf ("%s\n", scalar localtime $i) };' Sat Apr 13 02:13:21 2002 Sat Apr 13 03:28:42 2002
Figure 23-2 shows the various locations where the majority of CAA-related directories and files are located. Note that we have not included each and every file or directory location. For an exhaustive list you can use the find (1) command.
Figure 23-2: CAA Directories and Files
# find / -name '*[Cc][Aa][Aa]*'
If you happen to have a cluster that is running a patched version of V5.1A, you will see a hybrid directory layout containing some of the files seen in V5.1B. This is due to the work that was done by the CAA Engineering group in support of the "Compaq Database Utility with Oracle9i Real Application Clusters".
Note that the log subdirectory under /var/cluster/caa is obsolete and should no longer be used.
CAA has a command-line interface as well as a graphical user interface (GUI). The easiest way to determine what CAA commands are available (or really what CAA information is available), you can simply use the man (1) command with the "-k" option (or the apropos (1) command).
# man -k caa
Another option is to use the sman script that we wrote. The sman command is essentially a section-based "man -k" command that also formats the output. The advantage of sman is that you can narrow your search criteria to only the sections in which you are interested. For example, if you only want commands, you can limit your search to sections 1 and 8.
# sman [18] caa Section Reference Page Description ------- ----------------------- ----------------------------------------- 8 caa_balance Finds the optimal member for an application resource and relocates the resource to that member if it is not currently placed there. 8 caa_profile Creates, validates, deletes, and updates a Cluster Application Availability (CAA) resource profile 8 caa_register Registers a resource with Cluster Application Availability (CAA) 8 caa_relocate Relocates an application resource from one cluster member to another 1 caa_report reports availability statistics for application resources 8 caa_start Starts resources that have been registered with Cluster Application Availability (CAA). 1 caa_stat Provides status on Cluster Application Availability (CAA) resources within a cluster. 8 caa_stop Stops a Cluster Application Availability (CAA) application resource 8 caa_unregister Unregisters a Cluster Application Availability (CAA) resource. 8 caad Cluster Application Availability (CAA) daemon
Using the "all" keyword will search all man sections.
# sman all caa Section Reference Page Description ------- ------------------- ----------------------------------------- 4 caa Cluster Application Availability (CAA) information ...
The good news with CAA is that the commands you will be using all start with "caa_". All you need to remember is the action you want to perform.
Note that the GUI does not show up in the output of "man -k" or "sman". This is because the GUI is a sysman (8) application plug-in. Use the "sysman –list" command to find the list of sysman accelerators.
# sysman -list | grep -i caa | Cluster Application Availability (CAA) Management [caa]
To manage CAA with sysman using the "caa" accelerator.
# sysman caa
Note | The caa_balance (8) and caa_report (1) commands were added in V5.1B. |
[2]Not a Highly-Available Widget
[3]Now a Highly-Available Widget ☺