23.3 What is Cluster Application Availability?


23.3 What is Cluster Application Availability?

The Cluster Application Availability facility is a framework to manage and monitor an application to make it highly available. If an application would normally be restricted to running on one cluster member at a time, CAA can be used to relocate the application from one cluster member to another in order to keep the application running within the cluster at all times.

For example, if you have an application called "NaHA-Widget[2]", that is restricted to running on member1, and member1 is shutdown (or crashes), what would happen to "NaHA-Widget"? It would no longer be available to your users.

If, however, the application was placed under the control of CAA, and member1 is shutdown (or crashes), then the "NaHA-Widget[3]" would automatically start up on another cluster member.

CAA monitors and manages resources. Resources can be applications, network interfaces, tape and media changer devices. A resource is defined by creating a profile. Once the profile is created, it must be registered with the CAA Resource Manager (caad (8)) before it can be managed.

23.3.1 CAA Architecture

Figure 23-1 shows the CAA architecture.

click to expand
Figure 23-1: The CAA Architecture

23.3.2 The Resource Manager (caad)

Each member has a Resource Manager that communicates to the other cluster members' resource managers. The Resource Manager monitors the various resource types and manages (starts, stops, relocates) the application resources when certain events occur or other criteria are met. Events can be those received from EVM (see Table 23-2) or by direct intervention from the cluster administrator (e.g., running a caa_* command – see section 23.3.6). The term "other criteria" is used to indicate when an attribute value, defined within a resource's profile, is reached, causing the Resource Manager to take action. We will discuss this further in the following sections.

Table 23-2: CAA Components EVM Event Subscription

EVM Event Subscriptions

Attribute

Event

CAA

Clu.cnx.member.leave

clu.cnx.member.join

clu.cnx.quorum.loss

clu.cnx.quorum.gain

clu.member.add

clu.member.delete

hw.net.down

hw.net.up

Chamger Respirce

hw.state_change.media_changer

hw.state_media_changer._hwid.*

hw.deregistered.media_changer._hwid.*

Network Resources

hw.net.niff.down

hw.net.niff.up

Tape Resource

hw.state_change.available

hw.state_change.unavailable.

The Resource Manager only monitors and manages those resources that are in the CAA registry (/var/cluster/caa/registry/caa.reg*). In other words, once you create (or modify) a resource, you must register the resource with CAA.

For more information on the CAA Resource Manager, see the caad (8) reference page.

23.3.3 Resource Monitors

Resource Monitors are shared library plug-ins that the Resource Manager uses to monitor and control a particular resource type. Since CAA supports four resource types (as of this writing), the /var/cluster/caa/monitors directory contains four requisite resource monitors.

Note that as of V5.1A, there exists a resource monitor registry (caa_type.reg) where resource monitors are registered with the resource manager. This registry is a text file but do not attempt to edit it.

The resource monitor registry does contain non-printable characters within it, so to see what resource monitors are within the caa_type.reg file, we recommend using the strings (1) command.

 # strings /var/cluster/caa/registry/caa_type.reg application application.so SCRIPTPATH=/var/cluster/caa/script network network.so NONE tape tape.so NONE changer changer.so NONE 

The changer, tape, and network resource monitors subscribe to EVM events in order to know when the monitored hardware component has failed or has become available (see Table 23-2).

23.3.4 Resource Registry Database

The resource registry database is located in the /var/cluster/caa/registry directory. The file name may differ depending on the version of the TruCluster Server software that is installed but starts with "caa.reg".

Note

We have seen two file names as of this writing:

  • caa.reg

– V5.0A, V5.1, V5.1A (unpatched)

  • caa.reg.binaryDB

– V5.1A (IPK and above), V5.1B

The resource registry database contains all of the information that the Resource Manager needs to monitor and manage the registered resources. The resource registry database must be updated whenever a resource's profile is modified. If a resource has not been added to the resource registry database, CAA will not know about it.

The resource registry database is a binary file, therefore the information contained within it cannot be easily gleaned simply by using the cat (1) command or your favorite editor. You could of course get some information by using the strings command, but this would not dump all of the information contained therein.

The easiest way to get information from the resource registry database is to use the caa_stat (1) command.

 # caa_stat -t Name             Type           Target     State     Host -------------------------------------------------------------- autofs           application    OFFLINE    OFFLINE cluster_lockd    application    ONLINE     ONLINE    sheridan clustercron      application    ONLINE     ONLINE    sheridan dhcp             application    OFFLINE    OFFLINE named            application    OFFLINE    OFFLINE 

You can get more in-depth information about a registered resource's attributes by using the "-p" option.

 # caa_stat -p clustercron NAME=clustercron TYPE=application ACTION_SCRIPT=clustercron.scr ACTIVE_PLACEMENT=0 AUTO_START=1 CHECK_INTERVAL=60 DESCRIPTION=clustercron FAILOVER_DELAY=0 FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 HOSTING_MEMBERS= OPTIONAL_RESOURCES= PLACEMENT=balanced REBALANCE= REQUIRED_RESOURCES= RESTART_ATTEMPTS=1 SCRIPT_TIMEOUT=60 

We will discuss resource attributes in section 23.4.4; so don't be concerned if none of this makes sense at this stage. Our intent is to show you how to get to the information, not how to interpret it – that will come later in the chapter.

If you are a senior-level-cluster-guru-type-dude (a.k.a., Chief Troubleshooter in TruCluster Server Snoopology (CT2S2)) and would like to dump the raw contents of the resource registry database, you can use a relatively unknown (currently undocumented and hence unsupported) tool located in the /usr/sbin/cluster directory known as caa_dbConvert.

Here is an example of dumping the resource registry database (caa.reg.binaryDB) to a text file (caa.reg.txt) in the /tmp directory.

 # cd /var/cluster/caa/registry # /usr/sbin/cluster/caa_dbConvert DUMP caa.reg.binaryDB /tmp/caa.reg.txt 

Although we have not shown the contents of the /tmp/caa.reg.txt, it does contain quite a bit of interesting information.

Note

If you see the following error when using the above-mentioned command, use the full pathname to indicate the resource registry database (or change your directory location as we did in the previous example).

 mmapFile::mapFile, caa.reg.binaryDB, open error 

For example:

 # caa_dbConvert DUMP /var/cluster/caa/registry/caa.reg.binaryDB /tmp/caa.reg.txt 

23.3.4.1 Resource Registry History Database

The resource registry history database is also located in the /var/cluster/caa/registry directory. As with the resource registry database, the file name may differ depending on which version of the TruCluster Server software is installed but starts with "caa.his".

Note

We have seen two file names as of this writing:

  • caa.his

– V5.0A, V5.1, V5.1A (unpatched)

  • caa.his.binaryDB

– V5.1A (IPK and above), V5.1B

The resource registry history database is used to track the failure history of a resource. This database, like the resource registry database, is binary in format, so the best approach to getting information from the database is to use the caa_stat command. Resource failure history can be retrieved using the "-v" option.

 # caa_stat -v nicUP NAME=nicUP TYPE=network FAILURE_THRESHOLD=2 FAILURE_COUNT=0 on molari FAILURE_COUNT=0 on sheridan TARGET=ONLINE on molari TARGET=ONLINE on sheridan STATE=ONLINE on molari STATE=ONLINE on Sheridan 

As with the resource registry database (if you're a CT[2]S[2] that is), you can dump the resource registry history database using the caa_dbConvert command.

 # /usr/sbin/cluster/caa_dbConvert DUMP caa.his.binaryDB /tmp/caa.his.txt 

Here is an excerpt of the converted database showing the information for the nicUP resource.

 ... __RESOURCE__,nicUP 2_FAILURE_HISTORY,1018678401 1018682922 ... 

You can see that the last two times the nicUP resource failed as follows:

 # perl -e 'foreach $i (1018678401,1018682922) > { printf ("%s\n", scalar localtime $i) };' Sat Apr 13 02:13:21 2002 Sat Apr 13 03:28:42 2002 

23.3.5 Directory Layout

Figure 23-2 shows the various locations where the majority of CAA-related directories and files are located. Note that we have not included each and every file or directory location. For an exhaustive list you can use the find (1) command.

click to expand
Figure 23-2: CAA Directories and Files

 # find / -name '*[Cc][Aa][Aa]*' 

If you happen to have a cluster that is running a patched version of V5.1A, you will see a hybrid directory layout containing some of the files seen in V5.1B. This is due to the work that was done by the CAA Engineering group in support of the "Compaq Database Utility with Oracle9i Real Application Clusters".

Note that the log subdirectory under /var/cluster/caa is obsolete and should no longer be used.

23.3.6 CAA Commands

CAA has a command-line interface as well as a graphical user interface (GUI). The easiest way to determine what CAA commands are available (or really what CAA information is available), you can simply use the man (1) command with the "-k" option (or the apropos (1) command).

 # man -k caa 

Another option is to use the sman script that we wrote. The sman command is essentially a section-based "man -k" command that also formats the output. The advantage of sman is that you can narrow your search criteria to only the sections in which you are interested. For example, if you only want commands, you can limit your search to sections 1 and 8.

 # sman [18] caa Section   Reference Page          Description -------   ----------------------- ----------------------------------------- 8         caa_balance             Finds the optimal member for an                                   application resource and relocates the                                   resource to that member if it is not                                   currently placed there. 8         caa_profile             Creates, validates, deletes, and updates                                   a Cluster Application Availability (CAA)                                   resource profile 8         caa_register            Registers a resource with Cluster                                   Application Availability (CAA) 8         caa_relocate            Relocates an application resource from                                   one cluster member to another 1         caa_report              reports availability statistics for                                   application resources 8         caa_start               Starts resources that have been                                   registered with Cluster Application                                   Availability (CAA). 1         caa_stat                Provides status on Cluster Application                                   Availability (CAA) resources within a                                   cluster. 8         caa_stop                Stops a Cluster Application Availability                                   (CAA) application resource 8         caa_unregister          Unregisters a Cluster Application                                   Availability (CAA) resource. 8         caad                    Cluster Application Availability (CAA)                                   daemon 

Using the "all" keyword will search all man sections.

 # sman all caa Section  Reference Page      Description -------  ------------------- -----------------------------------------    4     caa                 Cluster Application Availability (CAA)                              information ... 

The good news with CAA is that the commands you will be using all start with "caa_". All you need to remember is the action you want to perform.

Note that the GUI does not show up in the output of "man -k" or "sman". This is because the GUI is a sysman (8) application plug-in. Use the "sysman –list" command to find the list of sysman accelerators.

 # sysman -list | grep -i caa         | Cluster Application Availability (CAA) Management [caa] 

To manage CAA with sysman using the "caa" accelerator.

 # sysman caa 
Note

The caa_balance (8) and caa_report (1) commands were added in V5.1B.

[2]Not a Highly-Available Widget

[3]Now a Highly-Available Widget




TruCluster Server Handbook
TruCluster Server Handbook (HP Technologies)
ISBN: 1555582591
EAN: 2147483647
Year: 2005
Pages: 273

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net