Think of a resource as something that CAA manages and monitors. If you are familiar with earlier versions of TruCluster (ASE and Production Server (PS)), we referred to resources as services. In fact, ASE had four service types: Disk, NFS, Tape, and User-Defined. Production Server added a fifth service, the Distributed Raw Disk (the old DRD) service.
TruCluster Server has four types of CAA resources as of this writing: Application, Changer, Network, and Tape.
Can we draw a parallel between the old ASE and Production Server services and CAA resources? Yes and no, but we have not lost any capability. In fact, as you will soon see, the CAA resource (along with the CFS, the new DRD, and Cluster Alias subsystem) adds more flexibility and capability.
In ASE and PS, the service was responsible for a lot more than starting and stopping an application. For example, a disk service was used to manage storage, optionally add and remove IP aliases, optionally import and deport LSM disk groups, and mount and unmount file systems. With TruCluster Server, the DRD manages the storage, the CLSM manages the disk groups, the CFS manages the file systems, and the CLUA manages the aliases.
So what does CAA manage? Resources.
What types of resources exist?
As of this writing, CAA supports four resource types:
Application
Changer
Network
Tape
Application resources must have a resource profile (see section 23.4.3) and an action script (see section 23.4.5).
Non-application resources must have a resource profile (of course, configured and functioning hardware is also important ☺).
Table 23-3 shows what states the resources can have and a description of each state as it relates to each resource. Note, each resource will have a TARGET state and a current STATE.
Resource States | |||
---|---|---|---|
Resource | State | Description | |
STATE[*] | TARGET[*] | ||
Application | ONLINE | The application started by the resource is running | The resource has been set to start (i.e., caa_start was executed). |
OFFLINE | The application started by the resource is not running or the resource has been administratively stopped. | The resource has been set to stop (i.e., caa_stop was executed). | |
UNKNOWN | The stop entry point within the action script failed or reached the SCRIPT_TIMEOUT. To set the state back to OFFLINE, use "caa_stop -f resource_name". | Not applicable. | |
Network | ONLINE | The network interface is functioning. | The network interface has not failed. |
OFFLINE | The network interface has failed or is misconfigured. | The network resource has reached the FAILURE_THRESHOLD. Once the problem is resolved the TARGET can be set ONLINE using the caa_start command. | |
Tape & Changer | ONLINE | A direct connection to the device (via the SCSI or Fibre Channel bus) exists. | A direct connection to the device(via the SCSI or Fibre Channel bus) exists. |
OFFLINE | No direct connection to the device exists. This can be due to a path failure or simply because the cluster member does not have a direct connection to the bus where the device is connected (e.g., a private or semi-shared bus). | The changer or tape resource has reached the FAILURE_THRESHOLD. Once the problem is resolved TARGET can be set ONLINE using the caa_start command. | |
[*]-the STATE and TARGET can be seen using the caa_stat (1) or "sysman caa" commands |
The profile for a resource contains the attributes that define the resource. The resource profile is an ASCII file so it can be edited using your favorite editor, although you may find it convenient to create it using the caa_profile command or using sysman with the "caa" keyword accelerator.
A resource profile is required for all resource types. Creating a resource is not the last step along the way. Once a profile has been created (and any time it is modified), it must be registered with the resource manager.
In the following sections we will show you the attributes that can, and must, be used for each resource type. We will also show you how to register a resource, update the registration when a profile is modified, and unregister a resource that is no longer needed.
Let's start by defining the resource attributes.
As of this writing there are four resource types. Each resource type uses certain attributes. Some attributes are required while others are optional. Some attributes are common to all resource types.
In the following sections, we will list each attribute, state whether or not it is required, identify its default value, and describe how it is used.
Resource Attribute Category | Section |
| 23.4.4.1 |
| 23.4.4.2 |
| 23.4.4.3 |
| 23.4.4.4 |
For the most up-to-date resource attribute information, see the caa (4) reference page.
Table 23-4 lists the resource attributes that are common to each resource type.
Common Resource Attributes | |||
---|---|---|---|
Required? | Attribute | Default | Description |
√ | TYPE=resource-type | none | The resource-type can be either:
|
√ | NAME=resource-name | none | The resource-name:
|
x | DESCRIPTION = description | none | The description is a user-defined string that describes the resource. |
x | FAILURE_THRESHOLD = value | 0 | The value is the number of times that a resource may fail to start within a period of time (see FAILURE_INTERVAL), before CAA will mark the resource as OFFLINE and discontinue monitoring it. Both the FAILURE_THRESHOLD and the FAILURE_INTERVAL must be non-zero for monitoring to occur. A value of 0 (zero)disables failure threshold monitoring. |
x | FAILURE_INTERVAL = value | 0 | The value is the time (in seconds) that the FAILURE_THRESHOLD is tallied and applied. Both the FAILURE_THRESHOLD and the FAILURE_INTERVAL must be non-zero for monitoring to occur. A value of 0 (zero) disables failure threshold monitoring. |
Table 23-5 and Table 23-6 list the resource attributes specific to an Application resource.
Application Resource Attributes | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Required? | Attribute | Default | Description | |||||||||||||
√ | ACTION_SCRIPT = script | none |
| |||||||||||||
x | CHECK_INTERVAL = seconds | 60 | The number of seconds that will elapse before CAA calls the action script's check entry point. A value of 0 (zero) indicates no check will be performed. | |||||||||||||
x | SCRIPT_TIMEOUT = seconds | 60 | The maximum number of seconds that an action script may take to complete before CAA returns an error status. | |||||||||||||
x | PLACEMENT = policy | balanced | The policy can be either
| |||||||||||||
Varies. See item description | HOSTING_MEMBERS = members | none | Where members is a white-space delimited list of members where the resource can or must run (see PLACEMENT). List the members in the order that they should be selected. If PLACEMENT is balanced, then HOSTING_MEMBERS is not used. | |||||||||||||
x | ACTIVE_PLACEMENT = value | 0 | If value is set to 1, then the placement of the resource is reevaluated per the PLACEMENT attribute when a cluster member rejoins the cluster. See section 23.6.3. |
Application Resource Attributes (continued) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Required? | Attribute | Default | Description | ||||||
x | OPTIONAL_RESOURCES = reslist | none | Where reslist is a white-space delimited list of resources that a member should have available if possible. If some or even all of the listed resources are not available on the member, the resource will still start unless there is a member with a greater set of these optional resources available. The maximum number of resources in reslist is currently limited to 58. | ||||||
x | REQUIRED_RESOURCES = reslist | none | Where reslist is a white-space delimited list of resources that a member must have available in order for this resource to start. | ||||||
x | REBALANCE = time | none | Where time is in the form t:day:hour:min when reevaluation is to occur.
| ||||||
x | AUTO_START = value | 0 | If value is set to 1, then start the application resource authomatically after a member reboot, regardless of the state of the resource prior to the reboot. If value is set to 0 (zero), then start the application resource only if it was ONLINE before the reboot. | ||||||
x | RESTART_ATTEMPTS = value | 1 | The value is the number of times that the resource manager will attempt to restart the resource on the member before attempting to relocate the resource. | ||||||
x | FAILOVER_DELAY = seconds | 0 | The number of seconds that the resource manager will wait before attempting to relocate the application resource after it failed. |
Table 23-7 lists the one attribute that is specific to the Network resource type.
Network Resource Attributes | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Required? | Attribute | Description | |||||||||||||||
√ | SUBNET = xxx.xxx..xxx..xxx | The xxx.xxx..xxx..xxx is the network subnet. The network subnet is the bitwise AND of the IP address and the netmask.
|
Table 23-8 lists the one attribute that is specific to both the Tape and Media Changer resources.
Tape/Media Changer Resource Attributes | ||||||
---|---|---|---|---|---|---|
Required? | Attribute | Description | ||||
√ | DEVICE_NAME = device-name | The device-name is the tape or media changer device-special file name. You can specify either the full path or only the device-special file name.
|
Starting with TruCluster Server version 5.1A-IPK the TYPE_*.cap files in the /var/cluster/caa/template directory have been replaced with a new file format, the type definition file (*.tdf). This change was made primarily to facilitate ease of customization in adding user-defined attributes to application resources.
The type definition file defines the attributes for a resource. The following entries define a resource attribute.
| – The name of the attribute. |
| – This is the caa_profile command switch to assign a value to this attribute. |
| – The data type of the attribute. Valid data types include: boolean, file, internet_address, name_list, name_string, positive_integer, string. |
| – The default value of the attribute. |
| – Whether or not the attribute is required. |
For example, here is an excerpt from the application resource type definition file (application.tdf) that defines the AUTO_START attribute.
# grep -p AUTO_START application.tdf #!========================== attribute: AUTO_START type: boolean switch: -o as default: 0 required: no
The "AUTO_START" resource attribute is not a required attribute. The value of the resource is expected to be a Boolean with a default value of 0. The "-o as" switch is the command option that passes the AUTO_START attribute value to the caa_profile (8) command (see section 23.4.6 for more information).
Additionally, user-defined attributes can be added to the application resource type definition file. We will cover this topic in section 23.5.3.3.
There are several ways that a resource profile can be created.
Use the caa_profile command.
Use the "sysman caa" command (see section 23.5.2 for an example).
Create a profile with your favorite editor.
Use a combination of the previous options. This section will explore the first option.
The caa_profile command can be used to create a resource from the default profile template, located in /var/cluster/caa/template, or with the "-I" option to use a template located elsewhere.
There are many, many options to the caa_profile command, but they are logically categorized by primary, secondary, and tertiary options. The primary options are shown in Table 23-9.
The caa_profile(8) Command's Primary Options | |
---|---|
Option | Description |
-create | Create a resource profile |
-delete | Delete a resource profile |
-update | Modify a resource profile |
-template | Create a resource profile template |
| Print a resource profile |
-validate | Validate a resource profile to determine there are no typos. |
There are many additional options to "-create" switch that directly correlate to the resource attributes that were shown in section 23.4.4. We put together a chart to illustrate the parallels (see Table 23-10).
|
Note, that resource_name will become the value of the NAME resource attribute, while the string following the "-t" option will become the value of the TYPE resource attribute.
Also of note is the "-B" option that is used to indicate the name of the program (or application) that is to be managed by CAA. If starting your application happens to be more complex than executing one program, it would probably be easier to edit the generated script (or write your own) and not use the "-B" option.
Note | A resource is not managed or monitored by CAA until it is registered. See section 23.4.11 for more information. |
Using the caa_profile command or "sysman caa" instead of editing the profile manually can help you to avoid typos in the resource attribute names.
To validate a resource profile, use the caa_profile command with the "-validate" option.
# caa_profile -validate memberUP
What is validated? Table 23-11 illustrates the attributes that are validated.
CAA Resource Profile Validation | ||
---|---|---|
Attribute | Description | Error Message |
NAME | Must match the profile name exactly sans the .cap extension. | NAME attribute must be the same as filename |
Must not be greater than 128 characters, empty, or start with a period(.) | Improper Name: .memberUP | |
TYPE | Must be a validate resource type. Valid types are: application, changer, network, tape. This attribute is case sensitive. | Invalid Type: guitar |
PLACEMENT | Must be a valid placement policy. Valid placement policy can be: balanced, favored, restricted. This attribute is case sensitive. | PLACEMENT invalid: restrict |
HOSTING_MEMBERS | If the placement policy is not balanced then this attribute must exist. | HOSTING_MEMBERS is required for: favored placement policy |
HOSTING_MEMBERS is required for: restricted placement policy | ||
If the placement policy is balanced then this attribute must not exist | No HOSTING_MEMBERS is needed for balanced placement policy | |
ACTION_SCRIPT | If the profile is for an application then the profile must contain this attribute. | The ACTION_SCRIPT attribute of the resource profile must be set |
SUBNET | If the profile is for a network resource then the profile must contain this attribute. The actual subnet or hardware is not validated although it does check that the subnet number is in correct format. | Invalid subset setting for network resource |
DEVICE_NAME | If the profile is for a changer or tape resource then the profile must contain this attribute. The hardware is not checked to see if exists. | DEVICE_NAME must be set for changer |
DEVICE_NAME must be set for tape. | ||
AUTO_START | If the profile is for an application resource these attributes are checked. The attribute value must be a number but not a negative. However, the maximum value is not checked. | AUTO_START Out of range: -1 |
FAILURE_INTERVAL out of range: 10 widgets | ||
OPTIONAL_RESOURCES | Attribute value must not contain a colon(:), semicolon(;), or a comma(,). However, the value is not checked to be a list of existing resources. | OPTIONAL_RESOURCES syntax error: nicUP; tapeDrive |
REQUIRED_RESOURCES syntax error: nicUP, tapeDrive |
Use the caa_profile command with the "-update" option (or use sysman – see section 23.5.2) to modify a resource profile. You also have the option of editing the profile using your favorite editor.
# caa_profile -h | grep update caa_profile -update resource_name [option ...] [-o option,...] [-q]
You can use the majority of the options listed in Table 23-9. See the caa_profile (8) reference page for more information.
Note | Any time a resource profile is modified the resource registry database must be updated. See section 23.4.12 for more information. |
Printing a profile can be accomplished in three ways.
Use the cat (1) or more (1) command.
Use the button in "sysman caa" (see section 23.5.2).
Use the caa_profile command with the "-print" option.
# caa_profile -h | grep print caa_profile -print [resource_name [...]] [-q]
# caa_profile -print nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 SUBNET=192.168.0.0
Deleting a profile can be accomplished in three ways:
Use the rm (1) command to delete the profile.
Use the button in "sysman caa" (see section 23.5.2).
Use the caa_profile command with the "-delete" option.
# caa_profile -delete nicUP
Note | You cannot delete a profile that is associated with a registered resource. You'll receive the following error: |
Can not delete profile for resource nicUP as it is currently registered.
See section 23.4.13 for more information on unregistering a resource.
Caution | Prior to V5.1B, using caa_profile with the "-delete" option to remove an application resource's profile will also delete the resource's action script! Note, "sysman" will do the same thing. # caa_profile -delete memberUP # ls /var/cluster/caa/script/memberUP.scr ls: /var/cluster/caa/script/memberUP.scr not found |
For more information on the caa_profile command, see the caa_profile (8) reference page.
Once you have created and edited a resource profile, the resource must be registered with CAA before it can be managed or monitored.
Note | Before registering a resource with CAA you should always do the following:
|
You can register a profile using the caa_register command.
# caa_register myResource
For more information on registering resources, see the caa_register (8) reference page.
Any time that modifications are made to a resource profile, CAA must be notified. Registered profiles are stored in the caa.reg database in /var/cluster/caa/registry and not read from the profile directory.
If the profile is for an application resource, the profile can be updated using the caa_register command with the "-u" option.
# caa_register -u myApplicationResource
Note | Prior to V5.1B, if the resource is a non-application resource, then the resource must be unregistered and then registered. Since non-application resources are only used as a dependency for an application resource, this presents a bit of an inconvenience in that the application resource must also be unregistered and then registered. For example, say we have a network resource (nicUP) that is a REQUIRED_RESOURCE for an application resource (memberUP). When we attempt to unregister the profile we receive an error.
# caa_unregister nicUP Can't unregister 'nicUP' because it is required by other resources. Could not unregister resource nicUP. |
So we must unregister every resource that depends on nicUP.
Find the resources that require the nicUP resource.
# grep nicUP *.cap | grep -v nicUP.cap memberUP.cap:REQUIRED_RESOURCES=nicUP
If the resource is running, stop it.
# caa_stat -a memberUP -r && caa_stop memberUP Attempting to stop 'memberUP' on member 'molari' Stop of 'memberUP' on member 'molari' succeeded.
Note that we used the "-r" option with the "-a" option to the caa_stat command to see if the resource was running.
If the resource is registered, unregister it.
# caa_stat -a memberUP -g && caa_unregister memberUP
Note that we used the "-g" option with the "-a" option to the caa_stat command to see if the resource was registered.
Unregister and register the non-application resource.
# caa_unregister nicUP && caa_register nicUP
Register and start the application resource.
# caa_register memberUP && caa_start memberUP Attempting to start 'nicUP' on member 'molari' Start of 'nicUP' on member 'molari' succeeded. Attempting to start 'memberUP' on member 'molari' Start of 'memberUP' on member 'molari' succeeded.
For more information on updating the resource registry, see the caa_register (8) reference page.
If a resource is no longer needed, it can be unregistered so that CAA will no longer manage or monitor the resource. To unregister, use the caa_unregister command.
# caa_unregister myResource
Note that the resource must be stopped before it can be unregistered.
# caa_unregister myResource Could not unregister resource myResource.
Also note that this does not remove the profile or the action script from the /var/cluster/caa subdirectories; it merely removes the profile from the CAA registry database (caa.reg).
For more information on unregistering resources, see the caa_unregister (8) reference page.
Once you have registered a resource it must be started. When an application resource is started, the "start" entry point in the action script is called to start the application. To start a resource, use the caa_start command.
# caa_start memberUP
You can start a resource on a particular member using the "-c" option.
# caa_start -c sheridan memberUP
You can start all registered resources using the "-all" option.
# caa_start -all
You can modify user-defined resource attributes when starting a resource as well.
# caa_start USR_ALIAS_IP=16.60.45.10 aliasAPP
See section 23.5.3.3 for more information on user-defined resource attributes.
If you receive the following error, the "start" entry point failed (i.e., a non-zero value was returned), but the "stop" entry point ran successfully. Note, when the "start" entry point fails, the "stop" entry point is automatically called.
# caa_start memberUP Attempting to start 'memberUP' on member 'molari' Start of 'memberUP' on member 'molari' failed. Attempting to start 'memberUP' on member 'sheridan' Start of 'memberUP' on member 'sheridan' failed. No more members to consider Could not start resource memberUP.
Note that the target state remains ONLINE, yet the current state is OFFLINE.
# caa_stat -t -v memberUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------ memberUP application 0/1 0/0 ONLINE OFFLINE
If you receive the following error, both the "start" and "stop" entry points failed.
# caa_start memberUP Attempting to start 'memberUP' on member 'molari' 'memberUP' on member 'molari' has experienced an unrecoverable failure. Human intervention required to resume its availability. Could not start resource memberUP.
Note that in this case since the "stop" entry point also failed. CAA no longer knows the state of the resource.
# caa_stat -t -v memberUP Name Type R/RA F/FT Target State Host -------------------------------------------------------------------------- memberUP application 0/1 0/0 ONLINE UNKNOWN
A resource cannot be started from an UNKNOWN state. It must be forcefully stopped first (see section 23.4.15).
One final note on starting resources: non-application resources typically do not need to be started in order for them to be ONLINE. However, if a non-application resource reaches its FAILURE_THRESHOLD, its target state will be set to OFFLINE. If a non-application resource is in an OFFLINE target state, it can be set to an ONLINE target state using the caa_start command. Note, however, that you will need to correct the problem that forced the resource to an OFFLINE state in the first place before it will go ONLINE.
For example, we have a network resource (nicUP) that is currently ONLINE.
# caa_stat -t -v nicUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------- nicUP network - 0/2 ONLINE ONLINE molari nicUP network - 1/2 ONLINE ONLINE sheridan
We have set the FAILURE_THRESHOLD to 2 and the FAILURE_INTERVAL to 600. In other words, nicUP cannot fail more than twice in a ten minute period or its target state will be set to OFFLINE.
# caa_stat -p nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=600 FAILURE_THRESHOLD=2 SUBNET=192.168.0.0
We will literally pull the plug on the network associated to nicUP (which is our tu0 interface on sheridan).
# ifconfig tu0 tu0: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX> inet 192.168.0.69 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500
Note that the tu0 inet address is 192.168.0.69, and the network subnet nicUP is configured to use is 192.168.0.0.
The plug is pulled.
# caa_stat -t -v nicUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------------- nicUP network - 0/2 ONLINE ONLINE molari nicUP network - 1/2 ONLINE OFFLINE sheridan
Failure #1 has occurred. We plug the interface back in and wait for the state to return ONLINE.
# caa_stat -t -v nicUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------------- nicUP network - 0/2 ONLINE ONLINE molari nicUP network - 1/2 ONLINE ONLINE sheridan
Okay, pull the plug again. This will induce the second failure within the FAILURE_INTERVAL, which will force the target state to OFFLINE.
# caa_stat -t -v nicUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------------- nicUP network - 0/2 ONLINE ONLINE molari nicUP network - 2/2 OFFLINE OFFLINE sheridan
Plug the interface back in. Note that the target state remains OFFLINE.
# caa_stat -t -v nicUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------------- nicUP network - 0/2 ONLINE ONLINE molari nicUP network - 0/2 OFFLINE OFFLINE sheridan
If we now attempt to start a resource that has a dependency on nicUP, note the results.
# caa_start memberUP molari : Resource memberUP (application) cannot run on molari sheridan : Resource nicUP (network) is not available on sheridan Resource memberUP has placement error.
The reason for the first error message is that we restricted memberUP to run only on sheridan. We did this purely to illustrate that a resource that is dependent upon a REQUIRED_RESOURCE will be unable to start if the target state of the REQUIRED_RESOURCE is set to OFFLINE, which is illustrated by the second error message.
Here is a look at the pertinent entries in the memberUP profile:
# caa_profile -print memberUP | grep -E "^REQ|^HOST|^PLACE" HOSTING_MEMBERS=sheridan PLACEMENT=restricted REQUIRED_RESOURCES=nicUP
To solve this problem, simply set the target state of the nicUP resource to ONLINE by using the caa_start command.
# caa_start nicUP memberUP 'nicUP' re-enabled on member 'sheridan' Attempting to start 'nicUP' on member 'sheridan' Start of 'nicUP' on member 'sheridan' succeeded. Attempting to start 'memberUP' on member 'sheridan' Start of 'memberUP' on member 'sheridan' succeeded.
# caa_stat -t -v nicUP memberUP Name Type R/RA F/FT Target State Host ------------------------------------------------------------------------------ memberUP application 0/1 0/0 ONLINE ONLINE sheridan nicUP network - 0/2 ONLINE ONLINE molari nicUP network - 0/2 ONLINE ONLINE sheridan
Note | When a dependency comes ONLINE, any dependents that have their TARGET set to ONLINE will also be started. |
For additional information regarding starting a resource, see the caa_start (8) reference page.
Unless an application resource is in an UNKNOWN state, you can stop it using the caa_stop command. Note, only application resources can be stopped.
# caa_stop memberUP Attempting to stop 'memberUP' on member 'sheridan' Stop of 'memberUP' on member 'sheridan' succeeded.
If the "stop" entry fails, the resource is placed in an UNKNOWN state. In order to get the resource state back to OFFLINE, the resource must be forcefully stopped using the "-f" option.
Here's an example where we hacked the "stop" entry point to immediately exit with status of 1 (anything except a status of zero is considered a failure).
# caa_stop memberUP Attempting to stop 'memberUP' on member 'sheridan' 'memberUP' on member 'sheridan' has experienced an unrecoverable failure. Human intervention required to resume its availability.
# caa_stat -t memberUP Name Type Target State Host ------------------------------------------------------------ memberUP application OFFLINE UNKNOWN sheridan
The "Human intervention required" means that you will need to determine the cause of the failure (and may need to stop your application manually). Additionally, you may want to modify the action script to automatically handle (if possible) the situation that caused the failure to occur in the first place so that "human intervention" will not be required in the future.
Once the cause of the failure is addressed, you must use the caa_stop command with the "-f" option to get the resource state set to OFFLINE.
# caa_stop -f memberUP && caa_stat -t memberUP Attempting to stop 'memberUP' on member 'sheridan' Stop of 'memberUP' on member 'sheridan' succeeded. Name Type Target State Host ------------------------------------------------------------ memberUP application OFFLINE OFFLINE
You can now start the resource using the caa_start command (see section 23.4.14).
You can modify user-defined resource attributes when stopping a resource as well.
# caa_stop USR_STOP_CODE=Because memberUP
See section 23.5.3.3 for more information on user-defined resource attributes.
For more information on stopping a resource, see the caa_stop (8) reference page.
An application resource will automatically relocate to an available cluster member if the member where it is running fails. However there may be an occasion when you would like to relocate the resource. An application resource can be relocated to another member (including a non-favored member). This can be accomplished by using the caa_relocate command.
# caa_relocate memberUP Attempting to stop 'memberUP' on member 'sheridan' Stop of 'memberUP' on member 'sheridan' succeeded. Attempting to start 'nicUP' on member 'molari' Start of 'nicUP' on member 'molari' succeeded. Attempting to start 'memberUP' on member 'molari' Start of 'memberUP' on member 'molari' succeeded.
Note only application resources can be relocated.
If you want to relocate the application resource to a specific member, you can use the "-c" option (output not shown).
# caa_relocate -c sheridan memberUP
You can modify user-defined resource attributes when relocating a resource.
# caa_relocate USR_ALIAS_IP=192.168.0.74 aliasAPP
See section 23.5.3.3 for more information on user-defined resource attributes.
If an application resource's PLACEMENT is set to "restricted" and there are no other HOSTING_MEMBERS available, you will see the following error.
# caa_relocate memberUP molari : Resource memberUP (application) cannot run on molari Resource memberUP has placement error.
The resource will continue running on the member where it is currently placed.
A final note regarding relocation: you can use "-s" option to relocate all the resources currently running on that member to another member.
# caa_relocate -s molari Attempting to stop 'memberUP' on member 'molari' Stop of 'memberUP' on member 'molari' succeeded. Attempting to start 'nicUP' on member 'sheridan' Start of 'nicUP' on member 'sheridan' succeeded. Attempting to start 'memberUP' on member 'sheridan' Start of 'memberUP' on member 'sheridan' succeeded. Attempting to stop 'cluster_lockd' on member 'molari' Stop of 'cluster_lockd' on member 'molari' succeeded. Attempting to start 'cluster_lockd' on member 'sheridan' Start of 'cluster_lockd' on member 'sheridan' succeeded.
Note, resources with a PLACEMENT of "restricted" will not relocate unless another member in the HOSTING_MEMBER attribute is available. Also, a resource's REQUIRED_RESOURCES must be available on the target member in order for the relocation to succeed.
For more information regarding resource relocation, see the caa_relocate (8) reference page.
Placement of an application resource, based on the load of a particular cluster member, can be accomplished using one of methods shown in Table 23-12.
Application Resource Placement and Load-Balancing | ||||||
---|---|---|---|---|---|---|
Approach | C5.1B | V5.1A | V5.1 | V5.0A | ||
command | caa_start | The Cluster administrator runs the caa_start command and the resource is optimally placed based on its PLACEMENT while modified by optional (OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies. | √ | √ | √ | √ |
caa_relocate | The Cluster administrator runs the caa_relocate command and the resource is optimally placed based on its PLACEMENT while modified by optional (OPETIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies. | √ | √ | √ | √ | |
caa_balance | The Cluster Administrator runs the caa_balace command and the placement of resources are reevaluated by the Resource Manager. Load-balancing can be reevaluad for the set of application resources listed, by the application resources on the member listed, or by all application resources in the cluster. | √ | x | x | x | |
cluster | formation | When the cluster is formed, all resources with a TARGET state of ONLINE, or a TARGET state of OFFLINE and AUTO_START set to 1 will be started and optimally placed based on its PLACEMENT while modified by optional(OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies. | √ | √ | √ | √ |
member join | When a member joins the cluster any resource with an ACTIVE_PLACEMENT value of 1 will be reevaluated based on its PLACEMENT while modified by optional (OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies and possibly relocated. Note that AUTO_START is also evaluated (see formation). | √ | √ | √ | √ | |
member leave | When a member leaves the cluster any resource that was running on that member will be optimally placed on another member based on the resource's PLACEMENT while modified by optional (OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies. This can result in a resource not relocating to another, but rather, stopped. | √ | √ | √ | √ | |
time | An application resource's placement is reevaluated for optimal placement at the time set in the REBALANCE attribute, if set. | √ | x | x | x |
There are three commands that cause CAA to evaluate the placement and balance of application resources:
The caa_balance command.
The caa_start command (covered in section 23.4.14).
The caa_relocate command (covered in section 23.4.16).
Starting in V5.1B, the caa_balance command can be used at any time to reevaluate the placement of application resources within the cluster.
|
# caa_balance -all Attempting to stop 'cluster_lockd' on member 'alph11' Stop of 'cluster_lockd' on member 'alph11' succeeded. Attempting to start 'cluster_lockd' on member 'alph12' Start of 'cluster_lockd' on member 'alph12' succeeded. Resource clustercron is already well placed Resource memberUP is already well placed Resource powerUP is already well placed clustercron is placed optimally. No relocation is needed. memberUP is placed optimally. No relocation is needed. powerUP is placed optimally. No relocation is needed. |
|
# caa_balance -s molari Attempting to stop 'cluster_lockd' on member 'alph12' Stop of 'cluster_lockd' on member 'alph12' succeeded. Attempting to start 'cluster_lockd' on member 'alph11' Start of 'cluster_lockd' on member 'alph11' succeeded. Resource memberUP is already well placed Resource powerUP is already well placed memberUP is placed optimally. No relocation is needed. powerUP is placed optimally. No relocation is needed. |
|
# caa_balance memberUP powerUP Resource memberUP is already well placed Attempting to stop 'powerUP' on member 'alph11' Stop of 'powerUP' on member 'alph11' succeeded. Attempting to start 'powerUP' on member 'alph12' Start of 'powerUP' on member 'alph12' succeeded. memberUP is placed optimally. No relocation is needed. |
For more information on application resource load-balancing, see the caa_balance (8) reference page as well as the TruCluster Server Cluster Highly Available Applications guide.
When the cluster is formed, or a member joins or leaves the cluster, CAA will balance the application resource's load based on the following criteria:
When the cluster is formed, resources are started and load-balanced based on:
The AUTO_START attribute.
The ACTIVE_PLACEMENT attribute.
The PLACEMENT and HOSTING_MEMBERS attributes.
The REQUIRED_RESOURCES and OPTIONAL_RESOURCES attributes.
If a member joins the cluster, resources are load-balanced based on:
The AUTO_START attribute.
The ACTIVE_PLACEMENT attribute.
The PLACEMENT and HOSTING_MEMBERS attributes.
The REQUIRED_RESOURCES and OPTIONAL_RESOURCES attributes.
If a member leaves the cluster, resources failover (and load-balanced) based on:
The PLACEMENT and HOSTING_MEMBERS attributes.
The REQUIRED_RESOURCES and OPTIONAL_RESOURCES attributes.
CAA determines which resources to start as follows:
If a resource's TARGET state is ONLINE, start the resource.
If a resource's TARGET state is OFFLINE, but AUTO_START is set to 1, start the resource.
When CAA starts resources, it must place the resource on a member that meets the criteria set forth in the resource's profile, and the number of resources currently running in the cluster.
Determine the PLACEMENT of the resource. If PLACEMENT is "favored" or "restricted", then determine what HOSTING_MEMBERS are available. Note that you should list the members in the HOSTING_MEMBERS attribute in the order you want the resource placed.
PLACEMENT is "balanced"
No Dependencies
Place the resource on the member with the fewest number of online resources.
OPTIONAL_RESOURCES
Place the resource on the member that has the requisite resource(s) available AND has the fewest number of resources running. If there are no members that have the requisite resource(s) available, start on the member running the least number of resources.
REQUIRED_RESOURCES
Place the resource on the member that has the requisite resource(s) available AND has the fewest number of resources running. If no members have the requisite resource(s) available, do not start.
PLACEMENT is "favored"
No Dependencies
Place the resource on the favored member that is earliest in the HOSTING_MEMBERS list. If there are no favored members available, place the resource on a non-favored member.
OPTIONAL_RESOURCES
Place the resource on the favored member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list.
If there are no favored members available, but there is a non-favored member with the requisite resource(s), then start the resource on the non-favored member.
If there are no members with the requisite resource(s) available, start the resource on a favored member anyway.
If there are no favored members available AND no non-favored members available with the requisite resource(s) available, start the resource on a non-favored member anyway.
REQUIRED_RESOURCES
Place the resource on the favored member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list. If there are no favored members with the requisite resource(s) available, but there is a non-favored member with the requisite resource(s), then start the resource on the non-favored member. If no members have the requisite resource(s) available, do not start.
PLACEMENT is "restricted"
No Dependencies
Place the resource on the restricted member that is earliest in the HOSTING_MEMBERS list. If there are no restricted members available, do not start.
OPTIONAL_RESOURCES
Place the resource on the restricted member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list. If no restricted members have the requisite resource(s) available, start on a restricted member anyway.
REQUIRED_RESOURCES
Place the resource on the restricted member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list. If no restricted members have the requisite resource(s) available, do not start.
Placement and load balancing will only take place if there are resources that are currently ONLINE that have an ACTIVE_PLACEMENT set to 1.
If a resource has the ACTIVE_PLACEMENT attribute set to 1, then the resource might relocate to the joining member. Whether or not the resource will actually relocate to the joining member is determined by CAA based on the resource's PLACEMENT, HOSTING_MEMBERS, OPTIONAL_RESOURCES, and REQUIRED_RESOURCES attributes as discussed in section 23.4.17.2.1.
This is the classic resource failover scenario. A resource will locate to another cluster member as long as the placement policy for the resource is satisfied and dependencies for the resource are available on another cluster member. The placement policy and dependency determination were discussed in section 23.4.17.2.1.
Also starting in V5.1B, the REBALANCE attribute was added to the application resource profile. The REBALANCE attribute contains a time specification denoting when the application resource should have its placement reevaluated. The time specification value has the following format:
t:day:hour:minute | day = 0-6(where Sunday = 0) |
hour = 0-23 | |
minute = 0-59 |
An asterisk (*) can be used to designate every day, hour, or minute.
Multiple values can be comma delimited, although a range cannot be specified as of this writing.
For example, to have an application resource's placement reevaluated every Monday @ 2:10PM:
REBALANCE=t:1:14:10
To have an application resource's placement reevaluated every day @ 20 minutes after every hour:
REBALANCE=t:*:*:20
Here's an example where we actually set the memberUP resource's REBALANCE attribute and update its registration.
Set the REBALANCE attribute to reevaluate every Monday, Wednesday, and Friday at Midnight and Noon.
# caa_profile -update memberUP -o bt="t:1,3,5:00,12:00"
Update the registration.
# caa_register -u memberUP REBALANCE entries will be added to clustercron
Verify the change has taken place.
# caa_stat -p memberUP | grep REBALANCE REBALANCE=t:1,3,5:00,12:00
Note | Application resource load balancing is accomplished using a cluster-wide cron (8) application resource (clustercron), which was introduced in V5.1B. A similar solution is documented in the "Using cron in a TruCluster Server Cluster" Best Practice, September, 2001. However, the clustercron application resource implementation we are discussing here is strictly for use by CAA and should not be used as a general-purpose cluster-wide cron. Essentially, clustercron is implemented as a single-instance, high-availability application. Its job is to ensure that certain cluster-related tasks are run from cron. Whatever member clustercron is running on will have its root crontab file modified to run those cluster-related tasks. For example, when we updated memberUP's registration, the following message was displayed: REBALANCE entries will be added to clustercron An entry was placed in the crontab of the member where clustercron is running, which instructs cron to execute the "caa_balance memberUP" command every Monday, Wednesday, and Friday at Midnight and Noon. Let's prove it. First, see where clustercron is running. # caa_stat -t clustercron Name Type Target State Host ------------------------------------------------------------ clustercron application ONLINE ONLINE molari Since clustercron is running on node molari, we'll search its crontab file for the memberUP application resource.
[molari] # /usr/bin/crontab -l | grep memberUP 00 00,12 * * 1,3,5 /usr/sbin/caa_balance memberUP #clustercronData For information on how to create your own cluster-wide cron solution, see the Best Practice, "Using cron in a TruCluster Server Cluster," available at: http://www.tru64unix.compaq.com/docs/best_practices |
In V5.1B, a resource's availability can be obtained using the caa_report command. You can see the statistics of every resource by using the command without any switches, or, by using the "-a" switch, you can choose on which resources to return statistics.
# caa_report -a "memberUP powerUP" Time report for period from earliest known begin-date to Wed Jul 3 00:02:10 2002 Application Availability Report for babylon5 Applications starting/ending uptime --------------------------------------------------------------- memberUP Tue Jul 2 23:40:44 2002 99.61 % Wed Jul 3 00:02:10 2002 powerUP Tue Jul 2 23:41:06 2002 99.05 % Wed Jul 3 00:02:10 2002
The caa_report command can return only those resources that have non-zero statistics by using the "-o" switch. Additionally, using the caa_report command with the "-s" switch will return all resources where statistics exist.
An application's availability for a particular time period can be obtained using the "-b time" and "-e time" switches for a beginning and end time range respectively, where time is in the form of mm/dd/yy:hh:mm.
To see that memberUP is available from June 15, 2002 at Midnight to the present, you can use the following command:
# caa_report -a memberUP -s 6/15/02:00:00
For more information, see the caa_report (1) reference page.
As you have probably noticed by now, a resource's status can be determined by using the caa_stat command. There are two primary forms of output generated by the caa_stat command: list output (multiple lines per resource) and tabular output (one-line per resource).
List output is the default, whereas tabular output is achieved by using the "-t" switch. Note, there is an "-l" (lowercase "L") switch to the caa_stat command that can be used for list output, although it is implied.
Most of the output you have seen in this chapter has been in tabular form, primarily because the output is more compact.
Here is an example of the list output showing the following forms of output for the caa_stat command:
Default output.
Verbose output (the "-v" switch).
The in-memory profile (or currently registered attribute values) obtained using the "-p" switch.
Full output obtained using the "-f" switch (a combination of the "-p" and "-v" switches).
Default | Verbose | Currently Registered Attribute Values | Full (Verbose+CRAV) |
---|---|---|---|
#caa_stat ARes NAME=ARes TYPE=application TARGET=ONLINE STATE=ONLINE on molari | # caa_stat-v ARes NAME=ARes TYPE=application RESTART_ATTEMPTS=1 RESTART_COUNT=0 REBALANCE=t:*:1:0 FAILURE_THRESHOLD=0 FAILURE_COUNT=0 TARGET=ONLINE STATE-ONLINE on molari | # caa_stat-p ARes NAME=ARes TYPE=application ACTION_SCRIPT=ARes.scr ACTIVE_PLACEMENT=0 AUTO_START=0 CHECK_INTERVAL=60 DESCRIPTION=ARes FAILOVER_DELAY=0 FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 HOSTING_MEMBERS= OPTIONAL_RESOURCES= PLACEMENT=balanced REBALANCE=t:*:1:0 REQUIRED_RESOURCES= RESTART_ATTEMPTS=1 SCRIPT_TIMEOUT=60 | # CAA_STAT-F ARes NAME=ARes TYPE=application ACTION_SCRIPT=ARes.scr ACTIVE_PLACEMENT=0 AUTO_START=0 CHECK_INTERVAL=60 RESTART_ATTEMPTS=1 RESTART_COUNT=0 DESCRIPTION=ARes FAILOVER_DELAY=0 FAILURE_INTERVAL=0 HOSTING_MEMBERS= OPTIONAL_RESPIRCES= PLACEMENT=balanced REBALANCE=t:*:1:0 REQUIRED_RESOURCES= SCRIPT_TIMEOUT=60 FAILURE_THRESHOLD=0 FAILURE_COUNT=0 TARGET=ONLINE STATE=ONLINE on molari |
There are a couple of ways to determine whether or not a resource is registered. You can use the caa_stat command followed by the resource name. If the resource is not registered you will receive an error.
# caa_stat aNonRegisteredResource Could not find resource aNonRegisteredResource.
However, if you're writing a script, it would be much easier to be able to check the return status of the caa_stat command by using the "-g" switch with the "-a resource" switch.
# caa_stat -g -a memberUP ; echo $? 0
If the resource is registered, a zero is returned.
# caa_stat -g -a aNonRegisteredResource ; echo $? 1
If the resource is not registered, a value of "1" is returned.
You can use the caa_stat command followed by the resource name. If the resource's STATE is ONLINE, it's running.
You can use the caa_stat command with the "-r -a resource" switches to check the return status. This is particularly useful if you're writing a script.
# caa_stat -r -a memberUP ; echo $? 0
If the resource is running, a zero is returned. Conversely, if the resource is not running, a "1" is returned.
# caa_stop memberUP Attempting to stop 'memberUP' on member 'alph12' Stop of 'memberUP' on member 'alph12' succeeded.
# caa_stat -r -a memberUP ; echo $? 1
You can use the caa_stat command with "-p" switch to see the currently registered attribute values for a given resource. Note, this can be particularly useful in determining whether you remembered to update the registration of a resource after modifying its profile.
Note, that what is actually registered may be different from what is in your profile.
Currently Registered (In Memory): | In Profile: |
# caa_stat -p nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=600 FAILURE_THRESHOLD=2 SUBNET=192.168.0.0 | # caa_profile -print nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 SUBNET=192.168.0.0 |
Remember to update your registration when you modify your profile (see section 23.4.12).
The caa_stat command with the "-c hostname" switch can be used to retrieve resource status for a particular cluster member.
# caa_stat -t -c molari Name Type Target State Host --------------------------------------------------------------- clustercron application ONLINE ONLINE molari memberUP application ONLINE ONLINE molari nicUP network ONLINE ONLINE molari
For more information, see the caa_stat (1) reference page.