23.4 Resources


23.4 Resources

Think of a resource as something that CAA manages and monitors. If you are familiar with earlier versions of TruCluster (ASE and Production Server (PS)), we referred to resources as services. In fact, ASE had four service types: Disk, NFS, Tape, and User-Defined. Production Server added a fifth service, the Distributed Raw Disk (the old DRD) service.

TruCluster Server has four types of CAA resources as of this writing: Application, Changer, Network, and Tape.

Can we draw a parallel between the old ASE and Production Server services and CAA resources? Yes and no, but we have not lost any capability. In fact, as you will soon see, the CAA resource (along with the CFS, the new DRD, and Cluster Alias subsystem) adds more flexibility and capability.

In ASE and PS, the service was responsible for a lot more than starting and stopping an application. For example, a disk service was used to manage storage, optionally add and remove IP aliases, optionally import and deport LSM disk groups, and mount and unmount file systems. With TruCluster Server, the DRD manages the storage, the CLSM manages the disk groups, the CFS manages the file systems, and the CLUA manages the aliases.

So what does CAA manage? Resources.

What types of resources exist?

23.4.1 Resource Types

As of this writing, CAA supports four resource types:

  • Application

  • Changer

  • Network

  • Tape

Application resources must have a resource profile (see section 23.4.3) and an action script (see section 23.4.5).

Non-application resources must have a resource profile (of course, configured and functioning hardware is also important ).

23.4.2 Resource States

Table 23-3 shows what states the resources can have and a description of each state as it relates to each resource. Note, each resource will have a TARGET state and a current STATE.

Table 23-3: CAA Resource States

Resource States

Resource

State

Description

STATE[*]

TARGET[*]

Application

ONLINE

The application started by the resource is running

The resource has been set to start (i.e., caa_start was executed).

OFFLINE

The application started by the resource is not running or the resource has been administratively stopped.

The resource has been set to stop (i.e., caa_stop was executed).

UNKNOWN

The stop entry point within the action script failed or reached the SCRIPT_TIMEOUT. To set the state back to OFFLINE, use "caa_stop -f resource_name".

Not applicable.

Network

ONLINE

The network interface is functioning.

The network interface has not failed.

OFFLINE

The network interface has failed or is misconfigured.

The network resource has reached the FAILURE_THRESHOLD. Once the problem is resolved the TARGET can be set ONLINE using the caa_start command.

Tape & Changer

ONLINE

A direct connection to the device (via the SCSI or Fibre Channel bus) exists.

A direct connection to the device(via the SCSI or Fibre Channel bus) exists.

OFFLINE

No direct connection to the device exists. This can be due to a path failure or simply because the cluster member does not have a direct connection to the bus where the device is connected (e.g., a private or semi-shared bus).

The changer or tape resource has reached the FAILURE_THRESHOLD. Once the problem is resolved TARGET can be set ONLINE using the caa_start command.

[*]-the STATE and TARGET can be seen using the caa_stat (1) or "sysman caa" commands

23.4.3 Resource Profile

The profile for a resource contains the attributes that define the resource. The resource profile is an ASCII file so it can be edited using your favorite editor, although you may find it convenient to create it using the caa_profile command or using sysman with the "caa" keyword accelerator.

A resource profile is required for all resource types. Creating a resource is not the last step along the way. Once a profile has been created (and any time it is modified), it must be registered with the resource manager.

In the following sections we will show you the attributes that can, and must, be used for each resource type. We will also show you how to register a resource, update the registration when a profile is modified, and unregister a resource that is no longer needed.

Let's start by defining the resource attributes.

23.4.4 Resource Attributes

As of this writing there are four resource types. Each resource type uses certain attributes. Some attributes are required while others are optional. Some attributes are common to all resource types.

In the following sections, we will list each attribute, state whether or not it is required, identify its default value, and describe how it is used.

Resource Attribute Category

Section

  • Common

23.4.4.1

  • Application

23.4.4.2

  • Network

23.4.4.3

  • Tape and Media Changer

23.4.4.4

For the most up-to-date resource attribute information, see the caa (4) reference page.

23.4.4.1 Common Resource Attributes

Table 23-4 lists the resource attributes that are common to each resource type.

Table 23-4: CAA Common Resource Attributes

Common Resource Attributes

Required?

Attribute

Default

Description

TYPE=resource-type

none

The resource-type can be either:

  • application

  • changer

  • network

  • tape

NAME=resource-name

none

The resource-name:

  • Must be a unique, user-defined name containing any alphanumeric character, a period, a comma, or an underscore.

  • Cannot start with a period.

  • Must be the same name as the prefix of the profile file.

x

DESCRIPTION = description

none

The description is a user-defined string that describes the resource.

x

FAILURE_THRESHOLD = value

0

The value is the number of times that a resource may fail to start within a period of time (see FAILURE_INTERVAL), before CAA will mark the resource as OFFLINE and discontinue monitoring it. Both the FAILURE_THRESHOLD and the FAILURE_INTERVAL must be non-zero for monitoring to occur.

A value of 0 (zero)disables failure threshold monitoring.

x

FAILURE_INTERVAL = value

0

The value is the time (in seconds) that the FAILURE_THRESHOLD is tallied and applied. Both the FAILURE_THRESHOLD and the FAILURE_INTERVAL must be non-zero for monitoring to occur.

A value of 0 (zero) disables failure threshold monitoring.

23.4.4.2 Application Resource Attributes

Table 23-5 and Table 23-6 list the resource attributes specific to an Application resource.

Table 23-5: CAA Application Resource Attributes – Part 1

Application Resource Attributes

Required?

Attribute

Default

Description

ACTION_SCRIPT = script

none

  • The script is the name of the shell script that CAA will call when starting, stopping, and (optionally) checking the application resource.

  • The script is typically the same name as was used in the NAME attribute (see Common Resource Attributes) with a ".scr" extension.

  • Unless the script is located in the /var/cluster/caa/script directory the full pathname must be used. The action script must include a start and stop entry point.

x

CHECK_INTERVAL = seconds

60

The number of seconds that will elapse before CAA calls the action script's check entry point. A value of 0 (zero) indicates no check will be performed.

x

SCRIPT_TIMEOUT = seconds

60

The maximum number of seconds that an action script may take to complete before CAA returns an error status.

x

PLACEMENT = policy

balanced

The policy can be either

balanced

Place the resource such that the number of resources across the cluster is as balanced as possible (subject to dependency restrictions).


favored

Start the resource on a member listed in the HOSTING_MEMBERS attribute. If no members listed in HOSTING_MEMBERS are available, start on any available cluster member.


restricted

Start the resource on a member listed in the HOSTING_MEMBERS attribute. If no members listed in HOSTING_MEMBERS are available, do not start.


Varies. See item description

HOSTING_MEMBERS = members

none

Where members is a white-space delimited list of members where the resource can or must run (see PLACEMENT). List the members in the order that they should be selected.

If PLACEMENT is balanced, then HOSTING_MEMBERS is not used.

x

ACTIVE_PLACEMENT = value

0

If value is set to 1, then the placement of the resource is reevaluated per the PLACEMENT attribute when a cluster member rejoins the cluster. See section 23.6.3.

Table 23-6: CAA Application Resource Attributes – Part 2

Application Resource Attributes (continued)

Required?

Attribute

Default

Description

x

OPTIONAL_RESOURCES = reslist

none

Where reslist is a white-space delimited list of resources that a member should have available if possible. If some or even all of the listed resources are not available on the member, the resource will still start unless there is a member with a greater set of these optional resources available.

The maximum number of resources in reslist is currently limited to 58.

x

REQUIRED_RESOURCES = reslist

none

Where reslist is a white-space delimited list of resources that a member must have available in order for this resource to start.

x

REBALANCE = time

none

Where time is in the form t:day:hour:min when reevaluation is to occur.

day

day of the week(0 - 6) where 0 = Sunday

hour

hour of the day (0 - 23)

min

minute of the day (0 - 59)

An asterisk (*) can be used as a wildcard to specify every day and/or every hour.

x

AUTO_START = value

0

If value is set to 1, then start the application resource authomatically after a member reboot, regardless of the state of the resource prior to the reboot. If value is set to 0 (zero), then start the application resource only if it was ONLINE before the reboot.

x

RESTART_ATTEMPTS = value

1

The value is the number of times that the resource manager will attempt to restart the resource on the member before attempting to relocate the resource.

x

FAILOVER_DELAY = seconds

0

The number of seconds that the resource manager will wait before attempting to relocate the application resource after it failed.

23.4.4.3 Network Resource Attributes

Table 23-7 lists the one attribute that is specific to the Network resource type.

Table 23-7: CAA Network Resource Attribute

Network Resource Attributes

Required?

Attribute

Description

SUBNET = xxx.xxx..xxx..xxx

The xxx.xxx..xxx..xxx is the network subnet. The network subnet is the bitwise AND of the IP address and the netmask.

IP Address =

18.

32.

64.

121

& netmaskt =

225.

255.

255.

0

subnet =

18.

32.

64.

0

23.4.4.4 Tape and Media Changer Resource Attribute

Table 23-8 lists the one attribute that is specific to both the Tape and Media Changer resources.

Table 23-8: CAA Tape and Media Changer Resource Attribute

Tape/Media Changer Resource Attributes

Required?

Attribute

Description

DEVICE_NAME = device-name

The device-name is the tape or media changer device-special file name. You can specify either the full path or only the device-special file name.

/dev/tape/tape0

tape0

/dev/changer/mc0

mc0

23.4.5 Type Definition File

Starting with TruCluster Server version 5.1A-IPK the TYPE_*.cap files in the /var/cluster/caa/template directory have been replaced with a new file format, the type definition file (*.tdf). This change was made primarily to facilitate ease of customization in adding user-defined attributes to application resources.

The type definition file defines the attributes for a resource. The following entries define a resource attribute.

  • attribute

– The name of the attribute.

  • switch

– This is the caa_profile command switch to assign a value to this attribute.

  • type

– The data type of the attribute. Valid data types include: boolean, file, internet_address, name_list, name_string, positive_integer, string.

  • default

– The default value of the attribute.

  • required

– Whether or not the attribute is required.

For example, here is an excerpt from the application resource type definition file (application.tdf) that defines the AUTO_START attribute.

 # grep -p AUTO_START application.tdf #!========================== attribute: AUTO_START type: boolean switch: -o as default: 0 required: no 

The "AUTO_START" resource attribute is not a required attribute. The value of the resource is expected to be a Boolean with a default value of 0. The "-o as" switch is the command option that passes the AUTO_START attribute value to the caa_profile (8) command (see section 23.4.6 for more information).

Additionally, user-defined attributes can be added to the application resource type definition file. We will cover this topic in section 23.5.3.3.

23.4.6 Creating a Resource Profile

There are several ways that a resource profile can be created.

  • Use the caa_profile command.

  • Use the "sysman caa" command (see section 23.5.2 for an example).

  • Create a profile with your favorite editor.

  • Use a combination of the previous options. This section will explore the first option.

The caa_profile command can be used to create a resource from the default profile template, located in /var/cluster/caa/template, or with the "-I" option to use a template located elsewhere.

There are many, many options to the caa_profile command, but they are logically categorized by primary, secondary, and tertiary options. The primary options are shown in Table 23-9.

Table 23-9: CAA Profile Primary Options

The caa_profile(8) Command's Primary Options

Option

Description

-create

Create a resource profile

-delete

Delete a resource profile

-update

Modify a resource profile

-template

Create a resource profile template

-print

Print a resource profile

-validate

Validate a resource profile to determine there are no typos.

There are many additional options to "-create" switch that directly correlate to the resource attributes that were shown in section 23.4.4. We put together a chart to illustrate the parallels (see Table 23-10).

Table 23-10: CAA Profile Create Options to Resource Attributes

click to expand

Note, that resource_name will become the value of the NAME resource attribute, while the string following the "-t" option will become the value of the TYPE resource attribute.

Also of note is the "-B" option that is used to indicate the name of the program (or application) that is to be managed by CAA. If starting your application happens to be more complex than executing one program, it would probably be easier to edit the generated script (or write your own) and not use the "-B" option.

Note

A resource is not managed or monitored by CAA until it is registered. See section 23.4.11 for more information.

23.4.7 Validating a Resource Profile

Using the caa_profile command or "sysman caa" instead of editing the profile manually can help you to avoid typos in the resource attribute names.

To validate a resource profile, use the caa_profile command with the "-validate" option.

 # caa_profile -validate memberUP 

What is validated? Table 23-11 illustrates the attributes that are validated.

Table 23-11: CAA Profile Validation

CAA Resource Profile Validation

Attribute

Description

Error Message

NAME

Must match the profile name exactly sans the .cap extension.

NAME attribute must be the same as filename

Must not be greater than 128 characters, empty, or start with a period(.)

Improper Name: .memberUP

TYPE

Must be a validate resource type. Valid types are: application, changer, network, tape. This attribute is case sensitive.

Invalid Type: guitar

PLACEMENT

Must be a valid placement policy. Valid placement policy can be: balanced, favored, restricted. This attribute is case sensitive.

PLACEMENT invalid: restrict

HOSTING_MEMBERS

If the placement policy is not balanced then this attribute must exist.

HOSTING_MEMBERS is required for: favored placement policy

HOSTING_MEMBERS is required for: restricted placement policy

If the placement policy is balanced then this attribute must not exist

No HOSTING_MEMBERS is needed for balanced placement policy

ACTION_SCRIPT

If the profile is for an application then the profile must contain this attribute.

The ACTION_SCRIPT attribute of the resource profile must be set

SUBNET

If the profile is for a network resource then the profile must contain this attribute. The actual subnet or hardware is not validated although it does check that the subnet number is in correct format.

Invalid subset setting for network resource

DEVICE_NAME

If the profile is for a changer or tape resource then the profile must contain this attribute. The hardware is not checked to see if exists.

DEVICE_NAME must be set for changer

DEVICE_NAME must be set for tape.

AUTO_START
CHECK_INTERVAL
FAILURE_DELAY
FAILURE_INTERVAL
FAILURE_THRESHOLD
RESTART_ATTEMPTS

If the profile is for an application resource these attributes are checked. The attribute value must be a number but not a negative. However, the maximum value is not checked.

AUTO_START Out of range: -1

FAILURE_INTERVAL out of range: 10 widgets

OPTIONAL_RESOURCES
REQUIRED_RESOURCES

Attribute value must not contain a colon(:), semicolon(;), or a comma(,). However, the value is not checked to be a list of existing resources.

OPTIONAL_RESOURCES syntax error: nicUP; tapeDrive

REQUIRED_RESOURCES syntax error: nicUP, tapeDrive

23.4.8 Updating a Resource Profile

Use the caa_profile command with the "-update" option (or use sysman – see section 23.5.2) to modify a resource profile. You also have the option of editing the profile using your favorite editor.

 # caa_profile -h | grep update         caa_profile -update resource_name [option ...] [-o option,...] [-q] 

You can use the majority of the options listed in Table 23-9. See the caa_profile (8) reference page for more information.

Note

Any time a resource profile is modified the resource registry database must be updated. See section 23.4.12 for more information.

23.4.9 Printing a Resource Profile

Printing a profile can be accomplished in three ways.

  • Use the cat (1) or more (1) command.

  • Use the button in "sysman caa" (see section 23.5.2).

  • Use the caa_profile command with the "-print" option.

 # caa_profile -h | grep print            caa_profile -print [resource_name [...]] [-q] 
 # caa_profile -print nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 SUBNET=192.168.0.0 

23.4.10 Deleting a Resource Profile

Deleting a profile can be accomplished in three ways:

  • Use the rm (1) command to delete the profile.

  • Use the button in "sysman caa" (see section 23.5.2).

  • Use the caa_profile command with the "-delete" option.

 # caa_profile -delete nicUP 
Note

You cannot delete a profile that is associated with a registered resource. You'll receive the following error:

 Can not delete profile for resource nicUP as it is currently registered. 

See section 23.4.13 for more information on unregistering a resource.

Caution

Prior to V5.1B, using caa_profile with the "-delete" option to remove an application resource's profile will also delete the resource's action script! Note, "sysman" will do the same thing.

 # caa_profile -delete memberUP 
 # ls /var/cluster/caa/script/memberUP.scr ls: /var/cluster/caa/script/memberUP.scr not found 

For more information on the caa_profile command, see the caa_profile (8) reference page.

23.4.11 Registering a Resource

Once you have created and edited a resource profile, the resource must be registered with CAA before it can be managed or monitored.

Note

Before registering a resource with CAA you should always do the following:

  • Validate the profile (see section 23.4.7).

  • If it is an Application Resource, see the "scripting tips" in section 23.5.1.

  • If it is a non-application resource make sure that it is configured.

You can register a profile using the caa_register command.

 # caa_register myResource 

For more information on registering resources, see the caa_register (8) reference page.

23.4.12 Updating a Registered Resource

Any time that modifications are made to a resource profile, CAA must be notified. Registered profiles are stored in the caa.reg database in /var/cluster/caa/registry and not read from the profile directory.

If the profile is for an application resource, the profile can be updated using the caa_register command with the "-u" option.

 # caa_register -u myApplicationResource 
Note

Prior to V5.1B, if the resource is a non-application resource, then the resource must be unregistered and then registered. Since non-application resources are only used as a dependency for an application resource, this presents a bit of an inconvenience in that the application resource must also be unregistered and then registered.

For example, say we have a network resource (nicUP) that is a REQUIRED_RESOURCE for an application resource (memberUP). When we attempt to unregister the profile we receive an error.

 # caa_unregister nicUP Can't unregister 'nicUP' because it is required by other resources. Could not unregister resource nicUP. 

So we must unregister every resource that depends on nicUP.

  1. Find the resources that require the nicUP resource.

     # grep nicUP *.cap | grep -v nicUP.cap memberUP.cap:REQUIRED_RESOURCES=nicUP 
  2. If the resource is running, stop it.

     # caa_stat -a memberUP -r && caa_stop memberUP Attempting to stop 'memberUP' on member 'molari' Stop of 'memberUP' on member 'molari' succeeded. 

    Note that we used the "-r" option with the "-a" option to the caa_stat command to see if the resource was running.

  3. If the resource is registered, unregister it.

     # caa_stat -a memberUP -g && caa_unregister memberUP 

    Note that we used the "-g" option with the "-a" option to the caa_stat command to see if the resource was registered.

  4. Unregister and register the non-application resource.

     # caa_unregister nicUP && caa_register nicUP 
  5. Register and start the application resource.

     # caa_register memberUP && caa_start memberUP Attempting to start 'nicUP' on member 'molari' Start of 'nicUP' on member 'molari' succeeded. Attempting to start 'memberUP' on member 'molari' Start of 'memberUP' on member 'molari' succeeded. 

For more information on updating the resource registry, see the caa_register (8) reference page.

23.4.13 Unregistering a Resource

If a resource is no longer needed, it can be unregistered so that CAA will no longer manage or monitor the resource. To unregister, use the caa_unregister command.

 # caa_unregister myResource 

Note that the resource must be stopped before it can be unregistered.

 # caa_unregister myResource Could not unregister resource myResource. 

Also note that this does not remove the profile or the action script from the /var/cluster/caa subdirectories; it merely removes the profile from the CAA registry database (caa.reg).

For more information on unregistering resources, see the caa_unregister (8) reference page.

23.4.14 Starting a Resource

Once you have registered a resource it must be started. When an application resource is started, the "start" entry point in the action script is called to start the application. To start a resource, use the caa_start command.

 # caa_start memberUP 

You can start a resource on a particular member using the "-c" option.

 # caa_start -c sheridan memberUP 

You can start all registered resources using the "-all" option.

 # caa_start -all 

You can modify user-defined resource attributes when starting a resource as well.

 # caa_start USR_ALIAS_IP=16.60.45.10 aliasAPP 

See section 23.5.3.3 for more information on user-defined resource attributes.

If you receive the following error, the "start" entry point failed (i.e., a non-zero value was returned), but the "stop" entry point ran successfully. Note, when the "start" entry point fails, the "stop" entry point is automatically called.

 # caa_start memberUP Attempting to start 'memberUP' on member 'molari' Start of 'memberUP' on member 'molari' failed. Attempting to start 'memberUP' on member 'sheridan' Start of 'memberUP' on member 'sheridan' failed. No more members to consider Could not start resource memberUP. 

Note that the target state remains ONLINE, yet the current state is OFFLINE.

 # caa_stat -t -v memberUP Name         Type          R/RA        F/FT     Target    State      Host ------------------------------------------------------------------------ memberUP     application   0/1         0/0      ONLINE OFFLINE 

If you receive the following error, both the "start" and "stop" entry points failed.

 # caa_start memberUP Attempting to start 'memberUP' on member 'molari' 'memberUP' on member 'molari' has experienced an unrecoverable failure. Human intervention required to resume its availability. Could not start resource memberUP. 

Note that in this case since the "stop" entry point also failed. CAA no longer knows the state of the resource.

 # caa_stat -t -v memberUP Name           Type         R/RA       F/FT    Target        State    Host -------------------------------------------------------------------------- memberUP       application  0/1        0/0     ONLINE UNKNOWN 

A resource cannot be started from an UNKNOWN state. It must be forcefully stopped first (see section 23.4.15).

One final note on starting resources: non-application resources typically do not need to be started in order for them to be ONLINE. However, if a non-application resource reaches its FAILURE_THRESHOLD, its target state will be set to OFFLINE. If a non-application resource is in an OFFLINE target state, it can be set to an ONLINE target state using the caa_start command. Note, however, that you will need to correct the problem that forced the resource to an OFFLINE state in the first place before it will go ONLINE.

For example, we have a network resource (nicUP) that is currently ONLINE.

 # caa_stat -t -v nicUP Name       Type          R/RA       F/FT       Target     State     Host ------------------------------------------------------------------------- nicUP     network         -         0/2        ONLINE     ONLINE   molari nicUP     network         -         1/2        ONLINE     ONLINE   sheridan 

We have set the FAILURE_THRESHOLD to 2 and the FAILURE_INTERVAL to 600. In other words, nicUP cannot fail more than twice in a ten minute period or its target state will be set to OFFLINE.

 # caa_stat -p nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=600 FAILURE_THRESHOLD=2 SUBNET=192.168.0.0 

We will literally pull the plug on the network associated to nicUP (which is our tu0 interface on sheridan).

 # ifconfig tu0 tu0: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX>      inet 192.168.0.69 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500 

Note that the tu0 inet address is 192.168.0.69, and the network subnet nicUP is configured to use is 192.168.0.0.

The plug is pulled.

 # caa_stat -t -v nicUP Name           Type           R/RA          F/FT    Target     State    Host ------------------------------------------------------------------------------- nicUP          network         -             0/2    ONLINE     ONLINE   molari nicUP          network         -             1/2    ONLINE     OFFLINE  sheridan 

Failure #1 has occurred. We plug the interface back in and wait for the state to return ONLINE.

 # caa_stat -t -v nicUP Name           Type           R/RA          F/FT    Target     State    Host ------------------------------------------------------------------------------- nicUP          network         -             0/2    ONLINE     ONLINE   molari nicUP          network         -             1/2    ONLINE     ONLINE   sheridan 

Okay, pull the plug again. This will induce the second failure within the FAILURE_INTERVAL, which will force the target state to OFFLINE.

 # caa_stat -t -v nicUP Name           Type           R/RA          F/FT    Target     State    Host ------------------------------------------------------------------------------- nicUP          network         -             0/2    ONLINE     ONLINE   molari nicUP          network         -             2/2    OFFLINE    OFFLINE  sheridan 

Plug the interface back in. Note that the target state remains OFFLINE.

 # caa_stat -t -v nicUP Name           Type           R/RA          F/FT    Target     State    Host ------------------------------------------------------------------------------- nicUP          network         -             0/2    ONLINE     ONLINE   molari nicUP          network         -             0/2    OFFLINE    OFFLINE  sheridan 

If we now attempt to start a resource that has a dependency on nicUP, note the results.

 # caa_start memberUP molari : Resource memberUP (application) cannot run on molari sheridan : Resource nicUP (network) is not available on sheridan Resource memberUP has placement error. 

The reason for the first error message is that we restricted memberUP to run only on sheridan. We did this purely to illustrate that a resource that is dependent upon a REQUIRED_RESOURCE will be unable to start if the target state of the REQUIRED_RESOURCE is set to OFFLINE, which is illustrated by the second error message.

Here is a look at the pertinent entries in the memberUP profile:

 # caa_profile -print memberUP | grep -E "^REQ|^HOST|^PLACE" HOSTING_MEMBERS=sheridan PLACEMENT=restricted REQUIRED_RESOURCES=nicUP 

To solve this problem, simply set the target state of the nicUP resource to ONLINE by using the caa_start command.

 # caa_start nicUP memberUP 'nicUP' re-enabled on member 'sheridan' Attempting to start 'nicUP' on member 'sheridan' Start of 'nicUP' on member 'sheridan' succeeded. Attempting to start 'memberUP' on member 'sheridan' Start of 'memberUP' on member 'sheridan' succeeded. 

 # caa_stat -t -v nicUP memberUP Name       Type         R/RA     F/FT         Target        State      Host ------------------------------------------------------------------------------ memberUP   application  0/1      0/0          ONLINE        ONLINE     sheridan nicUP      network       -       0/2          ONLINE        ONLINE     molari nicUP      network       -       0/2          ONLINE        ONLINE     sheridan 

Note

When a dependency comes ONLINE, any dependents that have their TARGET set to ONLINE will also be started.

For additional information regarding starting a resource, see the caa_start (8) reference page.

23.4.15 Stopping a Resource

Unless an application resource is in an UNKNOWN state, you can stop it using the caa_stop command. Note, only application resources can be stopped.

 # caa_stop memberUP Attempting to stop 'memberUP' on member 'sheridan' Stop of 'memberUP' on member 'sheridan' succeeded. 

If the "stop" entry fails, the resource is placed in an UNKNOWN state. In order to get the resource state back to OFFLINE, the resource must be forcefully stopped using the "-f" option.

Here's an example where we hacked the "stop" entry point to immediately exit with status of 1 (anything except a status of zero is considered a failure).

 # caa_stop memberUP Attempting to stop 'memberUP' on member 'sheridan' 'memberUP' on member 'sheridan' has experienced an unrecoverable failure. Human intervention required to resume its availability. 

 # caa_stat -t memberUP Name         Type         Target      State         Host ------------------------------------------------------------ memberUP     application  OFFLINE    UNKNOWN     sheridan 

The "Human intervention required" means that you will need to determine the cause of the failure (and may need to stop your application manually). Additionally, you may want to modify the action script to automatically handle (if possible) the situation that caused the failure to occur in the first place so that "human intervention" will not be required in the future.

Once the cause of the failure is addressed, you must use the caa_stop command with the "-f" option to get the resource state set to OFFLINE.

 # caa_stop -f memberUP && caa_stat -t memberUP Attempting to stop 'memberUP' on member 'sheridan' Stop of 'memberUP' on member 'sheridan' succeeded. Name       Type         Target        State    Host ------------------------------------------------------------ memberUP   application  OFFLINE       OFFLINE 

You can now start the resource using the caa_start command (see section 23.4.14).

You can modify user-defined resource attributes when stopping a resource as well.

 # caa_stop USR_STOP_CODE=Because memberUP 

See section 23.5.3.3 for more information on user-defined resource attributes.

For more information on stopping a resource, see the caa_stop (8) reference page.

23.4.16 Relocating a Resource

An application resource will automatically relocate to an available cluster member if the member where it is running fails. However there may be an occasion when you would like to relocate the resource. An application resource can be relocated to another member (including a non-favored member). This can be accomplished by using the caa_relocate command.

 # caa_relocate memberUP Attempting to stop 'memberUP' on member 'sheridan' Stop of 'memberUP' on member 'sheridan' succeeded. Attempting to start 'nicUP' on member 'molari' Start of 'nicUP' on member 'molari' succeeded. Attempting to start 'memberUP' on member 'molari' Start of 'memberUP' on member 'molari' succeeded. 

Note only application resources can be relocated.

If you want to relocate the application resource to a specific member, you can use the "-c" option (output not shown).

 # caa_relocate -c sheridan memberUP 

You can modify user-defined resource attributes when relocating a resource.

 # caa_relocate USR_ALIAS_IP=192.168.0.74 aliasAPP 

See section 23.5.3.3 for more information on user-defined resource attributes.

If an application resource's PLACEMENT is set to "restricted" and there are no other HOSTING_MEMBERS available, you will see the following error.

 # caa_relocate memberUP molari : Resource memberUP (application) cannot run on molari Resource memberUP has placement error. 

The resource will continue running on the member where it is currently placed.

A final note regarding relocation: you can use "-s" option to relocate all the resources currently running on that member to another member.

 # caa_relocate -s molari Attempting to stop 'memberUP' on member 'molari' Stop of 'memberUP' on member 'molari' succeeded. Attempting to start 'nicUP' on member 'sheridan' Start of 'nicUP' on member 'sheridan' succeeded. Attempting to start 'memberUP' on member 'sheridan' Start of 'memberUP' on member 'sheridan' succeeded. Attempting to stop 'cluster_lockd' on member 'molari' Stop of 'cluster_lockd' on member 'molari' succeeded. Attempting to start 'cluster_lockd' on member 'sheridan' Start of 'cluster_lockd' on member 'sheridan' succeeded. 

Note, resources with a PLACEMENT of "restricted" will not relocate unless another member in the HOSTING_MEMBER attribute is available. Also, a resource's REQUIRED_RESOURCES must be available on the target member in order for the relocation to succeed.

For more information regarding resource relocation, see the caa_relocate (8) reference page.

23.4.17 Load-Balancing Resources

Placement of an application resource, based on the load of a particular cluster member, can be accomplished using one of methods shown in Table 23-12.

Table 23-12: Application Resource Load-Balancing Options

Application Resource Placement and Load-Balancing

Approach

C5.1B

V5.1A

V5.1

V5.0A

command

caa_start

The Cluster administrator runs the caa_start command and the resource is optimally placed based on its PLACEMENT while modified by optional (OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies.

caa_relocate

The Cluster administrator runs the caa_relocate command and the resource is optimally placed based on its PLACEMENT while modified by optional (OPETIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies.

caa_balance

The Cluster Administrator runs the caa_balace command and the placement of resources are reevaluated by the Resource Manager.

Load-balancing can be reevaluad for the set of application resources listed, by the application resources on the member listed, or by all application resources in the cluster.

x

x

x

cluster

formation

When the cluster is formed, all resources with a TARGET state of ONLINE, or a TARGET state of OFFLINE and AUTO_START set to 1 will be started and optimally placed based on its PLACEMENT while modified by optional(OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies.

member join

When a member joins the cluster any resource with an ACTIVE_PLACEMENT value of 1 will be reevaluated based on its PLACEMENT while modified by optional (OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies and possibly relocated. Note that AUTO_START is also evaluated (see formation).

member leave

When a member leaves the cluster any resource that was running on that member will be optimally placed on another member based on the resource's PLACEMENT while modified by optional (OPTIONAL_RESOURCES) and/or required (REQUIRED_RESOURCES) dependencies. This can result in a resource not relocating to another, but rather, stopped.

time

An application resource's placement is reevaluated for optimal placement at the time set in the REBALANCE attribute, if set.

x

x

x

23.4.17.1 Command-Initiated

There are three commands that cause CAA to evaluate the placement and balance of application resources:

  • The caa_balance command.

  • The caa_start command (covered in section 23.4.14).

  • The caa_relocate command (covered in section 23.4.16).

Starting in V5.1B, the caa_balance command can be used at any time to reevaluate the placement of application resources within the cluster.

  • Reevaluate all application resources in the cluster.

 # caa_balance -all Attempting to stop 'cluster_lockd' on member 'alph11' Stop of 'cluster_lockd' on member 'alph11' succeeded. Attempting to start 'cluster_lockd' on member 'alph12' Start of 'cluster_lockd' on member 'alph12' succeeded. Resource clustercron is already well placed Resource memberUP is already well placed Resource powerUP is already well placed clustercron is placed optimally. No relocation is needed. memberUP is placed optimally. No relocation is needed. powerUP is placed optimally. No relocation is needed. 

  • Reevaluate the resources on a particular cluster member.

 # caa_balance -s molari Attempting to stop 'cluster_lockd' on member 'alph12' Stop of 'cluster_lockd' on member 'alph12' succeeded. Attempting to start 'cluster_lockd' on member 'alph11' Start of 'cluster_lockd' on member 'alph11' succeeded. Resource memberUP is already well placed Resource powerUP is already well placed memberUP is placed optimally. No relocation is needed. powerUP is placed optimally. No relocation is needed. 

  • Reevaluate specific application resources.

 # caa_balance memberUP powerUP Resource memberUP is already well placed Attempting to stop 'powerUP' on member 'alph11' Stop of 'powerUP' on member 'alph11' succeeded. Attempting to start 'powerUP' on member 'alph12' Start of 'powerUP' on member 'alph12' succeeded. memberUP is placed optimally. No relocation is needed. 

For more information on application resource load-balancing, see the caa_balance (8) reference page as well as the TruCluster Server Cluster Highly Available Applications guide.

23.4.17.2 Cluster-Initiated

When the cluster is formed, or a member joins or leaves the cluster, CAA will balance the application resource's load based on the following criteria:

  • When the cluster is formed, resources are started and load-balanced based on:

    • The AUTO_START attribute.

    • The ACTIVE_PLACEMENT attribute.

    • The PLACEMENT and HOSTING_MEMBERS attributes.

    • The REQUIRED_RESOURCES and OPTIONAL_RESOURCES attributes.

  • If a member joins the cluster, resources are load-balanced based on:

    • The AUTO_START attribute.

    • The ACTIVE_PLACEMENT attribute.

    • The PLACEMENT and HOSTING_MEMBERS attributes.

    • The REQUIRED_RESOURCES and OPTIONAL_RESOURCES attributes.

  • If a member leaves the cluster, resources failover (and load-balanced) based on:

    • The PLACEMENT and HOSTING_MEMBERS attributes.

    • The REQUIRED_RESOURCES and OPTIONAL_RESOURCES attributes.

23.4.17.2.1 Resource Placement and Load Balance at Cluster Formation

CAA determines which resources to start as follows:

  • If a resource's TARGET state is ONLINE, start the resource.

  • If a resource's TARGET state is OFFLINE, but AUTO_START is set to 1, start the resource.

When CAA starts resources, it must place the resource on a member that meets the criteria set forth in the resource's profile, and the number of resources currently running in the cluster.

Determine the PLACEMENT of the resource. If PLACEMENT is "favored" or "restricted", then determine what HOSTING_MEMBERS are available. Note that you should list the members in the HOSTING_MEMBERS attribute in the order you want the resource placed.

  • PLACEMENT is "balanced"

    • No Dependencies

      Place the resource on the member with the fewest number of online resources.

    • OPTIONAL_RESOURCES

      Place the resource on the member that has the requisite resource(s) available AND has the fewest number of resources running. If there are no members that have the requisite resource(s) available, start on the member running the least number of resources.

    • REQUIRED_RESOURCES

      Place the resource on the member that has the requisite resource(s) available AND has the fewest number of resources running. If no members have the requisite resource(s) available, do not start.

  • PLACEMENT is "favored"

    • No Dependencies

      Place the resource on the favored member that is earliest in the HOSTING_MEMBERS list. If there are no favored members available, place the resource on a non-favored member.

    • OPTIONAL_RESOURCES

      Place the resource on the favored member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list.

      If there are no favored members available, but there is a non-favored member with the requisite resource(s), then start the resource on the non-favored member.

      If there are no members with the requisite resource(s) available, start the resource on a favored member anyway.

      If there are no favored members available AND no non-favored members available with the requisite resource(s) available, start the resource on a non-favored member anyway.

    • REQUIRED_RESOURCES

      Place the resource on the favored member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list. If there are no favored members with the requisite resource(s) available, but there is a non-favored member with the requisite resource(s), then start the resource on the non-favored member. If no members have the requisite resource(s) available, do not start.

  • PLACEMENT is "restricted"

    • No Dependencies

      Place the resource on the restricted member that is earliest in the HOSTING_MEMBERS list. If there are no restricted members available, do not start.

    • OPTIONAL_RESOURCES

      Place the resource on the restricted member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list. If no restricted members have the requisite resource(s) available, start on a restricted member anyway.

    • REQUIRED_RESOURCES

      Place the resource on the restricted member that has the requisite resource(s) available AND is the earliest member in the HOSTING_MEMBERS list. If no restricted members have the requisite resource(s) available, do not start.

23.4.17.2.2 Resource Placement and Load Balancing When a Member Joins the Cluster

Placement and load balancing will only take place if there are resources that are currently ONLINE that have an ACTIVE_PLACEMENT set to 1.

If a resource has the ACTIVE_PLACEMENT attribute set to 1, then the resource might relocate to the joining member. Whether or not the resource will actually relocate to the joining member is determined by CAA based on the resource's PLACEMENT, HOSTING_MEMBERS, OPTIONAL_RESOURCES, and REQUIRED_RESOURCES attributes as discussed in section 23.4.17.2.1.

23.4.17.2.3 Resource Placement and Load Balancing When a Member Leaves the Cluster

This is the classic resource failover scenario. A resource will locate to another cluster member as long as the placement policy for the resource is satisfied and dependencies for the resource are available on another cluster member. The placement policy and dependency determination were discussed in section 23.4.17.2.1.

23.4.17.3 Time-Initiated

Also starting in V5.1B, the REBALANCE attribute was added to the application resource profile. The REBALANCE attribute contains a time specification denoting when the application resource should have its placement reevaluated. The time specification value has the following format:

t:day:hour:minute

day = 0-6(where Sunday = 0)
hour = 0-23
minute = 0-59

An asterisk (*) can be used to designate every day, hour, or minute.

Multiple values can be comma delimited, although a range cannot be specified as of this writing.

For example, to have an application resource's placement reevaluated every Monday @ 2:10PM:

 REBALANCE=t:1:14:10 

To have an application resource's placement reevaluated every day @ 20 minutes after every hour:

 REBALANCE=t:*:*:20 

Here's an example where we actually set the memberUP resource's REBALANCE attribute and update its registration.

  1. Set the REBALANCE attribute to reevaluate every Monday, Wednesday, and Friday at Midnight and Noon.

     # caa_profile -update memberUP -o bt="t:1,3,5:00,12:00" 
  2. Update the registration.

     # caa_register -u memberUP REBALANCE entries will be added to clustercron 
  3. Verify the change has taken place.

     # caa_stat -p memberUP | grep REBALANCE REBALANCE=t:1,3,5:00,12:00 
Note

Application resource load balancing is accomplished using a cluster-wide cron (8) application resource (clustercron), which was introduced in V5.1B.

A similar solution is documented in the "Using cron in a TruCluster Server Cluster" Best Practice, September, 2001. However, the clustercron application resource implementation we are discussing here is strictly for use by CAA and should not be used as a general-purpose cluster-wide cron.

Essentially, clustercron is implemented as a single-instance, high-availability application. Its job is to ensure that certain cluster-related tasks are run from cron. Whatever member clustercron is running on will have its root crontab file modified to run those cluster-related tasks.

For example, when we updated memberUP's registration, the following message was displayed:

 REBALANCE entries will be added to clustercron 

An entry was placed in the crontab of the member where clustercron is running, which instructs cron to execute the "caa_balance memberUP" command every Monday, Wednesday, and Friday at Midnight and Noon.

Let's prove it. First, see where clustercron is running.

 # caa_stat -t clustercron Name         Type         Target          State     Host ------------------------------------------------------------ clustercron  application  ONLINE          ONLINE    molari 

Since clustercron is running on node molari, we'll search its crontab file for the memberUP application resource.

 [molari] # /usr/bin/crontab -l | grep memberUP 00 00,12 * * 1,3,5 /usr/sbin/caa_balance memberUP #clustercronData 

For information on how to create your own cluster-wide cron solution, see the Best Practice, "Using cron in a TruCluster Server Cluster," available at:

http://www.tru64unix.compaq.com/docs/best_practices

23.4.18 Resource Availability Statistics

In V5.1B, a resource's availability can be obtained using the caa_report command. You can see the statistics of every resource by using the command without any switches, or, by using the "-a" switch, you can choose on which resources to return statistics.

 # caa_report -a "memberUP powerUP" Time report for period from earliest known begin-date to Wed Jul 3 00:02:10 2002          Application Availability Report for babylon5 Applications          starting/ending             uptime --------------------------------------------------------------- memberUP             Tue Jul 2 23:40:44 2002      99.61 %                      Wed Jul 3 00:02:10 2002 powerUP              Tue Jul 2 23:41:06 2002      99.05 %                      Wed Jul 3 00:02:10 2002 

The caa_report command can return only those resources that have non-zero statistics by using the "-o" switch. Additionally, using the caa_report command with the "-s" switch will return all resources where statistics exist.

An application's availability for a particular time period can be obtained using the "-b time" and "-e time" switches for a beginning and end time range respectively, where time is in the form of mm/dd/yy:hh:mm.

To see that memberUP is available from June 15, 2002 at Midnight to the present, you can use the following command:

 # caa_report -a memberUP -s 6/15/02:00:00 

For more information, see the caa_report (1) reference page.

23.4.19 Resource Status

As you have probably noticed by now, a resource's status can be determined by using the caa_stat command. There are two primary forms of output generated by the caa_stat command: list output (multiple lines per resource) and tabular output (one-line per resource).

List output is the default, whereas tabular output is achieved by using the "-t" switch. Note, there is an "-l" (lowercase "L") switch to the caa_stat command that can be used for list output, although it is implied.

Most of the output you have seen in this chapter has been in tabular form, primarily because the output is more compact.

Here is an example of the list output showing the following forms of output for the caa_stat command:

  • Default output.

  • Verbose output (the "-v" switch).

  • The in-memory profile (or currently registered attribute values) obtained using the "-p" switch.

  • Full output obtained using the "-f" switch (a combination of the "-p" and "-v" switches).

Table 23-13: caa_stat Output

Default

Verbose

Currently Registered Attribute Values

Full (Verbose+CRAV)

 #caa_stat ARes NAME=ARes TYPE=application TARGET=ONLINE STATE=ONLINE on molari
 # caa_stat-v ARes NAME=ARes TYPE=application RESTART_ATTEMPTS=1 RESTART_COUNT=0 REBALANCE=t:*:1:0 FAILURE_THRESHOLD=0 FAILURE_COUNT=0 TARGET=ONLINE STATE-ONLINE on molari
 # caa_stat-p ARes NAME=ARes TYPE=application ACTION_SCRIPT=ARes.scr ACTIVE_PLACEMENT=0 AUTO_START=0 CHECK_INTERVAL=60 DESCRIPTION=ARes FAILOVER_DELAY=0 FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 HOSTING_MEMBERS= OPTIONAL_RESOURCES= PLACEMENT=balanced REBALANCE=t:*:1:0 REQUIRED_RESOURCES= RESTART_ATTEMPTS=1 SCRIPT_TIMEOUT=60
 # CAA_STAT-F ARes NAME=ARes TYPE=application ACTION_SCRIPT=ARes.scr ACTIVE_PLACEMENT=0 AUTO_START=0 CHECK_INTERVAL=60 RESTART_ATTEMPTS=1 RESTART_COUNT=0 DESCRIPTION=ARes FAILOVER_DELAY=0 FAILURE_INTERVAL=0 HOSTING_MEMBERS= OPTIONAL_RESPIRCES= PLACEMENT=balanced REBALANCE=t:*:1:0 REQUIRED_RESOURCES= SCRIPT_TIMEOUT=60 FAILURE_THRESHOLD=0 FAILURE_COUNT=0 TARGET=ONLINE STATE=ONLINE on molari

23.4.19.1 How Can I Determine If a Resource is Registered?

There are a couple of ways to determine whether or not a resource is registered. You can use the caa_stat command followed by the resource name. If the resource is not registered you will receive an error.

 # caa_stat aNonRegisteredResource Could not find resource aNonRegisteredResource. 

However, if you're writing a script, it would be much easier to be able to check the return status of the caa_stat command by using the "-g" switch with the "-a resource" switch.

 # caa_stat -g -a memberUP ; echo $? 0 

If the resource is registered, a zero is returned.

 # caa_stat -g -a aNonRegisteredResource ; echo $? 1 

If the resource is not registered, a value of "1" is returned.

23.4.19.2 How Can I Determine If a Resource is Running?

You can use the caa_stat command followed by the resource name. If the resource's STATE is ONLINE, it's running.

You can use the caa_stat command with the "-r -a resource" switches to check the return status. This is particularly useful if you're writing a script.

 # caa_stat -r -a memberUP ; echo $? 0 

If the resource is running, a zero is returned. Conversely, if the resource is not running, a "1" is returned.

 # caa_stop memberUP Attempting to stop 'memberUP' on member 'alph12' Stop of 'memberUP' on member 'alph12' succeeded. 
 # caa_stat -r -a memberUP ; echo $? 1 

23.4.19.3 How Can I Determine Which Attribute Values are Registered For a Given Resource?

You can use the caa_stat command with "-p" switch to see the currently registered attribute values for a given resource. Note, this can be particularly useful in determining whether you remembered to update the registration of a resource after modifying its profile.

Note, that what is actually registered may be different from what is in your profile.

Currently Registered (In Memory):In Profile:
 # caa_stat -p nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=600 FAILURE_THRESHOLD=2 SUBNET=192.168.0.0 
 # caa_profile -print nicUP NAME=nicUP TYPE=network DESCRIPTION=nicUP FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 SUBNET=192.168.0.0 

Remember to update your registration when you modify your profile (see section 23.4.12).

23.4.19.4 How Can I See Which Resources are Running On a Particular Cluster Member?

The caa_stat command with the "-c hostname" switch can be used to retrieve resource status for a particular cluster member.

 # caa_stat -t -c molari Name           Type          Target         State       Host --------------------------------------------------------------- clustercron    application   ONLINE         ONLINE     molari memberUP       application   ONLINE         ONLINE     molari nicUP          network       ONLINE         ONLINE     molari 

For more information, see the caa_stat (1) reference page.




TruCluster Server Handbook
TruCluster Server Handbook (HP Technologies)
ISBN: 1555582591
EAN: 2147483647
Year: 2005
Pages: 273

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net