4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster

 < Day Day Up > 



4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster

In this section, we describe the steps to implement IBM Tivoli Workload Scheduler in an HACMP cluster. We use the mutual takeover scenario described in 3.1.1, "Mutual takeover for IBM Tivoli Workload Scheduler" on page 64.

Note

In this section we assume that you have finished planning your cluster and have also finished the preparation tasks to install HACMP. If you have not finished these tasks, perform the steps described in Chapter 3, "Planning and Designing an HACMP Cluster", and the preparation tasks described in Chapter 3 "Installing HACMP". We strongly recommend that you install IBM Tivoli Workload Scheduler before HACMP, and confirm that IBM Tivoli Workload Scheduler runs without any problem.

It is important that you also confirm that IBM Tivoli Workload Scheduler is able to fallover and fallback between nodes, by manually moving the volume group between nodes. This verification procedure is described in "Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster" on page 202.

4.1.1 IBM Tivoli Workload Scheduler implementation overview

Figure 4-1 on page 185 shows a diagram of a IBM Tivoli Workload Scheduler implementation in a mutual takeover HACMP cluster. Using this diagram, we will describe how IBM Tivoli Workload Scheduler could be implemented, and what you should be aware of. Though we do not describe a hot standby scenario of IBM Tivoli Workload Scheduler, the steps used to configure IBM Tivoli Workload Scheduler for a mutual takeover scenario also cover what should be done for a hot standby scenario.


Figure 4-1: IBM Tivoli Workload Scheduler implementation overview

To make IBM Tivoli Workload Scheduler highly available in an HACMP cluster, the IBM Tivoli Workload Scheduler instance should be installed on the external shared disk. This means that the /TWShome directory should reside on the shared disk and not the locally attached disk. This is the bottom line to enable HACMP to relocate the IBM Tivoli Workload Scheduler engine from one node to another, along with other system components such as external disks and service IP labels.

When implementing IBM Tivoli Workload Scheduler in a cluster, there are certain items you should be aware of, such as the location of the IBM Tivoli Workload Scheduler engine and the IP address used for IBM Tivoli Workload Scheduler workstation definition. Specifically for a mutual takeover scenario, you have more to consider, as there will be multiple instances of IBM Tivoli Workload Scheduler running on one node.

Following are the considerations you need to keep in mind when implementing IBM Tivoli Workload Scheduler in an HACMP cluster. The following considerations apply for Master Domain Manager, Domain Manager, Backup Domain Manager and FTA.

  • Location of IBM Tivoli Workload Scheduler engine executables

    As mentioned earlier, IBM Tivoli Workload Scheduler engine should be installed in the external disk to be serviced by HACMP. In order to have the same instance of IBM Tivoli Workload Scheduler process its job on another node after a fallover, executables must be installed on the external disk. For Version 8.2, all files essential to IBM Tivoli Workload Scheduler processing are installed in the /TWShome directory. The /TWShome directory should reside on file systems on the shared disk.

    For versions prior to 8.2, IBM Tivoli Workload Scheduler executables should be installed in a file system with the mount point above the /TWShome directory. For example, if /TWShome is /usr/maestro/maestro, the mount point should be /usr/maestro.

    In a mutual takeover scenario, you may have a case where multiple instances of IBM Tivoli Workload Scheduler are installed on the shared disk. In such a case, make sure these instances are installed on separate file systems residing on separate volume groups.

  • Creating mount points on standby nodes

    Create a mount point for the IBM Tivoli Workload Scheduler file system on all nodes that may run that instance of IBM Tivoli Workload Scheduler. When configuring for a mutual takeover, make sure that you create mount points for every IBM Tivoli Workload Scheduler instance that may run a node.

    In Figure 4-1 on page 185, nodes tivaix1 and tivaix2 may both have two instances of IBM Tivoli Workload Scheduler engine running in case of a node failure. Note that in the diagram, both nodes have mount points for TWS Engine1 and TWS Engine2.

  • IBM Tivoli Workload Scheduler user account and group account

    On each node, create a IBM Tivoli Workload Scheduler user and group for all IBM Tivoli Workload Scheduler instances that may run on the node. The user's home directory must be set to /TWShome.

    If a IBM Tivoli Workload Scheduler instance will fallover and fallback among several nodes in a cluster, make sure all those nodes have the IBM Tivoli Workload Scheduler user and group defined to control that instance. In the mutual takeover scenario, you may have multiple instances running at the same time on one node. Make sure you create separate users for each IBM Tivoli Workload Scheduler instance in your cluster so that you are able to control them separately.

    In our scenario, we add user maestro and user maestro2 on both nodes because TWS Engine1 and TWS Engine2 should be able to run on both nodes. The same group accounts should be created on both nodes to host these users.

  • Netman port

    When there will be only one instance of IBM Tivoli Workload Scheduler running on a node, using the default port (31111) is sufficient.

    For a mutual takeover scenario, you need to consider setting different port numbers for each IBM Tivoli Workload Scheduler instance in the cluster. This is because several instances of IBM Tivoli Workload Scheduler may run on the same node, and no IBM Tivoli Workload Scheduler instance on the same node should have same netman port. In our scenario, we set the netman port of TWS Engine1 to 31111, and the netman port of TWS Engine2 to 31112.

  • IP address

    The IP address or IP label specified in the workstation definition should be the service IP address or the service IP label for HACMP. If you plan a fallover or a fallback for an IBM Tivoli Workload Scheduler instance, it should not use an IP address or IP label that is bound to a particular node. (Boot address and persistent address used in an HACMP cluster are normally bound to one node, so these should not be used.) This is to ensure that IBM Tivoli Workload Scheduler instance does not lose connection with other IBM Tivoli Workload Scheduler instances in case of a fallover or a fallback.

    In our diagram, note that TWS_Engine1 uses a service IP address called tivaix1_service, and TWS_Engine2 uses a service IP address called tivaix2_service. These service IP address will move along with the IBM Tivoli Workload Scheduler instance from one node to another.

  • Starting and stopping IBM Tivoli Workload Scheduler instances

    IBM Tivoli Workload Scheduler instances should be started and stopped from HACMP application start and stop scripts. Generate a custom script to start and stop each IBM Tivoli Workload Scheduler instance in your cluster, then when configuring HACMP, associate your custom scripts to resource groups that your IBM Tivoli Workload Scheduler instances reside in.

    If you put IBM Tivoli Workload Scheduler under the control of HACMP, it should not be started from /etc/inittab or from any other way except for application start and stop scripts.

  • Files installed on the local disk

    Though most IBM Tivoli Workload Scheduler executables are installed in the IBM Tivoli Workload Scheduler file system, some files are installed on local disks. You may have to copy these local files to other nodes.

    For IBM Tivoli Workload Scheduler 8.2, copy the /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a file.

    For IBM Tivoli Workload Scheduler8.1, you may need to copy the following files to any node in the cluster that will host the IBM Tivoli Workload Scheduler instance:

    • /usr/unison/components

    • /usr/lib/libatrc.a

    • /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a

  • Monitoring the IBM Tivoli Workload Scheduler process

    HACMP is able to monitor application processes. It can be configured to initiate a cluster event based on application process failures. When considering to monitor TWS using HACMP's application monitoring, keep in mind that IBM Tivoli Workload Scheduler stops and restarts its all its processes (excluding the netman process) every 24 hours. The recycling of the processes is initiated by the FINAL jobstream, which is set to run at a certain time everyday.

    Be aware that if you configure HACMP to initiate an action in the event of a TWS process failure, this expected behavior of IBM Tivoli Workload Scheduler could be interpreted as a failure of IBM Tivoli Workload Scheduler processes, and could trigger unwanted action. If you simply want to monitor process failures, we recommend that you use monitoring software (for example, IBM Tivoli Monitoring.)

4.1.2 Preparing to install

Before installing IBM Tivoli Workload Scheduler in an HACMP cluster, define the IBM Tivoli Workload Scheduler group and user account on each node that will host IBM Tivoli Workload Scheduler. The following procedure presents an example of how to prepare for an installation of IBM Tivoli Workload Scheduler 8.2 on AIX 5.2. We assume that IBM Tivoli Workload Scheduler file system is already created as described in 3.2.3, "Planning and designing an HACMP cluster" on page 67.

In our scenario, we added a group named tivoli, users maestro and maestro2 on each node.

  1. Creating group accounts

    Execute the following on all the nodes that IBM Tivoli Workload Scheduler instance will run.

    1. Enter the following command; this will take you to the SMIT Groups menu:

       # smitty groups 

    2. From the Groups menu, select Add a Group.

    3. Enter a value for each of the following items:

      Group NAME

      Assign a name for the group.

      ADMINISTRATIVE Group

      true

      Group ID

      Assign a group ID. Assign the same ID for all nodes in the cluster.

      Figure 4-2 shows an example of adding a group. We added group tivoli with an ID 2000.


      Figure 4-2: Adding a group

  2. Adding IBM Tivoli Workload Scheduler users

    Perform the following procedures for all nodes in the cluster:

    1. Enter the following command; this will take you to the SMIT Users menu:

       # smitty user 

    2. From the Users menu, select Add a User.

    3. Enter the values for the following item, then press Enter. The other items should be left as it is.

      User NAME

      Assign a name for the user.

      User ID

      Assign an ID for the user. This ID for the user should be the same on all nodes.

      ADMINISTRATIVE USER?

      false

      Primary GROUP

      Set the group that you have defined in the previous step.

      Group SET

      Set the primary group and the staff group.

      HOME directory

      Set /TWShome.

    Figure 4-3 shows an example of a IBM Tivoli Workload Scheduler user definition. In the example, we defined maestro user.


    Figure 4-3: Defining a user

    1. After you have added the user, modify the $HOME/.profile of the user. Modify the PATH variable to include the /TWShome and /TWShome/bin directory. This enables you to run IBM Tivoli Workload Scheduler commands in any directory as long as you are logged in as the IBM Tivoli Workload Scheduler user. Also add the TWS_TISDIR variable. The value for the TWS_TISDIR should be the /TWShome directory. The TWS_TISDIR enables IBM Tivoli Workload Scheduler to display messages in the correct language codeset. Example 4-1 shows an example of how the variable should be defined. In the example, /usr/maestro is the /TWShome directory.

      Example 4-1: An example .profile for TWSusr

      start example
       PATH=/TWShome:/TWShome/bin:$PATH export PATH TWS_TISDIR=/usr/maestro export TWS_TISDIR 
      end example

4.1.3 Installing the IBM Tivoli Workload Scheduler engine

In this section, we show you the steps to install IBM Tivoli Workload Scheduler 8.2 Engine (Master Domain Manager) from the command line. For procedures to install IBM Tivoli Workload Scheduler using the graphical user interface, refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273.

In our scenario, we installed two TWS instances called TIVAIX1 and TIVAIX2 on a shared external disk. TIVAIX1 was installed from node tivaix1, and TIVAIX2 was installed from tivaix2. We used the following steps to do this.

  1. Before installing, identify the following items. These items are required when running the installation script.

    • workstation type - master

    • workstation name - The name of the workstation. This is the value for the host field that you specify in the workstation definition. It will also be recorded in the globalopts file.

    • netman port - Specify the listening port for netman. We remind you again that if you plan to have several instances of IBM Tivoli Workload Scheduler running on machine, make sure you specify different port numbers for each IBM Tivoli Workload Scheduler instance.

    • company name - Specify this if you would like your company name in reports produced by IBM Tivoli Workload Scheduler report commands.

  2. Log in to the node where you want to install the IBM Tivoli Workload Scheduler engine, as a root user.

  3. Confirm that the IBM Tivoli Workload Scheduler file system is mounted. If it is not mounted, use the mount command to mount the IBM Tivoli Workload Scheduler file system.

  4. Insert IBM Tivoli Workload Scheduler Installation Disk 1.

  5. Locate the twsinst script in the directory of the platform on which you want to run the script. The following is an example of installing a Master Domain Manager named TIVAIX1.

     # ./twsinst -new -uname twsusr -cputype master -thiscpu cpuname -master cpuname -port port_no -company company_name 

    Where:

    • twsusr - The name of the IBM Tivoli Workload Scheduler user.

    • master - The workstation type. Refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273, for other options

    • cpuname - The name of the workstation. For -thiscpu, specify the name of the workstation that you are installing. For -master, specify the name of the Master Domain Manager. When installing the Master Domain Manager, specify the same value for -thiscpu and -master.

    • port_no - Specify the port number that netman uses to receive incoming messages other workstations.

    • company_name - The name of your company (optional)

    Example 4-2 shows sample command syntax for installing Master Domain Manager TIVAIX1.

    Example 4-2: twsinst script example for TIVAIX1

    start example
     # ./twsinst -new -uname maestro -cputype master -thiscpu tivaix1 -master tivaix1 -port 31111 -company IBM 
    end example

    Example 4-3 shows sample command syntax for installing Master Domain Manager TIVAIX2.

    Example 4-3: twsinst script example for TIVAIX2

    start example
     # ./twsinst -new -uname maestro2 -cputype master -thiscpu tivaix2 -master tivaix2 -port 31112 -company IBM 
    end example

4.1.4 Configuring the IBM Tivoli Workload Scheduler engine

After you have installed the IBM Tivoli Workload Scheduler engine as a Master Domain Manager, perform the following configuration tasks. These are the minimum tasks that you should perform to get IBM Tivoli Workload Scheduler Master Domain Manager running. For instructions on configuring other types of workstation, such as Fault Tolerant Agents and Domain Managers, refer to Tivoli Workload Scheduler Job Scheduling Console User's Guide, SH19-4552, or Tivoli Workload Scheduler Version 8.2, Reference Guide, SC32-1274.

Checking the workstation definition

In order to have IBM Tivoli Workload Scheduler serviced correctly by HACMP in the event of a fallover, it must have the service IP label or the service IP address defined in its workstation definition. When installing a Master Domain Manager (master), the workstation definition is added automatically. After you have installed IBM Tivoli Workload Scheduler, check the workstation definition of the master and verify that the service IP label or the address is associated with the master.

  1. Log into the master workstation as: TWSuser.

  2. Execute the following command; this opens a text editor with the master's CPU definition:

     $ composer "modify cpu=master_name" 

    Where:

    • master - the workstation name of the master.

    Example 4-4 and Example 4-5 give the workstation definition for workstations TIVAIX1 and TIVAIX2 that we installed. Notice that the value for NODE is set to the service IP label in each workstation definition.

    Example 4-4: Workstation definition for TIVIAIX1

    start example
     CPUNAME TIVAIX1   DESCRIPTION "MASTER CPU"   OS  UNIX   NODE tivaix1_svc   DOMAIN MASTERDM   TCPADDR 31111      FOR MAESTRO      AUTOLINK ON      RESOLVEDEP ON      FULLSTATUS ON END 
    end example

    Example 4-5: Workstation definition for TIVIAIX1

    start example
     CPUNAME TIVAIX2   DESCRIPTION "MASTER CPU"   OS  UNIX   NODE tivaix2_svc   DOMAIN MASTERDM   TCPADDR 31112      FOR MAESTRO      AUTOLINK ON      RESOLVEDEP ON      FULLSTATUS ON END 
    end example

  3. If the value for NODE is set to the service IP label correctly, then close the workstation definition. If is not set correctly, then modify the file and save.

Adding the FINAL jobstream

The FINAL jobstream is responsible for generating daily production files. Without this jobstream, IBM Tivoli Workload Scheduler is unable to perform daily job processing. IBM Tivoli Workload Scheduler provides a definition file that you can use to add this FINAL jobstream. The following steps describe how to add the FINAL jobstream using this file.

  1. Log in as the IBM Tivoli Workload Scheduler user.

  2. Add the FINAL schedule by running the following command.

     $ composer "add Sfinal" 

  3. Run Jnextday to create the production file.

     $ Jnextday 

  4. Check the status of IBM Tivoli Workload Scheduler by issuing the following command.

     $ conman status 

    If IBM Tivoli Workload Scheduler started correctly, the status should be Batchman=LIVES.

  5. Check that all IBM Tivoli Workload Scheduler processes (netman, mailman, batchman, jobman) are running. Example 4-6 illustrates checking for the IBM Tivoli Workload Scheduler process.

    Example 4-6: Checking for IBM Tivoli Workload Scheduler process

    start example
     $ ps -ef | grep -v grep | grep maestro maestro2 14484 31270   0 16:59:41      -  0:00 /usr/maestro2/bin/batchman -parm 32000 maestro2 16310 13940   1 16:00:29  pts/0  0:00 -ksh maestro2 26950     1   0 22:38:59      -  0:00 /usr/maestro2/bin/netman maestro2 28658 16310   2 17:00:07  pts/0  0:00 ps -ef     root 29968 14484   0 16:59:41      -  0:00 /usr/maestro2/bin/jobman maestro2 31270 26950   0 16:59:41      -  0:00 /usr/maestro2/bin/mailman -parm 3 2000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE $ 
    end example

4.1.5 Installing IBM Tivoli Workload Scheduler Connector

If you plan to use JSC to perform administration tasks for IBM Tivoli Workload Scheduler, install the IBM Tivoli Workload Scheduler connector. IBM Tivoli Workload Scheduler connector must be installed on any TMR server or Managed Node that is running IBM Tivoli Workload Scheduler Master Domain Manager. Optionally, JSC could be installed on any Domain Manager or FTA, providing that Managed Node is also installed.

Note

Tivoli Management Framework should be installed prior to IBM Tivoli Workload Scheduler Connector installation. For instructions on installing a TMR server, refer to Chapter 5 or Tivoli Enterprise Installation Guide Version 4.1, GC32-0804. In this section, we assume that you have already installed Tivoli Management Framework, and have applied the latest set of fix packs.

Here we describe the steps to install Job Scheduling Services (a prerequisite to install IBM Tivoli Workload Scheduler Connector) and IBM Tivoli Workload Scheduler Connector by using the command line. For instructions on installing IBM Tivoli Workload Scheduler Connector from the Tivoli Desktop, refer to Tivoli Workload Scheduler Job Scheduling Console User's Guide, SH19-4552.

For our mutual takeover scenario, each node in our two-node HACMP cluster (tivaix1, tivaix2) hosts a TMR server. We installed IBM Tivoli Workload Scheduler Connector on each of the two cluster nodes.

  1. Before installing, identify the following items. These items are required when running the IBM Tivoli Workload Scheduler Connector installation script.

    • Node name to install IBM Tivoli Workload Scheduler Connector - This must be the name defined in the Tivoli Management Framework.

    • The full path to the installation image - For Job Scheduling Services, it is the directory with the TMF_JSS.IND file. For IBM Tivoli Workload Scheduler Connector, it is the directory with the TWS_CONN.IND file.

    • IBM Tivoli Workload Scheduler installation directory - The /TWShome directory.

    • Connector Instance Name - A name for a connector instance name.

    • Instance Owner - The name of the IBM Tivoli Workload Scheduler user.

  2. Insert the IBM Tivoli Workload Scheduler Installation Disk 1.

  3. Log in on the TMR server as root user.

  4. Run the following command to source the Tivoli environment variables:

     # . /etc/Tivoli/setup_env.sh 

  5. Run the following command to install Job Scheduling Services:

     # winstall -c install_dir -i TMF_JSS nodename 

    Where:

    • install_dir - the path to the installation image

    • nodename - the name of the TMR server or the Managed Node that you are installing JSS on.

    The command will perform a prerequisite verification, and you will be prompted to proceed with the installation or not.

    Example 4-7 illustrates the execution of the command.

    Example 4-7: Installing JSS from the command line

    start example
     # winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_1/TWS_CONN -i TMF_JSS tivaix1 Checking product dependencies...  Product TMF_3.7.1 is already installed as needed.  Dependency check completed. Inspecting node tivaix2... Installing Product: Tivoli Job Scheduling Services v1.2 Unless you cancel, the following operations will be executed:   For the machines in the independent class:     hosts: tivaix2    need to copy the CAT (generic) to:          tivaix2:/usr/local/Tivoli/msg_cat   For the machines in the aix4-r1 class:     hosts: tivaix2    need to copy the BIN (aix4-r1) to:          tivaix2:/usr/local/Tivoli/bin/aix4-r1    need to copy the ALIDB (aix4-r1) to:          tivaix2:/usr/local/Tivoli/spool/tivaix2.db Continue([y]/n)? Creating product installation description object...Created. Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix2   Completed. Distributing architecture specific Binaries --> tivaix2   Completed. Distributing architecture specific Server Database --> tivaix2  ....Product install completed successfully.  Completed. Registering product installation attributes...Registered. 
    end example

  6. Verify that Job Scheduling Services was installed by running the following command:

     # wlsinst -p 

    This command shows a list of all the Tivoli products installed in your environment. You should see in the list "Tivoli Job Scheduling Services v1.2". Example 4-8 shows an example of the command output. The 10th line shows that JSS was installed successfully

    Example 4-8: wlsinst -p command output

    start example
     # wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1  (build 09/19) Tivoli AEF, Version  4.1  (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Distribution Status Console, Version 4.1 # 
    end example

  7. To install IBM Tivoli Workload Scheduler Connector, run the following command:

     # winstall -c install_dir -i TWS_CONN twsdir=/TWShome iname=instance owner=twsuser createinst=1 nodename 

    Where:

    • Install_dir - the path of the installation image.

    • twsdir - set this to /TWSHome.

    • iname - the name of the IBM Tivoli Workload Scheduler Connector instance.

    • owner - the name of the IBM Tivoli Workload Scheduler user.

  8. Verify that IBM Tivoli Workload Scheduler Connector was installed by running the following command.

     # wlsinst -p 

    This command shows a list of all the Tivoli products installed in your environment. You should see in the list "TWS Connector 8.2".

    The following is an example of a command output. The 11th line shows that IBM Tivoli Workload Scheduler Connector was installed successfully.

Example 4-9: wlsinst -p command output

start example
 # wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1  (build 09/19) Tivoli AEF, Version  4.1  (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Tivoli TWS Connector 8.2 Distribution Status Console, Version 4.1 # 
end example

4.1.6 Setting the security

After you have installed IBM Tivoli Workload Scheduler Connectors, apply changes to the IBM Tivoli Workload Scheduler Security file so that users can access IBM Tivoli Workload Scheduler through JSC. If you grant access to a Tivoli Administrator, then any operating system user associated to that Tivoli Administrator is granted access through JSC. For more information on IBM Tivoli Workload Scheduler Security file, refer to Tivoli Workload Scheduler Version 8.2 Installation Guide, SC32-1273. To modify the security file, follow the procedures described in this section.

For our scenario, we added the name of two Tivoli Administrators, Root_tivaix1-region and Root_tivaix2-region, to the Security file of each Master Domain Manager. Root_tivaix1-region is a Tivoli Administrator on tivaix1, and Root_tivaix2-region is a Tivoli Administrator on tivaix2. This will make each iIBM Tivoli Workload Scheduler Master Domain Manager accessible from either of the two TMR servers. In the event of a fallover, IBM Tivoli Workload Scheduler Master Domain Manager remains accessible from JSC through the Tivoli Administrator on the surviving node.

  1. Log into IBM Tivoli Workload Scheduler master as the TWSuser. TWSuser is the user you have used to install IBM Tivoli Workload Scheduler.

  2. Run the following command to dump the Security file to a text file.

     # dumpsec > /tmp/sec.txt 

  3. Modify the security file and save your changes. Add the name of Tivoli Administrators to the LOGON clause.

    Example 4-8 on page 197 illustrates a security file. This security file grants full privileged access to Tivoli Administrators called Root_tivaix1-region and Root_tivaix2-region.

    Example 4-10: Example of a security file

    start example
     USER MAESTRO         CPU=@+LOGON=maestro,root,Root_tivaix2-region,Root_tivaix1-region BEGIN         USEROBJ CPU=@   ACCESS=ADD,DELETE,DISPLAY,MODIFY,ALTPASS         JOB     CPU=@ ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,CONFIRM,DELDEP,DELETE,DI SPLAY,KILL,MODIFY,RELEASE,REPLY,RERUN,SUBMIT,USE,LIST         SCHEDULE        CPU=@ ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,DELDEP,DELETE,DI SPLAY,LIMIT,MODIFY,RELEASE,REPLY,SUBMIT,LIST         RESOURCE        CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY,RESOURCE,USE,LI ST         PROMPT          ACCESS=ADD,DELETE,DISPLAY,MODIFY,REPLY,USE,LIST         FILE    NAME=@ ACCESS=CLEAN,DELETE,DISPLAY,MODIFY         CPU     CPU=@ ACCESS=ADD,CONSOLE,DELETE,DISPLAY,FENCE,LIMIT,LINK,MODIF Y,SHUTDOWN,START,STOP,UNLINK,LIST         PARAMETER       CPU=@   ACCESS=ADD,DELETE,DISPLAY,MODIFY         CALENDAR                ACCESS=ADD,DELETE,DISPLAY,MODIFY,USE END 
    end example

  4. Verify your security file by running the following command. Make sure that no errors or warnings are displayed.

     $ makesec -v /tmp/sec.txt 

    Note

    Running makesec command with the -v option only verifies your security file to see there are no syntax errors. It does not update the security database.

    Example 4-11 shows the sample output of the makesec -v command:

    Example 4-11: Output of makesec -v command

    start example
     $ makesec -v /tmp/sec.txt TWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1) Licensed Materials      Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)] MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total) $ 
    end example

  5. If there are no errors, compile the security file with the following command:

     $ makesec /tmp/sec.txt 

    Example 4-12 illustrates output of the makesec command:

    Example 4-12: Output of makesec command

    start example
     $ makesec /tmp/sec.txt TWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1) Licensed Materials      Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)] MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total) MAKESEC:Security file installed as /usr/maestro/Security $ 
    end example

  6. When applying changes to the security file, the connector instance should be stopped to allow the change to take effect. Run the following commands to source the Tivoli environment variables and stop the connector instance:

     $ . /etc/Tivoli/setup_env.sh $ wmaeutil inst_name -stop "*" 

    where inst_name is the name of the instance you would like to stop.

    Example 4-13 shows an example of wmaeutil command to stop a connector instance called TIVAIX1.

    Example 4-13: Output of wmaeutil command

    start example
     $ . /etc/Tivoli/setup_env.sh $ wmaeutil TIVAIX1 -stop "*" AWSBCT758I Done stopping the ENGINE server AWSBCT758I Done stopping the DATABASE server AWSBCT758I Done stopping the PLAN server $ 
    end example

    Note

    You do not need to manually restart the connector instance, as it is automatically started when a user logs in to JSC.

  7. Verify that the changes in the security file are effective. by running the dumpsec command. This will dump the current content of the security file into a text file. Open the text file and confirm that the previous change you have made is reflected:

     $ dumpsec > filename 

    where filename is the name of the text file.

  8. Verify that the changes are effective by logging into JSC as a user you have added in the security file.

4.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance

One IBM Tivoli Workload Scheduler Connector instance can only be mapped to one IBM Tivoli Workload Scheduler instance. In our mutual takeover scenario, one TMR server would be hosting two instances of IBM Tivoli Workload Scheduler in case a fallover occurs. An additional IBM Tivoli Workload Scheduler Connector instance is required on each node so that a user can access both instances of IBM Tivoli Workload Scheduler on the surviving node.

We added a connector instance to each node to control both IBM Tivoli Workload Scheduler Master Domain Manager TIVAIX1 and TIVAIX2. To add an additional IBM Tivoli Workload Scheduler Connector Instance, perform the following tasks.

Note

You must install the Job Scheduling Services and IBM Tivoli Workload Scheduler Connector Framework products before performing these tasks.

  1. Log into a cluster node as root.

  2. Source the Tivoli environment variables by running the following command:

     #. /etc/Tivoli/setup_env.sh 

  3. List the existing connector instance:

     # wlookup -ar MaestroEngine 

    Example 4-14 on page 201 shows one IBM Tivoli Workload Scheduler Connector instance called TIVAIX1.

    Example 4-14: Output of wlookup command before adding additional instance

    start example
     # wlookup -ar MaestroEngine TIVAIX1 1394109314.1.661#Maestro::Engine# 
    end example

  4. Add an additional connector instance:

     # wtwsconn.sh -create -n instance_name -t TWS_directory 

    where:

    • instance_name - the name of the instance you would like to add.

    • TWS_directory - the path where the IBM Tivoli Workload Scheduler engine associated with the instance resides.

    Example 4-15 shows output for the wtwsconn.sh command. We added a TWS Connector instance called TIVAIX2. This instance is for accessing IBM Tivoli Workload Scheduler engine installed on /usr/maestro directory.

    Example 4-15: Sample wtwsconn.sh command

    start example
     # wtwsconn.sh -create -n TIVAIX2 -t /usr/maestro Scheduler engine created Created instance: TIVAIX2, on node: tivaix1 MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro2 
    end example

  5. Run the wlookup -ar command again to verify that the instance was successfully added. The IBM Tivoli Workload Scheduler Connector that you have just added should show up in the list.

     # wlookup -ar MaestroEngine 

    Example 4-16 shows that IBM Tivoli Workload Scheduler Connector instance TIVAIX2 is added to the list.

    Example 4-16: Output of wlookup command after adding additional instance

    start example
     # wlookup -ar MaestroEngine TIVAIX1 1394109314.1.661#Maestro::Engine# TIVAIX2 1394109314.1.667#Maestro::Engine## 
    end example

4.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster

When you have finished installing IBM Tivoli Workload Scheduler, verify that the BM Tivoli Workload Scheduler is able to move from one node to another, and that it is able to run on the standby node(s) in the cluster.

It is important that you perform this task manually before applying fix packs, and also before you install HACMP. Making sure that IBM Tivoli Workload Scheduler behaves as expected before each major change simplifies troubleshooting in case you have issues with IBM Tivoli Workload Scheduler. If you apply IBM Tivoli Workload Scheduler fix packs and install HACMP, and then find out that IBM Tivoli Workload Scheduler behaves unexpectedly, it would be difficult to determine the cause of the problem. Though it may seem cumbersome, we strongly recommend that you verify IBM Tivoli Workload Scheduler behavior before you make a change to a system. The sequence of the verification is as follows.

  1. Stop IBM Tivoli Workload Scheduler on a cluster node. Log in as TWSuser and run the following command:

     $ conman "shut ;wait" 

  2. Migrate the volume group to another node. Refer to the volume group migration procedure described in "Define the shared LVM components" on page 94.

  3. Start IBM Tivoli Workload Scheduler on the node by running the conman start command:

     $ conman start 

  4. Verify the batchman status. Make sure the Batchman status is LIVES.

     $ conman status 

  5. Verify that all IBM Tivoli Workload Scheduler processes are running by issuing the ps command:

     $ ps -ef | grep -v grep | grep maestro 

    Example 4-17 shows an example of ps command output. Check that netman, mailman, batchman and jobman processes are running for each IBM Tivoli Workload Scheduler instance installed.

    Example 4-17: Output of ps command

    start example
     $ ps -ef | grep -v grep | grep maestro  maestro 26378 43010   1 18:46:58  pts/1  0:00 -ksh     root 30102 34192   0 18:49:59      -  0:00 /usr/maestro/bin/jobman  maestro 33836 38244   0 18:49:59      -  0:00 /usr/maestro/bin/mailman -parm 32 000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE  maestro 34192 33836   0 18:49:59      -  0:00 /usr/maestro/bin/batchman -parm 3 2000  maestro 38244     1   0 18:49:48      -  0:00 /usr/maestro/bin/netman  maestro 41214 26378   4 18:54:52  pts/1  0:00 ps -ef $ 
    end example

  6. If using JSC, log into the IBM Tivoli Workload Scheduler Master Domain Manager. Verify that you are able to see the scheduling objects and the production plan.

4.1.9 Applying IBM Tivoli Workload Scheduler fix pack

When you have completed installing IBM Tivoli Workload Scheduler and IBM Tivoli Workload Scheduler Connector, apply the latest fix pack available. For instructions on installing the fix pack for IBM Tivoli Workload Scheduler engine, refer to the README file included in each fix pack. The IBM Tivoli Workload Scheduler engine fix pack can be applied either from the command line by using the twspatch script, or from the Java-based graphical user interface.

The IBM Tivoli Workload Scheduler Connector fix pack is applied from the Tivoli Desktop. Because instructions on applying IBM Tivoli Workload Scheduler Connectors are not documented in the fix pack README, we describe the procedures to install IBM Tivoli Workload Scheduler Connector fix packs here.

Before applying any of the fix packs, make sure you have a viable backup.

Note

The same level of fix pack should be applied to the IBM Tivoli Workload Scheduler engine and the IBM Tivoli Workload Scheduler Connector. If you apply a fix pack to the IBM Tivoli Workload Scheduler engine, make sure you apply the same level of fix pack for IBM Tivoli Workload Scheduler Connector.

Applying IBM Tivoli Workload Scheduler Connector fix pack from Tivoli Desktop

Install the IBM Tivoli Workload Scheduler Connector fix pack as follows:

  1. Set the installation media. If you are using a CD, then insert the CD. If you have downloaded the fix pack from the fix pack download site, then extract the tar file in a temporary directory.

  2. Log in to TMR server using the Tivoli Desktop. Enter the host machine name, user name, and password, then press OK as seen in Figure 4-4 on page 205.


    Figure 4-4: Logging into IBM Tivoli Management Framework through the Tivoli Desktop

  3. Select Desktop -> Install -> Install Patch as seen in Figure 4-5.


    Figure 4-5: Installing the fix pack

  4. If the error message in Figure 4-6 on page 206 is shown, press OK and proceed to the next step.


    Figure 4-6: Error message

  5. In the Path Name field, enter the full path of the installation image, as shown in Figure 4-7. The full path should be the directory where U2_TWS.IND file resides.


    Figure 4-7: Specifying the path to the installation image

  6. In the Install Patch dialog (Figure 4-8 on page 207), select the fix pack from the Select Patches to Install list. Then make sure the node to install the fix pack is shown in the Clients to Install On list. Press Install.


    Figure 4-8: Install Patch

  7. Pre-installation verification is performed, and then you are prompted to continue or not. If there are no errors or warnings shown in the dialog, press Continue Install (Figure 4-9 on page 208).


    Figure 4-9: Patch Installation

  8. Confirm the "Finished Patch Installation" message, then press Close.

  9. Log in, as root user, to the node where you just installed the fix pack.

  10. Source the Tivoli environment variables:

     # . /etc/Tivoli/setup_env.sh 

  11. Verify that the fix pack was installed successfully:

     # wlsinst -P 

    For IBM Tivoli Workload Scheduler Connector Fix Pack 01, confirm that Tivoli TWS Connector upgrade to v8.2 patch 1 is included in the list. For Fix Pack 02, confirm that Tivoli TWS Connector upgrade to v8.2 patch 2 is included in the list. Example 4-18 on page 209 shows an output of the wlsinst command after installing Fix Pack 01.

    Example 4-18: Verifying the fix pack installation

    start example
     # wlsinst -P 4.1-TMF-0008 Tier 2 3.7 Endpoint Bundles for Tier1 Gateways Tivoli Framework Patch 4.1-TMF-0013  (build 05/28) Tivoli Framework Patch 4.1-TMF-0014  (build 05/30) Tivoli Framework Patch 4.1-TMF-0015 for linux-ppc (LCF41) (build 05/14) Tivoli Management Agent 4.1 for iSeries Endpoint (41016) Tivoli Framework Patch 4.1-TMF-0034  (build 10/17) Java 1.3 for Tivoli, United Linux Tivoli Management Framework, Version 4.1 [2928] os400 Endpoint French language Tivoli Management Framework, Version 4.1 [2929] os400 Endpoint German language Tivoli Management Framework, Version 4.1 [2931] os400 Endpoint Spanish language Tivoli Management Framework, Version 4.1 [2932] os400 Endpoint Italian language Tivoli Management Framework, Version 4.1 [2962] os400 Endpoint Japanese language Tivoli Management Framework, Version 4.1 [2980] os400 Endpoint Brazilian Portugu ese language Tivoli Management Framework, Version 4.1 [2984] os400 Endpoint DBCS English lang uage Tivoli Management Framework, Version 4.1 [2986] os400 Endpoint Korean language Tivoli Management Framework, Version 4.1 [2987] os400 Endpoint Traditional Chine se language Tivoli Management Framework, Version 4.1 [2989] os400 Endpoint Simplified Chines e language Tivoli TWS Connector upgrade to v8.2 patch 1 # 
    end example

Best practices for applying IBM Tivoli Workload Scheduler fix pack

As of December 2003, the latest fix pack for IBM Tivoli Workload Scheduler 8.2 is 8.2-TWS-FP02. Because 8.2-TWS-0002 is dependent on 8.2-TWS-FP01, we applied both fix packs. Here are some hints and tips for applying these fix packs.

Additional disk space required for backup files

Though not mentioned in the README for 8.2-TWS-FP01, a backup copy of the existing binaries is created under the home directory of the user applying the fix. For applying IBM Tivoli Workload Scheduler fix pack, we use the root user. This means that the backup is created under the home directory of the root user.

Before applying the fix, confirm that you have enough space in that directory for the backup file. For UNIX systems, it is 25 MB. If you do not have enough space on that directory, the fix pack installation may fail with the message shown in Example 4-19. This example shows an installation failure message when installation of the fix pack was initiated from the command line.

Example 4-19: IBM Tivoli Workload Scheduler fix pack installation error

start example
 # ./twspatch -install -uname maestro2 Licensed Materials Property of IBM TWS-WSH (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. TWS for UNIX/TWSPATCH 8.2 Revision: 1.5 AWSFAF027E Error: Operation INSTALL failed. For more details see the /tmp/FP_TWS _AIX_maestro2^8.2.0.01.log log file. # 
end example

Check the fix pack installation log file. On UNIX systems, the fix pack installation log file is saved in /tmp directory. These logs are named twspatchXXXXX.log, where XXXXX is a 5-digit random number. Example 4-20 shows the log file we received when we had insufficient disk space.

Example 4-20: The contents of /tmp/twspatchXXXXX.log

start example
 Tue Dec 2 19:24:53 CST 2003 DISSE0006E Operation unsuccessful: fatal failure. 
end example

If you do not have sufficient disk space on the desired directory, you can either add additional disk space, or change the backup directory to another directory with sufficient disk space. For instructions on how to change the backup directory, refer to the README file attached to 8.2-TWS-FP02.

Note

Changing the backup directory requires a modification of a file used by IBM Tivoli Configuration Manager 4.2, and changes may affect the behavior of TCM 4.2 if you have it installed on your system. Consult your IBM service provider for more information.

4.1.10 Configure HACMP for IBM Tivoli Workload Scheduler

After you complete the installation of the application server (IBM Tivoli Workload Scheduler, in this redbook) and then HACMP, you configure HACMP as you planned in 3.2.3, "Planning and designing an HACMP cluster" on page 67, so the application server can be made highly available.

Note

We strongly recommend that you install your application servers and ensure they function properly before installing HACMP. In the environment we used for this redbook, we installed IBM Tivoli Workload Scheduler and/or IBM Tivoli Management Framework as called for by the scenarios we implement. This section is specifically oriented towards showing you how to configure HACMP for IBM Tivoli Workload Scheduler in a mutual takeover cluster.

In this section we show how to configure HACMP specifically for IBM Tivoli Workload Scheduler. Configuration of HACMP 5.1 can be carried out through the HACMP menu of the SMIT interface, or by the Online Planning Worksheets tool shipped with the HACMP 5.1 software. In this and in the following sections, we describe the steps to configure HACMP using the SMIT interface to support IBM Tivoli Workload Scheduler. We walk you through a series of steps that are specifically tailored to make the following scenarios highly available:

  • IBM Tivoli Workload Scheduler

  • IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework

  • IBM Tivoli Management Framework (shown in Chapter 5, "Implement IBM Tivoli Management Framework in a cluster" on page 411)

Note

There are many other possible scenarios, and many features are not used by our configuration in this redbook and not covered in the following sections. Any other scenario should be planned and configured using the HACMP manuals and IBM Redbooks, or consult your IBM service provider for assistance in planning and implementation of other scenarios.

The Online Planning Worksheet is a Java-based worksheet that will help you plan your HACMP configuration. It generates a configuration file based on the information you have entered that can be directly loaded into a live HACMP cluster, and it also generates a convenient HTML page documenting the configuration. We do not show how to use this worksheet here; for a complete and detailed explanation of this worksheet, see Chapter 16, "Using Online Planning Worksheets", High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00.

Note

One of the major drawbacks when you use the Online Planning Worksheet is that certain HACMP configurations that are accepted by the tool might cause problems on a live HACMP cluster. The SMIT screens that we show in this redbook tend to catch these problems.

Our recommendation, as of HACMP Version 5.1, is to use the Online Planning Worksheet to create convenient HTML documentation of the configuration, and then manually configure the cluster through the SMIT screens.

Following is an overview of the steps we use to configure HACMP for our IBM Tivoli Workload Scheduler environment, and where you can find each step:

  • "Configure heartbeating" on page 213

  • "Configure HACMP topology" on page 219

  • "Configure HACMP service IP labels/addresses" on page 252

  • "Configure application servers" on page 223

  • "Configure application monitoring" on page 227

  • "Add custom start and stop HACMP scripts" on page 234

  • "Add a custom post-event HACMP script" on page 242

  • "Modify /etc/hosts and name resolution order" on page 250

  • "Configure HACMP networks and heartbeat paths" on page 254

  • "Configure HACMP resource groups" on page 257

  • "Configure HACMP resource groups" on page 257

  • "Configure cascading without fallback" on page 264

  • "Configure pre-event and post-event commands" on page 267

  • "Configure pre-event and post-event processing" on page 269

  • "Configure HACMP persistent node IP label/addresses" on page 272

  • "Configure predefined communication interfaces" on page 276

  • "Verify the configuration" on page 280

  • "Start HACMP Cluster services" on page 287

  • "Verify HACMP status" on page 292

  • "Test HACMP resource group moves" on page 294

  • "Live test of HACMP fallover" on page 298

  • "Configure HACMP to start on system restart" on page 300

  • "Verify IBM Tivoli Workload Scheduler fallover" on page 301

The details of each step are as follows.

Configure heartbeating

The configuration we used implements two heartbeat mechanisms: one over the IP network, and one over the SSA disk subsystem (called target mode SSA). Best practices call for implementing at least one non-IP point-to-point network for exchanging heartbeat keepalive packets between cluster nodes, in case the TCP/IP-based subsystem, networks, or network NICs fail. Available non-IP heartbeat mechanisms are:

  • Target Mode SSA

  • Target Mode SCSI

  • Serial (also known as RS-232C)

  • Heartbeating over disk (only available for enhanced concurrent mode volume groups)

In this section, we describe how to configure a target mode SSA connection between HACMP nodes sharing disks connected to SSA on Multi-Initiator RAID adapters (FC 6215 and FC 6219). The adapters must be at Microcode Level 1801 or later.

You can define a point-to-point network to HACMP that connects all nodes on an SSA loop. The major steps of configuring target mode SSA are:

  • "Changing node numbers on systems in SSA loop" on page 213

  • "Configuring Target Mode SSA devices" on page 215

The details of each step follows.

Changing node numbers on systems in SSA loop

By default, SSA node numbers on all systems are zero. These must to be changed to unique, non-zero numbers on the nodes to enable target mode SSA.

To configure the target mode SSA devices:

  1. Assign a unique non-zero SSA node number to all systems on the SSA loop.

    Note

    The ID on a given SSA node should match the HACMP node ID, which is contained in the node_id field of the HACMP node ODM entry.

    The following command retrieves the HACMP node ID:

        odmget -q "name = node_name" HACMPnode 

    where node_name is the HACMP node name of the cluster node. In our environment, we used tivaix1 and tivaix2 as the values for node_name.

    Example 4-21 shows how we determined the HACMP node ID for tivaix1. Here we determined that tivaix1 uses node ID 1, based upon the information in the line highlighted in bold that starts with the string "node_id".

    Example 4-21: How to determine a cluster node's HACMP node ID

    start example
     [root@tivaix1:/home] odmget -q "name = tivaix1" HACMPnode | grep -p COMMUNICATION_PATH HACMPnode:         name = "tivaix1"         object = "COMMUNICATION_PATH"         value = "9.3.4.194"         node_id = 1         node_handle = 1         version = 6 
    end example

    Note that we piped the output from the odmget command used to the grep command to extract one stanza of the odmget command. If you omitted this part of the command string in the preceding figure, multiple stanzas are displayed that all have the same node_id field.

  2. To change the SSA node number:

        chdev -l ssar -a node_number=number 

    where number is the new SSA node number. Best practice calls for using the same number as the HACMP node ID determined in the preceding step.

    Note

    If you are using IBM AIX General Parallel File System (GPFS), you must make the SSA node number match the HACMP cluster node ID.

    In our environment, we assigned SSA node number 2 to tivaix1 and SSA node number 1 to tivaix2.

  3. To show the system's SSA node number:

        lsattr -El ssar 

    Example 4-22 shows the output of this command for tivaix1, where the node number is highlighted in bold.

    Example 4-22: Show a system's SSA node number, taken from tivaix1

    start example
     [root@tivaix1:/home] lsattr -El ssar node_number 1 SSA Network node number True 
    end example

Repeat this procedure on each cluster node, assigning a different SSA node number for each cluster node. In our environment, Example 4-23 shows that tivaix2 was assigned SSA node number 2.

Example 4-23: Show a system's SSA node number, taken from tivaix2

start example
 [root@tivaix2:/home] lsattr -El ssar node_number 2 SSA Network node number True 
end example

Configuring Target Mode SSA devices

After enabling the target mode interface, run cfgmgr to create the initiator and target devices and make them available.

To create the initiator and target devices:

  1. Enter: smit devices. SMIT displays a list of devices.

  2. Select Install/Configure Devices Added After IPL and press Enter.

  3. Exit SMIT after the cfgmgr command completes.

  4. Ensure that the devices are paired correctly:

        lsdev -C | grep tmssa 

    Example 4-24 shows this command's output on tivaix1 in our environment.

    Example 4-24: Ensure that target mode SSA is configured on a cluster node, taken from tivaix1

    start example
     [root@tivaix1:/home] lsdev -C | grep tmssa tmssa2     Available               Target Mode SSA Device tmssar     Available               Target Mode SSA Router 
    end example

    Example 4-25 shows this command's output on tivaix2 in our environment.

    Example 4-25: Ensure that target mode SSA is configured on a cluster node, taken from tivaix2

    start example
     # lsdev -C | grep tmssa tmssa1 Available              Target Mode SSA Device tmssar     Available              Target Mode SSA Router 
    end example

    Note how each cluster node uses the same target mode SSA router, but different target mode SSA devices. The differences are highlighted in bold in the preceding figures. Cluster node tivaix1 uses target mode SSA device tmssa2, while cluster node tivaix2 uses tmssa1.

Repeat the procedures for enabling and configuring the target mode SSA devices for other nodes connected to the SSA adapters.

Configuring the target mode connection creates two target mode files in the /dev directory of each node:

  • /dev/tmssan.im - the initiator file, which transmits data

  • /dev/tmssan.tm - the target file, which receives data

where n is a number that uniquely identifies the target mode file. Note that this number is different than the SAA node number and HACMP node ID from the preceding section. These numbers are deliberately set differently.

Example 4-26 shows the target mode files created in the /dev directory for tivaix1 in our environment.

Example 4-26: Display the target mode SSA files for tivaix1

start example
 [root@tivaix1:/home] ls /dev/tmssa*.im /dev/tmssa*.tm /dev/tmssa2.im  /dev/tmssa2.tm 
end example

Example 4-27 shows the target mode files created in the /dev directory for tivaix2 in our environment.

Example 4-27: Display the target mode SSA files for tivaix2

start example
 [root@tivaix2:/home] ls /dev/tmssa*.im /dev/tmssa*.tm /dev/tmssa1.im*  /dev/tmssa1.tm* 
end example

Note

On page 273, in section "Configuring Target Mode SSA Devices" of High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, these target mode SSA files are referred to as /dev/tmscsinn.im and /dev/tmscsinn.tm. We believe this is incorrect, because these are the files used for target mode SCSI heartbeating. This redbook shows what we believe are the correct file names. This includes the corrected unique identifiers, changed from two digits (nn) to one digit (n).

Testing the target mode connection

In order for the target mode connection to work, initiator and target devices must be paired correctly.

To ensure that devices are paired and that the connection is working after enabling the target mode connection on both nodes:

  1. Enter the following command on a node connected to the SSA disks:

        cat < /dev/tmssa#.tm 

    where # must be the number of the target node. (This command hangs and waits for the next command.)

    In our environment, on tivaix1 we ran the command:

        cat < /dev/tmssa2.tm 

  2. On the target node, enter the following command:

        cat filename > /dev/tmssa#.im 

    where # must be the number of the sending node and filename is any short ASCII file.

    The contents of the specified file are displayed on the node on which you entered the first command.

    In our environment, on tivaix2 we ran the command:

        cat /etc/hosts > /dev/tmssa1.im 

    The contents of /etc/hosts on tivaix2 is shown in the terminal session of tivaix1.

  3. You can also check that the tmssa devices are available on each system:

        lsdev -C | grep tmssa 

Defining the Target Mode SSA network to HACMP

To configure the Target Mode SSA point-to-point network in the HACMP custer, follow these steps.

  1. Enter: smit hacmp.

  2. In SMIT, select Extended Configuration -> Extended Topology Configuration -> Configure HACMP Networks -> Add a Network to the HACMP Cluster and press Enter.

    SMIT displays a choice of types of networks.

  3. Select the type of network to configure (select tmssa because we are using target mode SSA) and press Enter. The Add a Serial Network screen is displayed as shown in Figure 4-10 on page 218.


    Figure 4-10: Filling out the Add a Serial Network to the HACMP Cluster SMIT screen

  4. Fill in the fields on the Add a Serial Network screen as follows:

    Network Name

    Name the network, using no more than 32 alphanumeric characters and underscores; do not begin the name with a numeric.

    Do not use reserved names to name the network. For a list of reserved names see High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

    Network Type

    Valid types are RS232, tmssa, tmscsi, diskhb. This is filled in for you by the SMIT screen.

  5. Press Enter to configure this network.

  6. Return to the Add a Serial Network SMIT screen to configure more networks if necessary.

For our environment, we configured net_tmssa_01 as shown in Figure 4-10. No other serial networks were necessary.

Configure HACMP topology

Complete the following procedures to define the cluster topology. You only need to perform these steps on one node. When you verify and synchronize the cluster topology, its definition is copied to the other nodes. To define and configure nodes for the HACMP cluster topology:

  1. Enter: smitty hacmp. The HACMP for AIX SMIT screen is displayed as shown in Figure 4-11.


    Figure 4-11: HACMP for AIX SMIT screen

  2. Go to Initialization and Standard Configuration -> Add Nodes to an ACMP Cluster and press Enter. The Configure Nodes to an HACMP Cluster (standard) SMIT screen is displayed as shown in Figure 4-12 on page 220.


    Figure 4-12: Configure nodes to an HACMP Cluster

  3. Enter field values on the Configure Nodes to an HACMP Cluster screen as follows:

    Cluster Name

    Enter an ASCII text string that identifies the cluster. The cluster name can include alpha and numeric characters and underscores, but cannot have a leading numeric. Use no more than 32 characters. It can be different from the hostname.

    Do not use reserved names. For a list of reserved names see Chapter 6, "Verifying and Synchronizing a Cluster Configuration", in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

    New nodes (via selected communication paths)

    Enter (or add) one resolvable IP Label (this may be the hostname), IP address, or Fully Qualified Domain Name for each new node in the cluster, separated by spaces.

    This path will be taken to initiate communication with the node (for example, NodeA, 10.11.12.13, NodeC.ibm.com). Use F4 to see the picklist display of the hostnames and/or addresses in /etc/hosts that are not already HACMP-configured IP Labels/Addresses.

    You can add node names or IP addresses in any order.

    Currently configured node(s)

    If nodes are already configured, they are displayed here.

    In our environment, we entered cltivoli in the Cluster Name field and tivaix1 tivaix2 in the New Nodes (via selected communication paths) path.

  4. Press Enter to configure the nodes of the HACMP cluster. A COMMAND STATUS SMIT screen displays the progress of the cluster node configurations.

    The HACMP software uses this information to create the cluster communication paths for the ODM. Once communication paths are established, HACMP runs the discovery operation and prints results to the SMIT screen.

  5. Verify that the results are reasonable for your cluster.

At this point HACMP does not know how to locate the cluster nodes' this step only reserves spaces for these nodes. The following steps fill out the remaining information that enables HACMP to associate actual computing resources like disks, processes, and networks with these newly-reserved cluster nodes.

Configure HACMP service IP labels/addresses

A service IP label/address is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label. This connection can be node-bound or taken over by multiple nodes.

For the standard configuration, it is assumed that the connection will allow IP Address Takeover (IPAT) via aliases.The /etc/hosts file on all nodes must contain all IP labels and associated IP addresses that you want to discover.

Follow this procedure to define service IP labels for your cluster:

  1. Enter: smit hacmp.

  2. Go to HACMP -> Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Service IP Labels/Addresses and press Enter.

  3. Fill in field values as follows as shown in Figure 4-13:

    IP Label/IP Address

    Enter, or select from the picklist, the IP label/IP address to be kept highly available.

    Network Name

    Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured. If you leave the field blank, HACMP fills in this field automatically with the network type plus a number appended, starting with 1 (for example, netether1).


    Figure 4-13: Enter service IP label for tivaix1

    Figure 4-13 shows how we entered the service address label for tivaix1. In our environment, we used tivaix1_svc as the IP label and net_ether_01 as the network name.

  4. Press Enter after filling in all required fields. HACMP now checks the validity of the IP interface configuration.

  5. Repeat the previous steps until you have configured all IP service labels for each network, as needed.

    In our environment, we created another service IP label for cluster node tivaix2, as shown in Figure 4-14. We used tivaix2_svc as the IP label and net_ether_01 as the network name. Note how we assigned the network name net_ether_01 in both cases, so that both sets of service IP labels are in the same HACMP network.


    Figure 4-14: Enter service IP labels for tivaix2

Configure application servers

An application server is a cluster resource used to control an application that must be kept highly available. Configuring an application server does the following:

  • Associates a meaningful name with the server application. For example, you could give an installation of IBM Tivoli Workload Scheduler a name such as itws. You then use this name to refer to the application server when you define it as a resource.

  • Points the cluster event scripts to the scripts that they call to start and stop the server application.

  • Allows you to then configure application monitoring for that application server.

We show you in "Add custom start and stop HACMP scripts" on page 234 how to write the start and stop scripts for IBM Tivoli Workload Scheduler.

Note

Ensure that the server start and stop scripts exist on all nodes that participate as possible owners of the resource group where this application server resides.

Complete the following steps to create an application server on any cluster node:

  1. Enter smitty hacmp.

  2. Go to Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Application Servers and press Enter. The Configure Resources to Make Highly Available SMIT screen is displayed as shown in Figure 4-15.


    Figure 4-15: Configure Resources to Make Highly Available SMIT screen

    Go to Add an Application Server and press Enter (Figure 4-16 on page 225).


    Figure 4-16: Configure Application Servers SMIT screen

  3. The Add Application Server SMIT screen is displayed as shown in Figure 4-17 on page 226. Enter field values as follows:

    Server Name

    Enter an ASCII text string that identifies the server. You will use this name to refer to the application server when you define resources during node configuration. The server name can include alphabetic and numeric characters and underscores. Use no more than 64 characters.

    Start Script

    Enter the name of the script and its full pathname (followed by arguments) called by the cluster event scripts to start the application server (maximum: 256 characters). This script must be in the same location on each cluster node that might start the server. The contents of the script, however, may differ.

    Stop Script

    Enter the full pathname of the script called by the cluster event scripts to stop the server (maximum: 256 characters). This script must be in the same location on each cluster node that may start the server. The contents of the script, however, may differ.


    Figure 4-17: Fill out the Add Application Server SMIT screen for application server tws_svr1

    As shown in Figure 4-17, in our environment on tivaix1 we named the instance of IBM Tivoli Workload Scheduler that normally runs on that cluster node tws_svr1. For the instance of IBM Tivoli Workload Scheduler on tivaix2, we name the corresponding application server object tws_svr2. Note that no mention is made of the cluster nodes when defining an application server. We only mention them to make you aware of the conventions we used in our environment.

    For the start script of application server tws_svr1, we entered the following in the Start Script field:

        /usr/es/sbin/cluster/utils/start_tws1.sh 

    The stop script of this application server is:

        /usr/es/sbin/cluster/utils/stop_tws1.sh 

    This is entered in the Stop Script field.

  4. Press Enter to add this information to the ODM on the local node.

  5. Repeat the procedure for all additional application servers.

    In our environment, we added a definition for application server tws_svr2, using the start script for the Start Script field:

        /usr/es/sbin/cluster/utils/start_tws2.sh 

    For tws_svr2, we entered the following stop script in the Stop Script field:

        /usr/es/sbin/cluster/utils/stop_tws2.sh 

    Figure 4-18 shows how we filled out the SMIT screen to define application server tws_svr2.


    Figure 4-18: Fill out the Add Application Server SMIT screen for application server tws_svr2

You only need to perform this on one cluster node. When you verify and synchronize the cluster topology, the new application server definitions are copied to the other nodes.

Configure application monitoring

HACMP can monitor specified applications and automatically take action to restart them upon detecting process death or other application failures.

Note

If a monitored application is under control of the system resource controller, check to be certain that action:multi are -O and -Q. The -O specifies that the subsystem is not restarted if it stops abnormally. The -Q specifies that multiple instances of the subsystem are not allowed to run at the same time. These values can be checked using the following command:

    lssrc -Ss Subsystem | cut -d : -f 10,11 

If the values are not -O and -Q, then they must be changed using the chksys command.

You can select either of two application monitoring methods:

  • Process application monitoring detects the death of one or more processes of an application, using RSCT Event Management.

  • Custom application monitoring checks the health of an application with a custom monitor method at user-specified polling intervals.

Process monitoring is easier to set up, because it uses the built-in monitoring capability provided by RSCT and requires no custom scripts. However, process monitoring may not be an appropriate option for all applications. Custom monitoring can monitor more subtle aspects of an application's performance and is more customizable, but it takes more planning, because you must create the custom scripts.

In this section, we show how to configure process monitoring for IBM Tivoli Workload Scheduler. Remember that an application must be defined to an application server before you set up the monitor.

For IBM Tivoli Workload Scheduler, we configure process monitoring for the netman process because it will always run under normal conditions. If it fails, we want the cluster to automatically fall over, and not attempt to restart netman.

Because netman starts very quickly, we only give it 60 seconds to start before monitoring begins. For cleanup and restart scripts, we will use the same scripts as the start and stop scripts discussed in "Add custom start and stop HACMP scripts" on page 234.

Tip

For more comprehensive application monitoring by HACMP, configure process monitoring for the IBM Tivoli Workload Scheduler processes batchman, jobman, mailman, and writer. Define application server resources for each of these processes before defining the process monitoring for them.

If you do this, be sure to use the cl_RMupdate command to suspend monitoring before Jnextday starts and to resume monitoring after Jnextday completes. Otherwise, the cluster will interpret the Jnextday-originated shutdown of these processes as a failure of the cluster node and inadvertently start a fallover.

Set up your process application monitor as follows:

  1. Enter: smit hacmp.

  2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resources Configuration -> Configure HACMP Application Monitoring -> Configure Process Application Monitor -> Add Process Application Monitor and press Enter. A list of previously defined application servers appears.

  3. Select the application server for which you want to add a process monitor.

    In our environment, we selected tws_svr1, as shown in Figure 4-19.


    Figure 4-19: How to select an application server to monitor

  4. In the Add Process Application Monitor screen, fill in the field values as follows:

    Monitor Name

    This is the name of the application monitor. If this monitor is associated with an application server, the monitor has the same name as the application server. This field is informational only and cannot be edited.

    Application Server Name

    (This field can be chosen from the picklist. It is already filled in with the name of the application server you selected.)

    Processes to Monitor

    Specify the process(es) to monitor. You can type more than one process name. Use spaces to separate the names.

    Note

    To be sure you are using correct process names, use the names as they appear from the ps -el command (not ps -f), as explained in section "Identifying Correct Process Names" in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

    Process Owner

    Specify the user id of the owner of the processes specified above (for example: root). Note that the process owner must own all processes to be monitored.

    Instance Count

    Specify how many instances of the application to monitor. The default is 1 instance. The number of instances must match the number of processes to monitor exactly. If you put one instance, and another instance of the application starts, you will receive an application monitor error.

    Note

    This number must be 1 if you have specified more than one process to monitor (one instance for each process).

    Stabilization Interval

    Specify the time (in seconds) to wait before beginning monitoring. For instance, with a database application, you may wish to delay monitoring until after the start script and initial database search have been completed. You may need to experiment with this value to balance performance with reliability.

    Note

    In most circumstances, this value should not be zero.

    Restart Count

    Specify the restart count, that is the number of times to attempt to restart the application before taking any other actions. The default is 3.

    Note

    Make sure you enter a Restart Method if your Restart Count is any non-zero value.

    Restart Interval

    Specify the interval (in seconds) that the application must remain stable before resetting the restart count. Do not set this to be shorter than (Restart Count) x (Stabilization Interval). The default is 10% longer than that value. If the restart interval is too short, the restart count will be reset too soon and the desired fallover or notify action may not occur when it should.

    Action on Application Failure

    Specify the action to be taken if the application cannot be restarted within the restart count. You can keep the default choice notify, which runs an event to inform the cluster of the failure, or select fallover, in which case the resource group containing the failed application moves over to the cluster node with the next highest priority for that resource group.

    For more information, refer to "Note on the Fallover Option and Resource Group Availability" in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

    Notify Method

    (Optional) Define a notify method that will run when the application fails. This custom method runs during the restart process and during notify activity.

    Cleanup Method

    (Optional) Specify an application cleanup script to be invoked when a failed application is detected, before invoking the restart method. The default is the application server stop script defined when the application server was set up.

    Note

    With application monitoring, since the application is already stopped when this script is called, the server stop script may fail.

    Restart Method

    (Required if Restart Count is not zero.) The default restart method is the application server start script defined previously, when the application server was set up. You can specify a different method here if desired.

    In our environment, we entered the process /usr/maestro/bin/netman in the Process to Monitor field, maestro in the Process Owner field, 60 in the Restart Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left unchanged, as shown in Figure 4-20.


    Figure 4-20: Add Process Application Monitor SMIT screen for application server tws_svr1

    In our environment, the COMMAND STATUS SMIT screen displays two warnings as shown in Figure 4-21, which could we safely ignore because the default values applied are the desired values.


    Figure 4-21: COMMAND STATUS SMIT screen after creating HACMP process application monitor

  5. Press Enter when you have entered the desired information.

    The values are then checked for consistency and entered into the ODM. When the resource group comes online, the application monitor starts.

  6. Repeat the operation for remaining application servers.

    In our environment, we repeated the operation for application server tws_svr2. We entered the field values as shown in Figure 4-22 on page 234.


    Figure 4-22: Add Process Application Monitor SMIT screen for application server tws_svr2

    We entered the process /usr/maestro2/bin/netman in the Process to Monitor field, maestro2 in the Process Owner field, 60 in the Restart Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left unchanged.

Add custom start and stop HACMP scripts

For IBM Tivoli Workload Scheduler, custom scripts for HACMP are required to start and stop the application server. These are used when HACMP starts an application server that is part of a resource group, and gracefully shuts down the application server when a resource group is taken offline or moved. The stop script, of course, does not get an opportunity to execute if a cluster node is unexpectedly halted. We developed the following basic versions of the scripts for our environment. You may need to write your own version to accommodate your site's specific requirements.

Both of these example scripts are designed to recognize how they were called. That is, the name of the script is passed into itself, and based upon this name it performs certain actions. Our environment's design has two variable factors when starting and stopping IBM Tivoli Workload Scheduler:

  • Name of the TWSuser user account associated with a particular instance of IBM Tivoli Workload Scheduler. In our environment, there are two instances of the application, and the user accounts maestro and maestro2 are associated with these instances.

  • Path to the installation of each instance of IBM Tivoli Workload Scheduler, called the TWShome directory. In our environment, the two instances are installed under /usr/maestro and /usr/maestro2.

The scripts are designed so that when they are called with a name that follows a certain format, they will compute these variable factors depending upon the name. The format is start_twsn.sh and stop_twsn.sh, where n matches the cluster node number by convention.

When n equals 1, it is treated as a special case. TWSuser is assumed to be maestro and TWShome is assumed to be /usr/maestro.

  • When n equals any other number, TWSuser is assumed to be maestron. For example: If n is 4, TWSuser is maestro4.

  • TWShome is assumed to be /usr/maestron. Using the same example, TWShome is /usr/maestro4.

You need one pair of start and stop scripts for each instance of IBM Tivoli Workload Scheduler that will run in the cluster. For mutual takeover configurations like the two-node cluster environment we show in this redbook, you need each pair of start and stop scripts on each cluster node that participates in the mutual takeover architecture.

In our environment, we used the start script shown in Example 4-28 on page 236. Most of the script deals with starting correctly.

Example 4-28: Sample start script for IBM Tivoli Workload Scheduler under HACMP

start example
 #!/bin/sh # # Sample script for starting IBM Tivoli Workload Scheduler Version 8.2 # under IBM HACMP Version 5.1. # # Comments and questions to Anthony Yen <sg24-6632-00@AutomaticIT.com> # #----------------------------- # User-Configurable Constants #----------------------------- # # Base TWShome path. Modify this to match your site's standards. # root_TWShome=/usr # # Base TWSuser. Modify this to match your site's standards. # TWSuser="maestro" # # Debugging directory. This just holds a flag file; it won't grow more than 1 KB. # DBX_DIR=/tmp/ha_cfg #------------------- # Main Program Body #------------------- # # Ensure debugging directory is available, create if if necessary if [ -d ${DBX_DIR} ] ; then         DBX=1 else         mkdir ${DBX_DIR}         rc=$?         if [ $rc -ne 0 ] ; then                 echo "WARNING: no debugging directory could be created, no debug"                 echo "information will be issued..."                 DBX=0         else                 DBX=1         fi fi # # Determine how we are called CALLED_AS=`basename $0` # # Disallow being called as root name if [ "${CALLED_AS}" = "start_tws.sh" ] ; then         echo "FATAL ERROR: This script cannot be called as itself. Please create a"         echo "symbolic link to it of the form start_twsN.sh where N is an integer"         echo "corresponding to the cluster node number and try again."         exit 1 fi # # Determine cluster node number we are called as. extracted_node_number=`echo ${CALLED_AS} | sed 's/start_tws\(.*\)\.sh/\1/g'` # # Set TWShome path to correspond to cluster node number. if [ ${extracted_node_number} -eq 1 ] ; then         clusterized_TWShome=${root_TWShome}/${TWSuser}         clusterized_TWSuser=${TWSuser} else         clusterized_TWShome=${root_TWShome}/${TWSuser}${extracted_node_number}         clusterized_TWSuser=${TWSuser}${extracted_node_number} fi echo "clusterized_TWShome = $clusterized_TWShome" echo "clusterized_TWSuser = $clusterized_TWSuser" if [ $DBX -eq 1 ] ; then         echo "Script for starting TWS ${extracted_node_number} at "`date` > \ ${DBX_DIR}/start${extracted_node_number}.flag fi echo "Starting TWS ${extracted_node_number} at "`date` su - ${clusterized_TWSuser} -c "./StartUp ; conman start" echo "Netman on TWS ${extracted_node_number} started, conman start issued" sleep 10 echo "Process list of ${clusterized_TWSuser}-owned processes..." ps -ef | grep -v grep | grep ${clusterized_TWSuser} exit 0 
end example

The key line that actually starts IBM Tivoli Workload Scheduler is towards the end of the script, which reads:

    su - ${clusterized_TWSuser} -c "./StartUp ; conman start" 

This means the su command will execute as the TWSuser user account the command:

    ./StartUp ; conman start 

This is a simple command to start IBM Tivoli Workload Scheduler. Your site may require a different start procedure, so you can replace this line with your own procedure to start IBM Tivoli Workload Scheduler.

In our environment, we used a stop script that uses the same execution semantics as the start script described in the preceding discussion. The exact commands it runs depends upon the name the stop script is called as when it is executed.

Most of the script deals with starting correctly. The script is oriented towards stopping the cluster node, which in our environment is a Master Domain Manager. The key lines that actually stop IBM Tivoli Workload Scheduler are towards the end of the script, which are extracted and shown in Example 4-29 on page 238.

Example 4-29: Commands used by stop script to stop IBM Tivoli Workload Scheduler

start example
 su - ${clusterized_TWSuser} -c "conman 'unlink cpu=@ ; noask'" su - ${clusterized_TWSuser} -c "conman 'stop @ ; wait ; noask'" su - ${clusterized_TWSuser} -c "conman 'shutdown ; wait'"     .     .     .        wmaeutil ${connector} -stop "*" 
end example

This means the su command will execute, as the TWSuser user account, the following command:

    conman 'unlink cpu=@ ; noask' 

This unlinks all CPUs in the scheduling network. This is followed by another su command that executes, as the TWSuser user account, the following command:

    conman 'stop @ ; wait ; noask' 

This stops the IBM Tivoli Workload Scheduler engine on all CPUs in the scheduling network. A third and final su command executes, as the TWSuser user account, the following command:

    conman 'shutdown ; wait' 

This stops the netman process of the instance of IBM Tivoli Workload Scheduler on the cluster node.

Finally, the wmaeutil command is executed within a loop that passes the name of each IBM Tivoli Workload Scheduler Connector found on the cluster node to each iteration of the command. This stops all Connectors associated with the instance of IBM Tivoli Workload Scheduler that is being stopped.

This is a simple set of commands to stop IBM Tivoli Workload Scheduler. Your site may require a different stop procedure, so you can replace this line with your own procedure to stop IBM Tivoli Workload Scheduler. Example 4-30 shows our sample stop script.

Example 4-30: Sample stop script for IBM Tivoli Workload Scheduler under HACMP

start example
 #!/bin/ksh # # Sample script for stopping IBM Tivoli Workload Scheduler Version 8.2 # under IBM HACMP Version 5.1. # # Comments and questions to Anthony Yen <sg24-6632-00@AutomaticIT.com> # #----------------------------- # User-Configurable Constants #----------------------------- # # Base TWShome path. Modify this to match your site's standards. # root_TWShome=/usr # # Base TWSuser. Modify this to match your site's standards. # TWSuser="maestro" # # Debugging directory. This just holds a flag file; it won't grow more than 1 KB. # DBX_DIR=/tmp/ha_cfg #------------------- # Main Program Body #------------------- # # Source in environment variables for IBM Tivoli Management Framework. if [ -d /etc/Tivoli ] ; then         . /etc/Tivoli/setup_env.sh else         echo "FATAL ERROR: Tivoli environment could not be sourced, exiting..."         exit 1 fi # # Ensure debugging directory is available, create if if necessary if [ -d ${DBX_DIR} ] ; then         DBX=1 else         mkdir ${DBX_DIR}         rc=$?         if [ $rc -ne 0 ] ; then                 echo "WARNING: no debugging directory could be created, no debug"                 echo "information will be issued..."                 DBX=0         else                 DBX=1         fi fi # # Determine how we are called CALLED_AS=`basename $0` # # Disallow being called as root name if [ "${CALLED_AS}" = "stop_tws.sh" ] ; then         echo "FATAL ERROR: This script cannot be called as itself. Please create a"         echo "symbolic link to it of the form stop_twsN.sh where N is an integer"         echo "corresponding to the cluster node number and try again."         exit 1 fi # # Determine cluster node number we are called as. extracted_node_number=`echo ${CALLED_AS} | sed 's/stop_tws\(.*\)\.sh/\1/g'` # # Set TWShome path to correspond to cluster node number. if [ ${extracted_node_number} -eq 1 ] ; then         clusterized_TWShome=${root_TWShome}/${TWSuser}         clusterized_TWSuser=${TWSuser} else         clusterized_TWShome=${root_TWShome}/${TWSuser}${extracted_node_number}         clusterized_TWSuser=${TWSuser}${extracted_node_number} fi # # Source IBM Tivoli Workload Scheduler environment variables. if [ -f ${clusterized_TWShome}/tws_env.sh ] ; then         . ${clusterized_TWShome}/tws_env.sh else         echo "FATAL ERROR: Unable to source ITWS environment from:"         echo "    ${clusterized_TWShome}/tws_env.sh"         echo "Exiting..."         exit 1 fi echo "clusterized_TWShome = $clusterized_TWShome" echo "clusterized_TWSuser = $clusterized_TWSuser" if [ $DBX -eq 1 ] ; then         echo "Script for stopping TWS ${extracted_node_number} at "`date` > ${DBX_DIR}/start${extracted_node_number}.flag fi echo "Stopping TWS ${extracted_node_number} at "`date` su - ${clusterized_TWSuser} -c "conman 'unlink cpu=@ ; noask'" su - ${clusterized_TWSuser} -c "conman 'stop @ ; wait ; noask'" su - ${clusterized_TWSuser} -c "conman 'shutdown ; wait'" echo "Shutdown for TWS ${extracted_node_number} issued..." echo "Verify netman is stopped..." ps -ef | grep -v grep | grep ${clusterized_TWShome}/bin/netman > /dev/null rc=$? while ( [ ${rc} -ne 1 ] ) do         sleep 10         ps -ef | grep -v grep | grep ${clusterized_TWShome}/bin/netman > /dev/null         rc=$? done echo "Stopping all Connectors..." # # Identify all Connector object labels connector_labels=`wlookup -Lar MaestroEngine` for connector in ${connector_labels} do         echo "Stopping connector ${connector}..."         wmaeutil ${connector} -stop "*" done echo "Process list of ${clusterized_TWSuser}-owned processes:" ps -ef | grep -v grep | grep ${clusterized_TWSuser} exit 0 
end example

To add the custom start and stop HACMP scripts:

  1. Copy both scripts to the directory /usr/es/sbin/cluster/utils on each cluster node.

  2. Run the commands in Example 4-31 to install the scripts. These create symbolic links to the scripts. When the script is called via one of these symbolic links, it will know which instance of IBM Tivoli Workload Scheduler to manage.

    Example 4-31: Commands to run to install custom HACMP start and stop scripts for IBM Tivoli Workload Scheduler

    start example
     ln -s /usr/es/sbin/cluster/utils/start_tws.sh /usr/es/sbin/cluster/utils/start_tws1.sh ln -s /usr/es/sbin/cluster/utils/start_tws.sh /usr/es/sbin/cluster/utils/start_tws2.sh ln -s /usr/es/sbin/cluster/utils/stop_tws.sh /usr/es/sbin/cluster/utils/stop_tws2.sh ln -s /usr/es/sbin/cluster/utils/stop_tws.sh /usr/es/sbin/cluster/utils/stop_tws1.sh 
    end example

The symbolic links mean that no matter how many instances of IBM Tivoli Workload Scheduler you configure in a mutual takeover HACMP cluster, only two actual scripts need to be maintained. If you ensure that there are no unique variations between installations of IBM Tivoli Workload Scheduler, then maintaining the scripts among all installations is very easy. Only two scripts ever need to be modified, vastly simplifying maintenance and reducing copying errors.

Note

Keep in mind that, after a modification is made to either or both scripts, they need to be copied back to all the cluster nodes.

Tip

Console output from the start and stop scripts are sent to /tmp/hacmp.out on the cluster nodes. This is useful to debug start and stop scripts while you develop them.

Add a custom post-event HACMP script

IBM Tivoli Workload Scheduler presents a special case situation that HACMP can be configured to handle. If IBM Tivoli Workload Scheduler falls back to a cluster node, ideally it should fall back only after all currently running jobs have had a chance to finish.

For example, consider our environment of a two-node mutual takeover HACMP cluster, shown in Figure 4-23 when it is running normally. Here, cluster node tivaix1 runs an instance of IBM Tivoli Workload Scheduler we will call TWS Engine 1 from disk volume group tiv_vg1. Meanwhile, cluster node tivaix2 runs TWS Engine 2 from disk volume group tiv_vg2.


Figure 4-23: Normal operation of two-node mutual takeover HACMP cluster

Suppose cluster node tivaix2 suffers an outage, and falls over to tivaix1. This means TWS Engine2 now also runs on tivaix1, and tivaix1 picks up the connection to disk volume group tiv_vg2, as shown in Figure 4-24.


Figure 4-24: Location of application servers after tivaix2 falls over to tivaix1

Due to the sudden nature of a catastrophic failure, the jobs that are in progress on tivaix2 under TWS Engine2 when the disaster incident occurs are lost. When TWS Engine2 starts on tivaix1, you would perform whatever job recovery is necessary.

When tivaix2 is restored to service, it reintegrates with the cluster, but because we chose to use the Cascading WithOut Fallback (CWOF) feature, TWS Engine2 is not immediately transferred back to tivaix2 when it reintegrates with the cluster. This is shown in Figure 4-25, where tivaix2 is shown as available and back in the cluster, but TWS Engine2 is not shut down and transferred over to it yet.


Figure 4-25: State of cluster after tivaix2 returns to service and reintegrates with the cluster

Here is where the special case situation presents itself. If we simply shut down TWS Engine2 and transfer it back to tivaix2, any jobs TWS Engine2 currently is running on tivaix1 can possibly lose their job state information, or in the worst case where the jobs are executed from the same disk volume group as TWS Engine2 (or use the same disk volume group to read and write their data), be interrupted in mid-execution. This is shown in Figure 4-26 on page 245.


Figure 4-26: Running jobs under TWS Engine2 on tivaix1 prevent TWS Engine2 from transferring back to tivaix2

As long as there are running jobs under TWS Engine2 in the memory of tivaix1, moving TWS Engine2 to tivaix2 can cause undesirable side effects because we cannot move the contents of memory from one machine to another, only the contents of a disk volume group.

It is usually too inconvenient to wait for a lull in the jobs that are running under TWS Engine2 on tivaix1. In many environments there simply is no such "dead zone" in currently running jobs. When this occurs, the jobs currently executing on the cluster node in question (tivaix1, in this example) need to run through to completion, without any new jobs releasing on the cluster node, before moving the application server (TWS Engine2 ,in this example). The new jobs that are prevented from releasing will have a delayed launch time, but this is often the least disruptive approach to gracefully transferring an application server back to a reintegrated cluster node.

This process is called "quiescing the application server". For IBM Tivoli Workload Scheduler, as long as there are no currently running jobs on the cluster node itself that an instance of IBM Tivoli Workload Scheduler needs to move away from, all information that needs to be transferred intact is held on disk. This makes it easy and safe to restart IBM Tivoli Workload Scheduler on the reintegrated cluster node. The job state information that needs to be transferred can be thought of as "in hibernation" after no jobs are actively running.

We quiesce an instance of IBM Tivoli Workload Scheduler by raising the job fence of the instance on a cluster node high enough that all new jobs on the cluster node will not release. See IBM Tivoli Workload Scheduling Suite Version 8.2, General Information, SC32-1256, for more details on job fences. Raising the job fence does not affect currently running jobs.

We do not recommend using a job stream or CPU limit to quiesce the currently running jobs under an instance of IBM Tivoli Workload Scheduler on a CPU. Schedulers and users can still override a limit by forcing the priority of a job to the GO state, which can cause problems for falling back to a cluster node if a job is released at an inopportune time during the fallback.

Tip

While you can quiesce IBM Tivoli Workload Scheduler at any time, you still gain benefits from planning when during the production day you quiesce it. Quiesce when there is as little time left as possible for currently running jobs to complete, because the sooner currently running jobs complete, the less time new jobs will be kept on hold. Use the available reports in IBM Tivoli Workload Scheduler to predict when currently running jobs will complete.

It is very important to understand that when and how to quiesce an instance of IBM Tivoli Workload Scheduler is wholly dependent upon business considerations. When designing a schedule, collect information from the business users of IBM Tivoli Workload Scheduler on which jobs and job streams must not be delayed, which can be delayed if necessary, and for how long can they be delayed. This information is used to determine when to quiesce the server, and the impact of the operation. It can also be used to automate the decision and process of falling back an application server.

Some considerations external to IBM Tivoli Workload Scheduler usually affect this process as well. For example, if a database is used by jobs running on the cluster, or is hosted on the disk volume group that the application server uses, falling back would require shutting down the database. In some environments, this can be very time consuming, it can difficult to obtain authorization for on short notice, or it can be simply unacceptable during certain times of the year (like quarter-end processing periods). A highly available environment that takes these considerations into account is part of the design process of an actual production deployment. Consult your IBM service provider for advice on how to mitigate these additional considerations.


Figure 4-27: TWS Engine2 on tivaix1 is quiesced, only held jobs exist on tivaix1 under TWS Engine2. TWS Engine2 can now fall back to tivaix2

Once an instance of IBM Tivoli Workload Scheduler is quiesced on a CPU, all remaining jobs for that instance on that CPU are held either because their dependencies have not been satisfied yet, or the job fence has held their priority. This is shown inFigure 4-27, in which on tivaix1 only TWS Engine1 still has running jobs, while TWS Engine2's jobs are all held, and their state recorded to the production file on disk volume group tiv_vg2.

Due to the business and other non-IBM Tivoli Workload Scheduler considerations that affect the decision and process of quiescing an application server in preparation for falling it back to its original cluster node, we do not show in this redbook a sample quiesce script. In our lab environment, because we are not running actual production applications, our quiesce script simply exits.

However, when you develop your own quiesce script, we recommend that you design it as a script to be called as a post-event script for the node_up_complete event. Before raising the fence, the script should check for at least the following conditions:

  • All business conditions are met for raising the fence. For example, do not raise the fence if a business user still requires scheduling services for a critical job that needs to execute in the near future.

  • HACMP is already running on the cluster node a quiesced application server in a resource group needs to fall back to.

  • The cluster node is reintegrated within the cluster, but the resource group that normally belongs on the cluster node is not on that node. This prevents the quiescing process from accidentally running on a new node that joins the cluster and unnecessarily shutting down an application server, for example.

  • The resource group that falls back is in the ONLINE state on another cluster node. This prevents the quiescing from accidentally moving resource groups taken down for business reasons, for example.

Example 4-32 shows korn shell script code that can be used to determine if HACMP is running. It simply checks the status of the basic HACMP subsystems. You may need to modify it to suit your particular HACMP environment if other HACMP subsystems are used.

Example 4-32: How to determine in a script whether or not HACMP is running

start example
 PATH=${PATH}:/usr/es/sbin/cluster/utilities clstrmgrES=`clshowsrv clstrmgrES | grep -v '^Subsystem' | awk '{ print $3 }'` clinfoES=`  clshowsrv clinfoES   | grep -v '^Subsystem' | awk '{ print $3 }'` clsmuxpdES=`clshowsrv clsmuxpdES | grep -v '^Subsystem' | awk '{ print $3 }'` if (    [ "${clstrmgrES}" = 'inoperative' ] \      -o [ "${clinfoES}"   = 'inoperative' ] \      -o [ "${clsmuxpdES}" = 'inoperative' ] ) ; then         echo "FATAL ERROR: HACMP does not appear to be running, exiting..."         exit 1 fi 
end example

Example 4-33 shows the clRGinfo command and sample output from our environment. This can be used to determine whether or not a resource group is ONLINE, and if so, which cluster node it currently runs upon.

Example 4-33: Using the clRGinfo command to determine the state of resource groups in a cluster

start example
 [root@tivaix1:/home/root] clRGinfo ----------------------------------------------------------------------------- Group Name     Type       State      Location ----------------------------------------------------------------------------- rg1            cascading  OFFLINE    tivaix1                           OFFLINE    tivaix2 rg2            cascading  ONLINE     tivaix2                           OFFLINE    tivaix1 [root@tivaix1:/home/root] clRGinfo -s rg1:OFFLINE:tivaix1:cascading rg1:OFFLINE:tivaix2:cascading rg2:ONLINE:tivaix2:cascading rg2:OFFLINE:tivaix1:cascading 
end example

You use conman's fence command to raise the job fence on a CPU. If we want to raise the job fence on cluster node tivaix1 for an instance of IBM Tivoli Workload Scheduler that is running as CPU TIVAIX2, we log into the TWSuser user account of that instance, then run the command:

    conman "fence TIVAIX2 ; go ; noask" 

In our environment, we would log into maestro2 on tivaix1. A quiesce script would be running on a reintegrated cluster node, and remotely log into the surviving node to perform the job fence operation.

Example 4-34 shows one way to have a shell script wait for currently executing jobs under an instance of IBM Tivoli Workload Scheduler on a CPU to exit. It is intended to be run as root user. It simply uses the su command to run a command as the TWSuser user account that owns the instance of IBM Tivoli Workload Scheduler. The command that is run lists all jobs in the EXEC state on the CPU TIVAIX1, then counts the number of jobs returned. As long as the number of jobs in the EXEC state is not equal to zero, the code waits for a minute, then checks the number of jobs in the EXEC state again. Again, a quiesce script would remotely run this code on the surviving node against the desired instance of IBM Tivoli Workload Scheduler.

Example 4-34: Wait for currently executing jobs to exit

start example
 num_exec_jobs=`su - maestro -c "conman sj TIVAIX1#@.@+state='exec' 2>/dev/null | \ grep -v 'sj @#@.@+state=exec' | wc -l"` while ( [ ${num_exec_jobs} -ne 0 ] ) do         sleep 60         num_exec_jobs=`su - maestro -c "conman sj TIVAIX1#@.@+state='exec' 2>/dev/null | \ grep -v 'sj @#@.@+state=exec' | wc -l"` done 
end example

If the implemented quiesce script successfully quiesces the desired instance of IBM Tivoli Workload Scheduler, it can also be designed to automatically perform the resource group move. A script would use the clRGmove command, as shown in Example 4-35, to move resource group rg2 to tivaix2:

Example 4-35: Move a resource group using the clRGmove command

start example
 /usr/es/sbin/cluster/utilities/clRGmove -s 'false' -m -i -g 'rg2' -n 'tivaix2' 
end example

This command can be run from any cluster node.

In our environment, we copy our stub quiesce script to:

    /usr/es/sbin/cluster/sh/quiesce_tws.sh 

This script is copied to the same location on both cluster nodes tivaix1 and tivaix2. The stub does not perform any actual work, so it has no effect upon HACMP. In our environment, with CWOF set to true, the stub would have to run clRGmove to simulate quiescing. We still perform the quiescing manually as a result.

Tip

Make sure the basic HACMP services work for straight fallover and fallback scenarios before customizing HACMP behavior.

In a production deployment, the quiesce script would be implemented and tested only after basic configuration and testing of HACMP is successful.

Modify /etc/hosts and name resolution order

The IP hostnames we use for HACMP are configured in /etc/hosts so that local name resolution can be performed if access to the DNS server is lost. In our environment, our /etc/hosts file is the same on both cluster nodes tivaix1 and tivaix2, as shown in Figure 4-28 on page 251.


Figure 4-28: File /etc/hosts copied to all by cluster nodes of cluster we used

Name resolution order is controlled by the following items, in decreasing order of precedence (the first line overrides the second line, which in turn overrides the third line):

  • Environment variable NSORDER

  • Host settings in the /etc/netsvc.conf file

  • Host settings in the /etc/irs.conf file

In our environment, we used the following line in /etc/netsvc.conf to set the name resolution order on all cluster nodes:

    hosts = local, bind 

The /etc/netsvc.conf file on all cluster nodes is set to this line.

Note

In our environment, we used some IP hostnames that include underscores to test HACMP's handling of name resolution. In a live production environment, we do not recommend this practice.

Underscores are not officially supported in DNS, so some of the host entries we use for our environment can never be managed by strict DNS servers. The rules for legal IP hostnames are set by RFC 952:

  • http://www.ietf.org/rfc/rfc952.txt

RFC 1123 also sets the rules for legal IP hostnames:

  • http://www.ietf.org/rfc/rfc1123.txt

All the entries for /etc/hosts are drawn from the planning worksheets that you fill out when planning for HACMP.

Configure HACMP service IP labels/addresses

A service IP label/address is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label. This connection can be node-bound or taken over by multiple nodes.

For the standard configuration, it is assumed that the connection will allow IP Address Takeover (IPAT) via aliases.

The /etc/hosts file on all nodes must contain all IP labels and associated IP addresses that you want to discover.

Follow this procedure to define service IP labels for your cluster:

  1. Enter: smit hacmp.

  2. Go to HACMP -> Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Service IP Labels/Addresses and press Enter.

  3. Fill in field values as follows as shown in Figure 4-29 on page 253:

    IP Label/IP Address

    Enter, or select from the picklist, the IP label/IP address to be kept highly available.

    Network Name

    Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured. If you leave the field blank, HACMP fills in this field automatically with the network type plus a number appended, starting with 1 (for example, netether1).


    Figure 4-29: Enter service IP label for tivaix1

    Figure 4-29 shows how we entered the service address label for tivaix1. In our environment, we use tivaix1_svc as the IP label and net_ether_01 as the network name.

  4. Press Enter after filling in all required fields. HACMP now checks the validity of the IP Interface configuration.

  5. Repeat the previous steps until you have configured all IP service labels for each network, as needed.

    In our environment, we create another service IP label for cluster node tivaix2, as shown in Figure 4-30 on page 254.


    Figure 4-30: How to enter service IP labels for tivaix2

    We used tivaix2_svc as the IP label and net_ether_01 as the network name. Note how we assigned the network name net_ether_01 in both cases, so that both sets of service IP labels are in the same HACMP network.

Configure HACMP networks and heartbeat paths

The cluster should have more than one network, to avoid a single point of failure. Often the cluster has both IP and non-IP based networks in order to use different heartbeat paths. Use the Add a Network to the HACMP cluster SMIT screen to define HACMP IP and point-to-point networks. Running HACMP discovery before configuring is recommended, to speed up the process.

In our environment, we use IP-based networks, heartbeating over IP aliases, and point-to-point networks over Target Mode SSA. In this section we show how to configure IP-based networks and heartbeating using IP aliases. Refer to "Configure heartbeating" on page 213 for information about configuring point-to-point networks over Target Mode SSA.

Configure IP-Based networks

To configure IP-based networks, take the following steps:

  1. Enter: smit hacmp.

  2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Networks -> Add a Network to the HACMP Cluster and press Enter.

  3. Select the type of network to configure and press Enter. The Add an IP-Based Network to the HACMP Cluster SMIT screen displays the configuration fields.

    In our environment, we selected ether for the type of network to configure.

  4. Enter the information as follows:

    Network Name

    If you do not enter a name, HACMP will give the network a default network name made up of the type of network with a number appended (for example, ether1). If you change the name for this network, use no more than 32 alphanumeric characters and underscores.

    Network Type

    This field is filled in depending on the type of network you selected.

    Netmask

    The netmask (for example, 255.255.255.0).

    Enable IP Takeover via IP Aliases

    The default is True. If the network does not support IP aliases, then IP Replacement will be used. IP Replacement is the mechanism whereby one IP address is removed from an interface, and another IP address is added to that interface. If you want to use IP Replacement on a network that does support aliases, change the default to False.

    IP Address Offset for Heartbeating over IP Aliases

    Enter the base address of a private address range for heartbeat addresses (for example 10.10.10.1). HACMP will use this address to automatically generate IP addresses for heartbeat for each boot interface in the configuration. This address range must be unique and must not conflict with any other subnets on the network.

    Refer to section "Heartbeat Over IP Aliases" in Chapter 3, Planning Cluster Network Connectivity in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, and to your planning worksheet for more information on selecting a base address for use by Heartbeating over IP Aliases.

    Clear this entry to use the default heartbeat method.

    In our environment, we entered the values for the IP-based network as shown in Figure 4-31. We used the network name of net_ether_01, with a netmask of 255.255.254.0 for our lab network, and set an IP address offset for heartbeating over IP aliases of 172.16.100.1, corresponding to the offset we chose during the planning stage. Because our lab systems use network interface cards capable of supporting IP aliases, we leave the flag Enable IP Address Takeover via IP Aliases toggled to Yes.


    Figure 4-31: Add an IP-Based Network to the HACMP Cluster SMIT screen

  5. Press Enter to configure this network.

  6. Repeat the operation to configure more networks.

    In our environment, this is the only network we configured, so we did not configure any other HACMP networks.

Configure heartbeating over IP aliases

In HACMP 5.1, you can configure heartbeating over IP Aliases to establish IP-based heartbeat rings over IP Aliases to run over your existing topology. Heartbeating over IP Aliases supports either IP Address Takeover (IPAT) via IP Aliases or IPAT via IP Replacement. The type of IPAT configured determines how HACMP handles the service label:

IPAT via IP Aliases

The service label, as well as the heartbeat alias, is aliased onto the interface.

IPAT via IP Replacement

The service label is swapped with the interface IP address, not the heartbeating alias.

Note

HACMP removes the aliases from the interfaces at shutdown. It creates the aliases again when the network becomes operational. The /tmp/hacmp.out file records these changes.

To configure heartbeating over IP Aliases, you specify an IP address offset when configuring an interface. See the preceding section for details. Make sure that this address does not conflict with addresses configured on your network.

When you run HACMP verification, the clverify utility verifies that:

  • The configuration is valid for the address range

  • All interfaces are the same type (for example, Ethernet) and have the same subnet mask

  • The offset address allots sufficient addresses and subnets on the network.

In our environment we use IPAT via IP aliases.

Configure HACMP resource groups

This creates a container to organize HACMP resources into logical groups that are defined later. Refer to High Availability Cluster Multi-Processing for AIX Concepts and Facilities Guide Version 5.1, SC23-4864, for an overview of types of resource groups you can configure in HACMP 5.1. Refer to the chapter on planning resource groups in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, for further planning information. You should have your planning worksheets in hand.

Using the standard path, you can configure resource groups that use the basic management policies. These policies are based on the three predefined types of startup, fallover, and fallback policies: cascading, rotating, concurrent.

In addition to these, you can also configure custom resource groups for which you can specify slightly more refined types of startup, fallover and fallback policies.

Once the resource groups are configured, if it seems necessary for handling certain applications, you can use the Extended Configuration path to change or refine the management policies of particular resource groups (especially custom resource groups).

Configuring a resource group involves two phases:

  • Configuring the resource group name, management policy, and the nodes that can own it

  • Adding the resources and additional attributes to the resource group.

Refer to your planning worksheets as you name the groups and add the resources to each one.

To create a resource group:

  1. Enter: smit hacmp.

  2. On the HACMP menu, select Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Add a Standard Resource Group and press Enter.

    You are prompted to select a resource group management policy.

  3. Select Cascading, Rotating, Concurrent or Custom and press Enter.

    For our environment, we used Cascading.

    Depending on the previous selection, you will see a screen titled Add a Cascading | Rotating | Concurrent | Custom Resource Group. The screen will only show options relevant to the type of the resource group you selected. If you select custom, you will be asked to refine the startup, fallover, and fallback policy before continuing.

  4. Enter the field values as follows for a cascading, rotating, or concurrent resource group (Figure 4-32 on page 259):

    Resource Group Name

    Enter the desired name. Use no more than 32 alphanumeric characters or underscores; do not use a leading numeric.

    Do not use reserved words. See "List of Reserved Words" in Chapter 6 of High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Duplicate entries are not allowed.

    Participating Node Names

    Enter the names of the nodes that can own or take over this resource group. Enter the node with the highest priority for ownership first, followed by the nodes with the lower priorities, in the desired order. Leave a space between node names (for example, NodeA NodeB NodeX).


    Figure 4-32: Configure resource group rg1

    If you choose to define a custom resource group, you define additional fields. We do not use custom resource groups in this redbook for simplicity of presentation.

    Figure 4-32 shows how we configured resource group rg1 in the environment implemented by this redbook. We use this resource group to contain the instances of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework normally running on tivaix1.

    Figure 4-33 shows how we configured resource group rg2 in our environment. We used this resource group to contain the instances of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework normally running on tivaix2.


    Figure 4-33: How to configure resource group rg2

Configure cascading without fallback, other attributes

We configured all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Workload Scheduler can be given enough time to quiesce before falling back. This is part of the extended resource group configuration.

We use this step to also configure other attributes of the resource groups, such as the associated shared volume group and filesystems.

To configure CWOF and other resource group attributes:

  1. Enter: smit hacmp.

  2. Go to Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Change/Show Resources for a Standard Resource Group and press Enter to display a list of defined resource groups.

  3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, and Participating Node Names (Default Node Priority) fields filled in.

    Note

    SMIT displays only valid choices for resources, depending on the type of resource group that you selected. The fields are slightly different for custom, non-concurrent, and concurrent groups.

    If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings.

  4. Enter the field values as follows:

    Service IP Label/IP Addresses

    (Not an option for concurrent or custom concurrent-like resource groups.) List the service IP labels to be taken over when this resource group is taken over. Press F4 to see a list of valid IP labels. These include addresses which rotate or may be taken over.

    Filesystems (empty is All for specified VGs)

    (Not an option for concurrent or custom concurrent-like resource groups.) If you leave the Filesystems (empty is All for specified VGs) field blank and specify the shared volume groups in the Volume Groups field below, all file systems will be mounted in the volume group. If you leave the Filesystems field blank and do not specify the volume groups in the field below, no file systems will be mounted.

    You may also select individual file systems to include in the resource group. Press F4 to see a list of the file systems. In this case only the specified file systems will be mounted when the resource group is brought online.

    Filesystems (empty is All for specified VGs) is a valid option only for non-concurrent resource groups.

    Volume Groups

    (If you are adding resources to a non-concurrent resource group) Identify the shared volume groups that should be varied on when this resource group is acquired or taken over. Select the volume groups from the picklist, or enter desired volume groups names in this field.

    Pressing F4 will give you a list of all shared volume groups in the resource group and the volume groups that are currently available for import onto the resource group nodes.

    Specify the shared volume groups in this field if you want to leave the field Filesystems (empty is All for specified VGs) blank and to mount all file systems in the volume group.

    If you specify more than one volume group in this field, then all file systems in all specified volume groups will be mounted; you cannot choose to mount all filesystems in one volume group and not to mount them in another.

    For example, in a resource group with two volume group (vg1 and vg2), if the field Filesystems (empty is All for specified VGs) is left blank, then all the filesystems in vg1 and vg2 will be mounted when the resource group is brought up.

    However, if the field Filesystems (empty is All for specified VGs) has only filesystems that are part of the vg1 volume group, then none of the filesystems in vg2 will be mounted, because they were not entered in the Filesystems (empty is All for specified VGs) field along with the filesystems from vg1.

    If you have previously entered values in the Filesystems field, the appropriate volume groups are already known to the HACMP software.

    Concurrent Volume Groups

    (Appears only if you are adding resources to a concurrent or custom concurrent-like resource group.) Identify the shared volume groups that can be accessed simultaneously by multiple nodes. Select the volume groups from the picklist, or enter desired volume groups names in this field.

    If you previously requested that HACMP collect information about the appropriate volume groups, then pressing F4 will give you a list of all existing concurrent capable volume groups that are currently available in the resource group, and concurrent capable volume groups available to be imported onto the nodes in the resource group.

    Disk fencing is turned on by default.

    Application Servers

    Indicate the application servers to include in the resource group. Press F4 to see a list of application servers.

    Note

    If you are configuring a custom resource group, and choose to use a dynamic node priority policy for a cascading-type custom resource group, you will see the field where you can select which one of the three predefined node priority policies you want to use.

    In our environment, we defined resource group rg1 as shown in Figure 4-34.


    Figure 4-34: Define resource group rg1

    For resource group rg1, we assigned tivaix1_svc as the service IP label, tiv_vg1 as the sole volume group to use, and tws_svr1 for the application server.

  5. Press Enter to add the values to the HACMP ODM.

  6. Repeat the operation for other resource groups to configure.

    In our environment, we defined resource group rg2 as shown in Figure 4-35.


    Figure 4-35: Define resource group rg2

Configure cascading without fallback

We configured all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Workload Scheduler can be given enough time to quiesce before falling back. This is part of the extended resource group configuration. To configure CWOF:

  1. Enter: smit hacmp.

  2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resource Group Configuration -> Change/Show Resources and Attributes for a Resource Group and press Enter.

    SMIT displays a list of defined resource groups.

  3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, Inter-site Management Policy, and Participating Node Names (Default Node Priority) fields filled in as shown in Figure 4-36.


    Figure 4-36: Set cascading without fallback (CWOF) for a resource group

    If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings.

  4. Enter true in the Cascading Without Fallback Enabled field by pressing Tab in the field until the value is displayed.

  5. Repeat the operation for any other applicable resource groups.

    In our environment, we applied the same operation to resource group rg2; all resources and attributes for resource group rg1 are shown in Example 4-36 on page 266.

    Example 4-36: All resources and attributes for resource group rg1

    start example
     [TOP]                                                   [Entry Fields]   Resource Group Name                                 rg1   Resource Group Management Policy                    cascading   Inter-site Management Policy                        ignore   Participating Node Names / Default Node Priority    tivaix1 tivaix2   Dynamic Node Priority (Overrides default)          []                          +   Inactive Takeover Applied                           false                      +   Cascading Without Fallback Enabled                  true                       +   Application Servers                                [tws_svr1]                  +   Service IP Labels/Addresses                        [tivaix1_svc]               +   Volume Groups                                      [tiv_vg1]                   +   Use forced varyon of volume groups, if necessary    false                      +   Automatically Import Volume Groups                  false                      +   Filesystems (empty is ALL for VGs specified)       [/usr/maestro]              +   Filesystems Consistency Check                       fsck                       +   Filesystems Recovery Method                         sequential                 +   Filesystems mounted before IP configured            false                      +   Filesystems/Directories to Export                  []                          +   Filesystems/Directories to NFS Mount               []                          +   Network For NFS Mount                              []                          +   Tape Resources                                     []                          +   Raw Disk PVIDs                                     []                          +   Fast Connect Services                              []                          +   Communication Links                                []                          +   Primary Workload Manager Class                     []                          +   Secondary Workload Manager Class                   []                          +   Miscellaneous Data                                 [] [BOTTOM] 
    end example

    For resource group rg2, all resources and attributes configured for it are shown in Example 4-37.

    Example 4-37: All resources and attributes for resource group rg2

    start example
     [TOP]                                                   [Entry Fields]   Resource Group Name                                 rg2   Resource Group Management Policy                    cascading   Inter-site Management Policy                        ignore   Participating Node Names / Default Node Priority    tivaix2 tivaix1   Dynamic Node Priority (Overrides default)          []                          +   Inactive Takeover Applied                           false                      +   Cascading Without Fallback Enabled                  true                       +   Application Servers                                [tws_svr2]                  +   Service IP Labels/Addresses                        [tivaix2_svc]               +   Volume Groups                                      [tiv_vg2]                   +   Use forced varyon of volume groups, if necessary    false                      +   Automatically Import Volume Groups                  false                      +   Filesystems (empty is ALL for VGs specified)       [/usr/maestro2]             +   Filesystems Consistency Check                       fsck                       +   Filesystems Recovery Method                         sequential                 +   Filesystems mounted before IP configured            false                      +   Filesystems/Directories to Export                  []                          +   Filesystems/Directories to NFS Mount               []                          +   Network For NFS Mount                              []                          +   Tape Resources                                     []                          +   Raw Disk PVIDs                                     []                          +   Fast Connect Services                              []                          +   Communication Links                                []                          +   Primary Workload Manager Class                     []                          +   Secondary Workload Manager Class                   []                          +   Miscellaneous Data                                 [] [BOTTOM] 
    end example

We used this SMIT screen to overview and configure for the resource groups any resources we may have missed earlier.

Configure pre-event and post-event commands

To define your customized cluster event scripts, take the following steps:

  1. Enter: smit hacmp.

  2. Go to HACMP Extended Configuration -> Extended Event Configuration -> Configure Pre- or Post-Events -> Add a Custom Cluster Event and press Enter.

  3. Enter the field values as follows:

    Cluster Event Command Name

    Enter a name for the command. The name can have a maximum of 31 characters.

    Cluster Event Description

    Enter a short description of the event.

    Cluster Event Script Filename

    Enter the full pathname of the user-defined script to execute.

    In our environment, we defined the cluster event quiesce_tws in the Cluster Event Name field for the script we added in "Add a custom post-event HACMP script" on page 242. We entered the following file pathname to the field Cluster Event Script Filename:

        /usr/es/sbin/cluster/sh/quiesce_tws.sh 

    Figure 4-37 shows how we entered these fields.


    Figure 4-37: Add a Custom Cluster Event SMIT screen

  4. Press Enter to add the information to HACMP custom in the local Object Data Manager (ODM).

  5. Go back to the HACMP Extended Configuration menu and select Verification and Synchronization to synchronize your changes across all cluster nodes.

Note

Synchronizing does not propagate the actual new or changed scripts; you must add these to each node manually.

Configure pre-event and post-event processing

Complete the following steps to set up or change the processing for an event. In this step you indicate to the cluster manager to use your customized pre-event or post-event commands.

You only need to complete these steps on a single node. The HACMP software propagates the information to the other nodes when you verify and synchronize the nodes.

Note

When resource groups are processed in parallel, fewer cluster events occur in the cluster. In particular, only node_up and node_down events take place, and events such as node_up_local, or get_disk_vg_fs do not occur if resource groups are processed in parallel.

As a result, the use of parallel processing reduces the number of particular cluster events for which you can create customized pre- or post-event scripts. If you start using parallel processing for some of the resource groups in your configuration, be aware that your existing event scripts may not work for these resource groups.

For more information, see Appendix C, "Resource Group Behavior During Cluster Events" in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, and the chapter on planning events in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00.

To configure pre- and post-events for customized event processing, and specifically the quiesce_tws post-event script, follow these steps:

  1. Enter: smit hacmp.

  2. Select HACMP Extended Configuration -> Extended Event Configuration -> Change/Show Pre-defined HACMP Events to display a list of cluster events and subevents.

  3. Select an event or subevent that you want to configure and press Enter. SMIT displays the screen with the event name, description, and default event command shown in their respective fields.

    In our environment, we used node_up_complete as the event to configure.

  4. Enter field values as follows:

    Event Name

    The name of the cluster event to be customize.

    Description

    A brief description of the event's function. This information cannot be changed.

    Event Command

    The full pathname of the command that processes the event. The HACMP software provides a default script. If additional functionality is required, it is strongly recommended that you make changes by adding pre-or post-event processing of your own design, rather than by modifying the default scripts or writing new ones.

    Notify Command

    (Optional) Enter the full pathname of a user-supplied script to run both before and after a cluster event. This script can notify the system administrator that an event is about to occur or has occurred.

    The arguments passed to the command are: the event name, one keyword (either start or complete), the exit status of the event (if the keyword was complete), and the same trailing arguments passed to the event command.

    Pre-Event Command

    (Optional) If you have defined custom cluster events, press F4 for the list. Or, enter the name of a custom-defined event to run before the HACMP Cluster event command executes. This command provides pre-processing before a cluster event occurs.

    The arguments passed to this command are the event name and the trailing arguments passed to the event command. Remember that the ClusterManager will not process the event until this pre-event script or command has completed.

    Post-Event Command

    (Optional) If you have defined custom cluster events, press F4 for the list. Or, enter the name of the custom event to run after the HACMP Cluster event command executes successfully. This script provides post-processing after a cluster event. The arguments passed to this command are the event name, event exit status, and the trailing arguments passed to the event command.

    Recovery Command

    (Optional) Enter the full pathname of a user-supplied script or AIX command to execute to attempt to recover from a cluster event command failure. If the recovery command succeeds and the retry count is greater than zero, the cluster event command is rerun. The arguments passed to this command are the event name and the arguments passed to the event command.

    Recovery Counter

    Enter the number of times to run the recovery command. Set this field to zero if no recovery command is specified, and to at least one (1) if a recovery command is specified.

    In our environment, we enter the quiesce_tws post-event command for the node_up_complete event, as shown in Figure 4-38.


    Figure 4-38: Add quiesce_tws script in Change/Show Cluster Events SMIT screen

  5. Press Enter to add this information to the HACMP ODM.

  6. Return to the HACMP Extended Configuration screen and synchronize your event customization by selecting the Verification and Synchronization option. Note that all HACMP event scripts are maintained in the /usr/es/sbin/cluster/events directory. The parameters passed to a script are listed in the script's header. If you want to modify the node_up_complete event itself, for example, you could customize it by locating the corresponding script in this directory.

See Chapter 8, "Monitoring an HACMP Cluster" in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, for a discussion of event emulation to see how to emulate HACMP event scripts without actually affecting the cluster.

Configure HACMP persistent node IP label/addresses

A persistent node IP label is an IP alias that can be assigned to a network for a specified node. A persistent node IP label is a label which:

  • Always stays on the same node (is node-bound).

  • Co-exists with other IP labels present on an interface.

  • Does not require installing an additional physical interface on that node.

  • Is not part of any resource group.

Assigning a persistent node IP label for a network on a node allows you to have a node-bound address on a cluster network that you can use for administrative purposes to access a specific node in the cluster.

Refer to "Configuring HACMP Persistent Node IP Labels/Addresses" in Chapter 3, "Configuring HACMP Cluster Topology and Resources (Extended)" in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, for information about persistent node IP labels prerequisites.

To add persistent node IP labels, follow these steps:

  1. Enter: smit hacmp.

  2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Persistent Node IP Label/Addresses -> Add a Persistent Node IP Label/Address and press Enter. The Select a Node SMIT dialog shows cluster nodes currently defined for the cluster.

  3. Select a node to add a persistent node IP label/address to and then press Enter, as shown in the following figure. The Add a Persistent Node IP Label/Address SMIT screen is displayed.

    In our environment, we start with cluster node tivaix1, as shown in Figure 4-39 on page 273.


    Figure 4-39: Select a Node SMIT dialog

  4. Enter the field values as follows:

    Node Name

    The name of the node on which the IP label/address will be bound.

    Network Name

    The name of the network on which the IP label/address will be bound.

    Node IP Label/Address

    The IP label/address to keep bound to the specified node.

    In our environment, we enter net_ether_01 for the Network Name field, and tivaix1 for the Node IP Label/Address field, as shown in Figure 4-40 on page 274.


    Figure 4-40: Add a Persistent Node IP Label/Address SMIT screen for tivaix1

    Note

    If you want to use any HACMP IP address over DNS, do not use underscores in the IP hostname, because DNS does not recognize underscores.

    The use of underscores in the IP hostnames in our environment was a way to ensure that they were never introduced into the lab's DNS server.

    We entered these values by pressing F4 to select them from a list. In our environment, the list for the Network Name field is shown in Figure 4-41 on page 275.


    Figure 4-41: Network Name SMIT dialog

    The selection list dialog for the Node IP Label/Address is similar.

  5. Press Enter.

In our environment, we also created a persistent node IP label for cluster node tivaix2, as shown in Figure 4-42 on page 276. Note that we used the enter the same Network Name field value.


Figure 4-42: Add a Persistent Node IP Label/Address SMIT screen for tivaix2

Configure predefined communication interfaces

In our environment, communication interfaces and devices were already configured to AIX, and needed to be configured to HACMP (that means no HACMP discovery).

To add predefined network interfaces to the cluster, follow these steps:

  1. Enter: smit hacmp.

  2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Communication Interfaces/Devices -> Add Communication Interfaces/Devices and press Enter.

    A SMIT selector screen appears that lets you add previously discovered, or previously defined network interfaces:

    Add Discovered Communication Interfaces and Devices

    Displays a list of interfaces and devices which HACMP has been able to determine as being already configured to the operating system on a node in the cluster.

    Add Pre-defined Communication Interfaces and Devices

    Displays a list of all communication interfaces and devices supported by HACMP.

    Select the predefined option, as shown in Figure 4-43. SMIT displays a selector screen for the Predefined Communications Type.


    Figure 4-43: Select Add a Pre-defined Communication Interface to HACMP Cluster configuration

  3. Select Communication Interfaces as shown in Figure 4-44 and press Enter. The Select a Network SMIT selector screen appears.


    Figure 4-44: Select the Pre-Defined Communication type SMIT selector screen

  4. Select a network, as shown in Figure 4-45, and press Enter.


    Figure 4-45: Select a Network SMIT selector screen

The Add a Communication Interface screen appears. In our environment we only had one network, net_ether_01, and we selected that network.

  1. Fill in the fields as follows:

    Node Name

    The name of the node on which this network interface physically exists.

    Network Name

    A unique name for this logical network.

    Network Interface

    Enter the network interface associated with the communication interface (for example, en0).

    IP Label/Address

    The IP label/address associated with this communication interface which will be configured on the network interface when the node boots. The picklist filters out IP labels/addresses already configured to HACMP.

    Network Type

    The type of network media/protocol (for example, Ethernet, Token Ring, FDDI, and so on). Select the type from the predefined list of network types.

    Note

    The network interface that you are adding has the base or service function by default. You do not specify the function of the network interface as in releases prior to HACMP 5.1, but further configuration defines the function of the interface.

    In our environment, we enter the IP label tivaix1_bt1 for interface en0 on cluster node tivaix1 as shown in Figure 4-46 on page 279.


    Figure 4-46: Add a Communication Interface SMIT screen

  2. Repeat this operation for any remaining communication interfaces that you planned for earlier.

In our environment, we configured the communication interfaces shown in Table 4-1 to HACMP network net_ether_01. Note that the first row corresponds to Figure 4-46.

Table 4-1: Communication interfaces to configure for network net_ether_01

Network Interface

IP Label/Address

Node Name

en0

tivaix1_bt1 (192.168.10)

tivaix1

en1

tivaix1_bt2 (10.1.1.101)

tivaix1

en0

tivaix2_bt1 (192.168.10)

tivaix2

en1

tivaix2_bt2 (10.1.1.101)

tivaix2

If you configure a Target Mode SSA network as described in "Configure heartbeating" on page 213, you should not have to configure the interfaces listed in Table 4-2; we only show this information so you can verify other HACMP communication interface configurations. For HACMP network net_tmssa_01, we configured the following communication interfaces.

Table 4-2: Communication interfaces to configure for network tivaix1_tmssa2_01

Device Name

Device Path

Node Name

tivaix1_tmssa1_01

/dev/tmssa2

tivaix1

tivaix2_tmssa1_01

/dev/tmssa1

tivaix2

Verify the configuration

When all the resource groups are configured, verify the cluster components and operating system configuration on all nodes to ensure compatibility. If no errors are found, the configuration is then copied (synchronized) to each node in the cluster. If Cluster Services are running on any node, the configuration changes will take effect, possibly causing one or more resources to change state.

Complete the following steps to verify and synchronize the cluster topology and resources configuration:

  1. Enter: smit hacmp.

  2. Go to Initialization and Standard Configuration -> HACMP Verification and Synchronization and press Enter.

    SMIT runs the clverify utility. The output from the verification is displayed in the SMIT Command Status window. If you receive error messages, make the necessary changes and run the verification procedure again. You may see warnings if the configuration has a limitation on its availability (for example, only one interface per node per network is configured).

    Figure 4-47 on page 281 shows a sample SMIT screen of a successful verification of an HACMP configuration.


    Figure 4-47: COMMAND STATUS SMIT screen for successful verification of an HACMP Cluster configuration

It is useful to view the cluster configuration to document it for future reference. To display the HACMP Cluster, follow these steps:

  1. Enter: smit hacmp.

  2. Go to Initialization and Standard Configuration -> Display HACMP Configuration and press Enter.

    SMIT displays the current topology and resource information.

    The configuration for our environment is shown in Figure 4-48 on page 282.


    Figure 4-48: COMMAND STATUS SMIT screen for our environment's configuration

    If you want to obtain the same information from the command line, use the cltopinfo command as shown in Example 4-38.

    Example 4-38: Obtain the HACMP configuration using the cltopinfo command

    start example
     [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/cltopinfo Cluster Description of Cluster: cltivoli Cluster Security Level: Standard There are 2 node(s) and 3 network(s) defined NODE tivaix1:         Network net_ether_01                 tivaix1_svc     9.3.4.3                 tivaix2_svc     9.3.4.4                 tivaix1_bt2     10.1.1.101                 tivaix1_bt1     192.168.100.101         Network net_tmssa_01                 tivaix1_tmssa2_01       /dev/tmssa2 NODE tivaix2:         Network net_ether_01                 tivaix1_svc     9.3.4.3                 tivaix2_svc     9.3.4.4                 tivaix2_bt1     192.168.100.102                 tivaix2_bt2     10.1.1.102         Network net_tmssa_01                 tivaix2_tmssa1_01       /dev/tmssa1 Resource Group rg1         Behavior                 cascading         Participating Nodes      tivaix1 tivaix2         Service IP Label                 tivaix1_svc Resource Group rg2         Behavior                 cascading         Participating Nodes      tivaix2 tivaix1         Service IP Label                 tivaix2_svc 
    end example

The clharvest_vg command can also be used for a more detailed configuration information, as shown in Example 4-39.

Example 4-39: Gather detailed shared volume group information with the clharvest_vg command

start example
 [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clharvest_vg -w Initializing.. Gathering cluster information, which may take a few minutes... Processing... Storing the following information in file /usr/es/sbin/cluster/etc/config/clvg_config tivaix1: Hdisk:          hdisk0 PVID:           0001813fe67712b5 VGname:         rootvg VGmajor:        active Conc-capable:   Yes VGactive:       No Quorum-required:Yes Hdisk:          hdisk1 PVID:           0001813f1a43a54d VGname:         rootvg VGmajor:        active Conc-capable:   Yes VGactive:       No Quorum-required:Yes Hdisk:          hdisk2 PVID:           0001813f95b1b360 VGname:         rootvg VGmajor:        active Conc-capable:   Yes VGactive:       No Quorum-required:Yes Hdisk:          hdisk3 PVID:           0001813fc5966b71 VGname:         rootvg VGmajor:        active Conc-capable:   Yes VGactive:       No Quorum-required:Yes Hdisk:          hdisk4 PVID:           0001813fc5c48c43 VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk5 PVID:           0001813fc5c48d8c VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk6 PVID:           000900066116088b VGname:         tiv_vg1 VGmajor:        45 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk7 PVID:           000000000348a3d6 VGname:         tiv_vg1 VGmajor:        45 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk8 PVID:           00000000034d224b VGname:         tiv_vg2 VGmajor:        46 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk9 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk10 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk11 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk12 PVID:           00000000034d7fad VGname:         tiv_vg2 VGmajor:        46 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk13 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No FREEMAJORS:     48... tivaix2: Hdisk:          hdisk0 PVID:           0001814f62b2a74b VGname:         rootvg VGmajor:        active Conc-capable:   Yes VGactive:       No Quorum-required:Yes Hdisk:          hdisk1 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk2 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk3 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk4 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk5 PVID:           000900066116088b VGname:         tiv_vg1 VGmajor:        45 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk6 PVID:           000000000348a3d6 VGname:         tiv_vg1 VGmajor:        45 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk7 PVID:           00000000034d224b VGname:         tiv_vg2 VGmajor:        46 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk16 PVID:           0001814fe8d10853 VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk17 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk18 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk19 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No Hdisk:          hdisk20 PVID:           00000000034d7fad VGname:         tiv_vg2 VGmajor:        46 Conc-capable:   No VGactive:       No Quorum-required:Yes Hdisk:          hdisk21 PVID:           none VGname:         None VGmajor:        0 Conc-capable:   No VGactive:       No Quorum-required:No FREEMAJORS:     48... 
end example

Start HACMP Cluster services

After verifying the HACMP configuration, start HACMP Cluster services. Before starting HACMP Cluster services, verify that all network interfaces are configured with the boot IP labels. Example 4-40 on page 288 for tivaix1 shows how to use the ifconfig and host commands to verify that the configured IP addresses (192.168.100.101, 9.3.4.194, and 10.1.1.101 in the example, highlighted in bold) on the network interfaces all correspond to boot IP labels.

Example 4-40: Configured IP addresses before starting HACMP Cluster services on tivaix1

start example
 [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255         inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255          tcp_sendspace 131072 tcp_recvspace 65536 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>         inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255         inet6 ::1/0          tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] host 192.168.100.101 tivaix1_bt1 is 192.168.100.101, Aliases:    tivaix1 [root@tivaix1:/home/root] host 9.3.4.194 tivaix1 is 9.3.4.194, Aliases:    tivaix1.itsc.austin.ibm.com [root@tivaix1:/home/root] host 10.1.1.101 tivaix1_bt2 is 10.1.1.101 
end example

Example 4-41 shows the configured IP addresses before HACMP starts for tivaix2.

Example 4-41: Configured IP addresses before starting HACMP Cluster services on tivaix2

start example
 [root@tivaix2:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 192.168.100.102 netmask 0xfffffe00 broadcast 192.168.101.255         inet 9.3.4.195 netmask 0xfffffe00 broadcast 9.3.5.255 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 10.1.1.102 netmask 0xfffffe00 broadcast 10.1.1.255         tcp_sendspace 131072 tcp_recvspace 65536 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>         inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255         inet6 ::1/0          tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix2:/home/root] host 192.168.100.102 tivaix2_bt1 is 192.168.100.102 [root@tivaix2:/home/root] host 9.3.4.195 tivaix2 is 9.3.4.195, Aliases:    tivaix2.itsc.austin.ibm.com [root@tivaix2:/home/root] host 10.1.1.102 tivaix2_bt2 is 10.1.1.102 
end example

To start HACMP Cluster services:

  1. Enter: smit hacmp.

  2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Start Cluster Services and press Enter. The Start Cluster Services SMIT screen is displayed.

  3. Add all cluster nodes you want to start to the Start Cluster Services on these nodes field as a comma-separated list of cluster node names. Press Enter to start HACMP Cluster services on the selected cluster nodes. In our environment, we enter the cluster node names tivaix1 and tivaix2 as shown in Figure 4-49.


    Figure 4-49: Start Cluster Services SMIT screen

  4. The COMMAND STATUS SMIT screen displays the progress of the start operation, and will appear similar to Figure 4-50 on page 303 if successful.


    Figure 4-50: COMMAND STATUS SMIT screen displaying successful start of cluster services

    Check the network interfaces again after the start operation is complete. The service IP label and the IP addresses for heartbeating over IP aliases are populated into the network interfaces after HACMP starts.

    The service IP address is populated into any available network interface; HACMP selects which network interface. One IP address for heartbeating over IP aliases is populated by HACMP for each available network interface.

    Example 4-42 on page 291 shows the configured IP addresses on the network interfaces of tivaix1 after HACMP is started. Note that three new IP addresses are added into our environment, 172.16.100.2, 172.16.102.2, and 9.3.4.3, highlighted in bold in the example output.

    Example 4-42: Configured IP addresses after starting HACMP Cluster services on tivaix1

    start example
     [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255         inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255         inet 172.16.100.2 netmask 0xfffffe00 broadcast 172.16.101.255         tcp_sendspace 131072 tcp_recvspace 65536 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255         inet 172.16.102.2 netmask 0xfffffe00 broadcast 172.16.103.255         inet 9.3.4.3 netmask 0xfffffe00 broadcast 9.3.5.255 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>         inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255         inet6 ::1/0          tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] host 172.16.100.2 host: 0827-803 Cannot find address 172.16.100.2. [root@tivaix1:/home/root] host 172.16.102.2 host: 0827-803 Cannot find address 172.16.102.2. [root@tivaix1:/home/root] host 9.3.4.3 tivaix1_svc is 9.3.4.3 
    end example

    The IP addresses for heartbeating over IP aliases are 172.16.100.2 and 172.16.102.2. The service IP address is 9.3.4.3.

    In our environment we do not assign IP hostnames to the IP addresses for heartbeating over IP aliases, so the host commands for these addresses return an error.

    Example 4-43 shows the IP addresses populated by HACMP after it is started on tivaix2. The addresses on tivaix2 are 172.16.100.3, 172.16.102.3 for the IP addresses for heartbeating over IP aliases, and 9.3.4.4 for the service IP label, highlighted in bold.

    Example 4-43: Configured IP addresses after starting HACMP Cluster services on tivaix2

    start example
     [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 192.168.100.102 netmask 0xfffffe00 broadcast 192.168.101.255         inet 9.3.4.195 netmask 0xfffffe00 broadcast 9.3.5.255         inet 172.16.100.3 netmask 0xfffffe00 broadcast 172.16.101.255 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>         inet 10.1.1.102 netmask 0xfffffe00 broadcast 10.1.1.255         inet 172.16.102.3 netmask 0xfffffe00 broadcast 172.16.103.255         inet 9.3.4.4 netmask 0xfffffe00 broadcast 9.3.5.255          tcp_sendspace 131072 tcp_recvspace 65536 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255        inet6 ::1/0         tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] host 172.16.100.3 host: 0827-803 Cannot find address 172.16.100.3. [root@tivaix1:/home/root] host 172.16.102.3 host: 0827-803 Cannot find address 172.16.102.3. [root@tivaix1:/home/root] host 9.3.4.4 tivaix2_svc is 9.3.4.4 
    end example

HACMP is now started on the cluster.

Verify HACMP status

Ensure that HACMP has actually started before starting to use its features. Log into the first node as root user and follow these steps:

  1. Enter: smit hacmp.

  2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Show Cluster Services and press Enter to move a resource group. The COMMAND STATUS SMIT screen is displayed with the current status of all HACMP subsystems on the current node, similar to Figure 4-51 on page 293.


    Figure 4-51: Current status of all HACMP subsystems on a cluster node

  3. You can also verify the status of each node on an HACMP Cluster by running the following command:

     /usr/es/sbin/cluster/utilities/clshowsrv -a 

    This produces output similar to Example 4-44.

    Example 4-44: Using the command line to obtain the current status of all HACMP subsystems on a cluster node

    start example
     $ /usr/es/sbin/cluster/utilities/clshowsrv -a Subsystem         Group            PID          Status  clstrmgrES       cluster          16684        active  clinfoES         cluster          12950        active  clsmuxpdES       cluster          26856        active  cllockdES        lock                          inoperative 
    end example

Whether using SMIT or the command line, only the following HACMP subsystems must be active on each node in the cluster: clstrmgrES, clinfoES, and clsmuxpdES. All other subsystems should be active if their services are required by your application(s).

Repeat the procedure for all remaining nodes in the cluster. In our cluster, we repeated the procedure on tivaix2, and verified that the same subsystems are active.

Test HACMP resource group moves

Manually testing the movement of resource groups between cluster nodes further validates the HACMP configuration of the resource groups. If a resource group does not fall over to a cluster node after it was successfully moved manually, then you immediately know that addressing the issue involves addressing the HACMP fallover process, and likely not the resource group configuration.

To test HACMP resource group moves, follow these steps:

  1. Enter: smit hacmp.

  2. Go to System Management (C-SPOC) -> HACMP Resource Group and Application Management -> Move a Resource Group to Another Node and press Enter to move a resource group. The Select a Resource Group SMIT dialog is displayed.

  3. Move the cursor to resource group rg1, as shown in Figure 4-52, and press Enter.


    Figure 4-52: Select a Resource Group SMIT dialog

  4. Move the cursor to destination node tivaix2, as shown in Figure 4-53 on page 295, and press Enter.


    Figure 4-53: Select a Destination Node SMIT dialog

  5. The Move a Resource Group SMIT dialog is displayed as in Figure 4-54 on page 296. Press Enter to start moving resource group rg2 to destination node tivaix2.


    Figure 4-54: Move a Resource Group SMIT screen

  6. A COMMAND STATUS SMIT screen displays the progress of the resource group move. It takes about two minutes to complete the resource group move in our environment (it might take longer, depending upon your environment's specific details).

    When the resource group move is complete, the COMMAND STATUS screen displays the results of the move. This is shown in Figure 4-55 on page 297, where we move resource group rg1 to cluster node tivaix2.


    Figure 4-55: COMMAND STATUS SMIT screen for moving a resource group

  7. Repeat the process of moving resource groups in comprehensive patterns to verify that all possible resource group moves can be performed by HACMP.

Table 4-3 lists all the resource group moves that we performed to test all possible combinations. (Note that you have already performed the resource group move listed in the first line of this table.)

Table 4-3: Resource group movement combinations to test

Resource Group

Destination Node

Resource Groups in tivaix1 after move

Resource Groups in tivaix2 after move

rg1

tivaix2

none

rg1, rg2

rg2

tivaix1

rg2

rg1

rg1

tivaix1

rg1, rg2

none

rg2

tivaix2

rg1

rg2

Of course, if you add more cluster nodes to a mutual takeover configuration, you will need to test more combinations of resource group moves. We recommend that you automate the testing if possible for clusters of six or more cluster nodes.

Live test of HACMP fallover

After testing HACMP manually, perform a live test of its fallover capabilities.

Restriction:

Do not perform this procedure unless you are absolutely certain that all users are logged off the node and that restarting the node hardware is allowed. This procedure involves restarting the node, which can lead to lost data if it is performed while users are still logged into the node.

A live test ensures that HACMP performs as expected during fallover and fallback incidents. To perform a live test of HACMP in our environment:

  1. Make sure that HACMP is running on all cluster nodes before starting this operation.

  2. On the node you want to simulate a catastrophic failure upon, run the sync command several times, followed by the halt command:

        sync ; sync ; sync ; halt -q 

    This flushes disk buffers to the hard disks and immediately halts the machine, simulating a catastrophic failure. Running sync multiple times is not strictly necessary on modern AIX systems, but it is performed as a best practice measure. If the operation is successful, the terminal displays the following message:

        ....Halt completed.... 

    In our environment, we ran the halt command on tivaix2.

  3. If you are logged in remotely to the node, your remote connection is disconnected shortly after this message is displayed. To verify the success of the test, log into the node that will accept the failed node's resource group(s) and inspect the resource groups reported for that node using the lsvg, ifconfig and clRGinfo commands.

    In our environment, we logged into tivaix2, then ran the halt command. We then logged into tivaix1, and ran the lsvg, ifconfig, and clRGinfo commands to identify the volume groups, service label/service IP addresses, and resource groups that fall over from tivaix2, as shown in Example 4-45.

    Example 4-45: Using commands on tivaix1 to verify that tivaix2 falls over to tivaix1

    start example
     [root@tivaix1:/home/root] hostname tivaix1 [root@tivaix1:/home/root] lsvg -o tiv_vg2 tiv_vg1 rootvg [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64B IT,PSEG,CHAIN>         inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255         inet 9.3.4.3 netmask 0xfffffe00 broadcast 9.3.5.255         inet 9.3.4.4 netmask 0xfffffe00 broadcast 9.3.5.255         tcp_sendspace 131072 tcp_recvspace 65536 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64B IT,PSEG,CHAIN>         inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255         inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>         inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255         inet6 ::1/0          tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clRGinfo ----------------------------------------------------------------------------- Group Name     Type       State      Location ----------------------------------------------------------------------------- rg1            cascading  ONLINE     tivaix1                           OFFLINE    tivaix2 rg2            cascading  OFFLINE    tivaix2                           ONLINE     tivaix1 
    end example

    Note how volume group tiv_vg2 and the service IP label/IP address 9.3.4.4, both normally found on tivaix1, fall over to tivaix1. Also note that resource group rg2 is listed in the OFFLINE state for tivaix2, but in the ONLINE state for tivaix1.

  4. If you would like to get a simple list of the resource groups that are in the ONLINE state on a specific node, run the short script shown in Example 4-46 on the node you want to inspect for resource groups in the ONLINE state, replacing the string tivaix1 with the cluster node of your choice:

    Example 4-46: List resource groups in ONLINE state for a node

    start example
     /usr/es/sbin/cluster/utilities/clRGinfo -s | grep ONLINE | grep tivaix1 | \ awk -F':' '{ print $1 }' 
    end example

    In our environment, this script is run on tivaix1 and returns the results shown in Example 4-47 on page 300. This indicates that resource group rg2, which used to run on cluster node tivaix2, is now on cluster node tivaix1.

    Example 4-47: Obtain a simple list of resource groups that are in the ONLINE state on a specific node

    start example
     [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clRGinfo -s | grep ONLINE | \ > grep tivaix1 | awk -F':' '{ print $1 }' rg1 rg2 
    end example

  5. After the test, power back on the halted node.

    In our environment, we powered back on tivaix2.

  6. Start HACMP on the node that was halted after it powers back on. The node reintegrates back into the cluster.

  7. Verify that Cascading Without Fallback (CWOF) works.

    In our environment, we made sure that resource group rg2 still resides on cluster node tivaix1.

  8. Move the resource group back to its original node, using the preceding procedure for testing resource groups moves.

    In our environment, we moved resource group rg2 to tivaix2.

  9. Repeat the operation for other potential failure modes.

    In our environment, we tested halting cluster node tivaix1, and verified that resource group rg1 moved to cluster node tivaix2.

Configure HACMP to start on system restart

When you are satisfied with the verification of HACMP's functionality, configure AIX to automatically start the cluster subsystems when the node starts. The node then automatically joins the cluster when the machine restarts.

  1. Enter: smit hacmp.

  2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Start Cluster Services and press Enter to configure HACMP's cluster start attributes. The Start Cluster Services SMIT dialog is displayed as shown in Figure 4-56 on page 301.


    Figure 4-56: How to start HACMP on system restart

  3. In the Start now, on system restart or both field, press Tab to change the value to restart as shown in Example 4-56 on page 321t, hen press Enter so the cluster subsystems will start when the machine restarts.

HACMP now starts on the cluster nodes automatically when the node restarts.

Verify IBM Tivoli Workload Scheduler fallover

When halting cluster nodes during testing in "Live test of HACMP fallover" on page 298, IBM Tivoli Workload Scheduler will also start appropriately when a resource group is moved. Once you verify that a resource group's disk and network resources have moved, you must verify that IBM Tivoli Workload Scheduler itself functions in its new cluster node (or in HACMP terms, verify that the application server resource of the resource group is functions in the new cluster node).

In our environment, we perform the live test of HACMP operation at least twice: once to test HACMP resource group moves of disk and network resources in response to a sudden halt of a cluster node, and again while verifying IBM Tivoli Workload Scheduler is running on the appropriate cluster node(s).

To verify that IBM Tivoli Workload Scheduler is running during a test of a cluster node fallover from tivaix2 to tivaix1:

  1. Log into the surviving cluster node as any user.

  2. Run the following command:

     ps -ef | grep -v grep | grep maestro 

    The output should be similar to the following figure. Note that there are two instances of IBM Tivoli Workload Scheduler, because there are two instances of the processes batchman, netman, jobman, and mailman. Each pair of instances is made up of one process owned by the TWSuser user account maestro, and another owned by maestro2.

Example 4-48: Sample output of command to verify IBM Tivoli Workload Scheduler is moved by HACMP

start example
 [root@tivaix1:/home/root] ps -ef | grep -v grep | grep maestro  maestro 13440 38764   0 15:56:41      -  0:00 /usr/maestro/bin/batchman -parm 32000 maestro2 15712     1   0 18:57:44      -  0:00 /usr/maestro2/bin/netman maestro2 26840 15712   0 18:57:55      -  0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE  maestro 30738     1   0 15:56:29      -  0:00 /usr/maestro/bin/netman     root 35410 13440   0 15:56:42      -  0:00 /usr/maestro/bin/jobman     root 35960 40926   0 18:57:56      -  0:00 /usr/maestro2/bin/jobman  maestro 38764 30738   0 15:56:40      -  0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE maestro2 40926 26840   0 18:57:56      -  0:00 /usr/maestro2/bin/batchman -parm 32000 
end example

The command should be repeated while testing that CWOF works. If CWOF works, then the output will remain identical after the halted cluster node reintegrates with the cluster.

The command should be repeated again to verify that falling back works. In our environment, after moving a resource group back to the reintegrated cluster node, so tivaix1 and tivaix2 each have their original resource groups, the output of the command on tivaix1 shows just one set of IBM Tivoli Workload Scheduler processes as shown in the following.

Example 4-49: IBM Tivoli Workload Scheduler processes running on tivaix1 after falling back resource group rg2 to tivaix2

start example
 [root@tivaix1:/home/root] ps -ef | grep -v grep | grep maestro  maestro 13440 38764   0 15:56:41      -  0:00 /usr/maestro/bin/batchman -parm 32000  maestro 30738     1   0 15:56:29      -  0:00 /usr/maestro/bin/netman     root 35410 13440   0 15:56:42      -  0:00 /usr/maestro/bin/jobman  maestro 38764 30738   0 15:56:40      -  0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE 
end example

The output of the command on tivaix2 in this case also shows only one instance of IBM Tivoli Workload Scheduler. The process IDs are different, but the processes are otherwise the same, as shown in Example 4-50.

Example 4-50: IBM Tivoli Workload Scheduler processes running on tivaix2 after falling back resource group rg2 to tivaix2

start example
 [root@tivaix2:/home/root] ps -ef | grep -v grep | grep maestro maestro2 17926 39660   0 19:02:17      - 0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE maestro2 39660     1   0 19:02:06      - 0:00 /usr/maestro2/bin/netman     root 47242 47366   0 19:02:19      - 0:00 /usr/maestro2/bin/jobman maestro2 47366 17926   0 19:02:18      - 0:00 /usr/maestro2/bin/batchman -parm 32000 
end example

4.1.11 Add IBM Tivoli Management Framework

After IBM Tivoli Workload Scheduler is configured for HACMP and made highly available, you can add IBM Tivoli Management Framework so that the Job Scheduling Console component of IBM Tivoli Workload Scheduler can be used. In this section we show how to plan, install and configure IBM Tivoli Management Framework for a highly available installation of IBM Tivoli Workload Scheduler. The steps include:

  • "Planning for IBM Tivoli Management Framework" on page 303

  • "Planning the installation sequence" on page 312

  • "Stage installation media" on page 313

  • "Install base Framework" on page 315

  • "Load Tivoli environment variable in .profile files" on page 318

  • "Install Tivoli Framework components and patches" on page 318

  • "Add IP alias to oserv" on page 320

  • "Install IBM Tivoli Workload Scheduler Framework components" on page 322

  • "Create additional Connectors" on page 328

  • "Configure Framework access" on page 330

  • "Interconnect Framework servers" on page 331

  • "How to log in using the Job Scheduling Console" on page 339

The details of each step follow.

Planning for IBM Tivoli Management Framework

In this section we show the entire process of iteratively planning the integration of IBM Tivoli Management Framework into an HACMP environment specifically configured for IBM Tivoli Workload Scheduler. We show successively more functional configurations of IBM Tivoli Management Framework.

Note

While we discuss this process after showing you how to configure HACMP for IBM Tivoli Workload Scheduler in this redbook, in an actual deployment this planning occurs alongside the planning for HACMP and IBM Tivoli Workload Scheduler.

Configuring multiple instances of IBM Tivoli Management Framework on the same operating system image is not supported by IBM Support. In our highly available IBM Tivoli Workload Scheduler environment of mutual takeover nodes, this means we cannot use two or more instances of IBM Tivoli Management Framework on a single cluster node.

In other words, IBM Tivoli Management Framework cannot be configured as an application server in a resource group configured for mutual takeover in a cluster. At the time of writing, while the configuration is technically feasible and even demonstrated in IBM publications such as the IBM Redbook High Availability Scenarios for Tivoli Software, SG24-2032, IBM Support does not sanction this configuration.

Due to this constraint, we install an instance of IBM Tivoli Management Framework on a local drive on each cluster node. We then create a Connector for both cluster nodes on each instance of IBM Tivoli Management Framework.

The Job Scheduling Console is the primary component of IBM Tivoli Workload Scheduler that uses IBM Tivoli Management Framework. It uses the Job Scheduling Services component in IBM Tivoli Management Framework. The primary object for IBM Tivoli Workload Scheduler administrators to manage in the Job Scheduling Services is the Connector. A Connector holds the specific directory location that an IBM Tivoli Workload Scheduler scheduling engine is installed into. In our environment, this is /usr/maestro for TWS Engine1 that normally runs on tivaix1 and is configured for resource group rg1, and /usr/maestro2 that normally runs on tivaix2 and is configured for resource group rg2.

In our environment, under normal operation the relationship of Connectors to IBM Tivoli Workload Scheduler engines and IBM Tivoli Management Framework on cluster nodes is as shown in Figure 4-57 on page 305.


Figure 4-57: Relationship of IBM Tivoli Workload Scheduler, IBM Tivoli Management Framework, Connectors, and Job Scheduling Consoles during normal operation of an HACMP Cluster

We use Job Scheduling Console Version 1.3 Fix Pack 1; best practice calls for using at least this level of the Job Scheduling Console or later because it addresses many user interface issues. Its prerequisite is the base install of Job Scheduling Console Version 1.3 that came with your base installation media for IBM Tivoli Workload Scheduler. If you do not already have it installed, download Fix Pack 1 from:

ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_1.3/1.3-JSC-FP01

You can use the environment in this initial configuration as is. Users can log into either TWS Engine1 or TWS Engine2 by logging into the corresponding service IP address. Users can even log into both, but that requires running two instances of the Job Scheduling Console. Figure 4-58 on page 306 shows the display of a user's Microsoft Windows 2000 computer running two instances of Job Scheduling Console. Each instance of the Job Scheduling Console is logged into a different cluster node as root user. To run two instances of Job Scheduling Console, simply run it twice.


Figure 4-58: Viewing multiple instances of IBM Tivoli Workload Scheduler on separate cluster nodes on a single display

Note how in the Job Scheduling Console window for Administrator Root_tivaix1-region (root@tivaix1), the scheduling engine for TIVAIX2, is unavailable. The engine for TIVAIX2 is marked by a small icon badge that looks like a red circle with a white "X" inside it, as shown in Figure 4-59 on page 307.


Figure 4-59: Available scheduling engines when logged into tivaix1 during normal operation

In the Job Scheduling Console window for Administrator Root_tivaix2-region (root@tivaix2), the reverse situation exists: the scheduling engine for TIVAIX1 is unavailable. The engine for TIVAIX1 is similarly marked unavailable as shown in Figure 4-60.


Figure 4-60: Available scheduling engines when logged into tivaix2 during normal operation

This happens because in our environment we actually configure two Connectors (one for each instance of IBM Tivoli Workload Scheduler) on each instance of IBM Tivoli Management Framework, as shown Figure 4-61 on page 308.


Figure 4-61: How multiple instances of the Connector work during normal operation

If we do not configure multiple Connectors in this manner, then for example, when resource group rg2 on tivaix2 falls over to tivaix1, no Connector for TWS Engine2 will exist on tivaix1 after the fallover.

In normal operation, when a user logs into tivaix1, they use the Connector for TWS Engine1 (called Connector1 in Figure 4-61 on page 308). But on tivaix1 the Connector for TWS Engine2 does not refer to an active instance of IBM Tivoli Workload Scheduler on tivaix1 because /usr/maestro2 is already mounted and in use on tivaix2.

If resource groups rg1 and rg2 are running on a single cluster node, each instance of IBM Tivoli Workload Scheduler in each resource group requires its own Connector. This is why we create two Connectors for each instance of IBM Tivoli Management Framework. The Job Scheduling Console clients connect to IBM Tivoli Workload Scheduler through the IBM Tivoli Management Framework oserv process that listens on interfaces that are assigned the service IP labels.

For example, consider the fallover scenario where tivaix2 falls over to tivaix1. It causes resource group rg2 to fall over to tivaix1. As part of this resource group move, TWS Engine2 on /usr/maestro2 is mounted on tivaix1. Connector2 on tivaix1 then determines that /usr/maestro2 contains a valid instance of IBM Tivoli Workload Scheduler, namely TWS Engine2. IBM Tivoli Management Framework is configured to listen to both tivaix1_svc (9.3.4.3) or tivaix2_svc (9.3.4.4).

Because HACMP moves these service IP labels as part of the resource group, it makes both scheduling engines TWS Engine1 and TWS Engine2 available to Job Scheduling Console users who log into either tivaix1_svc or tivaix2_svc, even though both service IP labels in this fallover scenario reside on a single cluster node (tivaix1).

When a Job Scheduling Console session starts, the instance of IBM Tivoli Workload Scheduler it connects to creates authentication tokens for the session. These tokens are held in memory. When the cluster node that this instance of IBM Tivoli Workload Scheduler falls over to another cluster node, these authentication tokens in memory are lost.

Note

Users working through the Job Scheduling Console on the instance of IBM Tivoli Workload Scheduler in the cluster node that fails must exit their session and log in through the Job Scheduling Console again. Because the IP service labels are still valid, users simply log into the same service IP label they originally used.

As far as Job Scheduling Console users are concerned, if a fallover occurs, they simply log back into the same IP address or hostname.

Figure 4-62 shows the fallover scenario where tivaix2 falls over to tivaix1, and the effect upon the Connectors.


Figure 4-62: Multiple instances of Connectors after tivaix2 falls over to tivaix1

Note how Job Scheduling Console sessions that were connected to 9.3.4.4 on port 94 used to communicate with tivaix2, but now communicate instead with tivaix1. Users in these sessions see an error dialog window similar to the following figure the next time they attempt to perform an operation.


Figure 4-63: Sample error dialog box in Job Scheduling Console indicating possible fallover of cluster node

Users should be trained to determine identify when this dialog indicates a cluster node failure. Best practice is to arrange for appropriate automatic notification whenever a cluster fallover occurs, whether by e-mail, pager, instant messaging, or other means, and to send another notification when affected resource group(s) are returned to service. When Job Scheduling Console users receive the second notification, they can proceed to log back in again.

Once the resource group falls over, understanding when and how Connectors recognize a scheduling engine is key to knowing why certain scheduling engines appear after certain actions.

Note

While Job Scheduling Console users from the failed cluster node who log in again will see both scheduling engines, Job Scheduling Console users on the surviving cluster node will not see both engines until at least one user logs into the instance of IBM Tivoli Workload Scheduler that fell over, and after they log in.

The scheduling engine that falls over is not available to the Job Scheduling Console of the surviving node until two conditions are met, in the following order:

  1. A Job Scheduling Console session against the engine that fell over is started. In the scenario we are discussing where tivaix2 falls over to tivaix1, this means Job Scheduling Console users must log into tivaix2_svc.

  2. The Job Scheduling Console users who originally logged into tivaix1_svc (the users of the surviving node, in other words) log out and log back into tivaix1_svc.

When these conditions are met, Job Scheduling Console users on the surviving node see a scheduling engine pane as shown in Figure 4-64.


Figure 4-64: Available scheduling engines on tivaix1 after tivaix2 falls over to it

Only after a Job Scheduling Console session communicates with the Connector for a scheduling engine is the scheduling engine recognized by other Job Scheduling Console sessions that connect later. Job Scheduling Console sessions that are already connected will not recognize the newly-started scheduling engine because identification of scheduling engines only occurs once during Job Scheduling Console startup.

While the second iteration of the design is a workable solution, it is still somewhat cumbersome because it requires users who need to work with both scheduling engines to remember a set of rules. Fortunately, there is one final refinement to our design that helps address some of this awkwardness.

The TMR interconnection feature of IBM Tivoli Management Framework allows objects on one instance of IBM Tivoli Management Framework to be managed by another instance, and vice versa. We used a two-way interconnection between the IBM Tivoli Management Framework instances on the two cluster nodes in the environment we used for this redbook to expose the Connectors on each cluster node to other cluster nodes. Now when tivaix2 falls over tivaix1, Job Scheduling Console users see the available scheduling engines, as shown in Figure 4-65.


Figure 4-65: Available Connectors in interconnected Framework environment after tivaix2 falls over to tivaix1

Note that we now define the Connectors by the cluster node and resource group they are used for. So Connector TIVAIX1_rg1 is for resource group rg1 (that is, scheduling engine TWS Engine1) on tivaix1. In Example 4-65, we see Connector TIVAIX1_rg2 is active. It is for resource group rg2 (that is, TWS Engine2) on

tivaix1, and it is active only when tivaix2 falls over tivaix1. Connector TIVAIX2_rg1 is used if resource group rg1 falls over to tivaix2. Connector TIVAIX2_rg2 would normally be active, but because resource group rg2 has fallen over to tivaix1, it is inactive in the preceding figure.

During normal operation of the cluster, the active Connectors are TIVAIX1_rg1 and TIVAIX2_rg2, as shown in Figure 4-66.


Figure 4-66: Available Connectors in interconnected Framework environment during normal cluster operation

In this section we show how to install IBM Tivoli Management Framework Version 4.1 into an HACMP Cluster configured to make IBM Tivoli Workload Scheduler highly available, with all available patches as of the time of writing. We specifically show how to install on tivaix1 in the environment we used for this redbook. Installing on tivaix2 is similar, except the IP hostname is changed where applicable.

Planning the installation sequence

Before installing, plan the sequence of packages to install. The publication Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, describes in detail what needs to be installed.

Figure 4-67 on page 313 shows the sequence and dependencies of packages we planned for IBM Tivoli Management Framework Version 4.1 for the environment used for this redbook.


Figure 4-67: IBM Tivoli Framework 4.1.0 application and patch sequence and dependencies as of December 2, 2003

Stage installation media

We first stage the installation media on a hard disk for ease of installation. If your system does not have sufficient disk space to allow this, you can copy the media to a system that does have enough disk space and use Network File System (NFS), Samba, Andrew File System (AFS) or similar remote file systems to mount the media over the network.

In our environment, we created directories and copied the contents of the media and patches to the directories as shown in Table 4-4. The media was copied to both cluster nodes tivaix1 and tivaix2.

Table 4-4: Installation media directories used in our environment

Sub-directory under /usr/sys/inst.images/

Description of contents or disc title (or electronic download)

tivoli

Top level of installation media directory.

tivoli/fra

Top level of IBM Tivoli Management Framework media.

tivoli/fra/FRA410_1of2

Tivoli Management Framework v4.1 1 of 2

tivoli/fra/FRA410_2of2

Tivoli Management Framework v4.1 2 of 2

tivoli/fra/41TMFnnn

Extracted tar file contents of patch 4.1-TMF-0nnn.

tivoli/wkb

Top level of IBM Tivoli Workload Scheduler media

tivoli/wkb/TWS820_1

IBM Tivoli Workload Scheduler V8.2 1 of 2

tivoli/wkb/TWS820_2

IBM Tivoli Workload Scheduler V8.2 2 of 2

tivoli/wkb/8.2-TWS-FP01

IBM Tivoli Workload Scheduler V8.2 Fix Pack 1

tivoli/wkb/JSC130_1

Job Scheduling Console V1.3 1 of 2

tivoli/wkb/JSC130_2

Job Scheduling Console V1.3 2 of 2

tivoli/wkb/1.3-JSC-FP01

Job Scheduling Console V1.3 Fix Pack 1

You can download the patches for IBM Tivoli Management Framework Version 4.1 from:

  • ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_4.1

Note that we only used the contents of the tar files of each patch into the corresponding patch directory, such that the file PATCH.LST is in the top level of the patch directory. For example, for patch 4.1-TMF-0008, we downloaded the tar file:

  • ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_4.1/4.1-TMF-0008/4.1-TMF-0008.tar

Then we expanded the tar file in /usr/sys/inst.images/tivoli, resulting in a directory called 41TMF008. One of the files beneath that directory was the PATCH.LST file.

Example 4-51 shows the top two levels of the directory structure.

Example 4-51: Organization of installation media

start example
 [root@tivaix1:/home/root] ls /usr/sys/inst.images/tivoli/ ./    ../   fra/ [root@tivaix1:/home/root] ls /usr/sys/inst.images/tivoli/* /usr/sys/inst.images/tivoli/fra: ./            41TMF014/     41TMF017/     FRA410_1of2/ ../           41TMF015/     41TMF032/     FRA410_2of2/ 41TMF008/     41TMF016/     41TMF034/ /usr/sys/inst.images/tivoli/wkb: ./             1.3-JSC-FP01/  JSC130_1/      TWS820_1/ ../            8.2-TWS-FP01/  JSC130_2/      TWS820_2/ 
end example

After staging the media, install the base product as shown in the following section.

Install base Framework

In this section we show how to install IBM Tivoli Management Framework so it is specifically configured for IBM Tivoli Workload Scheduler on HACMP. This enables you to transition the instances of IBM Tivoli Management Framework used for IBM Tivoli Workload Scheduler to a mutual takeover environment if that becomes a supported feature in the future. We believe the configuration as shown in this section can be started and stopped directly from HACMP in a mutual takeover configuration.

When installing IBM Tivoli Management Framework on an HACMP Cluster node in support of IBM Tivoli Workload Scheduler, use the primary IP hostname as the hostname for IBM Tivoli Management Framework. Add an IP alias later for the service IP label. When this configuration is used with the multiple Connector object configuration described in section, this enables Job Scheduling Console users to connect through any instance of IBM Tivoli Management Framework, no matter which cluster nodes fall over.

IBM Tivoli Management Framework consists of a base install and various components. You must first prepare for the base install by performing the commands shown in Example 4-52 for cluster node tivaix1, in our environment.

Example 4-52: Preparing for installation of IBM Tivoli Management Framework 4.1

start example
 [root@tivaix1:/home/root] HOST=tivaix1 [root@tivaix1:/home/root] echo $HOST > /etc/wlocalhost [root@tivaix1:/home/root] WLOCALHOST=$HOST [root@tivaix1:/home/root] export WLOCALHOST [root@tivaix1:/home/root] mkdir /usr/local/Tivoli/install_dir [root@tivaix1:/home/root] cd /usr/local/Tivoli/install_dir [root@tivaix1:/home/root] /bin/sh /usr/sys/inst.images/tivoli/fra/FRA410_1of2/WPREINST.SH to install, type ./wserver -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 [root@tivaix1:/home/root] DOGUI=no [root@tivaix1:/home/root] export DOGUI 
end example

On tivaix2, we replace the IP hostname in the first command shown in bold from tivaix1 to tivaix2

After you prepare for the base install, perform the initial installation of IBM Tivoli Management Framework by running the command shown in Example 4-53 on page 316. You will see output similar to this example; depending upon the speed of your server, it will take 5 to 15 minutes to complete.

Example 4-53: Initial installation of IBM Tivoli Management Framework Version 4.1

start example
 [root@tivaix1:/home/root] sh ./wserver -y \ -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 \ -a tivaix1 -d \ BIN=/usr/local/Tivoli/bin! \ LIB=/usr/local/Tivoli/lib! \ ALIDB=/usr/local/Tivoli/spool! \ MAN=/usr/local/Tivoli/man! \ APPD=/usr/lib/lvm/X11/es/app-defaults! \ CAT=/usr/local/Tivoli/msg_cat! \ LK=1FN5B4MBXBW4GNJ8QQQ62WPV0RH999P99P77D \ RN=tivaix1-region \ AutoStart=1 SetPort=1 CreatePaths=1 @ForceBind@=yes @EL@=None Using command line style installation..... Unless you cancel, the following operations will be executed:    need to copy the CAT (generic) to:          tivaix1:/usr/local/Tivoli/msg_cat    need to copy the CSBIN (generic) to:          tivaix1:/usr/local/Tivoli/bin/generic    need to copy the APPD (generic) to:          tivaix1:/usr/lib/lvm/X11/es/app-defaults    need to copy the GBIN (generic) to:          tivaix1:/usr/local/Tivoli/bin/generic_unix    need to copy the BUN (generic) to:          tivaix1:/usr/local/Tivoli/bin/client_bundle    need to copy the SBIN (generic) to:          tivaix1:/usr/local/Tivoli/bin/generic    need to copy the LCFNEW (generic) to:          tivaix1:/usr/local/Tivoli/bin/lcf_bundle.40    need to copy the LCFTOOLS (generic) to:          tivaix1:/usr/local/Tivoli/bin/lcf_bundle.40/bin    need to copy the LCF (generic) to:          tivaix1:/usr/local/Tivoli/bin/lcf_bundle    need to copy the LIB (aix4-r1) to:          tivaix1:/usr/local/Tivoli/lib/aix4-r1    need to copy the BIN (aix4-r1) to:          tivaix1:/usr/local/Tivoli/bin/aix4-r1    need to copy the ALIDB (aix4-r1) to:          tivaix1:/usr/local/Tivoli/spool/tivaix1.db    need to copy the MAN (aix4-r1) to:          tivaix1:/usr/local/Tivoli/man/aix4-r1    need to copy the CONTRIB (aix4-r1) to:          tivaix1:/usr/local/Tivoli/bin/aix4-r1/contrib    need to copy the LIB371 (aix4-r1) to:          tivaix1:/usr/local/Tivoli/lib/aix4-r1    need to copy the LIB365 (aix4-r1) to:          tivaix1:/usr/local/Tivoli/lib/aix4-r1 Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix1  ..... Completed. Distributing machine independent generic Codeset Tables --> tivaix1  .... Completed. Distributing architecture specific Libraries --> tivaix1  ...... Completed. Distributing architecture specific Binaries --> tivaix1  ............. Completed. Distributing architecture specific Server Database --> tivaix1  .......................................... Completed. Distributing architecture specific Man Pages --> tivaix1  ..... Completed. Distributing machine independent X11 Resource Files --> tivaix1  ... Completed. Distributing machine independent Generic Binaries --> tivaix1  ... Completed. Distributing machine independent Client Installation Bundle --> tivaix1  ... Completed. Distributing machine independent generic HTML/Java files --> tivaix1  ... Completed. Distributing architecture specific Public Domain Contrib --> tivaix1  ... Completed. Distributing machine independent LCF Images (new version) --> tivaix1  ............. Completed. Distributing machine independent LCF Tools --> tivaix1  ....... Completed. Distributing machine independent 36x Endpoint Images --> tivaix1  ............ Completed. Distributing architecture specific 371_Libraries --> tivaix1  .... Completed. Distributing architecture specific 365_Libraries --> tivaix1  .... Completed. Registering installation information...Finished. 
end example

On tivaix2 in our environment, we run the same command except we change the third line of the command highlighted in bold from tivaix1 to tivaix2.

Load Tivoli environment variable in .profile files

The Tivoli environment variables contain pointers to important directories that IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework use for many commands. Loading the variables in the .profile file of a user account ensures that these environment variables are always available immediately after logging into the user account.

Use the commands shown in Example 4-54 to modify the .profile files of the root and TWSuser user accounts on all cluster nodes to source in all Tivoli environment variables for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework.

Example 4-54: Load Tivoli environment variables

start example
 PATH=${PATH}:${HOME}/bin if [ -f /etc/Tivoli/setup_env.sh ] ; then         . /etc/Tivoli/setup_env.sh fi if [ -f `maestro`/tws_env.sh ] ; then         . `maestro`/tws_env.sh fi 
end example

Also enter these commands on the command line, or log out and log back in to activate the environment variables for the following sections.

Install Tivoli Framework components and patches

After the base install is complete, you can install all remaining Framework components and patches by running the script shown in Example 4-55 on page 319.

Example 4-55: Script for installing IBM Tivoli Management Framework Version 4.1 with patches

start example
 #!/bin/ksh if [ -d /etc/Tivoli ] ; then         . /etc/Tivoli/setup_env.sh fi reexec_oserv() {         echo "Reexecing object dispatchers..."         if [ `odadmin odlist list_od | wc -l` -gt 1 ] ; then                 #                 # Determine if necessary to shut down any clients                 tmr_hosts=`odadmin odlist list_od | head -1 | cut -c 36-`                 client_list=`odadmin odlist list_od | grep -v ${tmr_hosts}$`                 if [ "${client_list}" = "" ] ; then                         echo "No clients to shut down, skipping shut down of clients..."                 else                         echo "Shutting down clients..."                         odadmin shutdown clients                         echo "Waiting for all clients to shut down..."                         sleep 30                 fi         fi         odadmin reexec 1         sleep 30         odadmin start clients } HOST="tivaix1" winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRE130 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JHELP41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JCF41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRIM41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i MDIST2GU $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISDEPOT $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISCLNT $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i ADE $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i AEF $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF008 -y -i 41TMF008 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF014 -y -i 41TMF014 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF015 -y -i 41TMF015 $HOST reexec_oserv wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF016 -y -i 41TMF016 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2928 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2929 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2931 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2932 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2962 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2980 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2984 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2986 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2987 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2989 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF034 -y -i 41TMF034 $HOST reexec_oserv wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF032 -y -i JRE130_0 $HOST 
end example

If you use this script on tivaix2, change the line that starts with the string "HOST=" so that tivaix1 is replaced with tivaix2.

This completes the installation of IBM Tivoli Management Framework Version 4.1.

After installing IBM Tivoli Management Framework, configure it to meet the requirements of integrating with IBM Tivoli Workload Scheduler over HACMP.

Add IP alias to oserv

Installing IBM Tivoli Management Framework using the primary IP hostname of the server binds the Framework server (also called oserv) to the corresponding IP address. It only listens for Framework network traffic on this IP address. This makes it easy to start IBM Tivoli Management Framework before starting HACMP.

In our environment, we also need oserv to listen on the service IP address. The service IP label/address is moved between cluster nodes along with its parent resource group, but the primary IP hostname remains on the cluster node to ease administrative access (that is why it is called the persistent IP label/address). Job Scheduling Console users depend upon using this IP address, not the primary IP hostname of the server, to access IBM Tivoli Workload Scheduler services.

As a security precaution, IBM Tivoli Management Framework only listens on the IP address it is initially installed against unless the feature is specifically disabled to bind against other addresses. We show you how to disable this feature in this section.

To add the service IP label as a Framework oserv IP alias, follow these steps:

  1. Log in as root user on a cluster node.

    In our environment, we log in as root user on cluster node tivaix1.

  2. Use the odadmin command as shown in Example 4-56 on page 321 to verify the current IP aliases of the oserv, add the service IP label as an IP alias to the oserv, then verify that the service IP label is added to the oserv as an IP alias.

    Example 4-56: Add an IP alias to the Framework oserv server

    start example
     [root@tivaix1:/home/root] odadmin odlist Region           Disp  Flags  Port            IPaddr   Hostname(s) 1369588498          1    ct-    94         9.3.4.194   tivaix1,tivaix1.itsc.austin.ibm.com [root@tivaix1:/home/root] odadmin odlist add_ip_alias 1 tivaix1_svc [root@tivaix1:/home/root] odadmin odlist Region           Disp  Flags  Port            IPaddr   Hostname(s) 1369588498          1    ct-    94         9.3.4.194   tivaix1,tivaix1.itsc.austin.ibm.com                                              9.3.4.3   tivaix1_svc 
    end example

    Note that the numeral "1" in the odadmin odlist add_ip_alias command should be replaced by the "dispatcher number" of your Framework installation.

    The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation. In Example 4-57, the dispatcher number is 7.

    Example 4-57: Identify the dispatcher number of a Framework installation

    start example
     [root@tivaix1:/home/root] odadmin odlist Region          Disp Flags    Port            IPaddr   Hostname(s) 1369588498         7   ct-      94         9.3.4.194   tivaix1,tivaix1.itsc.austin.ibm.com 
    end example

    The dispatcher number will be something other than "1" if you delete and reinstall Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation.

  3. Use the odadmin command as shown in Example 4-58to verify that IBM Tivoli Management Framework currently binds against the primary IP hostname, disable the feature, then verify that it is disabled.

    Note that the numeral "1" in the odadmin set_force_bind command should be replaced by the "dispatcher number" of your Framework installation.

    Example 4-58: Disable set_force_bind object dispatcher option

    start example
     [root@tivaix1:/home/root] odadmin | grep Force Force socket bind to a single address = TRUE [root@tivaix1:/home/root] odadmin set_force_bind FALSE 1 [root@tivaix1:/home/root] odadmin | grep Force Force socket bind to a single address = FALSE 
    end example

    The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation.

    In Example 4-59 on page 322, the dispatcher number is 7.

    Example 4-59: Identify the dispatcher number of a Framework installation

    start example
     [root@tivaix1:/home/root] odadmin odlist Region           Disp  Flags  Port            IPaddr   Hostname(s) 1369588498          7    ct-    94         9.3.4.194   tivaix1,tivaix1.itsc.austin.ibm.com 
    end example

    The dispatcher number will be something other than "1" if you delete and reinstall Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation.

    Important:

    Disabling the set_force_bind variable can cause unintended side effects for installations of IBM Tivoli Management Framework that also run other IBM Tivoli server products, such as IBM Tivoli Monitoring and IBM Tivoli Configuration Manager. Refer to your IBM service provider for advice on how to address this potential conflict if you plan on deploying other IBM Tivoli server products on top of the instance of IBM Tivoli Management Framework that you use for IBM Tivoli Workload Scheduler.

    Best practice is to dedicate an instance of IBM Tivoli Management Framework for IBM Tivoli Workload Scheduler, typically on the Master Domain Manager, and not to install other IBM Tivoli server products into it. This simplifies these administrative concerns and does not affect the functionality of a Tivoli Enterprise environment.

  4. Repeat the operation on all remaining cluster nodes.

    For our environment, we repeated the operation on tivaix2, replacing tivaix1 with tivaix2 in the commands.

Install IBM Tivoli Workload Scheduler Framework components

After installing IBM Tivoli Management Framework, install the IBM Tivoli Workload Scheduler Framework. The components for IBM Tivoli Workload Scheduler Version 8.2 in the environment we use throughout this redbook are:

  • Tivoli Job Scheduling Services v1.2

  • Tivoli TWS Connector 8.2

There are separate versions for Linux environments. See Tivoli Workload Scheduler Job Scheduling Console User's Guide, SH19-4552, to identify the equivalent components for a Linux environment.

Best practice is to back up the Framework object database before installing any Framework components. This enables you to restore the object database to its original state before the installation in case the install operation encounters a problem.

Use the wbkupdb command as shown in Example 4-60 to back up the object database.

Example 4-60: Back up the object database of IBM Tivoli Management Framework

start example
 [root@tivaix1:/home/root] cd /tmp [root@tivaix1:/tmp] wbkupdb tivaix1 ; echo DB_`date +%b%d-%H%M` Starting the snapshot of the database files for tivaix1... ............................................................ .............................. Backup Complete. DB_Dec09-1958 
end example

The last line of the output is produced by the echo command; it returns the name of the backup file created by wbkupdb. All backup files are stored in the directory $DBDIR/../backups. shows how to list all the available backup files.

Example 4-61 shows there are five backups taken of the object database on cluster node tivaix1.

Example 4-61: List all available object database backup files

start example
 [root@tivaix1:/home/root] ls $DBDIR/../backups ./              ../            DB_Dec08-1705   DB_Dec08-1716 DB_Dec08-1723   DB_Dec08-1724  DB_Dec09-1829 
end example

Tip

Backing up the object database of IBM Tivoli Management Framework requires that the current working directory that the wbkupdb command is executed from grants write permission to the current user and contains enough disk space to temporarily hold the object database.

A common reason wbkupdb fails is the current working directory that it is executed from either does not grant write permissions to the user account running it, or there is not enough space to temporarily hold a copy of the object database directory.

Example 4-62 on page 324 shows how to verify there is enough disk space to run wkbkupdb.

Example 4-62: Verifying enough disk space in the current working directory for wbkupdb

start example
 [root@tivaix1:/tmp] pwd /tmp [root@tivaix1:/tmp] du -sk $DBDIR 15764 /usr/local/Tivoli/spool/tivaix1.db [root@tivaix1:/tmp] df -k /tmp Filesystem    1024-blocks    Free %Used    Iused %Iused Mounted on /dev/hd3          1146880  661388   43%      872    1%  /tmp 
end example

In Example 4-62, the current working directory is /tmp. The du command in the example shows how much space the object database directory occupies. It is measured in kilobytes, and is 15,764 kilobytes in this example (highlighted in bold).

The df command in the example shows how much space is available in the current working directory. The third column, labeled "Free" in the output of the command, shows the available space in kilobytes. In this example, the available disk space in /tmp is 661,388 kilobytes. As long as the latter number is at least twice as large as the former, proceed with running wbkupdb.

If the installation of these critical IBM Tivoli Workload Scheduler components fail, refer to your site's Tivoli administrators for assistance in recovering from the error, and direct them to the file created by wbkupdb (as reported by the echo command).

To install the IBM Tivoli Management Framework components for IBM Tivoli Workload Scheduler:

  1. Log in as root user on a cluster node.

    In our environment, we logged in as root user on tivaix1.

  2. Enter the winstall command as shown in Example 4-63 to install Job Scheduling Services.

    Example 4-63: Install Job Scheduling Services component on cluster node tivaix1

    start example
     [root@tivaix1:/home/root] winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_2/TWS_CONN \  -y -i TMF_JSS tivaix1 Checking product dependencies...  Product TMF_3.7.1 is already installed as needed.  Dependency check completed. Inspecting node tivaix1... Installing Product: Tivoli Job Scheduling Services v1.2 Unless you cancel, the following operations will be executed:   For the machines in the independent class:     hosts: tivaix1    need to copy the CAT (generic) to:        tivaix1:/usr/local/Tivoli/msg_cat   For the machines in the aix4-r1 class:     hosts: tivaix1    need to copy the BIN (aix4-r1) to:          tivaix1:/usr/local/Tivoli/bin/aix4-r1    need to copy the ALIDB (aix4-r1) to:          tivaix1:/usr/local/Tivoli/spool/tivaix2.db Creating product installation description object...Created. Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix1   Completed. Distributing architecture specific Binaries --> tivaix1   Completed. Distributing architecture specific Server Database --> tivaix1  ....Product install completed successfully.  Completed. Registering product installation attributes...Registered. 
    end example

    Note

    Both IBM Tivoli Workload Scheduler Job Scheduling Console User's Guide Feature Level 1.2, SH19-4552 (released for IBM Tivoli Workload Scheduler Version 8.1) on page 26, and IBM Tivoli Workload Scheduler Job Scheduling Console User's Guide Feature Level 1.3, SC32-1257 (release for IBM Tivoli Workload Scheduler Version 8.2) on page 45 refer to an owner argument to pass to the winstall command to install the Connector.

    We believe this is incorrect, because the index files TWS_CONN.IND for both versions of IBM Tivoli Workload Scheduler do not indicate support this argument, and using the argument produces errors in the installation.

  3. Enter the winstall command as shown in Example 4 on page 327 to install the Connector Framework resource. The command requires two IBM Tivoli Workload Scheduler-specific arguments, twsdir and iname.

    These arguments create an initial Connector object. Best practice is to create initial Connector objects on a normally operating cluster. The order that Connector objects are created in does not affect functionality. It is key, however, to ensure the resource group of the corresponding instance of IBM Tivoli Workload Scheduler the initial Connector is being created for is in the ONLINE state on the cluster node you are working on.

    twsdir

    Enter the TWShome directory of an active instance of IBM Tivoli Workload Scheduler. The file system of the instance must be mounted and available.

    iname

    Enter a Connector name for the instance of IBM Tivoli Workload Scheduler.

    In our environment, we use /usr/maestro for twsdir, make sure it is mounted, and use TIVAIX1_rg1 as the Connector name for iname because we want to create an initial Connector object for resource group rg1 on tivaix1, as the cluster is in normal operation and resource group rg1 in the ONLINE state on tivaix1 is the normal state.

    Example 4-64: Install Connector component for cluster node tivaix1

    start example
     root@tivaix1:/home/root] winstall -c \ /usr/sys/inst.images/tivoli/wkb/TWS820_2/TWS_CONN -y -i TWS_CONN \ twsdir=/usr/maestro iname=TIVAIX1_rg1 createinst=1 tivaix1 Checking product dependencies...  Product TMF_JSS_1.2 is already installed as needed.   Product TMF_3.7.1 is already installed as needed.  Dependency check completed. Inspecting node tivaix1... Installing Product: Tivoli TWS Connector 8.2 Unless you cancel, the following operations will be executed:   For the machines in the independent class:     hosts: tivaix1   For the machines in the aix4-r1 class:     hosts: tivaix1    need to copy the BIN (aix4-r1) to:          tivaix1:/usr/local/Tivoli/bin/aix4-r1    need to copy the ALIDB (aix4-r1) to:          tivaix1:/usr/local/Tivoli/spool/tivaix1.db Creating product installation description object...Created. Executing queued operation(s) Distributing architecture specific Binaries --> tivaix1  .. Completed. Distributing architecture specific Server Database --> tivaix1  ....Product install completed successfully.  Completed. Registering product installation attributes...Registered. 
    end example

  4. Verify both Framework components are installed using the wlsinst command as shown in the following example. The strings "Tivoli Job Scheduling Services v1.2" and "Tivoli TWS Connector 8.2" (highlighted in bold in Example 4-65) should display in the output of the command.

    Example 4-65: Verify installation of Framework components for IBM Tivoli Workload Scheduler

    start example
     [root@tivaix1:/home/root] wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1  (build 09/19) Tivoli AEF, Version 4.1   (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Tivoli TWS Connector 8.2 Distribution Status Console, Version 4.1 
    end example

  5. Verify the installation of the initial Connector instance using the wtwsconn.sh. Pass the same Connector name used for the iname argument in the preceding step as the value to the -n flag argument. shows the flag argument value TIVAIX1_rg1 (highlighted in bold).

    In our environment we passed TIVAIX1_rg1 as the value for the -n flag argument.

    The output of the command shows the directory path used as the value for the twdir argument in the preceding step, repeated on three lines (highlighted in bold in Example 4-66).

    Example 4-66: Verify creation of initial Connector

    start example
     [root@tivaix1:/home/root] wtwsconn.sh -view -n TIVAIX1_rg1 MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro" 
    end example

  6. Repeat the operation for the remaining cluster nodes.

    In our environment, repeated the operation for cluster node tivaix2. We used /usr/maestro2 for the twsdir argument and TIVAIX2_rg2 for the iname argument.

Create additional Connectors

The initial Connector objects created as part of the installation of IBM Tivoli Workload Scheduler Framework components only address one resource group that can run on each cluster node. Create additional Connectors to address all possible resource groups that a cluster node can take over, on all cluster nodes.

To create additional Connector objects:

  1. Log in as root user on a cluster node.

    In our environment we log in as root user on cluster node tivaix1.

  2. Use the wlookup command to identify which Connector objects already exist on the cluster node, as shown in Example 4-67.

    Example 4-67: Identify which Connector objects already exist on a cluster node

    start example
     [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1 
    end example

    In our environment, the only Connector object that exists is the one created by the installation of the IBM Tivoli Workload Scheduler Framework components, TIVAIX1_rg1, highlighted in bold in Example 4-67.

  3. Use the wtwsconn.sh command to create an additional Connector object, as shown in Example 4-68. The command accepts the name of the Connector object to create for the value of the -n flag argument, and the TWShome directory path of the instance of IBM Tivoli Workload Scheduler that the Connector object will correspond to, as the value for the -t flag argument.

    Example 4-68: Create additional Connector object

    start example
     [root@tivaix1:/home/root] wtwsconn.sh -create -n TIVAIX1_rg2 -t /usr/maestro2 Scheduler engine created Created instance: TIVAIX1_rg2, on node: tivaix1 MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro2 
    end example

    The corresponding resource group does not have to be in the ONLINE state on the cluster node. This step only creates the object, but does not require the presence of the resource group to succeed.

    In our environment we created the Connector object TIVAIX1_rg2 to manage resource group rg2 on tivaix1 in case tivaix2 falls over to tivaix1. Resource group rg2 contains scheduling engine TWS Engine2. TWS Engine2 is installed in /usr/maestro2. So we pass /usr/maestro2 as the value to the -t flag argument.

  4. Verify the creation of the additional Connector objects using the wtwsconn.sh command as shown in Example 4-69.

    Example 4-69: Verify creation of additional Connector object

    start example
     [root@tivaix1:/home/root] wtwsconn.sh -view -n TIVAIX1_rg2 MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro2" MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro2" MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro2" 
    end example

    Pass the name of a new Connector object as the value for the -n flag argument. The output displays the TWShome directory path you use to create the Connector object if the create operation is successful.

  5. Repeat the operation for all remaining Connector objects to create on the cluster node. Only create Connector objects for possible resource groups that the cluster node can take over. Using the examples in this section for instance, we would not create any Connector objects on tivaix1 that start with "TIVAIX2". So the Connector objects TIVAIX2_rg1 and TIVAIX2_rg2 would not be created on tivaix1. They are instead created on tivaix2. In our environment, we did not have any more resource groups to address, so we did not create any more Connectors on tivaix1.

  6. Repeat the operation on all remaining cluster nodes. In our environment we created the Connector object TIVAIX2_rg1 as shown in Example 4-70.

    Example 4-70: Create additional Connectors on tivaix2

    start example
     [root@tivaix2:/home/root] wtwsconn.sh -create -n TIVAIX2_rg1 -t /usr/maestro Scheduler engine created Created instance: TIVAIX2_rg1, on node: tivaix2 MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro [root@tivaix2:/home/root] wtwsconn.sh -view -n TIVAIX2_rg1 MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro" 
    end example

If you make a mistake creating a Connector, remove the Connector using the wtwsconn.sh command as shown in Example 4-71.

Example 4-71: Remove a Connector

start example
 [root@tivaix1:/home/root] wtwsconn.sh -remove -n TIVAIX2 Removed 'MaestroEngine' for 'TIVAIX2' instance Removed 'MaestroPlan' for 'TIVAIX2' instance Removed 'MaestroDatabase' for 'TIVAIX2' instance 
end example

In Example 4-71 on page 329, the Connector TIVAIX2 is removed. You can also use wtwsconn.sh to edit the one value accepted by a Connector when creating it. This is the directory of TWShome of the instance of IBM Tivoli Workload Scheduler the Connector communicates with. Example 4-72 shows how to change the directory.

Example 4-72: Change Connector's directory value

start example
 [root@tivaix1:/home/root] wtwsconn.sh -remove -n TIVAIX2 Removed 'MaestroEngine' for 'TIVAIX2' instance Removed 'MaestroPlan' for 'TIVAIX2' instance Removed 'MaestroDatabase' for 'TIVAIX2' instance 
end example

Editing the value of the directory is useful to match changes to the location of TWShome if IBM Tivoli Workload Scheduler is moved.

Configure Framework access

After you install IBM Tivoli Management Framework (see "Implementing IBM Tivoli Workload Scheduler in an HACMP cluster" on page 184), configure Framework access for the TWSuser accounts. This lets the TWSuser accounts have full access to IBM Tivoli Management Framework so you can add Tivoli Enterprise products like IBM Tivoli Workload Scheduler Plus Module, and Scheduler Connectors.

In this redbook we show how to grant access to the root Framework Administrator object. The Tivoli administrators of some sites do not allow this level of access. Consult your Tivoli administrator if this is the case, because other levels of access can be arranged.

Use the wsetadmin command for to grant this level of access to your TWSuser accounts. In the environment, we ran the following command as root user to identify which Framework Administrator object to modify:

    wlookup -ar Administrator 

This command returns output similar to that shown in Example 4-73, taken from tivaix1 in our environment.

Example 4-73: Identify which Framework Administrator object to modify to grant TWSuser account root-level Framework access

start example
 [root@tivaix1:/home/root] wlookup -ar Administrator Root_tivaix1-region  1394109314.1.179#TMF_Administrator::Configuration_GUI# root@tivaix1    1394109314.1.179#TMF_Administrator::Configuration_GUI# 
end example

This shows that the root account is associated with the Administrator object called root@tivaix1. We then used the following command to add the TWSuser accounts to this Administrator object:

    wsetadmin -l maestro -l maestro2 root@tivaix1 

This grants root-level Framework access to the user accounts maestro and maestro2. Use the wgetadmin command as shown in Example 4-74 to confirm that the TWSuser accounts were added to the root Framework Administrator object. In line 3, the line that starts with the string "logins:", the TWSuser accounts maestro and maestro2 (highlighted in bold) indicate these accounts were successfully added to the Administrator object.

Example 4-74: Confirm TWSuser accounts are added to root Framework Administrator object

start example
 [root@tivaix1:/home/root] wgetadmin root@tivaix1 Administrator: Root_tivaix1-region logins: root@tivaix1, maestro, maestro2 roles:  global  super, senior, admin, user, install_client, install_product, policy         security_group_any_admin        user         Root_tivaix1-region    admin,  user, rconnect notice groups:  TME Administration, TME Authorization, TME Diagnostics, TME Scheduler 
end example

Once these are added, you can use the wtwsconn.sh command (and other IBM Tivoli Management Framework commands) to manage Connector objects from the TWSuser user account. If you are not sure which Connectors are available, use the wlookup command to identify the available Connectors, as shown in Example 4-75.

Example 4-75: Identify available Connectors to manage on cluster node

start example
 [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1 
end example

In Example 4-75, the Connector called "TIVAIX1" (case is significant for Connector names) is available on tivaix1.

Interconnect Framework servers

The Connectors for each resource group are configured on each cluster node. Interconnect the Framework servers to be able to manage the Connectors on each cluster node from every other cluster node. Framework interconnection is a complex subject. We will show how to interconnect the Framework servers for our environment, but you should plan your interconnection if your installation of IBM Tivoli Workload Scheduler is part of a larger Tivoli Enterprise environment. Consult your IBM service provider for assistance with planning the interconnection.

Tip

When working with Tivoli administrators, be aware that they are used to hearing "Framework resources" called "managed resources". We use the term "Framework resource" in this redbook to point out that this is a concept applied to IBM Tivoli Management Framework, and to distinguish it from HACMP resources. It is not an official term, however, so when working with staff who are not familiar with HACMP we advise using the official term of "managed resources" to avoid confusion.

To interconnect the Framework servers for IBM Tivoli Workload Scheduler for our environment, follow these steps:

  1. Before starting, make a backup of the IBM Tivoli Management Framework object database using the wbkupdb command as shown in Example 4-76. Log on to each cluster node as root user, and run a backup of the object database on each.

    Example 4-76: Back up object database of IBM Tivoli Management Framework

    start example
     [root@tivaix1:/home/root] cd /tmp [root@tivaix1:/tmp] wbkupdb tivaix1 Starting the snapshot of the database files for tivaix1... ............................................................ .............................. Backup Complete. 
    end example

  2. Temporarily grant remote shell access to the root user on each cluster node. Edit or create as necessary the .rhosts file in the home directory of the root user on each cluster node. (This is a temporary measure and we will remove it after we finish the interconnection operation.)

    In our environment we created the .rhosts file with the contents as shown in Example 4-77.

    Example 4-77: Contents of .rhosts file in home directory of root user

    start example
     tivaix1 root tivaix2 root 
    end example

  3. Temporarily grant the generic root user account (root with no hostname qualifier) a Framework login on the root Framework account. Run the wsetadmin command as shown:

     wsetadmin -l root root@tivaix1 

    If you do not know your root Framework account, consult your Tivoli administrator or IBM service provider. (This is a temporary measure and we will remove it after we finish the interconnection operation.)

    In our environment the root Framework account is root@tivaix, so we grant the generic root user account a login on this Framework account.

    Note

    If an interconnection is made under a user other than root, the /etc/hosts.equiv file also must be configured. Refer to "Secure and Remote Connections" in Tivoli Management Framework Maintenance and Troubleshooting Guide Version 4.1, GC32-0807, for more information.

  4. Run the wlookup commands on the cluster node as shown in Example 4-78 to determine the Framework objects that exist before interconnection, so you can refer back to them later in the operation.

    Example 4-78: Sampling Framework objects that exist before interconnection on tivaix1

    start example
     [root@tivaix1:/home/root] wlookup -Lar ManagedNode tivaix1 [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1 TIVAIX1_rg2 
    end example

    In our environment we ran the commands on tivaix1.

  5. Run the same sequence of wlookup commands, but on the cluster node on the opposing side of the interconnection operation, as shown in Example 4-79.

    Example 4-79: Sampling Framework objects that exist before interconnection on tivaix2

    start example
     [root@tivaix2:/home/root] wlookup -Lar ManagedNode tivaix2 [root@tivaix2:/home/root] wlookup -Lar MaestroEngine TIVAIX2_rg1 TIVAIX2_rg2 
    end example

    In our environment we ran the commands on tivaix2.

  6. Interconnect the Framework servers in a two-way interconnection using the wconnect command as shown in Example 4-80 on page 334.

    Example 4-80: Interconnect the Framework servers on tivaix1 and tivaix2

    start example
     [root@tivaix1:/home/root] wconnect -c none -l root -m Two-way -r none tivaix2 Enter Password for user root on host tivaix2: 
    end example

    Refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806, for a complete description of how to use wconnect.

    Note

    While writing this redbook, we observed that the wconnect command behaves inconsistently when used in trusted host mode, especially upon frequently restored object databases. Therefore, we enabled trusted host access through .rhosts only as a precaution, and forced wconnect to require a password; then it does not exhibit the same inconsistency.

    In our environment we configured an interconnection against tivaix2, using the root account of tivaix2 to perform the operation through the remote shell service, as shown in Example 4-80.

    Because we do not use an interregion encryption (set during Framework installation in the wserver command arguments), we pass none to the -c flag option. Because we do not use encryption in tivaix2's Tivoli region, we pass none to the -r flag option.

    We log into tivaix2 and use the odadmin command to determine the encryption used in tivaix2's Tivoli region, as shown in Example 4-81. The line that starts with "Inter-dispatcher encryption level" displays the encryption setting of the Tivoli region, which is none in the example (highlighted in bold).

    Example 4-81: Determine the encryption used in the Tivoli region of tivaix2

    start example
     [root@tivaix2:/home/root] odadmin Tivoli Management Framework (tmpbuild) #1 Wed Oct 15 16:45:40 CDT 2003 (c) Copyright IBM Corp. 1990, 2003. All Rights Reserved. Region = 1221183877 Dispatcher = 1 Interpreter type = aix4-r1 Database directory = /usr/local/Tivoli/spool/tivaix2.db Install directory = /usr/local/Tivoli/bin Inter-dispatcher encryption level = none Kerberos in use = FALSE Remote client login allowed = version_2 Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/usr/local/Tivoli/install_dir/iblib/aix4-r1:/usr/lib:/us r/local/Tivoli/lib/aix4-r1:/usr/lib Force socket bind to a single address = FALSE Perform local hostname lookup for IOM connections = FALSE Use Single Port BDT = FALSE Port range = (not restricted) Single Port BDT service port number = default (9401) Network Security = none SSL Ciphers = default ALLOW_NAT = FALSE State flags in use = TRUE State checking in use = TRUE State checking every 180 seconds Dynamic IP addressing allowed = FALSE Transaction manager will retry messages 4 times. 
    end example

    Important:

    Two-way interconnection operations only need to be performed on one side of the connection. If you have two cluster nodes, you only need to run the wconnect command on one of them.

  7. Use the wlsconn and odadmin commands to verify the interconnection as shown in Example 4-82.

    Example 4-82: Verify Framework interconnection

    start example
     [root@tivaix1:/home/root] wlsconn      MODE NAME             SERVER         REGION   <----> tivaix2-region   tivaix2    1221183877 [root@tivaix1:/home/root] odadmin odlist Region           Disp  Flags  Port            IPaddr   Hostname(s) 1369588498          1    ct-    94         9.3.4.194   tivaix1,tivaix1.itsc.austin.ibm.com                                              9.3.4.3   tivaix1_svc 1112315744          1    ct-    94         9.3.4.195   tivaix2,tivaix2.itsc.austin.ibm.com 
    end example

    The output displays the primary IP hostname of the cluster node that is interconnected to in the preceding step. In our environment, the primary IP hostname of cluster node tivaix2 is found under the SERVER column of the output of the wlsconn command (highlighted in bold in Example 4-82, with the value tivaix2). The same value (tivaix2, highlighted in bold in Example 4-82) is found under the Hostname(s) column in the output of the odadmin command, on the row that shows the Tivoli region ID of the cluster node.

    The Tivoli region ID is found by entering the odadmin command as shown in Example 4-83. It is on the line that starts with "Region =".

    Example 4-83: Determine Tivoli region ID of cluster node

    start example
     [root@tivaix2:/home/root] odadmin Tivoli Management Framework (tmpbuild) #1 Wed Oct 15 16:45:40 CDT 2003 (c) Copyright IBM Corp. 1990, 2003. All Rights Reserved. Region = 1221183877 Dispatcher = 1 Interpreter type = aix4-r1 Database directory = /usr/local/Tivoli/spool/tivaix2.db Install directory = /usr/local/Tivoli/bin Inter-dispatcher encryption level = none Kerberos in use = FALSE Remote client login allowed = version_2 Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/usr/local/Tivoli/install_dir/iblib/aix4-r1:/usr/lib:/us r/local/Tivoli/lib/aix4-r1:/usr/lib Force socket bind to a single address = FALSE Perform local hostname lookup for IOM connections = FALSE Use Single Port BDT = FALSE Port range = (not restricted) Single Port BDT service port number = default (9401) Network Security = none SSL Ciphers = default ALLOW_NAT = FALSE State flags in use = TRUE State checking in use = TRUE State checking every 180 seconds Dynamic IP addressing allowed = FALSE Transaction manager will retry messages 4 times. 
    end example

    In this example, the region ID is shown as 1221183877.

  8. Interconnecting Framework servers only establishes a communication path. The Framework resources that need to be shared between Framework servers have to be pulled across the servers using an explicit updating command.

    Sharing a Framework resource shares all the objects that the resource defines. This enables Tivoli administrators to securely control which Framework objects are shared between Framework servers, and control the performance of the Tivoli Enterprise environment by leaving out unnecessary resources from the exchange of resources between Framework servers. Exchange all relevant Framework resources among cluster nodes by using the wupdate command.

    In our environment we exchanged the following Framework resources:

    • ManagedNode

    • MaestroEngine

    • MaestroDatabase

    • MaestroPlan

    • SchedulerEngine

    • SchedulerDatabase

    • SchedulerPlan

    Use the script shown in Example 4-84 on page 337 to exchange resources on all cluster nodes.

    Example 4-84: Exchange useful and required resources for IBM Tivoli Workload Scheduler

    start example
     for resource in ManagedNode \                 MaestroEngine MaestroDatabase MaestroPlan \                 SchedulerEngine SchedulerDatabase SchedulerPlan do         wupdate -r ${resource} All done 
    end example

    Important:

    Unlike the wconnect command, the wupdate command must be run on all cluster nodes, even on two-way interconnected Framework servers.

    The SchedulerEngine Framework resource enables the interconnected scheduling engines to present themselves in the Job Scheduling Console. The MaestroEngine Framework resource enables the wmaeutil command to manage running instances of Connectors.

    Tip

    Best practice is to update the entire Scheduler series (SchedulerDatabase, SchedulerEngine, and SchedulerPlan) and Maestro series (MaestroDatabase, MaestroEngine, and MaestroPlan) of Framework resources, if for no other reason than to deliver administrative transparency so that all IBM Tivoli Workload Scheduler-related Framework objects can be managed from any cluster node running IBM Tivoli Management Framework.

    It is much easier to remember that any IBM Tivoli Workload Scheduler-related Framework resource can be seen and managed from any cluster node running a two-way interconnected IBM Tivoli Management Framework server, than to remember a list of which resources must be managed locally on each individual cluster nodes, and which can be managed from anywhere in the cluster.

    In our environment, we ran the script in Example 4-84 on tivaix1 and tivaix2.

  9. Verify the exchange of Framework resources. Run the wlookup command as shown in Example 4-85 on the cluster node.

    Example 4-85: Verify on tivaix1 the exchange of Framework resources

    start example
     [root@tivaix1:/home/root] wlookup -Lar ManagedNode tivaix1 tivaix2 [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1 TIVAIX1_rg2 TIVAIX2_rg1 TIVAIX2_rg2 
    end example

    Note the addition of Framework objects that used to only exist on the cluster node on the opposite side of the interconnection.

    In our environment, we ran the commands on tivaix1.

  10. Run the same sequence of wlookup commands, but on the cluster node on the opposite side of the interconnection, as shown in Example 4-86. The output from the commands should be identical to the same commands run on the cluster node in the preceding step.

    Example 4-86: Verify on tivaix2 the exchange of Framework resources

    start example
     [root@tivaix2:/home/root] wlookup -Lar ManagedNode tivaix1 tivaix2 [root@tivaix2:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1 TIVAIX1_rg2 TIVAIX2_rg1 TIVAIX2_rg2 
    end example

    In our environment, we ran the commands on tivaix1.

  11. Log into both cluster nodes through the Job Scheduling Console, using the service IP labels of the cluster nodes and the root user account. All scheduling engines (corresponding to the configured Connectors) on all cluster nodes appear. Those scheduling engines marked inactive are actually Connectors for potential resource groups on a cluster node that are not active because the resource group is not running on that cluster node.

    In our environment, the list of available scheduling engines was as shown in Figure 4-86 on page 363, for a normal operation cluster.


    Figure 4-68: Available scheduling engines after interconnection of Framework servers

  12. Remove the .rhosts entries or delete the entire file if the two entries in this operation were the only ones added.

  13. Remove the configuration that allows any root user to access Framework. Enter the wsetadmin command as shown.

     wsetadmin -L root root@tivaix1 

  14. Set up a periodic job to exchange Framework resources using the wupdate command shown in the script in the preceding example. The frequency that the job should run at depends upon how often changes are made to the Connector objects. For most sites, best practice is a daily update about an hour before Jnextday. Timing it before Jnextday makes the Framework resource update compatible with any changes to the installation location of IBM Tivoli Workload Scheduler. These changes are often timed to occur right before Jnextday is run.

How to log in using the Job Scheduling Console

Job Scheduling Console users should log in using the service IP label of the scheduling engine they work with the most. Figure 4-69 shows how to log into TWS Engine1, no matter where it actually resides on the cluster, by using tivaix1_svc as the service label.


Figure 4-69: Log into TWS Engine1

Figure 4-70 on page 340 shows how to log into TWS Engine2.


Figure 4-70: Log into TWS Engine2

While using the IP hostnames will also work during normal operation of the cluster, they are not transferred during an HACMP fallover. Therefore, Job Scheduling Console users must use a service IP label for an instance of IBM Tivoli Workload Scheduler that falls over to a foreign cluster node.

4.1.12 Production considerations

In this redbook, we present a very straightforward implementation of a highly available configuration of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework. An actual production deployment adds considerably to the complexity of the presentation. In this section, we identify some of the considerations that have to be managed in an actual deployment.

Naming conventions

In this redbook we used names selected to convey their product function as much as possible. However, this may lead to names that are inconvenient for users in a production environment.

The IP service labels in our environment, tivaix1_svc and tivaix2_svc, are the primary means for Job Scheduling Console users to specify what to log into. For these users, the "_svc" string typically holds no significance. We recommend using a more meaningful name like master1 and master2 for two cluster nodes that implement Master Domain Manager servers, for example.

Connector names in this redbook emphasized the cluster node first. In an actual production environment, we recommend emphasizing the resource group first in the name. Furthermore, the name of the resource group would be more meaningful if it referred to its primary business function. For example, TIVAIX1_rg1 in the environment we used for this redbook would be changed to mdm1_tivaix1 for Master Domain Manager server 1. Job Scheduling Console users would then see in their GUI a list of resource groups in alphabetical order, in terms they already work with.

Dynamically creating and deleting Connectors

The inactive Connector objects do not have to remain in their static configurations. They only have to be created if a resource group falls over to a cluster node. For example, during normal operation of our environment, we do not use Connectors TIVAIX1_rg2 and TIVAIX2_rg1. If the Connectors can be dynamically created and deleted as necessary, then Job Scheduling Console users will only ever see active resource groups.

After a resource group is brought up in a cluster node, the rg_move_complete event is posted. A custom post-event script for the event can be developed to identify which resource group is moving, what cluster node it is moving to, and which Connectors are extraneous as a result of the move. This information, taken together, enables the script to create an appropriate new Connector and delete the old Connector. The result delivered to the Job Scheduling Console users is a GUI that presents the currently active scheduling engines running in the cluster as of the moment in time that the user logs into the scheduling network.

Time synchronization

Best practice is to use a time synchronization tool to keep the clocks on all cluster nodes synchronized to a known time standard. One such tool we recommend is ntp, an Open Source implementation of the Network Time Protocol. For more information on downloading and implementing ntp for time synchronization, refer to:

  • http://www.ntp.org/

Network Time Protocol typically works by pulling time signals from the Internet or through a clock tuned to a specific radio frequency (which is sometimes not available in certain parts of the world). This suffices for the majority of commercial applications, even though using the Internet for time signals represents a single point of failure. Sites with extremely high availability requirements for applications that require very precise time keeping can use their own onsite reference clocks to eliminate using the Internet or a clock dependent upon a radio frequency as the single point of failure.

Security

In this redbook we present a very simplified implementation with as few security details as necessary that obscure the HACMP aspects. In an actual production deployment, however, security is usually a large part of any planning and implementation. Be aware that some sites may not grant access to the Framework at the level that we show.

Some sites may also enforce a Framework encryption level across the Managed Nodes. This affects the interconnection of servers. Consult your IBM service provider for information about your site's encryption configuration and about how to interconnect in an encrypted Framework environment.

Other security considerations like firewalls between cluster nodes, firewalls between cluster nodes and client systems like Job Scheduling Console sessions, and so forth require careful consideration and planning. Consult your IBM service provider for assistance on these additional scenarios.

Monitoring

By design, failures of components in the cluster are handled automatically—but you need to be aware of all such events. Chapter 8, "Monitoring an HACMP Cluster", in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, describes various tools you can use to check the status of an HACMP Cluster, the nodes, networks, and resource groups within that cluster, and the daemons that run on the nodes.

HACMP software incudes the Cluster Information Program (Clinfo), an SNMP-based monitor. HACMP for AIX software provides the HACMP for AIX MIB, which is associated with and maintained by the HACMP for AIX management agent, the Cluster SMUX peer daemon (clsmuxpd). Clinfo retrieves this information from the HACMP for AIX MIB through the clsmuxpd.

Clinfo can run on cluster nodes and on HACMP for AIX client machines. It makes information about the state of an HACMP Cluster and its components available to clients and applications via an application programming interface (API). Clinfo and its associated APIs enable developers to write applications that recognize and respond to changes within a cluster.

The Clinfo program, the HACMP MIB, and the API are documented in High Availability Cluster Multi-Processing for AIX Programming Client Applications Version 5.1, SC23-4865.

Although the combination of HACMP and the inherent high availability features built into the AIX system keeps single points of failure to a minimum, there are still failures that, although detected, can cause other problems. See the chapter on events in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, for suggestions about customizing error notification for various problems not handled by the HACMP events.

Geographic high availability

An extension of cluster-based high availability is geographic high availability. As the name implies, these configurations increase the availability of an application even more when combined with a highly available cluster. The configurations accomplish this by treating the cluster's entire site as a single point of failure, and introduce additional nodes in a geographically separate location. These geographically separate nodes can be clusters in themselves.

Consult your IBM service provider for assistance in planning and implementing a geographic high availability configuration.

Enterprise management

Delivering production-quality clusters often involves implementing enterprise systems management tools and processes to ensure the reliability, availability and serviceability of the applications that depend upon the cluster. This section covers some of the considerations we believe that should be given extra attention when implementing a highly available cluster for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework.

Many IBM Tivoli products speed the time to deliver the additional necessary services to enable you to deliver service level guarantees for the users of the cluster. For more information about these products, refer to:

  • http://www.ibm.com/software/tivoli/

We recommend that you consult your IBM Tivoli service provider for advice on other enterprise systems management issues that should be considered. The issues covered in this section represent only a few of the benefits available for delivery to users of the cluster.

Measuring availability

Availability analysis is a major maintenance tool for clusters. You can use the Application Availability Analysis tool to measure the amount of time that any of your applications is available. The HACMP software collects, time stamps, and logs the following information:

  • An application starts, stops, or fails

  • A node fails or is shut down, or comes online

  • A resource group is taken offline or moved

  • Application monitoring is suspended or resumed

Using SMIT, you can select a time period and the tool will display uptime and downtime statistics for a given application during that period.

The tool displays:

  • Percentage of uptime

  • Amount of uptime

  • Longest period of uptime

  • Percentage of downtime

  • Amount of downtime

  • Longest period of downtime

  • Percentage of time application monitoring was suspended

The Application Availability Analysis tool reports application availability from the HACMP Cluster infrastructure's point of view. It can analyze only those applications that have been properly configured so that they will be managed by the HACMP software.

When using the Application Availability Analysis tool, keep in mind that the statistics shown in the report reflect the availability of the HACMP application server, resource group, and (if configured) the application monitor that represent your application to HACMP.

The Application Availability Analysis tool cannot detect availability from an end user's point of view. For example, assume that you have configured a client-server application so that the server was managed by HACMP, and after the server was brought online, a network outage severed the connection between the end user clients and the server.

End users would view this as an application outage because their client software could not connect to the server—but HACMP would not detect it, because the server it was managing did not go offline. As a result, the Application Availability Analysis tool would not report a period of downtime in this scenario.

For this reason, best practice is to monitor everything that affects the entire user experience. We recommend using tools like IBM Tivoli Monitoring, IBM Tivoli Service Level Advisor, and IBM Tivoli NetView to perform basic monitoring and reporting of the end-user service experience.

Configuration management

When there are many nodes in a cluster, configuration management often makes a difference of as much as hours or even days between the time a new cluster node is requested by users and when it is available with a fully configured set of highly available applications.

Configuration management tools also enable administrators to enforce the maintenance levels, patches, fix packs and service packs of the operating system and applications on the cluster nodes. They accomplish this by gathering inventory information and comparing against baselines established by the administrators. This eliminates the errors that are caused in a cluster by mismatched versions of operating systems and applications.

We recommend using IBM Tivoli Configuration Manager to implement services that automatically create a new cluster node from scratch, and enforce the software levels loaded on all nodes in the cluster.

Notification

Large, highly available installations are very complex systems, often involving multiple teams of administrators overseeing different subsystems. Proper notification is key to the timely and accurate response to problems identified by a monitoring system. We recommend using IBM Tivoli Enterprise Console and a notification server to implement robust, flexible and scalable notification services.

Provisioning

For large installations of clusters, serving many highly available applications, with many on demand cluster requirements and change requests each week, provisioning software is recommended as a best practice. In these environments, a commercial-grade provisioning system substantially lowers the administrative overhead involved in responding to customer change requests. We recommend using IBM Tivoli ThinkDynamic Orchestrator to implement provisioning for very complex and constantly changing clusters.

Practical lessons learned about high availability

While writing this redbook, a serial disk in the SSA disk tray we use in our environment failed. Our configuration does not use this disk for any of our volume groups, so we continued to use the SSA disk tray. However, the failed drive eventually impacted the performance of the SSA loop to the point that HACMP functionality was adversely affected.

The lesson we learned from this experience was that optimal HACMP performance depends upon a properly maintained system. In other words, using HACMP does not justify delaying normal system preventative and necessary maintenance tasks.

Forced HACMP stops

We observed that forcing HACMP services to stop may leave it in an inconsistent state. If there are problems starting it again, we find that stopping it gracefully before attempting a start clears up the problem.

4.1.13 Just one IBM Tivoli Workload Scheduler instance

The preceding sections show you how to design, plan and implement a two-node HACMP Cluster for an IBM Tivoli Workload Scheduler Master Domain Manager in a mutual takeover configuration. This requires you to design your overall enterprise workload into two independent, or at most loosely coupled, sets of job streams. You can, however,opt to only implement a single instance of IBM Tivoli Workload Scheduler in a two-node cluster in a hot standby configuration.

Best practice is to use a mutual takeover configuration for Master Domain Managers. In this section, we discuss how to implement a single instance of IBM Tivoli Workload Scheduler in a hot standby configuration, which is appropriate for creating highly available Fault Tolerant Agents, for example.

Important:

Going from a mutual takeover, dual Master Domain Manager configuration to only one instance of IBM Tivoli Workload Scheduler doubles the risk exposure of the scheduling environment.

You can create a cluster with just one instance of IBM Tivoli Workload Scheduler by essentially using the same instructions, but eliminating one of the resource groups. You can still use local instances of IBM Tivoli Management Framework. With only one resource group, however, there are some other, minor considerations to address in the resulting HACMP configuration.

Create only one IBM Tivoli Workload Scheduler Connector on each cluster node. If the installation of the single instance of IBM Tivoli Workload Scheduler is in /usr/maestro, the instance normally runs on cluster node tivaix1, and the IBM Tivoli Workload Scheduler Connector is named PROD for "production", then all instances of IBM Tivoli Management Framework on other cluster nodes also use a IBM Tivoli Workload Scheduler Connector with the same name ("PROD") and configured the same way. When the resource group containing an instance of IBM Tivoli Workload Scheduler falls over to the another cluster node, the IP service label associated with the instance falls over with the resource group.

Configure the instances of IBM Tivoli Management Framework on the cluster nodes to support this IP service label as an IP alias for the Managed Node in each cluster node. Job Scheduling Console sessions can connect against the corresponding IP service address even after a fallover event.

Consult your IBM service provider if you need assistance with configuring a hot standby, single instance IBM Tivoli Workload Scheduler installation.

Complex configurations

In this redbook we show how to configure IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework on a cluster with two cluster nodes. More complex configurations include:

  • One instance of IBM Tivoli Workload Scheduler across more than two cluster nodes.

  • More than two instances of IBM Tivoli Workload Scheduler across more than two cluster nodes.

  • Multiple instances of IBM Tivoli Workload Scheduler on a single cluster node, in a cluster with multiple nodes.

The number of permutations of fallover scenarios increases with each additional cluster node beyond the two-node environment we show in this redbook. Best practice is to test each permutation.

Consult your IBM service provider if you want assistance with configuring a more complex configuration.



 < Day Day Up > 



High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
High Availability Scenarios With IBM Tivoli Workload Scheduler And IBM Tivoli Framework
ISBN: 0738498874
EAN: 2147483647
Year: 2003
Pages: 92
Authors: IBM Redbooks

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net