5.3 CSM administration

 < Day Day Up > 

In 5.1, "CSM concepts and architecture" on page 212, we touch on the topics of CSM management and administration as a basic introduction to the main features of CSM and how they function.

In this section, we examine these administration topics in detail by using examples and sample scenarios, and discuss the following areas:

  • Log file management

  • Managing Node groups

  • Hardware control

  • Cluster File Manager

  • Software Management System

  • CSM monitoring

  • Diagnostic probes

  • CSM backup

  • Querying CSM database

  • CSM problem determination and diagnostics

  • CSM hostname changes

5.3.1 Log file management

CSM logs to several different log files during installation and cluster management. These log files are available on the management server and managed nodes, and they help to determine the status of a command, or in troubleshooting a CSM issue.

Most of the CSM log files on the management server are located in the /var/log/csm directory. Table 5-1 lists the log files on the management server and their purpose.

Table 5-1. Log files on management server

Log File

Purpose

/var/log/csm/install.log

csm. core install log

/var/log/csm/installms.log

Output of installms command

/var/log/csm/installnode.log

Verbose output of installnode command

/var/log/csm/installnode.node.log.*

hmc_nodecond out of each node's installation

/var/log/csm/csmsetupyast.log

Output of csmsetupyast command

/var/log/csm/updatenode.log

Output of updatenode command

/var/log/csm/smsupdatenode.log

Output of smsupdatenode command

/var/log/csm/cfmerror.log

CFM error log

/var/log/csm/cfmchange.log

Output of CFM file updates

/var/log/csm/hw_logfile

HW control daemon status log

/var/log/csm/hmc[IP_address].log.*

HMC communication error messages

/var/log/csm/hmc[IP_address].java_trace

Tracing for openCIMOM calls to HMC

/var/log/csm/hmc_logfile.314

Tracing for libhmc_power.so

/var/log/csm/getadapters/getadapters.node.log.*

Output of getadapters command for each node

/var/ct/RMstart.log

Resource Manager status log

/var/ct/*.stderr

RSCT daemons standard errors

Linux log files in /var/logs

Refer to problem determination section 3.

Other Linux log files

Refer to problem determination section 3

Table 5-2 on page 253 lists log files on managed nodes and their purpose.

Table 5-2. Log files on managed nodes

Log file

Purpose

/var/log/csm/install.log

csm.core install log

/var/log/csm/updatekernel.log

Kernel update log running smsupdatenode

5.3.2 Node groups

Managed nodes can be grouped together by using the nodegrp command. Distributed commands can be issued against groups for common tasks , instead of performing them on each node. Default node groups created at install time are shown in Example 5-27.

Example 5-27. nodegrp command
 # nodegrp ManagedNodes AutoyastNodes ppcSLES81Nodes AllNodes SuSE82Nodes SLES72Nodes pSeriesNodes SLES81Nodes LinuxNodes PreManagedNodes xSeriesNodes EmptyGroup APCNodes RedHat9Nodes MinManagedNodes 

Node groups are created with the nodegrp command:

 #nodegrp -a lpar1, lpar2 testgroup 

This creates a group called test group which includes nodes lpar1 and lpar2. For more information, refer to the nodegrp man page.

Distributed commands such as dsh can be run on nodegroups:

 # dsh -w testgroup date 

5.3.3 Hardware control

The CSM hardware control feature is used to remotely control HMC-attached pSeries servers. Remote nodes can be powered on, off, the power status can be queried, and you can open a remote console from the management server.

It is mandatory to have all pSeries servers connected to HMC, and to have the HMC communicate with the management server for the hardware control function. Figure 5-7 shows hardware control feature design for a simple CSM cluster.

Figure 5-7. pSeries CSM cluster with hardware control using HMC

graphics/05fig07.gif

Hardware control uses openCIMOM (public software) and conserver software to communicate to HMC to issue remote commands. The IBM.HWCTRLRM daemon subscribes and maintains state to HMC openCIMOM events during startup. Conserver is started at boot time on the management server and reads from a defined config file located at /etc/opt/conserver/conserver.cf.

The following hardware control commands are available on the management server:

r power Powers nodes on and off and queries power status

rconsole Opens a remote serial console for nodes

chrconsolecfg Removes , adds and re- writes conserver config file entries

rconsolerefresh Refreshes conserver on the management server

getadapters Obtains MAC addresses of remote nodes

lshwinfo Collects node information from Hardware Control points

systemid Stores userid and encrypted password required to access remote hardware

The rpower and rconsole commands are frequently used hardware control commands and we discuss them in detail here:

Remote power

Remote power commands access the CSM database for node attribute information.

PowerMethod Node attribute must be set to hmc to access pSeries nodes.

HardwareControlPoint is the hostname or IP address of the Hardware Management Console (HMC).

HardwareControlNodeId is the hostname or IP address of the managed node which is attached to the HMC over a serial link.

Other Node attributes such as HWModel, HWSerialNum, HWType are obtained automatically using lshwinfo .

Remote power configuration is outlined in 5.2.5, "Installing CSM on the management server" on page 228.

Remote console

The Remote console command communicates with the console server to open remote console to nodes using management VLAN and Serial connections. The HMC works as the remote console server listening for requests from the management server.

Only one read write console, but multiple read only consoles, can be opened to each node by using the rconsole command.

5.3.4 Configuration File Manager (CFM)

Configuration File Manager (CFM) is a CSM component to centralize and distribute files across management nodes in a management cluster. This is similar to file collections on IBM PSSP. Common files such as /etc/ hosts across the cluster are distributed from the management server using a push mechanism through root's crontab and/or event monitoring. CFM uses rdist to distribute files. Refer to 5.1.7, "CSM diagnostic probes" on page 220 for more information on hostname changes.

CFM uses /cfmroot as its main root directory, but copies all files to /etc/opt/csm/cfmroot with a symlink on the management server. File permissions are preserved while copying. Make sure that you have enough space in your root directory or create /cfmroot on a separate partition and symlink it from /etc/opt/csm/cfmroot.

Example 5-28 shows cfmupdatenode usage.

Example 5-28. cfmupdatenode usage
 Usage: cfmupdatenode [-h] [-v  -V]                 [-a  -N node_group[,node_group]  --file file ] [-b]                 [[-y]  [-c]] [-q [-s] ] [-r remote shell path]                 [-t timeout] [-M number of max children]                 [-d location for distfile] [-f filename] [[-n] node_list]      -a         Files are distributed to all nodes. This option cannot be                 used with the -N or host positional arguments.      -b         Backup. Preserve existing configuration file (on nodes) as                 "filename".OLD      -c         Perform binary comparison on files and transfer them                 if they differ.      -d distfile location                 cfmupdatenode will generate a distfile in the given (absolute)                 path and exit (without transferring files). This way the user                 can execute Rdist with the given distfile and any options                 desired.      -f  filename                 Only update the given filename. The filename must be the                 absolute path name of the file and the file must reside in                 the cfmroot directory      --file filename                 specifies a file that contains a list of nodes names. If the                 file name "-", then the list is read from stdin. The file                 can contain multiple lines and each line can have one or node                 names, separated by spaces.      -h         Writes the usage statement to standard out.      [-n]  node_list                 Specifies a list of node hostnames, IP addresses, or node                 ranges on which to run the command. (See the noderange man                 page for information on node ranges.)      -M  number of maximum children                 Set the number of nodes to update concurrently.                 (The default is 32.)      -N Node_group[,Node_group...]                 Specifies one or more node groups on which to run the command.      -q         Queries for out of date CFM files across the cluster.      -s         Reports which nodes are up to date by comparing last CFM                 update times. Must be called with the -q option.      -r remote  shell path.                 Path to remote shell. (The default is the DSH_REMOTE_CMD                 environment variable, or /usr/bin/rsh).      -t timeout                 Set the timeout period (in seconds) for waiting for response                 from a remote process. (The default is 900).      -v  V     Verbose mode.      -y         Younger mode. Does not update files younger than master copy. 

Note

CFM can be set up prior to running the installnode command, and common files are distributed at install time while installing nodes.


At CSM install time, root's crontab is updated with an entry to run cfmupdatenode every day at midnight.This can changed to suit your requirements.

 #crontab -l grep cfmupdate 0 0 * * * /opt/csm/bin/cfmupdatenode -a 1>>/var/log/csm/cfmerror.log 2>>/var/log/csm/cfmerror.log 

Some common features of CFM, along with usage examples, are described here.

  • In general, it is important to have a single /etc/hosts file across the management cluster. The CSM database and other commands do hostname resolution either using /etc/hosts or DNS. To keep a single copy of /etc/hosts, symlink /etc/hoststo /cfmroot/etc/hosts:

     # ln -s /etc/hosts /cfmroot/etc/hosts 
  • Run the cfmupdatenode command to copy the hosts file to all managed nodes defined in the CSM database:

     # cfmupdatenode -a 
  • If you want to have a file that is different on the management server and all managed nodes, copy or create the file to /cfmroot instead of symlinking it and then distributing it across to nodes. Files in /cfmroot are not distributed to the management server.

     # copy /etc/file.1 /cfmroot/etc/file.1 #touch /cfmroot/etc/file.1 #cfmupdatenode -a 
  • Files can be distributed to selected nodegroups only, instead of to all managed nodes, by creating the files with a ._groupname extension. To distribute ntp.conf file to, for example, nodegroup group1, create the file with ._group1 and when cfmupdatenode is run next time, it copies the ntp.conf file only to group1 nodes.

     # cp /etc/ntp.conf /cfmroot/etc/ntp.conf.group1 # cfmupdatenode -a 
  • CFM customizations can be done after distributing files using pre-install and post-install scripts. Create script files with pre- and post- extensions in cfmroot and when cfmupdatenode runs, pre- and post- scripts are run accordingly . Example 5-29 shows running a pre-install script to save the file and distribute the file, and then running the post script to reset permissions.

    Example 5-29. Example of cfmupdatenode
     Create pre and post install scripts root@ms#cat >/cfmroot/etc/ntp.conf.pre #!/bin/sh cp /etc/ntp.conf /etc/ntp.conf.'date' ^D root@ms# cat >/cfmroot/etc/ntp.conf.post #!/bin/sh /sbin/service ntprestart ^D root@ms#chmod 755 /cfmroot/etc/ntp.conf.pre ntp.conf.post root@ms#cp /etc/ntp.conf /cfmroot/etc/ntp.conf._group1 root@ms#cfmupdatenode -a 
  • To have event monitoring monitor CFM file modifications and push the files whenever files are modified, start the condition and responses as below:

     #  startcondresp  CFMRootModTimeChanged CFMModResp 

Whenever a file in /cfmroot is modified, the changes are propagated to all managed nodes in the cluster.

Note

Use caution while enabling CFM event monitoring, as it can impact system performance.


User id management with CFM

CFM can be used to implement centralized user id management in your management domain. User ids and passwords are generated on the management server, stored under /cfmroot, and distributed to nodes as scheduled.

Copy the following files to /cfmroot to set up effective user id management:

  • /etc/passwd ----> /cfmroot/etc/password_useridmgmt.group

  • /etc/shadow------> /cfmroot/etc/shadow_useridmgmt.group

  • /etc/group---------> /cfmroot/etc/group_useridmgmt.group

  • /etc/hosts------> /cfmroot/etc/hosts_useridmgmt.group

Be aware that any id and password changes made on the nodes will be lost once centralized user id management is implemented. However, you can force users to change their passwords on the management server instead of on nodes. Set up scripts or tools to centralize user id creation and password change by group on the management server, and disable password command privileges on managed nodes.

CFM distributes files to managed nodes, but never deletes them. If a file needs to be deleted, delete it manually or with a dsh command from the management server. All CFM updates and errors are logged to files /var/log/csm/cfmchange.log and /var/log/csm/cfmerror.log.

For more information, refer to IBM Cluster Systems Management for Linux: Administration Guide , SA22-7873.

5.3.5 Software maintenance

The CSM Software Maintenance System (SMS) is used to install, query, update and delete Linux RPM packages on the management server and managed nodes. It is performed using the smsupdatenode command. Autoupdate open source software is a prerequisite for using SMS.

SMS uses either install mode to install new RPM packages, or update mode to update existing RPM packages on cluster nodes. Preview or test mode only tests the update without actually installing the packages.

The SMS directory structure includes /csminstall/Linux/InstallOSName/InstallOSVersion/InstallOSArchitecture/RPMS ../updates and ../install subdirectories to maintain all SMS RPMs, updates and install packages, respectively. Sample SMS directory structure on SuSE8.1 looks like the following:

  • /csminstall/Linux/SLES/8.1/ppc64/RPMS - contains all dependent RPM packages

  • /csminstall/Linux/SLES/8.1/ppc64/updates - -contains all RPM package updates

  • /csminstall/Linux/SLES/8.1/ppc64/install - contains all new RPM packages that are not installed with OS and need to be installed. All third party vendor software can also be placed in this subdirectory.

Copy the requisite RPM packages in the respective subdirectories from Install or Update CDs.

Note

SMS is only for maintaining RPM packages. OS patch CDs cannot be used for updating OS packages.

Follow these steps to copy the RPM packages from patch CDs to respective subdirectories, and then issue smsupdatenode :

  1. Mount the Patch CD on /mnt/ cdrom .

  2.  #cd /mnt/cdrom;cp 'find . -name *.rpm \ /csminstall/Linux/SLES/8.1/ppc64/updates ' 
  3.  # smsupdatenode -v 

Example 5-30 shows usage of smsupdatenode.

Example 5-30. smsupdatenode usage
 Usage: smsupdatenode   [-h] [-a  -N node_group[,node_group]  --file file ]                 [-v  -V] [-t  --test] [-q  --query [-c  --common]]                 [--noinsdeps] [-r "remote shell path"]                 [-i  --install packagename[,packagename]]                 [-e  --erase {--deps  --nodeps} packagename[,packagename]]                 [-p  --packages packagename[,packagename]] [[-n] node_list] smsupdatenode   [--path pkg_path] --copy {attr=value...  hostname}        -a       Run Software Maintenance on all nodes.        --copy {attr=value...  hostname}                 Copy the distrobution CD-Roms corresponding to the given                 attributes or hostname to the correct /csminstall directory.                 If you give attr=value pairs they must come at the end of the                 command line. The valid attributes are:                         InstallDistributionName                         InstallDistributionVersion                         InstallPkgArchitecture                 If a hostname is given, the distribution CD-ROMs, and                 destination directory, are determined by the nodes                 attributes.        -e  --erase {--deps  --nodeps} packagename[,packagename]                 Removes the RPM packages specified after either the --deps                 or --nodeps option.             --deps                 Removes all packages dependent on the package targeted for                 removal.             --nodeps                 Only removes this package and leaves the dependent packages                 installed.        --file filename                 specifies a file that contains a list of nodes names. If the                 file name "-", then the list is read from stdin. The file                 can contain multiple lines and each line can have one or node                 names, separated by spaces.        -h       Writes the usage statement to standard out.        [-n] node_list                 Specifies a list of node hostnames,  IP addresses, or node                 ranges on which to run the command. (See the noderange man                 page for information on node ranges.)        -i  --install packagename[,packagename]                 Installs the given RPM packages.        -N Node_group[,Node_group...]                 Specifies one or more node groups on which to run the                 command.        --noinsdeps                 Do not install RPM dependencies.        -p  --packages packagename[,packagename]                 Only update the given packages. The user does not have to                 give the absolute path. It will be determined by looking under                 directory structure corresponding to the node.        --path pkg_path                 Specifies one or more directories, separated by colons, that                 contain copies of the distrobution CD-ROMs. The default on a                 Linux system is /mnt/cdrom and the default on an AIX system is                 /dev/cd0. This flag may only be used with the --copy flag.        -q  --query [-c  --common]                 Query all the RPMs installed on the target machines and report                 the RPMs installed that are not common to every node.            -c  --common                 Also report the common set of RPMs (installed on every target                 node).        -r "remote shell path"                 Path to use for remote commands. If this is not set, the default                 is determined by dsh.        -t  --test                 Report  what would be done by this command without making any                 changes to the target system(s)        -v  -V    Verbose mode. 

SMS writes logs to /var/log/csm/smsupdatenode.log files.

Kernel packages are updated as normal RPM packages using SMS. Once upgraded, kernel cannot be backed out, so use caution while running smsupdatenode command with any kernel packages (kernel* prefix).

Also, make sure to run lilo to reload the boot loader if you upgrade kernel and wants to load the new kernel.

5.3.6 CSM Monitoring

CSM uses Reliable Scalable Cluster Technology Infrastructure (RSCT) for event monitoring. RSCT has been proven to provide highly available and scalable infrastructure in applications such as GPFS and PSSP.

CSM Monitoring uses a condition and response-based system to monitor system resources such as processes, memory, CPU and file systems. A condition can be a quantified value of a monitored resource attribute, and is based on a defined event expression. If an event expression is true, then an event is generated.

File system utilization(/var) is a resource to be monitored, and "condition" can be THE percent utilization on that resource. For example, /var >90% means if the /var file system increases above a 90% threshold value, then the event expression is true and an event is generated. To prevent flooding of generating events, a re-arm expression can be created. In this case, no event will be generated until the re-arm expression value is true.

A response can be one or more actions performed when an event is triggered for a defined condition. Considering the file system resource example, if we define that a response action is to increase the file system by 1 MB if /var reaches above 90% and to notify the system administrator, then after monitoring is started, whenever /var goes above 90%, a response action is performed automatically.

A set of predefined conditions and responses are available at CSM install. See the IBM Cluster Systems Management for Linux: Administration Guide SA22-7873, for more information.

Resource Monitoring and Control (RMC) and Resource Managers (RMs)

Resource Monitoring and Control (RMC) and Resource Managers (RM) are components of RSCT and are critical for monitoring.

  • RMC provides monitoring of system resources. The RMC daemon monitors resources and alerts RM.

  • RM can be defined as a process that maps resources and resource classes to commands for one or more resources. Resource classes contain resource attributes and descriptions and are available for query through the command line.

Table 5-3 lists available resource managers and their functions.

Table 5-3. Resource managers

Resource manager

Function

IBM.AuditRM

Audit Logging

IBM.ERRM

Event resource manager

IBM.HWCTRLRM

Hardware control

IBM.DMSRM

Domain node management

IBM.HostRM

Hostname management

IBM.SensorRM

 

IBM.FSRM

File System management

Table 5-4 lists predefined resource classes and can be obtained with the command lsrsrc .

Table 5-4. Predefined resource classes

Resource class

Description of attribute

IBM.Asociation

Persistent Resources

IBM.Auditlog

Event Audit logging

IBM.AuditlogTemplate

Template for audit logging

IBM.Condition

Pre-defined conditions

IBM.EthernetDevice

Primary Ethernet device

IBM.EventResponse

Pre-defined responses

IBM.Host

Management server host

IBM.FileSystem

File system attributes

IBM.Program

 

IBM.TokenRing

Token ring device

IBM.Sensor

CFM root and MinManaged

IBM.ManagedNode

Managed node

IBM.ManagementServer

Management server on nodes

IBM.NodeAuthenticate

Node authentication

IBM.PreManagedNode

PreManaged node classification

IBM.NodeGroup

Nodegroups

IBM.NetworkINterface

Defined network interface

IBM.DmsCtrl

Domain control

IBM.NodeHwCtrl

Node Hardware control point attributes

IBM.HwCtrlPoint

Hardware Control point (HMC)

IBM.HostPublic

 

lsrsrc -l Resource_class will list detailed attributes of each resource class. Check the man page of lsrsrc for more details.

Customizing event monitoring

As explained, custom conditions and responses can be created and custom monitoring can be activated on one or more nodes as follows :

  • Create a custom condition or event expression such as monitor file system space used on node lpar1 only in the management domain:

     #mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" -E "PercentTotUsed < 85" -n lpar1 -m d "File system space used 

    wherein:

    -r option is for resource class

    -e is for creating an event expression

    -E is for re-arm expression

    -n is for specifying a node

    -m is for management domain. Is a must if -n is used.

    d is for description of condition

  • Create a custom response, such as e-mail root using a notification script to run Sunday, Monday and Tuesday.

     #mkresponse -n "E-mail root" -d " 1+2+3 -s "/usr/sbin/rsct/bin/notifyevent root" -e b "E-mail root any time" 

    wherein,

    -n is for action

    -d days of week

    -e type of event to run and b is for both event and re-arm event

  • Link the created file system space used condition and e-mail notification response

      #mkcondresp  -"File system space used" E-mail root any time" 
  • Start the created condition and responses linked above:

     #startcondresp "File system space used" "E-mail root any time" 
  • List the condition and responses to check the status. State is listed as "Active" if started and "Not Active" if not started.

     #lscondresp 

Example 5-31 shows the output of lscondresp.

Example 5-31. lscondresp output
 #lscondresp Displaying condition with response information: Condition                      Response                         Node       State "NodeFullInstallComplete"      "RunCFMToNode"                   "mgmt_server" "Active" "NodeManaged"                  "GatherSSHHostKeys"              "mgmt_server" "Active" "UpdatenodeFailedStatusChange" "UpdatenodeFailedStatusResponse" "mgmt_server" "Active" "NodeChanged"                  "rconsoleUpdateResponse"         "mgmt_server" "Active" "NodeFullInstallComplete"      "removeArpEntries"               "mgmt_server" "Active"  "FileSystem Space Used"        "E-mail root any time"          "mgmt_server" "Active"  

If any file system size exceeds 90% on lpar1, our newly created event triggers an an event as a response action by e-mailing root. Monitoring resumes once the file system is fixed back to 85%.

Multiple response actions can be defined to a single condition, and a single response can be assigned to multiple conditions. For Example 5-31, an action such as increasing the file system or deleting files older than 60 days from the file system to claim space could be other actions.

5.3.7 Diagnostic probes

CSM diagnostic probes help you diagnose system problems using programs called probes.The CSM command probemgr is useful in sending custom probes to determine problems; users write their own diagnostics scripts and call probemgr.

All predefined probes are located in the /opt/csm/diagnostics/probes directory, and probemgr can access the user-defined directory before reading the predefined probes called with a -D option. System problem diagnostics can be dependent on each other, and probes take a defined order while running. Example 5-32 shows usage of probemgr.

Example 5-32. Probemgr usage
 probemgr [-dh] [-c {01020127}] [-l {01234}]          [-e prb,prb,...] [-D dir] [-n prb]     -h        Display usage information     -d        Show the probes dependencies and the run order     -c        Highest level of exit code returned by a probe that the               probe manager permits before terminating.The defaule value               is 10                 0   - Success                 10  - Success with Attention Messages                 20  - Failure                 127 - Internal Error     -l        Indicates the message output level. The default is 3                 0 - Show probe manager messages, probe trace messages,                     probe explanation and suggested action messages, probe                     attention messages and probe error messages                 1 - Show probe trace messages, probe explanation and                     suggested action messages, probe attention messages                     and probe error messages                 2 - Show probe explanation and suggested action messages,                     probe attention messages and probe error messages                 3 - Show probe attention messages and probe error                     messages                 4 - Show probe error messages only     -e prb,.. List of probes to exclude when creating the probe dependency               tree. This also means that those probes will not be run     -D dir    Directory where user specified probes reside     -n prb    Run the specified probe 

Table 5-5 lists the default pre-defined probes available and the probe dependencies.

Table 5-5. Probes and dependencies

Probe

Dependent probe

dsh

ssh-protocol

nfs

network

rmc

network

errm

rmc

fs-mounts

none

network-ifaces

network-enabled

dmsrm

rmc

network-routes

network-enabled

network-ifaces

network-hostname

none

network-enabled

none

network-ping

network-enabled

network-ifaces

network-routes

network

network-enabled

network-hostnamr

network-ifaces

network-routes

network-ipforward

network-ping

network-ipforward

none

ssh-protocol

none

rsh-protocol

none

All probes are run from the management server using the probemgr command. For detailed information on each probe, refer to probemgr man page.

5.3.8 Querying the CSM database

CSM stores all cluster information, such as nodes, attributes of nodes, and so on, in a database at a centralized location in the /var/ct directory. This database is accessed and modified using tools and commands, but not directly with a text editor.

Table 5-6 on page 268 lists commands you can use to access the CSM database.

Table 5-6. CSM database commands

Command

Purpose

lsnode

Lists nodes defined

definenode

Add/define nodes

chnode

Change node definitions

rmnode

Remove defined nodes

smsupdatenode

Software update the node

installnode

Install node

csmsetupyast

Setup config for the node to be installed

cfmupdatenode

Distribute files

rpower

Remote power

rconsole

Remote Console

5.3.9 Un-installing CSM

CSM is un-installed by using the uninstallms command on the management server. Not all packages are removed while running uninstallms. Table 5-7 identifies what is removed and what is not removed with uninstallms.

Table 5-7. Uninstallms features

Node definitions

Removed

Node group definitions

Removed

Predefined conditions

Removed

CSM packages

Removed

CSM log files

Removed

CSM Package prerequisites

Not removed

RSCT packages when rsct.basic is present

Not removed

RSCT packages when csm.client is present on mgmt. server

Not removed

RSCT packages when no rsct.basic installed

Removed

/cfmroot

Not removed

/csminstall

Not removed

/opt/csm

Removed

SSH public keys

Not removed

Clean up manually to remove all the packages and directories that are not removed with the uninstallms command to completely erase CSM. Refer to IBM Cluster Systems Management Guide for Linux: Planning and Installation Guide-Version 1.3.2 , SA22 7853, for detailed information

5.3.10 Distributed Command Execution Manager (DCEM)

DCEM is a Cluster Systems Management GUI interface used to run a variety of tasks on networked computers. Currently this is not available for pSeries machines.

5.3.11 Backing up CSM

Currently CSM backup and restore features are not available for pSeries Linux management server version 1.3.2. These will be available in the near future.

5.3.12 CSM problem determination and diagnostics

CSM logs detailed information in various log files on the management server and on managed nodes. These log files are useful in troubleshooting problems. In this section, we discuss some common and frequent problems which may be encountered while setting up and running CSM. For more detailed information and diagnostics, refer to the IBM Cluster Systems Management Guide for Linux: Administration Guide , SA22-7873.

Table 5-8 lists common CSM problems and their fixes.

Table 5-8. Common CSM problems and fixes

Problem

Fix

installms fails

Make sure to copy requisites to the reqs directory on the temporary CSM package folder

rpower reports <unknown> status for query

Management server event subscriptions either expired or hung on HMC. Refresh openCIMOM on HMC or reboot HMC.

rpower -a query reports "Hardware Control Socket Error"

  • The Java path might have changed on the management server. Verify that the Java search path is "/usr/lib/IBMJava2-1.3.1/jre/bin/java".

    Restart HWCTRL as follows:

    - stopsrc -s IBM.HWCTRLRM;

    - startsrc -s IBM.HWCTRLRM -e "HC_JAVA_PATH=/usr/lib/IBMJav a2-1.3.1"

Any other rpower or rconsole errors

  • Stop/start RMC daemons such as IBM.HWCTRLRM on the management server

  • Stop/Start openCIMOM on the HMC by running "initCIMOM stop" and "initCIMOMstart"

  • Reboot the HMC

  • Start IBM.HWCTRLRM with trace hooks on to collect more data by running "startsrc -s IBM.HWCTRLRM -e "HC_JAVA_VERBOSE=/tmp/jni.txt"

  • Run "rmcctl -A" and rmcctl -Z" to stop/remove and add/start RMC daemons

  • Check IBM Cluster Systems Mgmt for Linux: Hardware Control Guide , SA22-7856

rconsole not coming up on the current window

Check the flags to make sure the -t option is specified

rconsole fails with "xinit failed to connect to console" error

  • Corrupted conserver.cf. Rewrite the config files and refresh conserver conserver

  • chrconsolecfg -a

  • rconsolerefresh -r

csmsetupyast fails with getadapters errors

  • Run getadapters command line to populate CSM node database

  • Or use the chnode command to upload node network attributes such as InstallAdapterType, MAC address, and so on

  • getadapaters may fail if run on multiple nodes, so check /var/log/csm/getadapters/getadapter*. * logs to check, and fix any errors with locks

installnode fails

  • Run lsnode and verify duplicate nodes entries are listed

  • Verify log files to find/fix any errors

  • Check for node attribute "Mode" to make sure it is set to PreManaged. If set to Installing and installnode fails, reset it to "PreManaged" with the chnode command

  • Check network cables for proper connectivity

  • Check /etc/dhcpd.conf and restart dhcpd

  • Check /etc/inetd.conf for tftp and restart inetd

  • Open the read only console and look for any packaging errors. Installnode waits for input if any package errors are encountered. Open the read-write console to interactively respond to install options and close the console.

updatenode fails

Check for hostname resolution errors

dsh fails

Check for SSH authentication errors

smsupdatenode fails

Check that RPM packages are copied to the right directories

Event monitoring fails

Check for proper network connectivity

Refer to IBM Cluster Systems Management for Linux: Hardware Control guide SA22-7856 for more information on hardware control and HMC connectivity and RMC issues.

 < Day Day Up > 


Quintero - Deploying Linux on IBM E-Server Pseries Clusters
Quintero - Deploying Linux on IBM E-Server Pseries Clusters
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 108

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net