5.3 CSM administration

< Day Day Up >

In 5.1, "CSM concepts and architecture" on page 212, we touch on the topics of CSM management and administration as a basic introduction to the main features of CSM and how they function.

In this section, we examine these administration topics in detail by using examples and sample scenarios, and discuss the following areas:

Log file management
Managing Node groups
Hardware control
Cluster File Manager
Software Management System
CSM monitoring
Diagnostic probes
CSM backup
Querying CSM database
CSM problem determination and diagnostics
CSM hostname changes

5.3.1 Log file management

CSM logs to several different log files during installation and cluster management. These log files are available on the management server and managed nodes, and they help to determine the status of a command, or in troubleshooting a CSM issue.

Most of the CSM log files on the management server are located in the /var/log/csm directory. Table 5-1 lists the log files on the management server and their purpose.

Table 5-1. Log files on management server

Log File	Purpose
/var/log/csm/install.log	csm. core install log
/var/log/csm/installms.log	Output of installms command
/var/log/csm/installnode.log	Verbose output of installnode command
/var/log/csm/installnode.node.log.*	hmc_nodecond out of each node's installation
/var/log/csm/csmsetupyast.log	Output of csmsetupyast command
/var/log/csm/updatenode.log	Output of updatenode command
/var/log/csm/smsupdatenode.log	Output of smsupdatenode command
/var/log/csm/cfmerror.log	CFM error log
/var/log/csm/cfmchange.log	Output of CFM file updates
/var/log/csm/hw_logfile	HW control daemon status log
/var/log/csm/hmc[IP_address].log.*	HMC communication error messages
/var/log/csm/hmc[IP_address].java_trace	Tracing for openCIMOM calls to HMC
/var/log/csm/hmc_logfile.314	Tracing for libhmc_power.so
/var/log/csm/getadapters/getadapters.node.log.*	Output of getadapters command for each node
/var/ct/RMstart.log	Resource Manager status log
/var/ct/*.stderr	RSCT daemons standard errors
Linux log files in /var/logs	Refer to problem determination section 3.
Other Linux log files	Refer to problem determination section 3

Table 5-2 on page 253 lists log files on managed nodes and their purpose.

Table 5-2. Log files on managed nodes

Log file	Purpose
/var/log/csm/install.log	csm.core install log
/var/log/csm/updatekernel.log	Kernel update log running smsupdatenode

5.3.2 Node groups

Managed nodes can be grouped together by using the nodegrp command. Distributed commands can be issued against groups for common tasks , instead of performing them on each node. Default node groups created at install time are shown in Example 5-27.

Example 5-27. nodegrp command

 # nodegrp ManagedNodes AutoyastNodes ppcSLES81Nodes AllNodes SuSE82Nodes SLES72Nodes pSeriesNodes SLES81Nodes LinuxNodes PreManagedNodes xSeriesNodes EmptyGroup APCNodes RedHat9Nodes MinManagedNodes

Node groups are created with the nodegrp command:

 #nodegrp -a lpar1, lpar2 testgroup

This creates a group called test group which includes nodes lpar1 and lpar2. For more information, refer to the nodegrp man page.

Distributed commands such as dsh can be run on nodegroups:

 # dsh -w testgroup date

5.3.3 Hardware control

The CSM hardware control feature is used to remotely control HMC-attached pSeries servers. Remote nodes can be powered on, off, the power status can be queried, and you can open a remote console from the management server.

It is mandatory to have all pSeries servers connected to HMC, and to have the HMC communicate with the management server for the hardware control function. Figure 5-7 shows hardware control feature design for a simple CSM cluster.

Figure 5-7. pSeries CSM cluster with hardware control using HMC

graphics/05fig07.gif

Hardware control uses openCIMOM (public software) and conserver software to communicate to HMC to issue remote commands. The IBM.HWCTRLRM daemon subscribes and maintains state to HMC openCIMOM events during startup. Conserver is started at boot time on the management server and reads from a defined config file located at /etc/opt/conserver/conserver.cf.

The following hardware control commands are available on the management server:

r power Powers nodes on and off and queries power status

rconsole Opens a remote serial console for nodes

chrconsolecfg Removes , adds and re- writes conserver config file entries

rconsolerefresh Refreshes conserver on the management server

getadapters Obtains MAC addresses of remote nodes

lshwinfo Collects node information from Hardware Control points

systemid Stores userid and encrypted password required to access remote hardware

The rpower and rconsole commands are frequently used hardware control commands and we discuss them in detail here:

Remote power

Remote power commands access the CSM database for node attribute information.

PowerMethod Node attribute must be set to hmc to access pSeries nodes.

HardwareControlPoint is the hostname or IP address of the Hardware Management Console (HMC).

HardwareControlNodeId is the hostname or IP address of the managed node which is attached to the HMC over a serial link.

Other Node attributes such as HWModel, HWSerialNum, HWType are obtained automatically using lshwinfo .

Remote power configuration is outlined in 5.2.5, "Installing CSM on the management server" on page 228.

Remote console

The Remote console command communicates with the console server to open remote console to nodes using management VLAN and Serial connections. The HMC works as the remote console server listening for requests from the management server.

Only one read write console, but multiple read only consoles, can be opened to each node by using the rconsole command.

5.3.4 Configuration File Manager (CFM)

Configuration File Manager (CFM) is a CSM component to centralize and distribute files across management nodes in a management cluster. This is similar to file collections on IBM PSSP. Common files such as /etc/ hosts across the cluster are distributed from the management server using a push mechanism through root's crontab and/or event monitoring. CFM uses rdist to distribute files. Refer to 5.1.7, "CSM diagnostic probes" on page 220 for more information on hostname changes.

CFM uses /cfmroot as its main root directory, but copies all files to /etc/opt/csm/cfmroot with a symlink on the management server. File permissions are preserved while copying. Make sure that you have enough space in your root directory or create /cfmroot on a separate partition and symlink it from /etc/opt/csm/cfmroot.

Example 5-28 shows cfmupdatenode usage.

Example 5-28. cfmupdatenode usage

 Usage: cfmupdatenode [-h] [-v  -V]                 [-a  -N node_group[,node_group]  --file file ] [-b]                 [[-y]  [-c]] [-q [-s] ] [-r remote shell path]                 [-t timeout] [-M number of max children]                 [-d location for distfile] [-f filename] [[-n] node_list]      -a         Files are distributed to all nodes. This option cannot be                 used with the -N or host positional arguments.      -b         Backup. Preserve existing configuration file (on nodes) as                 "filename".OLD      -c         Perform binary comparison on files and transfer them                 if they differ.      -d distfile location                 cfmupdatenode will generate a distfile in the given (absolute)                 path and exit (without transferring files). This way the user                 can execute Rdist with the given distfile and any options                 desired.      -f  filename                 Only update the given filename. The filename must be the                 absolute path name of the file and the file must reside in                 the cfmroot directory      --file filename                 specifies a file that contains a list of nodes names. If the                 file name "-", then the list is read from stdin. The file                 can contain multiple lines and each line can have one or node                 names, separated by spaces.      -h         Writes the usage statement to standard out.      [-n]  node_list                 Specifies a list of node hostnames, IP addresses, or node                 ranges on which to run the command. (See the noderange man                 page for information on node ranges.)      -M  number of maximum children                 Set the number of nodes to update concurrently.                 (The default is 32.)      -N Node_group[,Node_group...]                 Specifies one or more node groups on which to run the command.      -q         Queries for out of date CFM files across the cluster.      -s         Reports which nodes are up to date by comparing last CFM                 update times. Must be called with the -q option.      -r remote  shell path.                 Path to remote shell. (The default is the DSH_REMOTE_CMD                 environment variable, or /usr/bin/rsh).      -t timeout                 Set the timeout period (in seconds) for waiting for response                 from a remote process. (The default is 900).      -v  V     Verbose mode.      -y         Younger mode. Does not update files younger than master copy.

Note

CFM can be set up prior to running the installnode command, and common files are distributed at install time while installing nodes.

At CSM install time, root's crontab is updated with an entry to run cfmupdatenode every day at midnight.This can changed to suit your requirements.

 #crontab -l grep cfmupdate 0 0 * * * /opt/csm/bin/cfmupdatenode -a 1>>/var/log/csm/cfmerror.log 2>>/var/log/csm/cfmerror.log

Some common features of CFM, along with usage examples, are described here.

In general, it is important to have a single /etc/hosts file across the management cluster. The CSM database and other commands do hostname resolution either using /etc/hosts or DNS. To keep a single copy of /etc/hosts, symlink /etc/hoststo /cfmroot/etc/hosts:
```
 # ln -s /etc/hosts /cfmroot/etc/hosts 
```
Run the cfmupdatenode command to copy the hosts file to all managed nodes defined in the CSM database:
```
 # cfmupdatenode -a 
```
If you want to have a file that is different on the management server and all managed nodes, copy or create the file to /cfmroot instead of symlinking it and then distributing it across to nodes. Files in /cfmroot are not distributed to the management server.
```
 # copy /etc/file.1 /cfmroot/etc/file.1 #touch /cfmroot/etc/file.1 #cfmupdatenode -a 
```
Files can be distributed to selected nodegroups only, instead of to all managed nodes, by creating the files with a ._groupname extension. To distribute ntp.conf file to, for example, nodegroup group1, create the file with ._group1 and when cfmupdatenode is run next time, it copies the ntp.conf file only to group1 nodes.
```
 # cp /etc/ntp.conf /cfmroot/etc/ntp.conf.group1 # cfmupdatenode -a 
```
CFM customizations can be done after distributing files using pre-install and post-install scripts. Create script files with pre- and post- extensions in cfmroot and when cfmupdatenode runs, pre- and post- scripts are run accordingly . Example 5-29 shows running a pre-install script to save the file and distribute the file, and then running the post script to reset permissions.

Example 5-29. Example of cfmupdatenode
```
 Create pre and post install scripts root@ms#cat >/cfmroot/etc/ntp.conf.pre #!/bin/sh cp /etc/ntp.conf /etc/ntp.conf.'date' ^D root@ms# cat >/cfmroot/etc/ntp.conf.post #!/bin/sh /sbin/service ntprestart ^D root@ms#chmod 755 /cfmroot/etc/ntp.conf.pre ntp.conf.post root@ms#cp /etc/ntp.conf /cfmroot/etc/ntp.conf._group1 root@ms#cfmupdatenode -a 
```
To have event monitoring monitor CFM file modifications and push the files whenever files are modified, start the condition and responses as below:
```
 #  startcondresp  CFMRootModTimeChanged CFMModResp 
```

Whenever a file in /cfmroot is modified, the changes are propagated to all managed nodes in the cluster.

Note

Use caution while enabling CFM event monitoring, as it can impact system performance.

User id management with CFM

CFM can be used to implement centralized user id management in your management domain. User ids and passwords are generated on the management server, stored under /cfmroot, and distributed to nodes as scheduled.

Copy the following files to /cfmroot to set up effective user id management:

/etc/passwd ----> /cfmroot/etc/password_useridmgmt.group
/etc/shadow------> /cfmroot/etc/shadow_useridmgmt.group
/etc/group---------> /cfmroot/etc/group_useridmgmt.group
/etc/hosts------> /cfmroot/etc/hosts_useridmgmt.group

Be aware that any id and password changes made on the nodes will be lost once centralized user id management is implemented. However, you can force users to change their passwords on the management server instead of on nodes. Set up scripts or tools to centralize user id creation and password change by group on the management server, and disable password command privileges on managed nodes.

CFM distributes files to managed nodes, but never deletes them. If a file needs to be deleted, delete it manually or with a dsh command from the management server. All CFM updates and errors are logged to files /var/log/csm/cfmchange.log and /var/log/csm/cfmerror.log.

For more information, refer to IBM Cluster Systems Management for Linux: Administration Guide , SA22-7873.

5.3.5 Software maintenance

The CSM Software Maintenance System (SMS) is used to install, query, update and delete Linux RPM packages on the management server and managed nodes. It is performed using the smsupdatenode command. Autoupdate open source software is a prerequisite for using SMS.

SMS uses either install mode to install new RPM packages, or update mode to update existing RPM packages on cluster nodes. Preview or test mode only tests the update without actually installing the packages.

The SMS directory structure includes /csminstall/Linux/InstallOSName/InstallOSVersion/InstallOSArchitecture/RPMS ../updates and ../install subdirectories to maintain all SMS RPMs, updates and install packages, respectively. Sample SMS directory structure on SuSE8.1 looks like the following:

/csminstall/Linux/SLES/8.1/ppc64/RPMS - contains all dependent RPM packages
/csminstall/Linux/SLES/8.1/ppc64/updates - -contains all RPM package updates
/csminstall/Linux/SLES/8.1/ppc64/install - contains all new RPM packages that are not installed with OS and need to be installed. All third party vendor software can also be placed in this subdirectory.

Copy the requisite RPM packages in the respective subdirectories from Install or Update CDs.

Note

SMS is only for maintaining RPM packages. OS patch CDs cannot be used for updating OS packages.

Follow these steps to copy the RPM packages from patch CDs to respective subdirectories, and then issue smsupdatenode :

Mount the Patch CD on /mnt/ cdrom .

 #cd /mnt/cdrom;cp 'find . -name *.rpm \ /csminstall/Linux/SLES/8.1/ppc64/updates '

```
 # smsupdatenode -v 
```

Example 5-30 shows usage of smsupdatenode.

Example 5-30. smsupdatenode usage

 Usage: smsupdatenode   [-h] [-a  -N node_group[,node_group]  --file file ]                 [-v  -V] [-t  --test] [-q  --query [-c  --common]]                 [--noinsdeps] [-r "remote shell path"]                 [-i  --install packagename[,packagename]]                 [-e  --erase {--deps  --nodeps} packagename[,packagename]]                 [-p  --packages packagename[,packagename]] [[-n] node_list] smsupdatenode   [--path pkg_path] --copy {attr=value...  hostname}        -a       Run Software Maintenance on all nodes.        --copy {attr=value...  hostname}                 Copy the distrobution CD-Roms corresponding to the given                 attributes or hostname to the correct /csminstall directory.                 If you give attr=value pairs they must come at the end of the                 command line. The valid attributes are:                         InstallDistributionName                         InstallDistributionVersion                         InstallPkgArchitecture                 If a hostname is given, the distribution CD-ROMs, and                 destination directory, are determined by the nodes                 attributes.        -e  --erase {--deps  --nodeps} packagename[,packagename]                 Removes the RPM packages specified after either the --deps                 or --nodeps option.             --deps                 Removes all packages dependent on the package targeted for                 removal.             --nodeps                 Only removes this package and leaves the dependent packages                 installed.        --file filename                 specifies a file that contains a list of nodes names. If the                 file name "-", then the list is read from stdin. The file                 can contain multiple lines and each line can have one or node                 names, separated by spaces.        -h       Writes the usage statement to standard out.        [-n] node_list                 Specifies a list of node hostnames,  IP addresses, or node                 ranges on which to run the command. (See the noderange man                 page for information on node ranges.)        -i  --install packagename[,packagename]                 Installs the given RPM packages.        -N Node_group[,Node_group...]                 Specifies one or more node groups on which to run the                 command.        --noinsdeps                 Do not install RPM dependencies.        -p  --packages packagename[,packagename]                 Only update the given packages. The user does not have to                 give the absolute path. It will be determined by looking under                 directory structure corresponding to the node.        --path pkg_path                 Specifies one or more directories, separated by colons, that                 contain copies of the distrobution CD-ROMs. The default on a                 Linux system is /mnt/cdrom and the default on an AIX system is                 /dev/cd0. This flag may only be used with the --copy flag.        -q  --query [-c  --common]                 Query all the RPMs installed on the target machines and report                 the RPMs installed that are not common to every node.            -c  --common                 Also report the common set of RPMs (installed on every target                 node).        -r "remote shell path"                 Path to use for remote commands. If this is not set, the default                 is determined by dsh.        -t  --test                 Report  what would be done by this command without making any                 changes to the target system(s)        -v  -V    Verbose mode.

SMS writes logs to /var/log/csm/smsupdatenode.log files.

Kernel packages are updated as normal RPM packages using SMS. Once upgraded, kernel cannot be backed out, so use caution while running smsupdatenode command with any kernel packages (kernel* prefix).

Also, make sure to run lilo to reload the boot loader if you upgrade kernel and wants to load the new kernel.

5.3.6 CSM Monitoring

CSM uses Reliable Scalable Cluster Technology Infrastructure (RSCT) for event monitoring. RSCT has been proven to provide highly available and scalable infrastructure in applications such as GPFS and PSSP.

CSM Monitoring uses a condition and response-based system to monitor system resources such as processes, memory, CPU and file systems. A condition can be a quantified value of a monitored resource attribute, and is based on a defined event expression. If an event expression is true, then an event is generated.

File system utilization(/var) is a resource to be monitored, and "condition" can be THE percent utilization on that resource. For example, /var >90% means if the /var file system increases above a 90% threshold value, then the event expression is true and an event is generated. To prevent flooding of generating events, a re-arm expression can be created. In this case, no event will be generated until the re-arm expression value is true.

A response can be one or more actions performed when an event is triggered for a defined condition. Considering the file system resource example, if we define that a response action is to increase the file system by 1 MB if /var reaches above 90% and to notify the system administrator, then after monitoring is started, whenever /var goes above 90%, a response action is performed automatically.

A set of predefined conditions and responses are available at CSM install. See the IBM Cluster Systems Management for Linux: Administration Guide SA22-7873, for more information.

Resource Monitoring and Control (RMC) and Resource Managers (RMs)

Resource Monitoring and Control (RMC) and Resource Managers (RM) are components of RSCT and are critical for monitoring.

RMC provides monitoring of system resources. The RMC daemon monitors resources and alerts RM.
RM can be defined as a process that maps resources and resource classes to commands for one or more resources. Resource classes contain resource attributes and descriptions and are available for query through the command line.

Table 5-3 lists available resource managers and their functions.

Table 5-3. Resource managers

Resource manager	Function
IBM.AuditRM	Audit Logging
IBM.ERRM	Event resource manager
IBM.HWCTRLRM	Hardware control
IBM.DMSRM	Domain node management
IBM.HostRM	Hostname management
IBM.SensorRM
IBM.FSRM	File System management

Table 5-4 lists predefined resource classes and can be obtained with the command lsrsrc .

Table 5-4. Predefined resource classes

Resource class	Description of attribute
IBM.Asociation	Persistent Resources
IBM.Auditlog	Event Audit logging
IBM.AuditlogTemplate	Template for audit logging
IBM.Condition	Pre-defined conditions
IBM.EthernetDevice	Primary Ethernet device
IBM.EventResponse	Pre-defined responses
IBM.Host	Management server host
IBM.FileSystem	File system attributes
IBM.Program
IBM.TokenRing	Token ring device
IBM.Sensor	CFM root and MinManaged
IBM.ManagedNode	Managed node
IBM.ManagementServer	Management server on nodes
IBM.NodeAuthenticate	Node authentication
IBM.PreManagedNode	PreManaged node classification
IBM.NodeGroup	Nodegroups
IBM.NetworkINterface	Defined network interface
IBM.DmsCtrl	Domain control
IBM.NodeHwCtrl	Node Hardware control point attributes
IBM.HwCtrlPoint	Hardware Control point (HMC)
IBM.HostPublic

lsrsrc -l Resource_class will list detailed attributes of each resource class. Check the man page of lsrsrc for more details.

Customizing event monitoring

As explained, custom conditions and responses can be created and custom monitoring can be activated on one or more nodes as follows :

Create a custom condition or event expression such as monitor file system space used on node lpar1 only in the management domain:
```
 #mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" -E "PercentTotUsed < 85" -n lpar1 -m d "File system space used 
```
wherein:

-r option is for resource class

-e is for creating an event expression

-E is for re-arm expression

-n is for specifying a node

-m is for management domain. Is a must if -n is used.

d is for description of condition
Create a custom response, such as e-mail root using a notification script to run Sunday, Monday and Tuesday.
```
 #mkresponse -n "E-mail root" -d " 1+2+3 -s "/usr/sbin/rsct/bin/notifyevent root" -e b "E-mail root any time" 
```
wherein,

-n is for action

-d days of week

-e type of event to run and b is for both event and re-arm event
Link the created file system space used condition and e-mail notification response
```
  #mkcondresp  -"File system space used" E-mail root any time" 
```

Start the created condition and responses linked above:

 #startcondresp "File system space used" "E-mail root any time"

List the condition and responses to check the status. State is listed as "Active" if started and "Not Active" if not started.
```
 #lscondresp 
```

Example 5-31 shows the output of lscondresp.

Example 5-31. lscondresp output

 #lscondresp Displaying condition with response information: Condition                      Response                         Node       State "NodeFullInstallComplete"      "RunCFMToNode"                   "mgmt_server" "Active" "NodeManaged"                  "GatherSSHHostKeys"              "mgmt_server" "Active" "UpdatenodeFailedStatusChange" "UpdatenodeFailedStatusResponse" "mgmt_server" "Active" "NodeChanged"                  "rconsoleUpdateResponse"         "mgmt_server" "Active" "NodeFullInstallComplete"      "removeArpEntries"               "mgmt_server" "Active"  "FileSystem Space Used"        "E-mail root any time"          "mgmt_server" "Active"

If any file system size exceeds 90% on lpar1, our newly created event triggers an an event as a response action by e-mailing root. Monitoring resumes once the file system is fixed back to 85%.

Multiple response actions can be defined to a single condition, and a single response can be assigned to multiple conditions. For Example 5-31, an action such as increasing the file system or deleting files older than 60 days from the file system to claim space could be other actions.

5.3.7 Diagnostic probes

CSM diagnostic probes help you diagnose system problems using programs called probes.The CSM command probemgr is useful in sending custom probes to determine problems; users write their own diagnostics scripts and call probemgr.

All predefined probes are located in the /opt/csm/diagnostics/probes directory, and probemgr can access the user-defined directory before reading the predefined probes called with a -D option. System problem diagnostics can be dependent on each other, and probes take a defined order while running. Example 5-32 shows usage of probemgr.

Example 5-32. Probemgr usage

 probemgr [-dh] [-c {01020127}] [-l {01234}]          [-e prb,prb,...] [-D dir] [-n prb]     -h        Display usage information     -d        Show the probes dependencies and the run order     -c        Highest level of exit code returned by a probe that the               probe manager permits before terminating.The defaule value               is 10                 0   - Success                 10  - Success with Attention Messages                 20  - Failure                 127 - Internal Error     -l        Indicates the message output level. The default is 3                 0 - Show probe manager messages, probe trace messages,                     probe explanation and suggested action messages, probe                     attention messages and probe error messages                 1 - Show probe trace messages, probe explanation and                     suggested action messages, probe attention messages                     and probe error messages                 2 - Show probe explanation and suggested action messages,                     probe attention messages and probe error messages                 3 - Show probe attention messages and probe error                     messages                 4 - Show probe error messages only     -e prb,.. List of probes to exclude when creating the probe dependency               tree. This also means that those probes will not be run     -D dir    Directory where user specified probes reside     -n prb    Run the specified probe

Table 5-5 lists the default pre-defined probes available and the probe dependencies.

Table 5-5. Probes and dependencies

Probe	Dependent probe
dsh	ssh-protocol
nfs	network
rmc	network
errm	rmc
fs-mounts	none
network-ifaces	network-enabled
dmsrm	rmc
network-routes	network-enabled network-ifaces
network-hostname	none
network-enabled	none
network-ping	network-enabled network-ifaces network-routes
network	network-enabled network-hostnamr network-ifaces network-routes network-ipforward network-ping
network-ipforward	none
ssh-protocol	none
rsh-protocol	none

All probes are run from the management server using the probemgr command. For detailed information on each probe, refer to probemgr man page.

5.3.8 Querying the CSM database

CSM stores all cluster information, such as nodes, attributes of nodes, and so on, in a database at a centralized location in the /var/ct directory. This database is accessed and modified using tools and commands, but not directly with a text editor.

Table 5-6 on page 268 lists commands you can use to access the CSM database.

Table 5-6. CSM database commands

Command	Purpose
lsnode	Lists nodes defined
definenode	Add/define nodes
chnode	Change node definitions
rmnode	Remove defined nodes
smsupdatenode	Software update the node
installnode	Install node
csmsetupyast	Setup config for the node to be installed
cfmupdatenode	Distribute files
rpower	Remote power
rconsole	Remote Console

5.3.9 Un-installing CSM

CSM is un-installed by using the uninstallms command on the management server. Not all packages are removed while running uninstallms. Table 5-7 identifies what is removed and what is not removed with uninstallms.

Table 5-7. Uninstallms features

Node definitions	Removed
Node group definitions	Removed
Predefined conditions	Removed
CSM packages	Removed
CSM log files	Removed
CSM Package prerequisites	Not removed
RSCT packages when rsct.basic is present	Not removed
RSCT packages when csm.client is present on mgmt. server	Not removed
RSCT packages when no rsct.basic installed	Removed
/cfmroot	Not removed
/csminstall	Not removed
/opt/csm	Removed
SSH public keys	Not removed

Clean up manually to remove all the packages and directories that are not removed with the uninstallms command to completely erase CSM. Refer to IBM Cluster Systems Management Guide for Linux: Planning and Installation Guide-Version 1.3.2 , SA22 7853, for detailed information

5.3.10 Distributed Command Execution Manager (DCEM)

DCEM is a Cluster Systems Management GUI interface used to run a variety of tasks on networked computers. Currently this is not available for pSeries machines.

5.3.11 Backing up CSM

Currently CSM backup and restore features are not available for pSeries Linux management server version 1.3.2. These will be available in the near future.

5.3.12 CSM problem determination and diagnostics

CSM logs detailed information in various log files on the management server and on managed nodes. These log files are useful in troubleshooting problems. In this section, we discuss some common and frequent problems which may be encountered while setting up and running CSM. For more detailed information and diagnostics, refer to the IBM Cluster Systems Management Guide for Linux: Administration Guide , SA22-7873.

Table 5-8 lists common CSM problems and their fixes.

Table 5-8. Common CSM problems and fixes

Problem	Fix
installms fails	Make sure to copy requisites to the reqs directory on the temporary CSM package folder
rpower reports <unknown> status for query	Management server event subscriptions either expired or hung on HMC. Refresh openCIMOM on HMC or reboot HMC.
rpower -a query reports "Hardware Control Socket Error"	The Java path might have changed on the management server. Verify that the Java search path is "/usr/lib/IBMJava2-1.3.1/jre/bin/java". Restart HWCTRL as follows: - stopsrc -s IBM.HWCTRLRM; - startsrc -s IBM.HWCTRLRM -e "HC_JAVA_PATH=/usr/lib/IBMJav a2-1.3.1"
Any other rpower or rconsole errors	Stop/start RMC daemons such as IBM.HWCTRLRM on the management server Stop/Start openCIMOM on the HMC by running "initCIMOM stop" and "initCIMOMstart" Reboot the HMC Start IBM.HWCTRLRM with trace hooks on to collect more data by running "startsrc -s IBM.HWCTRLRM -e "HC_JAVA_VERBOSE=/tmp/jni.txt" Run "rmcctl -A" and rmcctl -Z" to stop/remove and add/start RMC daemons Check IBM Cluster Systems Mgmt for Linux: Hardware Control Guide , SA22-7856
rconsole not coming up on the current window	Check the flags to make sure the `-t` option is specified
rconsole fails with "xinit failed to connect to console" error	Corrupted conserver.cf. Rewrite the config files and refresh conserver conserver chrconsolecfg -a rconsolerefresh -r
csmsetupyast fails with getadapters errors	Run getadapters command line to populate CSM node database Or use the chnode command to upload node network attributes such as InstallAdapterType, MAC address, and so on getadapaters may fail if run on multiple nodes, so check /var/log/csm/getadapters/getadapter. logs to check, and fix any errors with locks
installnode fails	Run lsnode and verify duplicate nodes entries are listed Verify log files to find/fix any errors Check for node attribute "Mode" to make sure it is set to PreManaged. If set to Installing and installnode fails, reset it to "PreManaged" with the chnode command Check network cables for proper connectivity Check /etc/dhcpd.conf and restart dhcpd Check /etc/inetd.conf for tftp and restart inetd Open the read only console and look for any packaging errors. Installnode waits for input if any package errors are encountered. Open the read-write console to interactively respond to install options and close the console.
updatenode fails	Check for hostname resolution errors
dsh fails	Check for SSH authentication errors
smsupdatenode fails	Check that RPM packages are copied to the right directories
Event monitoring fails	Check for proper network connectivity

Refer to IBM Cluster Systems Management for Linux: Hardware Control guide SA22-7856 for more information on hardware control and HMC connectivity and RMC issues.

< Day Day Up >

5.3.1 Log file management

Table 5-1. Log files on management server

Table 5-2. Log files on managed nodes

5.3.2 Node groups

Example 5-27. nodegrp command

5.3.3 Hardware control

Figure 5-7. pSeries CSM cluster with hardware control using HMC

Remote power

Remote console

5.3.4 Configuration File Manager (CFM)

Example 5-28. cfmupdatenode usage

Example 5-29. Example of cfmupdatenode

User id management with CFM

5.3.5 Software maintenance

Example 5-30. smsupdatenode usage

5.3.6 CSM Monitoring

Resource Monitoring and Control (RMC) and Resource Managers (RMs)

Table 5-3. Resource managers

Table 5-4. Predefined resource classes

Customizing event monitoring

Example 5-31. lscondresp output

5.3.7 Diagnostic probes

Example 5-32. Probemgr usage

Table 5-5. Probes and dependencies

5.3.8 Querying the CSM database

Table 5-6. CSM database commands

5.3.9 Un-installing CSM

Table 5-7. Uninstallms features

5.3.10 Distributed Command Execution Manager (DCEM)

5.3.11 Backing up CSM

5.3.12 CSM problem determination and diagnostics

Table 5-8. Common CSM problems and fixes