|< Day Day Up >|| |
Through the node database, CSM allows nodes to be easily added and removed from the IBM Cluster 1350. Here we outline the steps required when the nodes that make up your cluster need attention.
In the unlikely event of hardware failure on a node, components will obviously need to be replaced. According to the instructions you received with your cluster, report the fault to IBM. An IBM Customer Engineer should be despatched on-site to replace the failing components. If possible, you should ensure any important data is backed up before the engineer arrives. Here we provide some examples of situations and CSM administrative tasks that should be performed.
If a hard disk has failed, the replacement will be entirely blank and therefore require a full node re-installation, including the Linux operating system and CSM.
Since only the disk was replaced, all the information in the CSM node database is still correct. The node may be installed by running the following commands:
# csmsetupks -n <node_hostname> -x # installnode <node_hostname>
The installnode command will reboot the node and then start a network install. More information on installing nodes may be found in 5.3, "CSM installation on compute and storage nodes" on page 131.
For each node defined in the CSM database, CSM stores the service processor ID and MAC address of the first on-board Ethernet card. If the node's mainboard was replaced (or the entire node swapped-out), both these values may have changed and must be updated in the CSM database.
Automatic MAC address collection for the new node requires a reboot. To avoid problems with Linux, it is a good idea to leave the node switched off until the MAC has been collected. (Rpower will boot the nodes and collect this information on its own)
The node's service processor ID is contained in the HWControlNodeId attribute in the CSM node database; the service processor ID on the node is commonly set to match the node host name. When the node is replaced, the service processor name may not be set correctly; by default, it matches the node's serial number.
To determine whether the service processor ID is set correctly, use the lshwinfo command.
Example 6-1 shows a listing of all the nodes connected to RSA mgrsa1. One of the nodes is listed as no_hostname and has SN#5502654 in the NodeId column; this is the node that was replaced. If your replacement node was swapped from elsewhere, it may be that its NodeId has been changed from the default — you may need to inspect the lshwinfo output very closely to spot this. Alternatively, it may be that the replacement node has the correct NodeId already set, in which case you do not need to reset it.
Example 6-1: lshwinfo showing a replaced node on an RSA
[root@master /]# lsnode -a HWControlPoint node1 node1: mgrsa1.cluster.com # lshwinfo -c mgrsa1.cluster.com -p xseries #Hostname::PowerMethod::HWControlPoint::NodeId::LParId::HWType::HWModel::HWSeri alNum mgrsa1.cluster.com::xseries::mgrsa1.cluster.com::mgrsa1:::::::: no_hostname::xseries::mgrsa1.cluster.com::SN#5502654:::::::: node2.cluster.com::xseries::mgrsa1.cluster.com::node2:::::::: node3.cluster.com::xseries::mgrsa1.cluster.com::node3:::::::: [root@master /]#
Although it is not absolutely required, we recommend that you set the service processor ID to match the node host name. The procedure for doing this can be found in 5.2.8, "System Management hardware configuration" on page 122. To determine which RSA to login to, use a command similar to:
# lsnode -a HWControlPoint node1
Once the service processor has restarted you should see the no_hostname disappear from the lshwinfo output, as in Example 6-2.
Example 6-2: lshwinfo output after resetting the NodeID
[root@master /]# lsnode -a HWControlPoint node1 node1: mgrsa1.cluster.com # lshwinfo -c mgrsa1.cluster.com -p xseries #Hostname::PowerMethod::HWControlPoint::NodeId::LParId::HWType::HWModel::HWSeri alNum mgrsa1.cluster.com::xseries::mgrsa1.cluster.com::mgrsa1:::::::: node1.cluster.com::xseries::mgrsa1.cluster.com::node1:::::::: node2.cluster.com::xseries::mgrsa1.cluster.com::node2:::::::: node3.cluster.com::xseries::mgrsa1.cluster.com::node3:::::::: [root@master /]#
If you do not want to change the service processor name, change the HardwareControlID attribute for the node so that it matches the NodeID in the output of lshwinfo. For example:
# chnode node1 HardwareControlNodeId=SN#5502654
If you have changed the dial-in user or password on the service processor from the default (not recommended), see Example 6-3 for an example of running systemid to save the user information that will allow CSM remote access to the node.
Example 6-3: Using systemid to store service processor login information
[root@master /]# systemid node1 USERID Password: Verifying, please re-enter password: systemid: Entry created. [root@master /]#
Confirm the hardware control is functioning correctly by querying the power status of the node, as in Example 6-4. Verify rpower returns the correct power status without error.
Example 6-4: Demonstrating correct power control function
[root@master /]# rpower -n node1 query node1.cluster.com off [root@master /]#
Now that the node hardware control has been set up correctly, CSM can automatically capture the MAC address of the first Ethernet card (InstallAdapterMacaddr) using the terminal server. First, we remove the old MAC from the node database using the chnode command, then use the csmsetupks command to capture the new MAC. Example 6-5 shows chnode setting the MAC address to null and csmsetupks booting the node with the special kernel and capturing the MAC address.
Example 6-5: Using csmsetupks for MAC address collection
[root@master /]# chnode node1 InstallAdapterMacaddr= [root@master /]# csmsetupks -n node1 -x Setting up PXE. Collecting MAC Addresses for nodes node1.cluster.com Attempting to use dsh to collect MAC addresses. Cannot run dsh on node "node1.cluster.com" Generating /etc/dhcpd.conf file for MAC address collection. Setting up MAC Address Collection. Creating the Ramdisk for MAC Address Collection. 8602+0 records in 8602+0 records out mke2fs 1.27 (8-Mar-03) 8602+0 records in 8602+0 records out 66.6% Running getmacs for nodes node1.cluster.com node1.cluster.com: 00:09:6B:63:27:41 Setting up KickStart. 10684 blocks Adding nodes to /etc/dhcpd.conf file for KickStart install: node1.cluster.com. [root@master /]#
We can verify the MAC address was stored correctly in the node database by issuing the lsnode command, as in Example 6-6.
Example 6-6: Using lsnode to verify MAC address capture
[root@master /]# lsnode -a InstallAdapterMacaddr node1 node1: 00:09:6B:63:27:41 [root@master /]#
If the entire node was replaced, re-install the node using installnode, as for hard disk replacement. If the hard disks in the node still have Linux installed (they were swapped over or just not replaced), the node should only need a re-boot to come up into Linux correctly:
# rpower -n node1 reboot
Before you add new nodes into your CSM cluster you must decide which method you will use to add them. CSM supports adding the nodes via a fresh install (KickStart install) or by simply installing CSM onto a previously installed node (CSM only installation).
With a KickStart installation, the new node(s) will be network-installed as they were when you first obtained the cluster, via network boot and the RedHat KickStart install system. This will erase any data that may have been on the nodes before you installed this KickStart.
CSM only installation adds nodes to the cluster without a re-install; CSM uses ssh or rsh to log on to the node and install and configure the CSM components. This will not over-write or otherwise effect any existing OS on the nodes, though the nodes must be running a level of Linux supported by the target CSM release.
KickStart installation is the preferred method of adding nodes to a IBM Cluster 1350 or CSM cluster. It will leave the node(s) in a known state; the configuration differences that can occur when performing CSM only installs can cause significant problems.
Although we discuss adding non xSeries nodes to a CSM cluster, this may not be supported, especially with a Cluster 1350. If you are in any doubt, you should contact the IBM Help Center for support and licensing guidance.
When adding new xSeries nodes to your Cluster 1350, you must set up the BIOS as outlined in 5.3.1, "BIOS settings for compute and storage nodes" on page 132.
This entire procedure has much in common with the original installation. You may want to refer to 5.3, "CSM installation on compute and storage nodes" on page 131 for more information.
Because you are adding new nodes, you must allocate new IP addresses for them, perhaps on both cluster and IPC networks. You may also have added extra management hardware, which requires an IP address. You should allocate IP addresses before you start and ensure they are added in both /etc/hosts on the management node and your name server, as explained in 5.2.6, "Domain Name System (DNS) configuration" on page 114.
If you have added any additional hardware to the cluster to support the new nodes, you must configure it now. This includes RSAs (See 5.2.8, "System Management hardware configuration" on page 122), terminal servers (5.2.7, "Install Terminal Server" on page 114) and Ethernet switches.
When adding new xSeries nodes to your Cluster 1350, you must set up the BIOS and the service processor name. BIOS setup is outlined in 5.3.1, "BIOS settings for compute and storage nodes" on page 132 and service processor setup is explained in 5.2.8, "System Management hardware configuration" on page 122. If you have modified the service processor dial-in settings (we do not recommend this), you must also run systemid , as in Example 6-3 on page 152.
Whether you will be performing a KickStart or CSM only installation, you must first define the nodes in the node database.
Before you start, verify there are no nodes in the PreManagedNodes group by running nodegrp -p PreManagedNodes. The CSM commands we will be using include a -P switch to operate on the currently PreManaged nodes. If the group is currently empty, you can use the -P switch to refer only to the newly defined nodes. If there are existing nodes in the PreManagedNodes group, you may need to substitute -P in the commands below for -n <nodelist>, where <nodelist> is a list composed by the host names of the nodes to be added, for example, node1,node2,node3, or define a group and use -N.
We will be using lshwinfo to create a hostmap file, as described in "First method: Creating a host name mapping (hostmap) file" on page 134. This method will only work with IBM xSeries nodes; if you are adding different node-types, you will need to define the nodes using a nodedef file (see "Second method: Creating a node definition (nodedef) file" on page 135) or with the definenode command line.
lshwinfo will list all the nodes that are attached to one or more RSAs. If you supply it with the host names of any RSAs that have new nodes attached, you can create a hostmap suitable for passing to definenode. If there are existing nodes on the RSAs as well as new nodes, you may want to filter the output of lshwinfo for no_hostname:
# lshwinfo -c mgrsa1,mgrsa2 -p xseries | grep no_hostname > newnodes.map
Define the nodes into CSM using definenode. If you are adding a small number of nodes and are confident with the syntax of definenode, you can use the command line. The InstallMethod attribute should be set to kickstart for KickStart installs, but omitted for CSM only installation. For example:
# definenode -M newnode.map -C mgesp1:1:0:12 ConsoleMethod=esp \ PowerMethod=xseries InstallMethod=kickstart
If you are unsure of using definenode with hostmap files or you want to verify your settings, consider adding the -s switch. This will generate a nodedef file on stdout which can be redirected to a file and inspected and edited before passing it back to definenode -f as follows:
# definenode -s -M newnode.map -C mgesp1:1:0:12 ConsoleMethod=esp \ PowerMethod=xseries InstallMethod=kickstart > nodedef.new # vi nodedef.new # definenode -f nodedef.new
If your new nodes have a console connection, csmsetupks can now be used to collect the MAC addresses. If you have added a small number of nodes, you can run the following:
# csmsetupks -P -x
During the course of our research for this redbook, we found information that might be useful, but that we were not able to verify in our test lab (due to power restrictions). If you have more than 20 nodes in your cluster this next bit of information might prove helpful.
If your nodes do not have a console connection, you will need to collect their MAC addresses manually. Boot the nodes one at a time and watch the syslog output. When the nodes send a DHCPDISCOVER packet, the dhcp server will log this to syslog and you can add it to the node database, as in Example 6-7.
Example 6-7: Using syslog to capture MAC addresses
[root@master /]# rpower -n node1 reset [root@master /]# tail -f /var/log/messages July 15 10:37:36 master dhcpd: DHCPDISCOVER from 00:09:6B:63:27:41 via eth0 July 15 10:37:36 master dhcpd: no free leases on subnet 172.20.0.0 ^C [root@master /]# chnode node1 InstallAdapterMacaddr=00:09:6B:63:27:41 [root@master /]#
Once you have collected the MAC addresses of all the nodes, you must run csmsetupks as follows. If the MAC addresses are not already known, the nodes will not be rebooted:
# csmsetupks -P -x
Now that KickStart has been set up, the nodes can be installed. To install all PreManaged (newly added) nodes, run:
# installnode -P
As with csmsetupks, installnode should not be run for groups of more than 20 nodes at a time. Section 5.3.7, "Running the installnode script" on page 143 and onward contains information on verifying and troubleshooting node installation.
The process of adding new nodes to a cluster using the CSM only installation process assumes, of course, that the operating system has already been installed. The first step is to gather and define all hardware and network information, as described in "Hardware and network setup" on page 155. Secondly, the node should be made known to CSM and defined into the CSM database. For this, refer to "Defining new nodes into the cluster" on page 155.
Before we jump into the CSM installation procedure itself, we need to perform additional pre-installation steps. When performing a CSM only install, dsh is used to send commands to the nodes for the actual installation. The underlying remote shell (ssh or rsh) must be able to access the node. You can test this by running:
# dsh -n <hostname> date
In the above example, dsh must display the time and date from the remote node without a password prompt. If you are asked for a password, you must configure your remote shell.
Note that the first time you connect to a new host using ssh, you will be prompted to confirm the authenticity of the host; this is normal and will only occur once, as in Example 6-8.
Example 6-8: Using ssh to a new node for the first time
[root@master /]# dsh -n node5.cluster.com date The authenticity of host 'node5.cluster.com (172.20.3.5)' can't be established. RSA key fingerprint is 77:4a:6c:22:17:88:e2:28:53:db:66:42:48:ec:33:6d. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'node5.cluster.com,172.20.3.5' (RSA) to the list of known hosts. email@example.com's password: Thu July 24 22:21:54 BST 2003 [root@master /]# dsh -n node5.cluster.com date Thu July 24 22:21:55 BST 2003 [root@master /]#
The DSH_REMOTE_CMD environment variable determines which remote shell dsh uses. Determine which remote shell is in use by running:
# echo $DSH_REMOTE_CMD
If DSH_REMOTE_CMD is un-set, the default is to use ssh.
To configure ssh, add the contents of /root/.ssh/id_rsa.pub from the master to /root/.ssh/authorized_keys on the target node. You can copy and paste the file, but be careful; it's one very long line that must not be broken up.
It may be easier to run the following command. This command will prompt you for a password:
# cat /root/.ssh/id_rsa.pub | ssh <hostname> 'mkdir -p /root/.ssh;cat >> /root/.ssh/authorized_keys'
To configure rsh, add the fully qualified cluster internal host name of the master node to /root/.rhosts on the target node. Note that the permissions on a .rhosts file must allow only root to read the file. This is all demonstrated in Example 6-9.
Example 6-9: Configuring remote rsh access
[root@master /]# cat >> /root/.rhosts master.cluster.com ^D [root@master /]# chmod 600 /root/.rhosts [root@master /]# ls -l /root/.rhosts -rw------- 1 root root 23 July 15 14:23 .rhosts [root@master /]#
Once dsh -n to the node works, install CSM on the node using the updatenode command, as shown in Example 6-10.
Example 6-10: Using updatenode to perform a CSM only installation
[root@master /]# lsnode -a Mode node1 node1:PreManaged [root@master /]# updatenode node1 node1.cluster.com: Setting Management Server to master.cluster.com. node1.cluster.com: Creating the Autoupdate provides database. This could take a few minutes. node1.cluster.com: Updating RPMs node1.cluster.com: Node Install - Successful. [root@master /]# lsnode -a Mode node1 node1:Managed [root@master /]# lsnode -p node1 node1:1
As you can see in Example 6-10, when the node has been installed successfully, its mode should change from PreManaged to Managed.
Removing nodes from the CSM cluster is accomplished using the rmnode command. Single nodes may be deleted by listing them on the command line or groups of nodes may be deleted using the -N switch. Normally, rmnode simply de-configures CSM on the target node(s) and then removes them from the CSM nodes database. The -u switch will also uninstall the CSM RPMs (but not any non-IBM pre-requisites), as in Example 6-11.
Example 6-11: Uninstalling nodes with rmnode -u
[root@master /]# rmnode -u node3 node3.cluster.com: Uninstalling CSM ... node3.cluster.com: rmnode.client: Cannot erase rpm package csm.director.agent. Did not uninstall the Open Source prerequisites. The remote shell (SSH or RSH) have not been unconfigured on the Managed Nodes. To ensure security, you should unconfigure the remote shell on the nodes manually. [root@master /]#
Do not worry about the csm.director.agent error message; it is probably not installed.
If GPFS is installed on the nodes, uninstallation of SRC and RSCT RPMs will fail. These should be removed manually when GPFS is uninstalled.
If you need to change any host names within the cluster, you must update the CSM nodes database and then the CSM configuration on the nodes.
When changing host names, you must ensure that the management node still has remote shell (ssh or rsh) access to the nodes. You may need to edit /root/.rhosts on all the nodes and/or edit the host names in /root/.ssh/known_hosts on the management node. Test dsh -n before attempting to run the updatenode command.
Before you change any host names on the cluster, you should update /etc/hosts and the name server with the new names and/or addresses.
If you have changed the host name of a single node, run the following sequence of commands on the management node:
# chnode <oldhostname> Hostname=<newhostname> # updatenode -k <newhostname>
The updatenode command will re-sync the CSM information on that node.
If you have changed the host name of the management node, then run the following sequence of commands on the management node:
# chnode -a ManagementServer=<newhostname> # updatenode -a -k
Here, the updatenode command will re-configure all the nodes in the cluster with the new management node hostname.
|< Day Day Up >|| |