5.3 CSM installation on compute and storage nodes

 < Day Day Up > 



5.3 CSM installation on compute and storage nodes

Once the management server is installed, you can perform the installation of the compute and storage nodes. The storage node installation is, for the most part, the same as installing compute nodes. See 5.4, "Special considerations for storage node installation" on page 146 for some additional steps that are required when installing a storage node.

Compute and storage node installations can be done using one of two methods, depending upon their current state:

  • The first method, which will be discussed here, is the full installation. The KickStart install makes no assumptions about the state of the cluster node and performs a complete Linux and CSM installation. This is the appropriate method for a new compute or storage node or if you want to completely reinstall an existing node.

  • The second method is the CSM only installation. In this case, the node already has Linux installed and we just want to install and configure CSM. An overview of this procedure can be found in Chapter 6, "Cluster management with CSM" on page 149.

The following is an overview of installing CSM on compute and storage nodes using the full installation method:

  1. Configure the BIOS on the destination nodes.

  2. Use the hostmap and nodedef files, and command line syntax to assemble all the node attributes.

  3. Populate the CSM database with node attributes using the definenode command.

  4. Verify the power method connectivity to all nodes

  5. Optionally, customize the KickStart and firstboot templates

  6. Prepare the management node to support installs by running the csmsetupks command.

  7. Perform the compute and storage node installs using the installnode command.

  8. Verify the success of the installation.

The following sections give the details of each one of the above steps.

5.3.1 BIOS settings for compute and storage nodes

You must set the boot order for each node in your cluster via the BIOS menu.

While the node is booting, press the F1 key when you are prompted to enter the BIOS menu. The boot order should be set as follows:

  1. Diskette

  2. CD-ROM

  3. Network

  4. Hard disk 0

The "boot failed count" option should also be disabled. You also need to disable "boot sector virus detection" to prevent false virus indications during operating system installation.

5.3.2 Preparing to run the definenode command

The first step in installing nodes into the cluster is to define the nodes to the management node using the definenode command. This command populates the CSM database with the information required to install and manage the compute and storage nodes.

This information is in the form of node attributes that describe the nodes. These attributes are supplied to the definenode command in one of three ways:

  • The first method is to use a host name mapping (hostmap) file. This is the recommended alternative if you have a large number of nodes to install.

  • The second method is to use a node definition (nodedef) file. We recommend this option for smaller clusters.

  • The third method is using the command line itself to enter the data. This method would be most useful for manipulating individual nodes and will be described in 5.3.3, "Running the definenode script" on page 136.

Additionally, the hostmap file may be used to generate a nodedef file. This is the procedure we will present here.

Several node attributes will have valid default values. However, those in Table 5-2 will need to be defined using hostmap, nodedef, and/or the definenode command, based on your specific cluster implementation.

Table 5-2: Node attribute definitions

Node attribute

Definition

Hostname

The name of the node being defined.

HWControlPoint

Name of the device that controls the power for this node (that is, rsa001.cluster.com).

HWControlNodeId

The host name of the node where the RSA card is installed in (that is, node1).

PowerMethod

The type of power control being used (that is, xseries).

ConsoleMethod

How to reach the serial console of this node (currently esp or mrv).

ConsoleServerName

Name of the device that provides console connectivity (that is, esp001.cluster.com).

ConsoleServerNumber

For daisy-chained terminal server devices, which device in the chain (starting at 1).

ConsolePortNum

The port on the terminal server that connects to this node (in hex for ESP).

InstallMethod

Sets KickStart for this procedure.

First method: Creating a host name mapping (hostmap) file

Use the lshwinfo command to create a hostmap file. For example, to create a hostmap file that lists the nodes associated with hardware control point rsa001.cluster.com and power method xseries to a file called /tmp/hostmap.txt, perform the following steps:

  1. Issue the lshwinfo command as follows:

     # lshwinfo -c mgrsa1.cluster.com -p xseries -o /tmp/hostmap.txt 

    The /tmp/hostmap.txt output file would look something like Example 5-23.

    Example 5-23: lshwinfo output

    start example
     [root@master root]# cat /tmp/hostmap.txt # Hostname::PowerMethod::HWControlPoint::NodeId::LParId::HWType::HWModel::HWSerialNum no_hostname::xseries::mgrsa1.cluster.com::node1:::::::: no_hostname::xseries::mgrsa1.cluster.com::node2:::::::: no_hostname::xseries::mgrsa1.cluster.com::node3:::::::: no_hostname::xseries::mgrsa1.cluster.com::node4:::::::: [root@master root]# 
    end example

    Note 

    If you see a line where the fourth column in the lshwinfo output is a serial number instead of a host name, the ASM service processor ID has not been set correctly. See 5.2.8, "System Management hardware configuration" on page 122 for instructions on how to repair this. Then you can rerun the lshwinfo command to build a corrected hostmap file.

  2. Edit the /tmp/hostmap.txt file to replace the no_hostname references with the correct host names for each node.

As you can see in Example 5-24, the following attributes are defined in the hostmap file:

  • Hostname

  • HWControlPoint

  • HWControleNodeId

  • PowerMethod

Example 5-24: edited lshwinfo output

start example
 # Hostname::PowerMethod::HWControlPoint::NodeId::LParId::HWType::HWModel::HWSerialNum node1.cluster.com::xseries::mgrsa1.cluster.com::node1:::::::: node2.cluster.com::xseries::mgrsa1.cluster.com::node2:::::::: node3.cluster.com::xseries::mgrsa1.cluster.com::node3:::::::: node4.cluster.com::xseries::mgrsa1.cluster.com::node4:::::::: 
end example

However, the following node attributes have not been defined:

  • ConsoleMethod

  • ConsoleServerName

  • ConsoleServerNumber

  • ConsolePortNum

  • InstallMethod

These attributes will need to be later specified on the definenode command line. Optionally, you can use the hostmap file to create a nodedef file, where these values can be manually input.

Second method: Creating a node definition (nodedef) file

Tip 

The second method is also the recommended method of defining nodes.

The nodedef file allows you more fine-grained control of the node attributes. It is best to work from the sample provided with the CSM software, which is located in /opt/csm/install/nodedef.sample. You can also generate a nodedef file from a hostmap file using the following command:

 # definenode -M hostmap_file -s > nodedef_file 

where hostmap_file and nodedef_file represents the name of the hostmap and nodedef files. For example:

 # definenode -M /tmp/hostmap.txt -s > /tmp/nodedef.txt 

Note that attributes that are common across all nodes can be placed the default stanza in your nodedef file. If an attribute appears in both the default stanza and the node definition itself, the node definition will take precedence.

Example 5-25 is the nodedef file used in our lab environment.

Example 5-25: Sample nodedef file

start example
 default:      ManagementServer=master.cluster.com      HWControlPoint=mgrsa1.cluster.com      PowerMethod=xseries      ConsoleSerialDevice=ttyS0      ConsoleMethod=esp      ConsoleServerName=mgesp1.cluster.com      ConsoleServerNumber=1      InstallOSName=Linux      InstallCSMVersion=1.3.1      InstallMethod=kickstart      InstallDistributionVersion=7.3 node1.cluster.com:      HWControlNodeId=node1      ConsolePortNum=0 node2.cluster.com:      HWControlNodeId=node2      ConsolePortNum=1 node3.cluster.com:      HWControlNodeId=node3      ConsolePortNum=2 node4.cluster.com:      HWControlNodeId=node4      ConsolePortNum=3 
end example

By using a nodedef file, whether built from scratch or generated from a hostmap file, you can supply all of the necessary node attributes. This simplifies the usage of the definenode command considerably and provides a documented definition of your node configuration. The nodedef file can then be preserved and used in the future to reinstall CSM on the cluster using the same configuration.

5.3.3 Running the definenode script

Using some or all of the hostmap and nodedef options discussed above, you should now be ready to run the definenode command, which populates the CSM database with the node information.

The following three sections describe functionally equivalent solutions to define nodes in a cluster based on the examples we used in this chapter.

First Method: Defining nodes using the hostmap file

When creating the hostmap file, the following node attributes have not been defined and need to be specified with the definenode command:

  • ConsoleMethod

  • ConsoleServerName

  • ConsoleServerNumber

  • ConsolePortNum

  • InstallMethod

In order to define the nodes using the hostmap file, issue the following command:

 # definenode -M /tmp/hostmap.txt -C mgesp1:1:1:4 InstallMethod=kickstart ConsoleMethod=esp 

Second Method: Defining nodes using the nodedef file

Once you have completed all the required parameters in the nodedef file, you can defined the nodes using the nodedef file by typing:

 # definenode -f /tmp/nodedef.txt 

Third Method: Using the definenode CLI options

If you are working with a single node or a small group of nodes, you may use the definenode command line to supply all of the node attributes. The necessary command line options are as follows:

  • The first node (host name or IP address) is defined with the -n option.

  • The number of adjoining nodes to define is specified with the -c option.

  • The list of HWControlPoints is set with the -H option. The value should be entered as HwControlPoint:Count, where HWControlPoint is the device that controls power for the nodes, and Count specifies how many nodes are controlled by that device. Specify multiple HWControlPoints by separating them with commas.

  • The list of ConsoleServers is defined by using the -C option. The value should be entered as ConsoleServerName:ConsoleServerNumber:ConsolePortNum:Count, where Count is the number of nodes accessed by the specified ConsoleServer. See Table 5-2 on page 133 for definitions of the remaining fields.

  • Define the PowerMethod directly on the command line by adding PowerMethod=xseries to the end of your definenode command.

  • Likewise, define the InstallMethod attribute the same way by adding InstallMethod=kickstart.

Important: 

Using the command line options, you can only define contiguous nodes at one time. If you need to define nodes from node1 to node010 and node012 to node014, you have to run definenode twice or use a nodedef or hostmap file.

For example, in order to define node1, node2, node3, and node4 manually using the command line, run the definenode command as follows:

 # definenode -n node1 -c 4 -H mgrsa1:4 -C mgesp1:1:0:4 PowerMethod=xseries InstallMethod=kickstart 

The output will be similar to Example 5-26. This output is similar when running definenode using the nodedef and hostmap definition files.

Example 5-26: definenode output

start example
 Defining CSM Nodes: Defining Node "node1.cluster.com"("172.20.3.1") Defining Node "node2.cluster.com"("172.20.3.2") Defining Node "node3.cluster.com"("172.20.3.3") Defining Node "node4.cluster.com"("172.20.3.4") 
end example

Upon successful completion of the definenode command, each new node should be placed into the PreManaged mode. This means the node has been defined to CSM but has not been installed and activated. All nodes in the PreManaged mode dynamically become members of the PreManagedNodes node group.

You can verify that all your nodes are set in the PreManagedNodes group using the lsnode -N PreManagedNodes command. Likewise, lsnode -N ManagedNodes will show all nodes that have completed CSM installation. The output of the command is shown in Example 5-27.

Example 5-27: lsnode -N PreManagedNodes

start example
 [root@master /]# lsnode -N PreManagedNodes node4.cluster.com node3.cluster.com node2.cluster.com node1.cluster.com [root@master /]# 
end example

Troubleshooting definenode

If definenode hangs or fails, it is most often due to host name resolution problems, such as:

  • You entered a node host name that does not exist.

  • You entered an incorrect HWControlPoint host name.

  • You entered an incorrect ConsoleServerName.

In most cases, if you think you entered a wrong definition for the nodes, you can remove them from the PreManagedList with the rmnode -P nodename command and try again.

5.3.4 Verify that rpower works

Before continuing, be sure that the remote power command is functioning properly by typing:

 # rpower -a query 

The results should be something like Example 5-28.

Example 5-28: rpower command output

start example
 [root@master root]# rpower -a query node1.cluster.com on node2.cluster.com on node3.cluster.com off node4.cluster.com on [root@master root]# 
end example

Do not proceed until this command is working properly; otherwise, the CSM installation scripts used later in the install procedure will fail when they are unable to power cycle the nodes.

5.3.5 Customize the KickStart template (optional)

The next phase of the install uses the /opt/csm/install/kscfg.tmpl.RedHat.7.3 file as a template to generate a KickStart file for each node. You can easily modify it to add packages or modify options. Before changing this file, be sure to make a backup of the original file.

Modifying the KickStart template allows you to customize the following attributes for your compute and storage nodes:

  • Partition configuration

  • Root password

  • Predefined users

  • Services and their configuration

  • Log message destination (that is, to the local node or to the management node)

For example, we would recommend adding the stanza in Example 5-29 to your KickStart file after the Setup services section to automate NTP installation and configuration on all cluster nodes.

Example 5-29: Automate NTP installation using the KickStart template

start example
 # # Set up an ntpd.conf file # echo "server $MGMTSVR_HOSTNAME driftfile /etc/ntp/drift" > /etc/ntp.conf # # Set up a step-tickers file # echo "$MGMTSVR_IP" > /etc/ntp/step-tickers # # Turn on NTP service # chkconfig --level 345 ntpd on # 
end example

5.3.6 Running the csmsetupks script

At this point, all nodes should have been defined in the CSM database and are on PreMangedNode status. Any customizations to the KickStart template have been completed. The next step in the installation process is to run the csmsetupks script, which performs the following tasks:

  • Copies the RPM from the Red Hat Linux media or an alternate directory.

  • Starts DHCP, NFS, and TFTP daemons if they are not already running.

  • Creates a /etc/dhcpd.conf file for the MAC address collection and for full installation of the nodes.

  • Creates the files that are necessary for network boot in /tftpboot.

  • Gets the MAC addresses of all specified nodes.

  • Generates a KickStart configuration file for each node.

  • Creates a firstboot script for each node.

The csmsetupks command accepts the following attribute values:

  • Netmask=xxx.xxx.xxx.xxx

    The network mask for the cluster VLAN, which is used to communicate between the compute nodes and the management node.

  • Gateway=xxx.xxx.xxx.xxx

    The IP address the compute nodes should use as their default gateway.

  • Nameservers=xxx.xxx.xxx.xxx,yyy.yyy.yyy.yyy,

    Specifies the IP addresses of the DNS name servers the compute nodes should use.

Attention: 

The csmsetupks script uses the Netmask, Gateway, and Nameservers attributes to build the /etc/dhcpd.conf configuration file. If these attributes are not specified on the csmsetupks command line, the default gateway, netmask, and nameserver for the management node will be used. If your management node has a network connection to your internal network (Public VLAN), it is likely that these defaults are not appropriate for the compute nodes. This may cause the csmsetupks script to fail or the nodes to hang during installation.

The csmsetupks command accepts the following command-line options:

-n

Used to specify a node instead of all PreManagedNodes that have an InstallMethod of kickstart (default).

-p

Used to specify the directory that contains the Red Hat Linux RPM files.

-x

In case you do not need to copy the Red Hat RPM files.

-v

Verbose mode.

In order to prepare the cluster nodes for a KickStart installation, type the following command:

 # csmsetupks -P Netmask=255.255.0.0 Gateway=172.20.0.1 Nameserver=172.20.0.1 

Example 5-30 shows the output of this command.

Example 5-30: csmsetupks

start example
 [root@master /]# csmsetupks -P Netmask=255.255.0.0 Gateway=172.20.0.1 Nameserver=172.20.0.1 Copying Red Hat Images from /mnt/cdrom. Insert Red Hat Linux 7.3 disk 1.  Press Enter to continue. Insert Red Hat Linux 7.3 disk 2.  Press Enter to continue. Insert Red Hat Linux 7.3 disk 3.  Press Enter to continue. Setting up PXE. Collecting MAC Addresses for nodes node4.cluster.com Attempting to use dsh to collect MAC addresses. Cannot run dsh on node "node1.cluster.com" Cannot run dsh on node "node2.cluster.com" Cannot run dsh on node "node3.cluster.com" Cannot run dsh on node "node4.cluster.com" Generating /etc/dhcpd.conf file for MAC address collection. Setting up MAC Address Collection. Creating the Ramdisk for MAC Address Collection. 8602+0 records in 8602+0 records out mke2fs 1.27 (8-Mar-2003) 8602+0 records in 8602+0 records out  66.6% Running getmacs for nodes node1.cluster.com node2.cluster.con node3.cluster.com node4.cluster.com node1.cluster.com: 00:09:6B:63:27:41 node2.cluster.com: 00:02:55:67:57:CF node3.cluster.com: 00:02:55:C6:59:FD node4.cluster.com: 00:02:55:C6:5A:05 Setting up KickStart. 10684 blocks Adding nodes to /etc/dhcpd.conf file for KickStart install: node1.cluster.com node2.cluster.com node3.cluster.com node4.cluster.com. 
end example

The csmsetupks script first tries a dsh (using ssh) connection to gather the MAC addresses. If the nodes had been already installed before (in case you try to re-install the nodes or run the csmsetupks command again), then the MAC addresses are already added in the PreManagedNode database and csmsetupks does not attempt to get MAC addresses for those nodes.

All nodes where the MAC addresses were not collected using the database or with dsh are rebooted by csmsetupks, which will then connect to the nodes through the serial port and retrieve the MAC addresses.

If, for some reason, csmsetupks is unable to properly capture the MAC addresses from the nodes, you can use a command to manually record the address into the node configuration database. The MAC addresses for each node can be gathered by either one of these methods:

  • Capture address as it is displayed when the node first boots

  • Monitor /var/log/messages on the management node as you boot the node in question. The node will attempt to boot via DHCP, and the dhcpd daemon on the management node will log the attempt as a DHCPDISCOVER, including the MAC address.

Once you have obtained the MAC addresses of the nodes csmsetupks has failed to get, you should run the following command for each of those nodes:

 # chnode -n <nodename> InstallAdapterMacaddr=<MACaddr> 

where <nodename> and <MACaddr> is the host name and MAC address of the node you want to define.

Troubleshooting csmsetupks

The installation can hang during csmsetupks processing at different points.

If you run into a problem, the first thing to do is to restart the command with the -v option (verbose) and see if it helps you determine what has failed. If csmsetupks has already successfully copied the Red Hat Linux 7.3 RPMs to the /csminstall directory, you should use the -x option when retrying the command so the RPMs will not be copied again unnecessarily.

Tip 

If you forget to use the -x option when retrying the csmsetupks command, you can simply press Control-C when it prompts for the first Red Hat disc to be installed, and retry the command using -x.

5.3.7 Running the installnode script

At this point, all the pre-installation work is now complete, and all that is left to do is run the installnode script, which will actually install Linux and CSM on the specified nodes.

To install the nodes, type:

 # installnode -P 

The -P argument tells installnode to install all nodes that are in PreManaged mode and whose InstallMethod is set to kickstart, as shown in Example 5-31.

Example 5-31: KickStart installnode

start example
 [root@master /]# installnode -P Nodes to install:         node1.cluster.com         node2.cluster.com         node3.cluster.com         node4.cluster.com Rebooting Node for full install: node1.cluster.com Rebooting Node for full install: node2.cluster.com Rebooting Node for full install: node3.cluster.com Rebooting Node for full install: node4.cluster.com Status of nodes after the install: Node                           Status --------------------------------------------------------- node1.cluster.com            Rebooting and Installing Node. node2.cluster.com            Rebooting and Installing Node. node3.cluster.com            Rebooting and Installing Node. node4.cluster.com            Rebooting and Installing Node. [root@master /]# 
end example

Note 

The installnode command will return to the prompt before all nodes have completed their installations. This is not an error.

Each node will reboot and run a KickStart install via NFS access to the management server. Then the node will again reboot and run the post-installation script, which will actually install the CSM software. While this script is finishing, it may take a few moments for the node to properly respond to cluster commands, such as dsh.

In order to follow the installation progress on each node, you can either run the rconsole -N PreManagedNodes command on an xterm window on the management node, or run the monitorinstall command to display a summary of the installation status. Example 5-32 shows a sequence of commands showing the installation progress of our cluster installation.

Example 5-32: monitorinstall output sequence

start example
 [root@master /]# monitorinstall Node                           Status --------------------------------------------------------- node1.cluster.com Rebooting and Installing Node. node2.cluster.com Rebooting and Installing Node. node3.cluster.com Rebooting and Installing Node. node4.cluster.com Rebooting and Installing Node. [root@master /]# monitorinstall Node                            Status --------------------------------------------------------- node1.cluster.com Rebooting to hard disk node2.cluster.com Rebooting to hard disk node3.cluster.com Rebooting and Installing Node. node4.cluster.com Rebooting to hard disk [root@master /]# monitorinstall Node                           Status --------------------------------------------------------- node1.cluster.com            Installed node2.cluster.com            Rebooting to hard disk node3.cluster.com            Starting makenode to install CSM RPMs node4.cluster.com            Starting makenode to install CSM RPMs [root@master /]# monitorinstall Node                           Status --------------------------------------------------------- node1.cluster.com Installed node2.cluster.com Installed node3.cluster.com Installed node4.cluster.com Installed [root@master /]# 
end example

Troubleshooting installnode

If you run into trouble during the installnode process, you can check the log file generated by installnode command and the status log files generated for each node to help you determine what is wrong.

The installnode log file is /var/log/csm/installnode.log. This file contains all the output generated that you would see on your console if you started installnode with the -v option.

The nodes status files are located in the /tftpboot/status directory. To know where the installation stopped and what tasks have completed, you can view the files directly, or use the monitorinstall -l command to display all node status files.

Also, you may want to recheck your definenode command to be sure you entered all information correctly. A simple error like having a typographical error in the InstallMethod field will cause the installnode to be unable to restart the nodes.

5.3.8 Verifying compute and storage node installation

This section shows some basic commands to verify that the cluster is correctly installed.

  • To verify all nodes are reachable via the network, type lsnode -p. Nodes that return a 1 are present and accounted for. The output is shown in Example 5-33.

    Example 5-33: lsnode -p output

    start example
     [root@master /]# lsnode -p node1:  1 node2:  1 node3:  1 node4:  1 [root@master /]# 
    end example

  • Verify that dsh is working on all nodes by running the date command on all nodes. For example, with dsh -a date. The output is shown in Example 5-34.

    Example 5-34: dsh -a date output

    start example
     [root@master /]# dsh -a date node1.cluster.com: Mon July  19 23:42:09 CST 2003 node3.cluster.com: Mon July  19 23:42:09 CST 2003 node2.cluster.com: Mon July  19 23:42:09 CST 2003 node4.cluster.com: Mon July  19 23:42:09 CST 2003 
    end example

  • To verify the power status of all nodes, type rpower -a query. The output is shown in Example 5-35.

    Example 5-35: rpower -a query output

    start example
     [root@master /]# rpower -a query node1.cluster.com on node2.cluster.com on node3.cluster.com on node4.cluster.com on [root@master /]# 
    end example

5.3.9 Configuring NTP on your compute and storage nodes

If you have not already configured NTP on your compute and storage nodes using KickStart, as demonstrated in 5.3.5, "Customize the KickStart template (optional)" on page 139, you'll need to manually configure them now.

In order to manually configure the nodes, two files need to be created:

  • /etc/ntp.conf

  • /etc/ntp/step-tickers

Example 5-36 shows the content of the /etc/ntp.conf file.

Example 5-36: /etc/ntp.conf for compute and storage nodes

start example
 server master.cluster.com driftfile /etc/ntp/drift 
end example

Example 5-37 shows the content of the /etc/ntp/step-tickers file. It should be a single line containing the cluster VLAN IP address of your management node.

Example 5-37: /etc/ntp/step-tickers content

start example
 172.20.0.1 
end example

On each compute node, type the following commands to start the NTP service:

 # service ntp start # chkconfig --level 345 ntp on 

Now your compute nodes are fully installed.



 < Day Day Up > 



Linux Clustering with CSM and GPFS
Linux Clustering With Csm and Gpfs
ISBN: 073849870X
EAN: 2147483647
Year: 2003
Pages: 123
Authors: IBM Redbooks

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net