6.4 Running commands on the nodes

 < Day Day Up > 



6.4 Running commands on the nodes

When working on a cluster, the administrator usually wants to run the same command across a number of nodes, often the entire cluster. CSM provides two ways of doing this; a simple command-line tool, dsh, and a Java GUI, DCEM.

6.4.1 Distributed shell (dsh)

The dsh utility is a lightweight command-line tool that allows parallel commands to be easily issued from the terminal.

Example 6-23 shows the use of the date command with dsh to find out the time on all the machines in the cluster.

Example 6-23: Displaying the time on all nodes in the cluster using dsh

start example
 [root@master /]# dsh -a date node4.cluster.com: Wed July  9 18:10:04 CDT 2003 node3.cluster.com: Wed July  9 18:10:04 CDT 2003 storage1.cluster.com: Wed July  9 18:10:04 CDT 2003 node1.cluster.com: Wed July  9 18:10:04 CDT 2003 node2.cluster.com: Wed July  9 18:10:04 CDT 2003 [root@master /]# 
end example

Note that the answer does not come back in any particular order. Example 6-24 shows how the sort command can be used to format the output of dsh into node order.

Example 6-24: Formatting dsh output with sort

start example
 [root@master /]# dsh -a date | sort node1.cluster.com: Wed July  9 18:10:44 CDT 2003 node2.cluster.com: Wed July  9 18:10:44 CDT 2003 node3.cluster.com: Wed July  9 18:10:44 CDT 2003 node4.cluster.com: Wed July  9 18:10:44 CDT 2003 storage1.cluster.com: Wed July  9 18:10:44 CDT 2003 [root@master /]# 
end example

Often, it is more useful to see which nodes return similar or different values. CSM provides dshbak for this purpose.

In Example 6-25, all the nodes produced the same output. Example 6-26 shows what happened when the output differs.

Example 6-25: Using dshbak to format the output of dsh

start example
 [root@master /]# dsh -a date | dshbak -c HOSTS ------------------------------------------------------------------------- node1.cluster.com, node2.cluster.com, node3.cluster.com, node4.cluster.com, storage1.cluster.com ------------------------------------------------------------------------------- Wed July  9 18:11:26 CDT 2003 [root@master /]# 
end example

Example 6-26: dsh and dshbak with different outputs

start example
 [root@master /]# dsh -a ls -l /tmp/file | dshbak -c HOSTS ------------------------------------------------------------------------- node2.cluster.com, node3.cluster.com, node4.cluster.com, storage1.cluster.com ------------------------------------------------------------------------------- -rw-r--r--    1 root     root            0 July  9 18:17 /tmp/file HOSTS ------------------------------------------------------------------------- node1.cluster.com ------------------------------------------------------------------------------- -rw-r--r--    1 root     root            4 July  9 18:17 /tmp/file [root@master /]# 
end example

Note that in all the above examples, the piped commands (sort and dshbak) have run on the management node. Many shell meta-characters, including pipe (|), semicolon (;),and redirection (<, > and >>), must be enclosed within quotes if you want the operation to occur on the cluster nodes instead of locally on the management node. For example:

 # dsh -av 'rpm -aq | grep glibc' 

Tip 

If you need to dsh a command that includes special characters but you are unsure how to quote them correctly, create a script file in a shared directory and use dsh to run the script file. Alternatively, DCEM does not suffer from the same special character problems (See 6.4.2, "Distributed command execution manager (DCEM)" on page 169).

A commonly employed feature of dsh is the -v switch. This will verify (based on lsnode -p) the nodes availability before connecting. This saves waiting for the underlying remote shell (rsh or ssh) to timeout. Example 6-27 shows what happens when dsh -v is used and a node is not responding.

Example 6-27: dsh -v with a down node

start example
 [root@master /]# dsh -av date dsh: node4.cluster.com Host is not responding. No command will be issued to this host node1.cluster.com: Wed July  9 18:25:43 CDT 2003 node2.cluster.com: Wed July  9 18:25:43 CDT 2003 node3.cluster.com: Wed July  9 18:25:43 CDT 2003 storage1.cluster.com: Wed July  9 18:25:43 CDT 2003 [root@master /]# 
end example

It is possible that performing a large number of operations simultaneously could cause problems, for example, put excessive load on a file server. By default, dsh will attempt to run the specified commands in parallel on up to 64 nodes. This "fan-out" value may be changed by setting the DSH_FANOUT environment variable or using the -f switch to dsh:

 # dsh -avf 16 rpm -i /nfs/*.rpm 

6.4.2 Distributed command execution manager (DCEM)

In contrast to the lightweight command line tool dsh, DCEM is a Java GUI that performs a similar task. DCEM allows you to construct command specifications for execution on multiple target machines, providing real-time status as commands are executed. You can enter the command definition, run-time options, and selected hosts and groups for a command. You have the option of saving this command specification to use in the future. You can create and modify groups of hosts to use as targets for a command directly from DCEM.

Start DCEM from the command line by running:

 # dcem 

Figure 6-1 on page 170 shows all xosview windows from our four compute nodes on the GNOME desktop of our management node.

click to expand
Figure 6-1: DCEM - xosview from all nodes

The logs are saved in /root/dcem/logs and the command in /root/dcem/scripts.

Example 6-28: DCEM logs

start example
 TIME:      July 19 23:42:58.581 INFO:      Command Name:xterm Command: xterm -display master:0.0 Successful Machines: node2.cluster.com node1.cluster.com node3.cluster.com node4.cluster.com Failed Machines: TIME:      July 19 23:44:01.444 INFO:      Command Name:gnome-term Command: gnome-terminal -display master:0.0 Successful Machines: node2.cluster.com node1.cluster.com node3.cluster.com node4.cluster.com Failed Machines: TIME:      July 19 23:45:45.385 INFO:      Command Name:nxterm Command: nxterm -display master:0.0 Successful Machines: node1.cluster.com node2.cluster.com node3.cluster.com node4.cluster.com Failed Machines: 
end example

For a complete description of the Distributed Command Execution Manager functions, refer to the IBM Cluster Systems Management for Linux: Administration Guide, SA22-7873.



 < Day Day Up > 



Linux Clustering with CSM and GPFS
Linux Clustering With Csm and Gpfs
ISBN: 073849870X
EAN: 2147483647
Year: 2003
Pages: 123
Authors: IBM Redbooks

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net