Managing a large network can be a daunting task. Even with the Unix utilities available for remote administration, making changes on many systems can be taxing. Scripting tools make life easier to some extent, but some tasks require hands- and eyes-on interaction.
Several system utilities allow you to execute the same command on multiple hosts. This form of loosely coupled clustering is useful for information gathering and some monitoring purposes. However, on some occasions, you not only need to run a process on multiple hosts, but you must also observe it and interact with the process to resolve host-specific issues. An administration shell script will save typing and minimize mistakes, but it's hard to write a script that will work correctly on every machine on a diverse network.
Wouldn't it be nice if there were a program that allowed you to interact with your remote hosts while running parallel commands? Enter ClusterIt.
5.13.1 Why ClusterIt?
ClusterIt is a set of tools written by Tim Rightnour, designed to place all of your network hosts at your fingertips. ClusterIt includes utilities for running a single command on all of the hosts in your cluster. It also allows automatic distribution of the tasks to any available hosts in a defined group. It uses a remote login method, such as sshd on the target hosts, so you only need to install it on the control host.
Scripts can also synchronize between task completions on different hosts. For example, you can set two hosts to compile an application and install it on the other machine. Neither host should begin the installation until the other host has finished compiling, but it is impossible to predict which host will finish first. ClusterIt defines barrier operations that can be included in a script to prevent passing a synchronization point until all hosts have caught up.
In most clustering systems for Unix, once you issue a command, you cannot interact with the hosts in the cluster individually; you only see the final output of each command run on each of the hosts. ClusterIt does not have this limitation, making it ideal for dealing with processes that need continual monitoring.
5.13.2 Installation and Configuration
Install ClusterIt from the NetBSD pkgsrc collection:
# cd /usr/pkgsrc/parallel/clusterit # make install clean
It is also available in FreeBSD's /usr/ports/parallel/clusterit.
Before using any ClusterIt utility, you must create a list of machines in your cluster. Create the file ~/.cluster, containing a list of host names. Be sure not to put any whitespace after GROUP:, as in this example:
GROUP:setB Bester Brust GROUP:setOther Clarke Dick Niven Pohl Zelazny
Set an environment variable to tell ClusterIt where to find the list of hosts, and set two more to specify ssh as the tool to start remote shells and terminals. Run this from the command line or add the commands to your ~.cshrc or equivalent file [Hack #1] :
% export CLUSTER=$HOME/.cluster % export RCMD_CMD=ssh % export RLOGIN_CMD=ssh
5.13.3 Testing Noninteractive Commands
Now you're ready to issue commands to the cluster. You can run simple commands that require no interactivity from the command line with the dsh (distributed shell) command. Let's start by checking the version of the operating system on each of the hosts in a group:
% dsh -g setB uname -a Bester: SunOS bester 5.7 Generic_106541-11 sun4u sparc SUNW,UltraSPARC-IIi-Engine Brust: NetBSD brust 1.6ZC NetBSD 1.6ZC (GENERIC.MP) #1: Fri Sep 26 23:33:56 EDT 2003 david@pohl:/usr/obj/usr/src/sys/arch/i386/compile/GENERIC.MP i386
The -g groupname option specifies which hosts in the cluster should run this command. Every ClusterIt command allows you to specify a list of hosts, a named group of hosts, the entire cluster, or any of those options minus a list of excluded hosts.
As you can see, not much can go wrong with the uname command. Interestingly, the two hosts that I've chosen to use for examples are running different operating systems.
5.13.4 Using dvt
Many maintenance operations require different steps on machines running different operating systems. ClusterIt also includes a command called dvt (distributed virtual terminal), which allows you to interact with several hosts simultaneously or individually. This is where dvt shines!
Suppose that I want to install a Perl module on both of these example machines. First, I'll open the distributed terminals:
% dvt -g setB
Three terminal windows have opened up to my screen: one window for each of the two hosts and one control window. Anything I type in the control window goes to all of the host windows, as if I typed the same thing in each one. (I can also type within an individual host window, which will send my input only to that particular host.)
I have windows open to the hosts in the group now, but I'll need to be root to install the module.
In the control window, I'll type su. If the root password is the same on all the hosts, I can type it everywhere at once by typing in the command window. If the passwords are different on different hosts, I'll have to activate each host window in turn, typing the appropriate password in each one.
For simplicity, imagine I've already copied the module to my home directory on each host. I now need to un-tar it, run Perl on the Makefile.PL, run make, and run make install:
# tar xzvf Perl-Package-1.0.tgz && cd Perl-Package-1.0 && perl \ Makefile.PL && make && make install
If I knew that this command would work without any errors, I could have used dsh instead. However, any number of differences between these two machines could cause one or both to fail to complete this process. This Perl package may not have been tested on Solaris yet, or either machine could be missing some prerequisite package.
Since each host has its own window that I can view and type into, I can monitor the progress of the installation. If either host encounters a problem, I can focus my mouse on that window and manually correct and continue the process, without interfering with the other host.
5.13.5 Hacking the Hack
This technique is useful in several other situations. You can monitor a set of hosts by running ps, who, or top in several windows. You can diagnose network issues by running tcpdump on the source host, destination host, and any machines routing the packets in between the two.
An interesting way to troubleshoot networking is to have every host in your cluster ping or traceroute to the problem host. The missing route or mistyped filter rule quickly becomes obvious.
A sysadmin must troubleshoot all sorts of issues, including diagnosing name service troubles, NFS mount permissions, sysctl values, disk space, routing tables, backups, and logfiles. You can solve these problems more easily when you have a consolidated view of your systems.
5.13.6 See Also