Understanding ClusterKnoppix


ClusterKnoppix provides an easy-to-use openMosix master node and enables you to PXE boot your slave nodes. It was developed by Wim Vandersmissen and is available via its home page at http://bofh.be/clusterknoppix/.

OpenMosix is a clustering technology that enables a set of nodes to transparently share processes. If you start a bunch of processes that swamp one node, it passes them out to other nodes to share the load. This includes passing around IPC (Inter-Process Communication) file descriptors. You may know already just how involved this is. If you don't, just realize that it's really, really tricky.

Because openMosix enables any arbitrary process to be migrated, it provides a lot more power for general use. This lowers the bar for parallel programming: Any program that can split itself up into multiple processes can now take advantage of the cluster.

Are you ready to tackle this more powerful, slightly more complicated clustering technique? Then read on.

Setting Up ClusterKnoppix

ClusterKnoppix, once it boots up, drives much like any other Knoppix. Begin by starting the OpenMosix Terminal Server (from the KDE menu, select KNOPPIX Services Start openMosix Terminal Server). This starts a set of prompts, much like the setup for ParallelKnoppix. Following the defaults is a good choice, but don't start the client nodes just yet — there's a bit of command-line work to do first.

When running openMosix under ClusterKnoppix, you must do more manual bookkeeping up front. You need to run several commands to get the master node's openMosix setup running before you bring up the slave nodes. Otherwise, you'll have a cluster with no detected nodes (and a cluster of a single machine won't do you much good!). Start either omdiscd or tyd. These are openMosix discovery daemons that enable openMosix slave nodes to be "discovered" by the master node. omdiscd is a broadcast discovery daemon; it sends out broadcast packets to any machine on the network, asking "Are you my peer?" This can lead to some (serious) security problems, but on a private network it's not an issue. tyd uses (I am not making this up) the Terrence and Phillip protocol to ensure security. Basically, it uses unicast (directed) packets to keep things in sync. It also encrypts your data before it is sent, ensuring that nothing sensitive is visible on the wire.

Because tyd is more security conscious, it insists on a couple of things that you may or may not have in your network: a default gateway that works and a good grasp of iptables (the Linux kernel's software firewall and general packet-mangling framework). The latter is a real problem because the default rules tyd wants to use are overly restrictive. They make PXE booting slave nodes nearly impossible. To avoid the problem, always start tyd by telling it to initialize the packet filtering rules, and then to turn them off. This is done with the single command sudo tyd –f init –f off, as shown in the following steps.

I can see every foot of every network cable between my switch and my machines at home, so I just use omdiscd, with my laptop plugged in to the network, acting as a gateway. I use its wireless card as my outward-facing interface. Because I trust my network, and I control the firewall between my network and the outside world, I feel very safe using omdiscd. If I were running ClusterKnoppix in a computer lab, though, I would opt for the more secure, but more complicated, tyd setup.

You can use omdiscd and tyd together, which means there are no real problems with running one right after the other. Whichever you use, the following instructions should work for you. With the openMosix Terminal Server open, perform the following steps:

  1. Open a new shell terminal.

  2. Enter the following command:

     sudo omdiscd or sudo tyd -f init -f off. 
  3. After the daemon starts, open the openMosix Monitoring application (click on the little white penguin on the KDE panel). Your screen should look similar to the one shown in Figure 7-3.

    image from book
    Figure 7-3: After booting the slave nodes

  4. Boot up the slave nodes. As they join the cluster, they're appended to the list of available nodes.

After all the nodes are up and in the list, you can move on to the fun part: running applications on the cluster.

Note 

One quick aside on monitoring here. My kitchen cluster is made up of less-than-ideal hardware. I don't like running the overhead of the GUI monitor all the time, so I use mosmon on the command line. You can see it peeking out of the background in some of the screenshots; it's a great tool, and gives you a quick idea of your cluster's CPU utilization.

Exploring ClusterKnoppix Applications

Several ClusterKnoppix applications are available, although this chapter only introduces you to two: POV-Ray and John the Ripper. Search for others on your own and determine whether they'd help you out.

Using POV-Ray

POV-Ray is a ray-tracing computer rendering program. That is, it takes text files describing scenes, and renders them as three-dimensional spaces. Ray tracing is really neat because it works by simulating rays of light emanating from a light source, and tracks them as they bounce off surfaces. This creates stunningly realistic results by modeling real-world physics. Keeping track of all those light rays is also stunningly expensive in computational terms, which makes the program an attractive candidate for parallel processing.

To run POV-Ray as a parallel program, first start the PVM (Parallel Virtual Machine) daemon by running pvm and then quit out of the console it opens. Run your parallel POV-Ray raytrace with this command:

 povray -i /usr/share/doc/povray/povscn/level2/skyvase.pov +v1 image from book    +ft -x +a0.300 +r3 -q9 -mv2.0 -w1600 -h1200 -d +NT16 

The last parameter, +NT, is the number of processes to begin. In general, try out a few smaller runs with different numbers of processes; I find that using a couple more threads than CPUs usually works best. In addition, you might want to change -w and -h, which are the width and height (in pixels) of the output image; processing time increases quickly as the image gets larger. The remaining parameters are settings for POV-Ray's rendering engine and are beyond the scope of this chapter. For more information on what they do, check out the POV-Ray manual (man povray).

Using John the Ripper

John the Ripper is a well-known password-auditing utility. It uses a brute-force attack to crack passwords, which is a great tool for making sure that your users are picking good ones. Because John the Ripper enables you to obtain users' passwords, you should be aware of any privacy, policy, or legal issues with the use of this tool in your jurisdiction or place of employment.

John the Ripper, while not a "scientific" application, is a good example of clustering. It too is from an earlier generation of software, with a slightly convoluted build process. If you've ever compiled scientific software, you've probably run into these sorts of things. If you haven't, this program is a good, gentle introduction to the kinds of issues you'll have to deal with.

Understanding John

First, a bit of background on what John the Ripper does, for those who are new to it. Passwords are typically stored in hashed form, meaning that the original, plaintext password isn't available in a file anywhere — only a hash and a salt value are stored. Whenever someone types in his or her password, the system gets the salt value, and combines it with the password given in a hashing process. The computer then compares the output to what's stored in the password file. If they match, it's almost certainly the correct password, so the user is authenticated. In its bruteforce attack, John the Ripper tries a bunch of possible passwords in sequence. It starts with "a", then "b", "c", and so on, working its way up to long, arbitrary strings of characters. Needless to say, this exhaustive search takes a long, long time. Take a ballpark figure of 40 character options available for each password character of an 8-digit password, and that's 408 possible passwords — 6.5×1012 unique strings to check!

One way to speed up the process is to keep a dictionary of possible passwords, and work through that. Even with a dictionary, brute-force password discovery is a very computationally intensive process. Thankfully, each guess is totally independent of the other guesses, making this a perfect candidate for parallelization. John the Ripper doesn't include an explicitly parallel mode, but it does come with a built-in rule language and a rule that enables you to split up a cracking session.

Setting Up John

John the Ripper isn't included with ClusterKnoppix, so you need to download and compile the program, and download a dictionary to start from as well. First, create a build directory to work in. Because you typically want to audit passwords every few months, creating the build directory on a hard disk is recommended, so you won't have to rebuild John the Ripper every time you want to run it. If you usually have more than one program installed, make a generic src directory on your disk:

 mkdir /mnt/hdc4/src/ 

Follow that with the following:

 cd /mnt/hdc4/src 

Then fetch a new copy of the John source code:

 wget http://www.openwall.com/john/c/john-1.6.tar.gz 

After the file downloads, unpack the source code:

 tar xzvf john-1.6.tar.gz 

John is an older UNIX application, and it doesn't have a configure script. Instead, you must use a slightly old-fashioned build interface. First change directories (cd john-1.6/src/), and then run make, which will give you a list of options for build targets. This example uses Linux x86 with MMX, so the following command compiles the program:

 make linux-x86-mmx-elf 

You're going to see a bunch of errors, which is perfectly normal. Don't worry about installing the software: It's best to leave it in the src directory and use it from there, ensuring that you won't lose the program later. If you plan to use a more permanent master node, you can install it there, but the rest of this example assumes the binaries are left where John's makefile created them.

After the program compiles, download a dictionary to use. The de facto standard dictionary is all.gz, available at ftp://ftp.openwall.com/pub/wordlists/all.gz. Use the following to fetch it into the John directory:

 cd /mnt/hdc4/src/john-1.6/ wget ftp://ftp.openwall.com/pub/wordlists/all.gz 

Configure your copy of John the Ripper to tell it how many nodes you have. By default, John assumes you have two nodes: If this is the case for you, skip this paragraph. To change the number of nodes in the cluster, you need to manually edit the john.ini config file in the run/ directory. Open the file in your favorite text editor. Toward the bottom is a line reading total = 2;. Change the number 2 to the number of nodes you have.

With John the Ripper compiled, a dictionary installed, and the number of nodes configured, the installation is finished. Next you'll create a password, extract its hash, and finally crack that password.

Running John

Begin cracking by setting a new password for user knoppix on the ClusterKnoppix machine. For this example, choose something relatively simple, such as bottle, to ensure that you won't spend hours waiting for John to stumble across it. To set the password, run the following:

 sudo passwd knoppix 

Enter the new password twice when prompted. Then create an unshadowed version of the password file. In modern UNIX and Linux systems, the /etc/passwd file doesn't actually contain the password hashes. Those are stored in a shadow of the password file, which is readable only by root or the shadow user. To crack the password without running John as root, you must create a password file containing hashes that you, a normal user, can read. To store the unshadowed results in /tmp/unshadow, run the following:

 john-1.6/run/unshadow /etc/passwd /etc/shadow > /tmp/unshadow 

This gives you a set of hashes to work with, and you can now begin cracking passwords. From within the john-1.6/run/ directory, run the following:

 ./john -rules -external:parallel -wordfile:../all /tmp/unshadow 

As shown in Figure 7-4, processing is distributed nicely among the nodes.

image from book
Figure 7-4: Running John on two nodes

The results should be displayed in the window from which you ran John. If you miss them for any reason, don't worry; John creates and saves a file containing the password and hash pairs it found in the directory from which it ran so that it can just look up hashes during future runs — a real time-saver. If you miss John's output or need to review your session, you can always use the -show option to get a copy of your results.

It is hoped that you're now comfortable building and running custom software on your openMosix cluster. Between this and the MPI system covered with ParallelKnoppix earlier in the chapter, you have the base knowledge required to set up many common parallel applications for your research or hobby. (If nothing else, you've got some bragging rights: How many people can say they've got a compute cluster at home?)



Hacking Knoppix
Hacking Knoppix (ExtremeTech)
ISBN: 0764597841
EAN: 2147483647
Year: 2007
Pages: 118

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net