Chapter 7: Clustering with Knoppix | Hacking Knoppix (ExtremeTech)

Overview

Knoppix and Knoppix-derived distributions can be used for large computational problems that would benefit from the use of multiple computers. This chapter introduces the concept of a cluster, which, for the sake of this discussion, is a group of systems networked together to run a single computational task.

A cluster is the opposite of a shared system whereby multiple users run concurrent tasks. Clusters are used to run tasks, frequently called jobs, that are computationally intensive (often called expensive), such as the ones that scientists and mathematicians often need. This chapter focuses primarily on ParallelKnoppix, which is by far the easiest of the clustering setups, but it shows most of the problems you might run into with a Knoppix-based ad hoc cluster. ParallelKnoppix is perfect for the basement supercomputer, which is a great hack. If your problem can be solved using MPI (Message Passing Interface, an industry standard for parallel computing), ParallelKnoppix is for you. If it can't, ParallelKnoppix is a good learning experience. It presents a very gentle learning curve for handling all the basic components of clustering.

Note

Another type of cluster—the high-availability or fail-over cluster— involves pooling multiple computers, each of which is a candidate server for your filesystems, databases, or applications. In the event of failure in one of the cluster members, the others take over its services, usually in a way that makes it transparent to client systems accessing the data. That's a subject for another book. This chapter covers only parallel clusters.

Clustering is of special interest to system administrators, whose job is to configure the cluster with the libraries and services needed and then to maintain the system. Parallel programming would be another book entirely. The administrator's task in and of itself can be complex, and this chapter can only begin to cover it.

This chapter assumes that you're using Knoppix and its derived distros as a development platform, so, for example, $PATH values will be the default on Knoppix. Code will be written, whenever possible, as bash (Bourne Again Shell) shell scripts. The chapter also assumes that the purpose of your cluster is to do real work in an environment similar to a research group, in which the task of maintaining the cluster isn't the job of just one system administrator but several. The intent is to ensure that you have a maintainable system that doesn't depend on just one knowledgeable person. (It almost goes without saying that your system should be behind a firewall and be generally secure.)

After examining the basic concepts of clustering, you'll explore ParallelKnoppix and ClusterKnoppix distros and then take a look at some other science-related Knoppix derivations.