6 Managing Ensembles

6.6 Job Scheduling
The task of managing a Beowulf cluster has been limited by the functionality of existing software. Few software packages have been tailored to cluster management, which as a result has largely relied on traditional LAN management software. Even though a large number of supplementary software systems, such as schedulers, have been created in research labs, none has emerged as standard software. As a result of the growing popularity of commodity clustered computing, it is inevitable that these software packages will be improved, redesigned, or supplanted by better implementations. As that happens. the tools and functions available for cluster management will become greatly enriched.
Many Beowulf administrators are interested in better job scheduling functions. Beowulfs usually start out with only a few users in a single department, but as news about the system spreads to neighboring departments, more users are added to the system. Once that happens, it becomes important to keep user-developed applications from interfering with each other. This is usually done by funneling all user programs through a job scheduler, which decides in what order and on what processors to execute the programs. At this time, we know of at least 10 different job schedulers that have been tried on Beowulf systems, none of which is entirely satisfactory to its users. Discussions at recent conferences and workshops indicate that this situation may be about to change. The importance of this class of software to managing a Beowulf cluster is now widely recognized, and work is in progress to improve the situation. Within the next year, it is possible that there will be one or two job schedulers favored as standard Beowulf system administration software. Work is also underway to improve the status of system monitoring support, cluster partitioning, and account management.
The world of Beowulf software development is about to start moving at the same rapid pace as general Linux development. You will have to be prepared to decide when you have a satisfactory and stable software environment to avoid playing the constant game of catch-up. It will help if you separate out one or two experimental nodes for software evaluation and testing. Before upgrading to a new kernel, make sure you've stressed it out on a single node for a week or two. Before installing that fancy new scheduler, test it extensively. The last thing you want is for your entire Beowulf to grind to a halt because a scheduler has a bug that swamps the entire system with more processes than it can handle. If your users demand a new compiler, install it in such a way that they still have access to the old compiler, in case the new one doesn't always do quite the right thing. If your production system is humming along just fine, don't mess with it. Users don't like system down

 



How to Build a Beowulf
How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)
ISBN: 026269218X
EAN: 2147483647
Year: 1999
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net