15.5 Administration Tools


15.5 Administration Tools

Condor has a rich set of tools for the administrator. Table 15.2 gives an overview of the Condor commands typically used solely by the system administrator. Of course, many of the "user-level" Condor tools summarized in Table 15.2 can be helpful for cluster administration as well. For instance, the condor_status tool can easily display the status for all nodes in the cluster, including dynamic information such as current load average and free virtual memory.

Table 15.2: Commands reserved for the administrator.

Command

Description

condor_checkpoint

Checkpoint jobs running on the specified hosts

condor_config_val

Query or set a given Condor configuration variable

condor_fetch_log

Retrieve daemon logs from a remote machine

condor_master_off

Shut down Condor and the condor_master

condor_off

Shut down Condor daemons

condor_on

Start up Condor daemons

condor_reconfig

Reconfigure Condor daemons

condor_restart

Restart the condor_master

condor_stats

Display historical information about the Condor pool

condor_userprio

Display and manage user priorities

condor_vacate

Vacate jobs that are running on the specified hosts

15.5.1 Remote Configuration and Control

All machines in a Condor pool can be remotely managed from a centralized location. Condor can be enabled, disabled, or restarted remotely using the condor_on, condor_off, and condor_restart commands, respectively. Additionally, any aspect of Condor's configuration file on a node can be queried or changed remotely via the condor_config_val command. Of course, not everyone is allowed to change your Condor configuration remotely. Doing so requires proper authorization, which is set up at installation time.

Many aspects of Condor's configuration, including its scheduling policy, can be changed on the fly without requiring the pool to be shut down and restarted. This is accomplished by using the condor_reconfig command, which asks the Condor daemons on a specified host to reread the Condor configuration files and take appropriate action—on the fly if possible.

15.5.2 Accounting and Logging

Condor keeps many statistics about what is happening in the pool. Each daemon can be asked to keep a detailed log of its activities; Condor will automatically rotate these log files when they reach a maximum size as specified by the administrator.

In addition to the condor_history command, which allows users to view job ClassAds for jobs that have previously completed, the condor_stats tool can be used to query for historical usage statistics from a poolwide accounting database. This database contains information about how many jobs were being serviced for each user at regular intervals, as well as how many machines were busy. For instance, condor_stats could be asked to display the total number of jobs running at five-minute intervals for a specified user between January 15 and January 30.

The condor_view tool takes the raw information obtainable with condor_stats and converts it into HTML, complete with interactive charts. Figure 15.8 shows a sample display of the output from condor_view in a Web browser. The site administrator, using condor_view, can quickly put detailed, real-time usage statistics about the Condor pool onto a Web site.

click to expand
Figure 15.8: CondorView displaying machine usage.

15.5.3 User Priorities in Condor

The job queues in Condor are not strictly first-in, first-out. Instead, Condor implements priority queuing. Different users will get different-sized allocations of machines depending on their current user priority, regardless of how many jobs from a competing user are "ahead" of them in the queue. Condor can also be configured to perform priority preemption if desired. For instance, suppose user A is using all the nodes in a cluster, when suddenly a user with a superior priority submits jobs. With priority preemption enabled, Condor will preempt the jobs of the lower-priority user in order to immediately start the jobs submitted by the higher-priority user.

Starvation of the lower-priority users is prevented by a fair-share algorithm, which attempts to give all users the same amount of machine allocation time over a specified interval. In addition, the priority calculations in Condor are based on ratios instead of absolutes. For example, if Bill has a priority that is twice as good as that of Fred, Condor will not starve Fred by allocating all machines to Bill. Instead, Bill will get, on average, twice as many machines as will Fred because Bill's priority is twice as good.

The condor_userprio command can be used by the administrator to view or edit a user's priority. It can also be used to override Condor's default fair-share policy and explicitly assign users a better or worse priority in relation to other users.




Beowulf Cluster Computing With Linux 2003
Beowulf Cluster Computing With Linux 2003
ISBN: N/A
EAN: N/A
Year: 2005
Pages: 198

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net