2.2 Workload Control | System Performance Tuning2002

The next step in the realm of workloads is imposing workload management. This step is often technically straightforward, but it can carry with it political problems. Forcibly attempting to control the users of a system is often not a good idea, and in my opinion should be restricted to cases where education and attempting to secure the cooperation of your user base has failed abjectly. If you are getting some pressure from management to do this, I think you would do well to get it in writing.

Why do I submit these warnings? It is significantly less of a problem in academic environments or in small companies, where everyone knows the systems administrator and it's easy to walk down to their office and ask them what's going on. However, in large professional environments or when the management of the users is divorced to a degree from the management of the administrators, things often get quite complex. It is often very frustrating to users (who, perhaps, designed the hardware or software that you are administrating ) to have their abilities restricted: it reduces their flexibility in solving their own problems. If they appeal up through management, you could find yourself trying to explain what seemed like a reasonable technical decision at the time to a few very irate senior vice presidents . You really need to be careful if you are going to be forceful.

On the other hand, user education is one of the most powerful techniques available to you to control workloads. In this section, I discuss user education and the closely related area of written performance agreements, as well as some of the more direct techniques for limiting system resource consumption.

2.2.1 Education

The most powerful tool available to you in order to control the workloads on your systems is user education. Enforcing strict CPU time or disk quotas, while effective, often adds to a "resentment of the mystery" phenomenon . This leaves users feeling rather like medieval serfs: there are certain things they just can't do, like encourage the rain in a dry season , and all they can do is go and beg some rather mysterious people who usually live in caves to try and fix the problem for them. The end result is that the users get very frustrated.

A much better solution is to explain the problem to your users, how their actions induce it, and the solution to the problem. Many times, this sort of forthright discussion will produce the results you'd like, with much less of a headache .

2.2.1.1 Usage and performance agreements

One very basic approach to user education is to develop a usage and performance agreement . This is roughly akin to the "service-level agreements" that are common in the networking world: it is a document that sets out exactly what both sides expect from each other. It can be as formal or as informal as you want it to be. I have seen lengthy documents that looked like they'd gone out for legal review, and half-page descriptions that summarized what the agreed-upon metrics for performance were. In both cases, and in many cases in between, everyone was happy with what they had. I personally prefer short, technically dense documents (I think it's easier to get users to read them), but that's just a matter of taste.

There are many things that can be set out in a usage and performance agreement:

What is the minimum acceptable level of performance? That's a pretty loaded question. Ask your users to define it. Are there a certain number of concurrent jobs or users that must be supported? For each job, what is the "maximum runtime"? Can you agree upon a simple test suite that will determine whether, in the general case, performance is "acceptable"?
What mechanisms are in place to measure performance? What does the load on the system look like, and what characteristics does it exhibit before failing? Is there a way for users to tell when a machine is "full"?
What are the operational windows of the machine? When can maintainance be performed?
How are you, the systems administrator, to be notified of problems? If you are out of town or otherwise unavailable, do you have a backup?
How are they, the users, to be notified of problems? Is an entry in the message of the day sufficient? Does an email alias need to be created for announcements?

It's vitally important that you revisit these agreements periodically (say, once a quarter). This review may be very short if the document is still accurate, or it may be a bit longer; this is usually the case when the work being done has gotten more complex, or changed to pursue different problems.

It's been my experience that very few people actually take the time to create these sorts of agreements. This is, in my opinion, absurd. In addition to giving you solid information about what the users expect from their environment, it fosters good will and a cooperative spirit. Dealing with uncooperative users is one of the least pleasant aspects of being a systems administrator; unfortunately , it's probably part of your job. Take a few minutes to create an agreement with your users, and get management feedback to ensure your agreement is understood by everybody. It'll be enlightening.

2.2.2 The maxusers and the pt_cnt Parameters

When the Berkeley Standard Distribution (BSD) of Unix was being developed, it created a single tunable that would scale the size of several internal kernel tables and buffers. The scaling factor was directly correlated to the number of time-sharing users that the system had to support, so the parameter was called maxusers . In the modern world, there is no direct relationship between the number of users that a system supports and the value for maxusers . However, a set of parameters in Solaris are still derived from the value of maxusers , and some of them are related to workload control, as described in Table 2-1.

Table 2-1. Parameters derived from maxusers

Variable	Kernel resource	Default setting
`max_nprocs`	The maximum number of processes systemwide	`10 + 16` x `maxusers`
`reserved_procs`	The number of processes, systemwide, reserved for the root user	5
`maxuprc`	The maximum number of processes per non-root user	`max_nprocs -- reserved_procs`
`ndquot`	The size of the quota table	`(10` x `maxusers) + max_nprocs`
`ncsize`	The size of the DNLC(see Section 5.4.1.1)	`4` x `(max_nprocs + maxusers) + 320`
`ufs_ninode`	The size of the inode cache (see Section 5.4.2.7 in Chapter 5)	As `ncsize`

The maxusers parameter itself is automatically sized (unless you are running Solaris 2.3 or earlier). It is set to a value approximately equal to the number of megabytes of RAM in the system; the minimum limit is 8 and the maximum limit (without manual tuning) is 1024, which applies to any system with 1 GB or more of memory. You can set maxusers to a higher value manually in /etc/system , but it is limited to an overall maximum of 2048.

Contrary to popular suspicion, the maxusers parameter does not limit the number of current logins to the system; it just sizes a number of parameters that are generally related to the number of users a system can support. Of course, many things factor into the number of supportable users; maxusers is just intended to be a general knob. The variable that defines supportable users is pt_cnt . By default, this variable is set to 48, which can be a significant limit. There is a practical limit on the value of pt_cnt ; namely, the format of the utmp file imposes a limit of 3844 telnet and 3844 rlogin sessions. You should probably keep pt_cnt under about three thousand, in order to avoid hitting other, nontunable system limits. After setting pt_cnt , you will need to create the entries in /dev/pts ; this is best accomplished by rebooting with the reconfiguration flag ( touch /reconfigure; reboot ; this is identical to boot -r ).

2.2.3 Limiting Users

Unfortunately, there comes a time when you have to impose limits. Sometimes these limits are set up because user education is very difficult, or because one bad apple can spoil the cider. My personal approach to user limitations is that they are best thought of as an emergency brake; when a (generous) parameter is exceeded, it's healthier to stop whatever is causing the problem rather than cleaning up after it later.

There are two basic approaches to limiting users: quotas and environment controls.

2.2.3.1 Quotas

One of the most common types of user limitations is disk quotas . Disk quotas exist to prevent a user from consuming more than their fair share of disk space, determined by what they're entitled to or what they've paid for.

Disk quotas are typically implemented in two stages. Once a user exceeds a specific threshold of utilization, ^[5] usually called the hard limit , they are not able to write any more data to disk until they remove some files and their disk utilization drops below that limit. At some lower level of utilization, there is another limit, which is usually called the soft limit . Once the user exceeds the soft limit, they can continue to write data up until they run into the hard limit; however, a timer starts. When the timer (which is usually measured in days) expires , the user cannot write any more files until they remove data and drop below the soft limit. This provides a way for users to temporarily consume large amounts of disk space.

^[5] This utilization typically can either be measured in the amount of disk space consumed or in the number of files.

Enabling quotas is quite straightforward on both Solaris and Linux. On Solaris, you will need to install the accounting software packages ( SUNWaccr and SUNWaccu ) and link two startup files:

 #  ln /etc/init.d/acct /etc/rc0.d/K22acct  #  ln /etc/init.d/acct /etc/rc2.d/S22acct

Then add these lines to adm 's crontab file:

 # min   hour    day     month   wkday   command 0    *   *    *   *   /usr/lib/acct/ckpacct 30   3   *    *   *   /usr/lib/acct/runacct 2> /var/adm/acct/nite/fd2log 30   9   *    *   5   /usr/lib/acct/monacct

Then enable process accounting by rebooting or by running /etc/init.d/acct start .

On Linux, the process is similar. The first step is to modify the system startup scripts to turn on process accounting at boot:

 # enable process accounting if [ -x /sbin/accton ] then     /sbin/accton /var/log/pacct     echo "activating process accounting" fi

You then need to create the accounting record file /var/log/pacct . This file should be owned by root and world-readable:

 #  touch /var/log/pacct  #  chown root /var/log/pacct  #  chmod 0644 /var/log/pacct

You can then either reboot or run /sbin/accton /var/log/pacct to turn on process data accounting.

Enterprising users can always come up with ways around disk quotas. I have seen users mail files to themselves , find directories on other filesystems (where quotas weren't being enforced) to hide things in, and even "bounce" files around on the network between multiple machines via some homegrown software. The more reasonable your disk quotas are, the less of a problem you will have with users creatively working around the problem. To return to the emergency brake analogy, think about the electronic speed governors installed on modern automobiles. If the governor is set to something like 140 miles per hour, most people will not ever bother to disable it; ^[6] they don't ever go that fast, so they may not even know it's there. However, if the speed governor is set to something like 80 miles per hour, a huge number of people will start trying to come up with ways around the problem of their engine cutting off as they try and pass someone on the freeway ! The upshot is that the more reasonable you are with disk quotas, the less of a headache they are to enforce.

^[6] Just like disk quotas, the enterprising driver can always disable a speed governor.

2.2.3.2 Environmental limits

A more subtle way of controlling users' processes is to impose limits on how much of a system resource their applications can consume. This is implemented at the shell level via the limit command in csh or ulimit in ksh and sh . Table 2-2 summarizes the most common of these parameters and their typical defaults

Table 2-2. Resource limits

Resource	Soft limit	Hard limit
`cputime`	Unlimited	Unlimited
`filesize`	Unlimited	Unlimited
`datasize`	2 GB	2 GB
`stacksize`	8 KB	2 GB
`coredumpsize`	Unlimited	Unlimited
`descriptors`	64	1024
`memorysize`	Unlimited	Unlimited

Users can increase their own limits up to the hard limit, which can usually be modified by the superuser. Older systems may have different limits imposed by hardware constraints on the data size and stack size parameters (for example, sun4c architecture machines are limited to 512 MB and 256 MB, respectively).

You can place changes to the limits in the default profiles for user's shells . One of the most common changes are to prevent core dumps from occuring by setting the coredumpsize limit to zero:

 csh%  limit coredumpsize 0

Remember, enterprising users can always find a way around these restrictions.

2.2.4 Complex Environments

In some complex environments, user workload management becomes a very big deal. You may have hundreds, if not thousands, of processors scattered amongst anywhere from one machine to a vast cluster of single-CPU workstations. The problem shifts away from avoiding overloading and toward obtaining optimal utilization across such a wide field of systems. This is a very complex area, and one that commercial vendors are just starting to approach. The leading idea now seems to be c omputational grids , which enable the distribution of work across geographically disparate areas. While a detailed discussion of these issues is beyond the scope of this text, I would like to offer a few points of advice. Chief amongst these is not to be afraid of gluing together tools into something that works for you. Some Perl wrapped around the process-accounting mechanism can produce nice summary statements of usage on a per-application or per-user basis, with "usage charges" assessed depending on when applications were run and on what machines. You can also, with some care, create simple batch-queuing programs: a user submits their job to a controlling process, which finds an idle system given a certain range of parameters and starts jobs on appropriate systems (e.g., certain machines are workstations and if otherwise idle can only be used between 8 P.M. and 6 A.M., or certain jobs require large amounts of physical memory and should only be run on machines with more than 2 GB of memory). Commercial software exists to perform these sorts of tasks .