1.2 Principles of Performance Tuning | System Performance Tuning2002

In this book, I present a few "rules of thumb." As an old IBM technical bulletin says, rules of thumb come from people who live out of town and who have no production experience. ^[5]

^[5] MVS Performance Management , GG22-9351-00.

Keeping that in mind, I've found that much of the conceptual basis behind system performance tuning can be summarized in five principles: understand your environment, nothing is ever free, throughput and latency are distinct measurements, resources should not be overutilized, and one must design experiments carefully .

1.2.1 Principle 0: Understand Your Environment

If you don't understand your environment, you can't possibly fix the problem. This is why there is such an emphasis on conceptual material throughout this text: even though the specific details of how an algorithm is implemented might change, or even the algorithm itself, the abstract knowledge of what the problem is and how it is approached is still valid. You have a much more powerful tool at your disposal if you understand the problem than if you know a solution without understanding why the problem exists.

Certainly, at some level, problems are well beyond the scope of the average systems administrator: tuning network performance at the level discussed here does not really require an in-depth understanding of how the TCP/IP stack is implemented in terms of modules and function calls. If you are interested in the inner details of how the different variants of the Unix operating system work, there are a few excellent texts that I strongly suggest you read: Solaris Kernel Internals by Richard McDougall and James Mauro, The Design of the UNIX Operating System by Maurice J. Bach, and Operating Systems, Design and Implementation by Andrew Tanenbaum and Andrew Woodfull (all published by Prentice Hall).

The ultimate reference, of course, is the source code itself. As of this writing, the source code for all the operating systems I describe in detail (Solaris and Linux) is available for free.

1.2.2 Principle 1: TANSTAAFL!

TANSTAAFL means There Ain't No Such Thing As A Free Lunch. ^[6] At heart, performance tuning is about making trade-offs between various attributes. This usually consists of a list of three desirable attributes -- of which we can pick only two.

^[6] With apologies to Robert A. Heinlein's "The Moon Is A Harsh Mistress."

One example comes from tuning the TCP network layer, where Nagle's algorithm provides a way to sacrifice latency, or the time required to deliver a single packet, in exchange for increased throughput, or how much data can be effectively pushed down the wire. (We'll discuss Nagle's algorithm in greater detail in Section 7.4.2.8.)

This principle often necessitates making real, significant, and difficult choices.

1.2.3 Principle 2: Throughput Versus Latency

In many ways, systems administrators who are evaluating computer systems are often like adolescent males evaluating cars. This is unfortunate. In both cases, there are a certain set of metrics, and we try to find the highest value for the most "important" metric. Typically, these are "bulk data throughput" for computers and "horsepower" for cars .

The lengths to which some people will go to obtain maximum horsepower from their four-wheeled vehicles are often somewhat ludicrous. A slight change in perspective would reveal that there are other facets to the performance gem. A lot of effort is spent in optimizing a single thing that may not actually be a problem. I would like to illustrate this by means of a simplistic comparison ( please remember that we are comparing performance alone). Vehicle A puts out about 250 horsepower, whereas Vehicle B only produces about 95. One would assume that Vehicle A exhibits significantly "better" performance in the real world than Vehicle B. The astute reader may ask what the weights of the vehicles involved are: Vehicle A weighs about 3600 pounds , whereas Vehicle B weighs about 450. It is then apparent that Vehicle B is actually quite a bit faster (0-60 mph in about three and a half seconds, as opposed to Vehicle A's leisurely five and a half seconds). However, when we compare how fast the vehicles actually travel in crowded traffic down Highway 101 (which runs down the San Francisco Peninsula), Vehicle B wins by an even wider margin, since motorcycles in California are allowed to ride between the lanes ("lane splitting"). ^[7]

^[7] For the curious , Vehicle A is an Audi S4 sedan (2.7 liter twin-turbocharged V6). Vehicle B is a Honda VFR800FI Interceptor motorcycle (781cc normally-aspirated V4). Both vehicles are model year 2000.

Perhaps the most often neglected consideration in this world of tradeoffs is latency. For example, let's consider a fictional email server application -- call it SuperDuperMail. The SuperDuperMail marketing documentation claims that it is capable of handling over one million messages a hour. This might seem pretty reasonable: this rate is far faster than most companies need. In other words, the throughput is generally good. A different way of looking at the performance of this mail server would be to ask how long it takes to process a single message. After some pointed questions to the SuperDuperMail marketing department, they reveal that it takes half an hour to process a message. This appears contradictory: it would seem to indicate that the software can process at most two messages an hour . However, it turns out that the SuperDuperMail product is based on a series of bins internally, and moves messages to the next bin only when that bin is completely full. In this example, despite the fact that throughput is acceptable, the latency is horrible . Who wants to send a message to the person two offices down and have it take half an hour to get there?

1.2.4 Principle 3: Do Not Overutilize a Resource

Anyone who has driven on a heavily-travelled highway has seen a very common problem in computing emerge in a different area: the "minimum speed limit" signs often seem like a cruel joke! Clearly, there are many factors that go into designing and maintaining an interstate, particularly one that is heavily used by commuters: the peak traffic is significantly different than the average traffic, funding is always a problem, etc. Furthermore, adding another lane to the highway, assuming that space and funding is available, usually involves temporarily closing at least one lane of the active roadway. This invariably frustrates commuters even more. The drive to provide "sufficient" capacity is always there, but the barriers to implementing change are such that it often takes quite a while to add capacity.

In the abstract, the drive to expand is strongest when collapse is imminent or occuring. This principle usually makes sense: why should we build excess capacity when we aren't fully using what we have? Unfortunately, there are some cases where complete utilization is not optimal. This is true in computing, yet people often push their systems to 100% utilization before considering upgrades or expansion.

Overutilization is a dangerous thing. As a general rule of thumb, something should be no more than 70% busy or consumed at any given time: this will provide a margin of safety before serious degradation occurs.

1.2.5 Principle 4: Design Tests Carefully

I would like to explain this principle by means of a simple example of transferring a 130 MB file over Gigabit Ethernet using ftp .

 %  pwd  /home/jqpublic %  ls -l bigfile  -rw-------   1 jqpublic staff    134217728 Jul 10 20:18 bigfile %  ftp franklin  Connected to franklin. 220 franklin FTP server (SunOS 5.8) ready. Name (franklin:jqpublic):  jqpublic  331 Password required for jqpublic. Password:  <secret>  230 User jqpublic logged in. ftp>  bin  200 Type set to I. ftp>  prompt  Interactive mode off. ftp>  mput bigfile  200 PORT command successful. 150 Binary data connection for bigfile (192.168.254.2,34788). 226 Transfer complete. local: bigfile remote: bigfile 134217728 bytes sent in 13 seconds (9781.08 Kbytes/s) ftp>  bye  221 Goodbye.

If you saw this sort of performance, you would probably be very upset: you were expecting to see a transfer rate on the order of 120 MB per second! Instead, you are barely getting a tenth of that. What on earth happened ? Is there a parameter that's not set properly somewhere, did you get a bad network card? You could waste a lot of time searching for the answer to that question. The truth is that what you probably measured was either how fast you can push data out of /home , or how fast the remote host could accept that data. ^[8] The network layer is not the limiting factor here; it's something entirely different: it could be the CPU, the disks, the operating system, etc.

^[8] Which is not necessarily the same as "How fast can this data be written to disk?" although it might be.

A great deal of this book's emphasis on explaining how and why things work is so that you can design experiments that enable you to measure what you think you are.

There is a huge amount of hearsay regarding performance analysis. Your only hope of making sense of it is to understand the issues, design tests, and gather data.

The moral of this example is to think very, very carefully when you design performance measurement experiments. All sorts of things might be going on just underneath the surface of the measurement that are causing you to measure something that has little bearing on what you're actually interested in measuring. If you really want to measure something, try and find a tool written specifically to test that component. Even then, be careful: it's very easy to get burnt.

Tuning Kernel Variables

Throughout this book, we will refer to making adjustments to specific kernel variables. Never perform kernel tuning on a live system if you can possibly help it; try and replicate the problem in a controlled laboratory environment and experiment with the tunables there. Even if you do have to make changes on a live system, test your changes extensively .

In Solaris, you can tune kernel variables on the fly with adb , or have them automatically set at boot time. We'll use maxpgio as an example (you can learn more about maxpgio in Section 4.2.5). First, you need to know whether you are booted into a 64-bit or 32-bit kernel. You can find this out by running isainfo -v . Here's an example from a 32-bit system:

 %  isainfo -v  32-bit sparc applications

And from a 64-bit system:

 %  isainfo -v  64-bit sparcv9 applications 32-bit sparc applications

In order to check the setting of a kernel variable, start up adb in kernel mode ( -k ). To view the contents of a variable on a 32-bit system, type the variable name followed by /D ; on a 64-bit system, follow the variable name with /E . If you follow the variable name with an incorrect suffix, you will get meaningless output. Here's an example from a 32-bit system:

 #  adb -k  physmem 3bd6 maxpgio/D maxpgio: maxpgio:        40

And from a 64-bit system:

 #  adb -k  physmem 1f102 maxpgio/E maxpgio: maxpgio:        40

If you decide that you need to change the value of this variable, you can then make an immediate change to the variable. Drop out of adb by typing Ctrl-D, and restart it with adb -kw . To change the variable, you must again specify a suffix: use /W on 32-bit systems and /Z on 64-bit systems. This string should then be followed by 0t and the decimal value you wish to set the variable to.

For example, on a 32-bit system, let's change the value of maxpgio to 100:

 #  adb -kw  physmem 3bd6 maxpgio/W 0t100 maxpgio:        0x28            =       0x64 maxpgio/D maxpgio: maxpgio:        100

We can perform the same change on a 64-bit system:

 #  adb -kw  physmem 1f102 maxpgio/Z 0t100 maxpgio:        28              =       64 maxpgio/E maxpgio: maxpgio:        100

This change will not persist past a reboot. If you want to make the change permanent, edit /etc/system :

 * * change kernel variable at boot-time: 08-11-2001 by gdm set maxpgio=100

In Linux, depending on whether the variable has been exposed in the /proc tree or not, you have two choices: you can either edit the kernel source code and build a new kernel with your changes, or you can edit the appropriate file in /proc . Editing files in proc is straightforward -- for example, let's say you wish to change the value of the min_percent variable in Linux 2.2, which controls the minimum percentage of system memory available for caching. This variable can be found in the /proc/sys/vm/buffermem file, and the format of that file is min_percent max_percent borrow_percent . We can change the value of min_percent quite straightforwardly:

 #  cat /proc/sys/vm/buffermem  2 10 60 #  echo "5 10 60" > /proc/sys/vm/buffermem  #  cat /proc/sys/vm/buffermem  5 10 60

These changes won't persist past a reboot, but you can put an entry in a startup script to perform the tuning automatically.

Again ”be careful, and always test your changes!