9.2 Instant Tuning Recipes | System Performance Tuning2002

This section presents some recipes for performance tuning. Of course, these recommendations are for general scenarios, so feel free to modify them for your environment.

The best recipe for any machine, be it a server or a workstation, is to keep the installed operating system up-to-date, and be vigilant about maintaining patches. Performance patches are quite common and sometimes have dramatic effects, or enable you to use a newer algorithm (e.g., priority paging in Solaris) on older operating systems.

9.2.1 Single- User Development Workstations

One of the most common tuning scenarios is a computer used by a single developer. Such users often require high performance for compilation and other tasks that are both processor- and filesystem- intensive , but the users may not have the high speed disks that you would expect on a large server.

9.2.1.1 Filesystems

My preferred method of filesystem layout is to break the system disk into two partitions, one for swap space and the other for everything else. The swap partition should be the lower-numbered one on the disk; for an explanation of why, see Section 5.4.10.1. However, lots of people will disagree with me, and this topic borders on the religious. If you have more than one local disk, you should place a swap partition on each of them, in order to spread out the load a bit. On Solaris, all filesystems should be mounted with the logging option, to improve recovery time in the event of a power failure (see Section 5.4.3).

NFS-mounted filesystems containing essentially invariant files, such as application programs, should be mounted read-only in order to avoid NFS writes of file access times. You should also use CacheFS, if it's available for your platform, to cache commonly used files -- it is particularly useful for application directories. It shouldn't be used for mail directories, and it may be useful for home directories depending on what the read/write mixture of files is. Any application that regularly performs large writes to an NFS filesystem shouldn't use CacheFS. You can read more about implementing CacheFS on Section 5.4.10.

9.2.1.2 Swap space

Choosing the amount of swap space is fairly difficult. If you don't have a recommendation from your application vendor, configure at least 128 MB of virtual memory. Using swap files is less orderly than swap partitions, and disk space is cheap, so more never hurts. Most desktop machines will never need more than 512 MB of paging space.

9.2.1.3 Kernel tuning

Most operating systems are well configured for this sort of activity, so there's not much to do. The one exception is that if you are running Solaris 7 or earlier, you should turn on priority paging by adding the following lines to /etc/system :

 * * enable priority paging on pre-Solaris 8 systems set priority_paging = 1

For a detailed explanation of priority paging, see Section 4.4.3.1.

9.2.2 Workgroup Servers

In this case, we'll consider a workgroup server as one that is performing a variety of tasks in support of a small department. These tasks might include email services, file storage, light web service, etc. In a Unix-only environment, directories will be exported via NFS; we'll have to support SMB/CIFS on Windows networks with Samba. The server may also be the first-line proxy cache for the department's web browsers, as well as booting any thin clients , X terminals, or other network-reliant computers. The system is probably connected to the network via a few Fast Ethernet connections.

The workload mixture is probably going to largely be NFS or Samba. The basic sendmail software supplied with Solaris is capable of supporting at least a few hundred users without much, if any tuning; if you need to support more than that, you are probably outside of the workgroup server realm, but software products like iPlanet Messaging Server can support millions of users, along with much more efficient methods of storing messages. The web server usage is also probably going to be quite light. A uniprocessor UltraSPARC-II or faster Pentium 3-class system is easily capable of these tasks.

9.2.2.1 Memory

This sort of workload is not particularly memory-intensive: the largest active user program will probably be sendmail, and the rest of main memory will be a cache for the filesystems. Since most files will only be requested once due to client-side caching, the filesystem cache doesn't need to be large. Anything that is going to be reread frequently (e.g., applications) is an excellent candidate for being mounted via CacheFS on the clients.

As a rough guideline, start with 128 MB of memory for the system, then add about 64 MB of memory for each attached Fast Ethernet interface. If you are supporting X terminals or SunRays, you will need to add about 128 MB of memory for each one, depending on the workload that the clients will be running (one rule of thumb is to add as much memory to the server as you would install in a desktop machine, less 32 MB per client).

9.2.2.2 Disks

The two most important things to do as far as disk configuration is concerned are to ensure the reliability of data and accelerate writes as much as possible. If you have a hardware RAID device (such as a Sun StorEdge A1000), configure it as a RAID 1+0 or RAID 5 device, depending on how important capacity is (see my recommendations in Section 6.6). If you are deploying software RAID, you will want to use the DiskSuite logging feature instead of the UFS logging option, as the latter decreases performance. DiskSuite-based logging filesystems should use a dedicated disk of at least 7,200 rpm. ^[2] Hardware-based disk arrays can be safely mounted with the logging option, as writes to the log will be cached in the array controller's NVRAM.

^[2] This may seem like wasting a lot of space; really, it's not. We are interested in the performance of this drive, not its capacity.

You should mirror the system disk in this configuration.

9.2.2.3 Filesystems

I would create a single, large root partition, like in the workgroup server recipe. However, /var should reside on a separate partition; it will need to be write-accelerated. This doesn't have that much to do with log files; instead, we're trying to improve the performance of things like delivering mail to an inbox (a client rewriting the mail file on exit is a significant disk operation). /var should be fairly large; precisely how large depends on your usage patterns.

Create a separate filesystem for the user home directories.

9.2.2.4 Swap space

You should not need very much swap space for this sort of workload. A few hundred megabytes is more than sufficient.

9.2.2.5 Optimizing NFS

There are three important tuning changes to make for NFS servers: be sure you're using NFS Version 3, increase the number of NFS server threads, and, on Solaris systems, increase the TCP buffer sizes.

NFS Version 3 implements a few algorithmic changes that improve performance significantly, especially on file writes. All of your NFS clients should mount their filesystems and specify the vers=3 parameter. You can use nfsstat to find out whether any Version 2 calls are being made on your server. For more details about NFS Version 3, see Section 7.5.

Picking an appropriate number of NFS kernel threads is important. The out-of-the-box default is acceptable, but certainly not conducive to maximum performance. You actually adjust this parameter by editing the line in /etc/init.d/nfs.server that starts nfsd . 128 is a reasonable number to start with:

 /usr/lib/nfs/nfsd -a 128

On Solaris systems, you'll probably need to increase the TCP receive and transmit buffers, which are tuned by the tcp_recv_hiwat and tcp_xmit_hiwat variables . You will have to experiment with reasonable values, but a good starting point is to set tcp_xmit_hiwat to 56 KB (57344) and tcp_recv_lowat to 32 KB (32768). You shouldn't set these values greater than 64,000 in order to avoid turning on TCP window scaling. Linux web servers have equivalent parameters, but their default values are high enough and don't need to be tuned. For more details, see Section 7.4.2.3.

9.2.2.6 Kernel tuning

You may want to increase the maxphys variable to enable the kernel to push more contiguous pieces of modified files out to disk. You can read more about maxphys in Section 4.4.4.

If your Solaris system is mostly performing NFS workloads and has less than 128 MB of memory, you should increase the directory name lookup cache and the inode cache. You can read more about these caches in Section 5.4.1.1 and Section 5.4.2.7 in Chapter 5. Increasing them to 8,000 entries each is sufficient:

 * * increase the dnlc and inode caches for low-memory servers set ncsize = 8000 set ufs:ufs_ninode = 8000

Pre-Solaris 8 systems must turn on priority paging; see the previous recommendations for single-user workstations, or read Section 4.4.3.1.

9.2.3 Web Servers

There seems to be a definite trend to overbuild web servers to provide static content. A single UltraSPARC-II or fast Pentium-3 processor can saturate a 100-megabit Ethernet segment with static content easily, and scalability is effectively linear up to at least four processors. As the Web moves towards being a more dynamically generated medium, however, processor performance becomes more of a concern. Unfortunately, there are no good rules of thumb for sizing dynamic HTTP workloads simply because the workloads vary so much.

9.2.3.1 Memory

Web servers require quite a bit of memory to operate quickly, because they tend to perform best with large TCP buffers (which consume memory quickly when there are many connections) and with extensive filesystem caching. As a rough guideline, I would start with 256 MB of memory, and then add the size of the files that are responsible for about 90% of the requests (you can derive this information by inspecting the log files). This will give you enough memory to cache the "hottest" pages.

Pre-Solaris 8 systems should turn on priority paging. See Section 4.4.3.1.

9.2.3.2 Disks

For very large web sites or those providing web service for a large number of sites, disk storage becomes an issue. A single 7,200 rpm disk can sustain about 50 static HTTP operations per second, so you should work out how many peak operations per second you intend to sustain and stripe disks until you reach that limit. RAID 5 arrays are an excellent choice; they provide some fault resilience while maintaining good performance (largely because the workload is very read-heavy).

On busy web sites, it may be necessary to stripe and mirror the directory containing the HTTP logs. These will grow fast, especially during peak load.

9.2.3.3 Filesystems

I would still advise creating a single, large root partition, just like in the workgroup server recipe. However, as noted previously, you should be sure to create a dedicated file system for the HTTP logs, and take steps to improve write performance.

Whether or not you mirror the system disk is up to you. In large installations where the web servers are a group of Netra T1s, or other lightweight machines behind a load-balancing switch, or a round robin DNS entry, the failure of a machine isn't a big problem if your infrastructure is set up to work around it. You can just wait for the machine to be replaced , reinstall it via automated tools, and put it into service. This works best when your actual content is provided to the front-line web servers via an NFS backend.

9.2.3.4 Swap space

Again, web servers tend not to need a lot of swap space. 512 MB should be plenty, just to absorb anonymous memory requests that go unused. For details, see Section 4.3.2.1.

9.2.3.5 Networks

Network sizing is a little bit more tricky. We'll focus on static operations here; if your web server is performing a lot of dynamic content serving, you'll need to develop your own guidelines based on your particular workload. The intent of Table 9-1 is to provide you with a rough means of sizing your network capacity. For example, if you have a thousand users coming in via 56K dialup lines, you won't have to sustain more than 400 HTTP operations per second. Conversely, if you are trying to support 1,000 operations per second on a full-duplex Fast Ethernet link, you are probably doomed to fail.

Table 9-1. Network utilization for HTTP clients

Network connection	Bandwidth(per second)	Peak static HTTP operations/second
56K Dialup	56 Kbps	0.4
ISDN	128 Kbps	1
768K SDSL(Full-duplex)	768 Kbps	5
T1	1.5 Mb/sec	10
Cable modem	6	35
Ethernet	10	60
DS3	45 Mb/sec	300
Fast Ethernet(Full-duplex)	200 Mb/sec	600
Gigabit Ethernet(Full-duplex)	2,000 Mb/sec	5,000

There are two issues you should also be aware of regarding web server performance. The first is that SSL (also known as Secure HTTP) is an absolute performance killer. Divide the number of sustainable HTTP operations per processor by ten. There is a new generation of hardware SSL accelerator cards coming onto the market now, but I have no experience with them and cannot provide you with any good advice. The second issue to keep in mind is software products like Solaris's Network Cache Accelerator (NCA). These products run in kernel space and act as a caching mechanism for incoming (usually static) HTTP requests. This saves the trouble of copying the request from the kernel into user space, waking up the web server application, fetching the data from whatever cache (or, worse , disk) the page resides in, and then returning it to the kernel. The NCA product simply caches static web pages in a piece of kernel memory and transparently answers requests, but much faster. A newer generation of NCA will probably support hooks for dynamic content retrieval. Another product in this same vein is Tux, which is a very high-performance web server for Linux that runs significant portions of itself in kernel space. The caveat is that these products sometimes exhibit stability issues, so you'll need to test them carefully before deploying them.

9.2.3.6 Kernel tuning

You will probably want to increase the default values for the incomplete connection queue and the completed connection queue. In Solaris, these tunables are called tcp_conn_req_max_q0 and tcp_conn_req_max_q , respectively, and are 1,024 and 128 by default. The sign that you should increase these variables is large values of tcpListenDrop and tcpListenDropQ0 , which are viewable via netstat -sP tcp . tcp_conn_req_max_q0 probably shouldn't be set above about 10,000, and tcp_conn_req_max_q shouldn't need to be set larger than tcp_conn_req_max_q0 . In Linux, the equivalent to the tcp_conn_req_max_q0 parameter is tcp_max_syn_backlog , which can be found in /proc/sys/net/ipv4/ . For more information on the connection queues, see Section 7.4.2.1.

You will also want to increase the TCP receive and send buffers. These are governed by the tcp_recv_hiwat and tcp_xmit_hiwat variables in Solaris. You will have to experiment with reasonable values, but I would start by increasing tcp_xmit_hiwat to 48 KB (49,152) and tcp_recv_lowat to 32 KB (32,768). You shouldn't set these values greater than 64,000 in order to avoid turning on TCP window scaling. Linux web servers have their equivalent values set to values more appropriate to high performance, but you can tune them if you like. For more details, see Section 7.4.2.3.

9.2.3.7 Special case: proxy servers

The big difference between web servers and proxy servers is that proxy servers tend to be much more write-intensive (as they write cached pages to disk). The ideal configuration for a proxy server therefore involves a disk array with nonvolatile memory to cache and accelerate writes. A similar effect can be obtained by using a dedicated disk to log transactions to the cache filesystem.

If you are configuring a fast uniprocessor proxy server with, for example, eight 9 GB disks for data, I would stripe six of the disks with a 64 KB interlace size with DiskSuite or some other software RAID software. The remaining two disks would have one 128 MB partition each, be mirrored, and be used for storing the log information. Do not put anything else on those log disks. They are fully busy from a service-time point of view, even if they have spare storage and throughput available.