This section covers scalability problems and how to prevent them. This is more of a "don't do this" list, explaining limiting factors that can degrade performance or prevent the server from scaling. We will also investigate the proactive tuning of Apache for optimal performance.
Operating System Limits
Several operating system factors can prevent Apache from scaling. These factors are related to process creation, memory limits, and maximum simultaneous number of open files or connections.
| || |
The Unix ulimit command enables you to set several of the limits covered in this section on a per-process basis. Please refer to your operating system documentation for details on ulimit's syntax.
Apache provides settings for preventing the number of server processes and threads from exceeding certain limits. These settings affect scalability because they limit the number of simultaneous connections to the Web server, which in turn affects the number of visitors that you can service simultaneously.
The Apache MPM settings are in turn constrained by OS settings limiting the number of processes and threads. How to change those limits varies from operating system to operating system. In Linux 2.0.x and 2.2.x kernels, it requires changing the NR_TASKS defined in /usr/src/linux/include/linux/tasks.h and recompiling the kernel. In the 2.4.x series, the limit can be accessed at runtime from the /proc/sys/kernel/threads-max file. You can read the contents of the file with
and write to it using
echo value > /proc/sys/kernel/threads-max
In Linux (unlike most other Unix versions), there is a mapping between threads and processes, and they are similar from the point of view of the OS.
In Solaris, those parameters can be changed in the /etc/system file. Those changes don't require rebuilding the kernel, but might require a reboot to take effect. You can change the total number of processes by changing the max_nprocs entry and the number of processes allowed for a given user with maxuproc.
Whenever a process opens a file (or a socket), a structure called a file descriptor is assigned until the file is closed. The OS limits the number of file descriptors that a given process can open, thus limiting the number of simultaneous connections the Web server can have. How those settings are changed depends on the operating system. On Linux systems, you can read or modify /proc/sys/fs/file-max (using echo and cat, as explained in the previous section). On Solaris systems, you must edit the value for rlim_fd_max in the/etc/system file. This change will require a reboot to take effect.
You can find additional information at http://httpd.apache.org/docs/misc/descriptors.html.
Controlling External Processes
Apache provides several directives to control the amount of resources external processes use. This applies to CGI scripts spawned from the server and programs executed via Server Side Includes. Support for the following directives is available only on Unix and varies from system to system:
RLimitCPU Accepts two parameters: the soft limit and the hard limit for the amount of CPU time in seconds that a process is allowed. If the max keyword is used, it indicates the maximum setting allowed by the operating system. The hard limit is optional. The soft limit can be changed between restarts, and the hard limit specifies the maximum allowed value for that setting.
RlimitMem The syntax is identical to RLimitCPU, but this directive specifies the amount (in bytes) of memory used per process.
RlimitNProc The syntax is identical to RLimitCPU, but this directive specifies the number of processes.
These three directives are useful to prevent malicious or poorly written programs from running out of control.
Performance-Related Apache Settings
This section presents you with different Apache settings that affect performance.
File System Access
Accessing files on disk is expensive. You should try to minimize the number of disk accesses required for serving a request. Symbolic links, per-directory configuration files, and content negotiation are some of factors that affect the number of disk accesses:
Symbolic links In Unix, a symbolic link (or symlink) is a special kind of file that points to another file. It is created with the Unix ln command and is useful for making a certain file appear in different places.
Two of the parameters that the Options directive allows are FollowSymLinks and SymLinksIfOwnerMatch.
By default, Apache won't follow symbolic links because they can be used to bypass security settings. For example, you can create a symbolic link from a public part of the Web site to a restricted file or directory not otherwise accessible via the Web. So, also by default, Apache needs to perform a check to verify that the file isn't a symbolic link. If SymLinksIfOwnerMatch is present, it will follow a symbolic link if the same user sho created the symbolic link owns the target file. Because those tests must be performed for every path element and for every path that refers to a filesystem object, they can be expensive. If you control the content creation, you should add an Options +FollowSymLinks directive to your configuration and avoid the SymLinksIfOwnerMatch argument. In this way, the tests won't take place and performance isn't affected.
Per-directory configuration files As explained in Hour 2, "Installing and Configuring Apache," it is possible to have per-directory configuration files. These files, normally named .htaccess, provide a convenient way of configuring the server and allow for some degree of delegated administration. However, if this feature is enabled, Apache has to look for these files in each directory in the path leading to the file being requested, resulting in expensive filesystem accesses. If you don't have a need for per-directory configuration files, you can disable this feature by adding AllowOverride none to your configuration. Doing so will avoid the performance penalty associated with accessing the filesystem looking for .htaccess files.
Content negotiation Apache can serve different versions of a file depending on client language or preferences. This can be accomplished with file extensions, but for every request, Apache must access the filesystem repeatedly looking for files with appropriate extensions. If you need to use content negotiation, make sure that you at least use a type-map file, minimizing accesses to disk.
Scoreboard file This is a special file that the main Apache process uses to communicate with its children in certain older operating systems. You can specify its location with ScoreBoardFile, but most modern platforms do not require this directive. If this file is required, you might find improved performance if you place it on a RAM disk. A RAM disk is a mechanism that allows a portion of the system memory to be accessed as a filesystem. The details on creating a RAM disk vary from system to system.
Network and Status Settings
A number of network-related Apache settings can degrade performance:
HostnameLookups When HostnameLookups is set to on or double, Apache will perform a DNS lookup to capture the hostname of the client, introducing a delay. The default setting is HostnameLookups off. If you need to use the hostnames, you can always process the request logs with a log resolver later.
Accept mechanism Apache can use different mechanisms to control how Apache children arbitrate requests. The optimal mechanism depends on the specific platform and number of processors. You can find detailed tests and performance analysis at http://research.covalent.net/projects/osdl1.html. Additional information can be found at http://httpd.apache.org/docs-2.0/misc/perf-tuning.html.
mod_status This module collects statistics about the server, connections, and requests, which slows down Apache. For optimal performance, disable this module, or at least make sure that ExtendedStatus is set to off, which is the default.