This section covers scalability problems and how to prevent them. This section is more of a "don't do this" list, explaining limiting factors that can degrade performance or prevent the server from scaling. We also investigate the proactive tuning of Apache for optimal performance.
Operating System Limits
Several operating system factors can prevent Apache from scaling. These factors are related to process creation, memory limits, and maximum simultaneous number of open files or connections.
By the Way
The UNIX ulimit command enables you to set several of the limits covered in this section on a per-process basis. Refer to your operating system documentation for details on ulimit's syntax.
Apache provides settings for preventing the number of server processes and threads from exceeding certain limits. These settings affect scalability because they limit the number of simultaneous connections to the web server, which in turn affects the number of visitors you can service simultaneously.
The Apache Multi-Processing Module (MPM) settings are in turn constrained by OS settings limiting the number of processes and threads. How to change those limits varies from operating system to operating system. In Linux 2.0.x and 2.2.x kernels, it requires changing the NR_TASKS defined in /usr/src/linux/include/linux/tasks.h and recompiling the kernel. In the 2.4.x series, the limit can be accessed at runtime from the /proc/sys/kernel/threads-max file. You can read the contents of the file with this command:
# cat /proc/sys/kernel/threads-max
You can write to the file using this command:
# echo value > /proc/sys/kernel/threads-max
In Linux (unlike most other UNIX versions), there is a mapping between threads and processes, and they are similar from the point of view of the OS.
In Solaris, those parameters can be changed in the /etc/system file. Those changes don't require rebuilding the kernel but might require a reboot to take effect. You can change the total number of processes by changing the max_nprocs entry and the number of processes allowed for a given user with maxuproc.
Whenever a process opens a file (or a socket), a structure called a file descriptor is assigned until the file is closed. The OS limits the number of file descriptors that a given process can open, thus limiting the number of simultaneous connections the web server can have. How those settings are changed depends on the operating system. On Linux systems, you can read or modify /proc/sys/fs/file-max. On Solaris systems, you must edit the value for rlim_fd_max in the /etc/system file. This change requires a reboot to take effect.
You can find additional information at http://httpd.apache.org/docs/2.0/vhosts/fd-limits.html.
Controlling External Processes
Apache provides several directives to control the amount of resources external processes use. Such processes include CGI scripts spawned from the server and programs executed via server-side includes but do not include PHP scripts that are invoked using the module version, as the module is part of the server process.
By the Way
Following the installation instructions in the initial chapters of this book will result in PHP being installed as a module. Thus, these directives will not apply in your situation, unless you modified the installation type on your own.
Support for the following Apache directives (used in httpd.conf) is available only on UNIX and varies from system to system:
RLimitCPU Accepts two parameters: the soft limit and the hard limit for the amount of CPU time in seconds that a process is allowed. If the max keyword is used, it indicates the maximum setting allowed by the operating system. The hard limit is optional. The soft limit can be changed between restarts, and the hard limit specifies the maximum allowed value for that setting.
RLimitMem The syntax is identical to RLimitCPU, but this directive specifies the amount (in bytes) of memory used per process.
RLimitNProc The syntax is identical to RLimitCPU, but this directive specifies the number of processes.
These three directives are useful to prevent malicious or poorly written programs from running out of control.
Performance-Related Apache Settings
This section presents different Apache settings that affect performance.
File System Access
From a resource standpoint, accessing files on disk is an expensive process, so you should try to minimize the number of disk accesses required for serving a request. Symbolic links, per-directory configuration files, and content negotiation are some of the factors that affect the number of disk accesses:
Symbolic links In UNIX, a symbolic link (or symlink) is a special kind of file that points to another file. It is created with the UNIX ln command and is useful for making a certain file appear in different places.
Two of the parameters that the Options directive allows are FollowSymLinks and SymLinksIfOwnerMatch. By default, Apache won't follow symbolic links because they can be used to bypass security settings. For example, you can create a symbolic link from a public part of the website to a restricted file or directory not otherwise accessible via the Web. So, also by default, Apache needs to perform a check to verify that the file isn't a symbolic link. If SymLinksIfOwnerMatch is present, it will follow a symbolic link if the same user who created the symbolic link owns the target file.
Because those tests must be performed for every path element and for every path that refers to a filesystem object, they can be taxing on your system. If you control the content creation, you should add an Options +FollowSymLinks directive to your configuration and avoid the SymLinksIfOwnerMatch argument. In this way, the tests won't take place, and performance isn't affected.
Per-directory configuration files As explained in Chapter 3, "Installing and Configuring Apache," it is possible to have per-directory configuration files. These files, usually named .htaccess, provide a convenient way of configuring the server and allow for some degree of delegated administration. However, if this feature is enabled, Apache has to look for these files in each directory in the path leading to the file being requested, resulting in taxing filesystem accesses. If you don't have a need for per-directory configuration files, you can disable this feature by adding AllowOverride none to your configuration. Doing so will avoid the performance penalty associated with accessing the filesystem looking for .htaccess files.
Content negotiation Apache can serve different versions of a file depending on client language or preferences. This can be accomplished using specific language-related file extensions, but in that case, Apache must access the filesystem for every request, looking for files such as extensions. If you need to use content negotiation, make sure that you at least use a type-map file, minimizing accesses to disk. Alternatives to Apache-based content negotiation for internationalization purposes can be found in Chapter 27, "Application Localization."
Scoreboard file This is a special file that the main Apache process uses to communicate with its child processes on older operating systems. You can specify its location using the ScoreBoardFile directive, but most modern platforms do not require the use of this file. If this file is required, you might find improved performance if you place it on a RAM disk. A RAM disk is a mechanism that allows a portion of the system memory to be accessed as a filesystem. The details on creating a RAM disk vary from system to system.
Network and Status Settings
A number of network-related Apache options can degrade performance:
HostnameLookups When HostnameLookups is set to on or double, Apache will perform a DNS lookup to capture the hostname of the client each time the client makes a request. This constant lookup will introduce a delay into the response process. The default setting for this directive is off. If you want to capture the hostname of the requestor, you can always process the request logs with a log resolver later, offline, and not in real-time.
Accept mechanism Apache can use different mechanisms to control how Apache children arbitrate requests. The optimal mechanism depends on the specific platform and number of processors. You can find additional information at http://httpd.apache.org/docs-2.0/misc/perf-tuning.html.
mod_status This module collects statistics about the server, connections, and requests, which slow down Apache. For optimal performance, disable this module, or at least make sure that ExtendedStatus is set to off, which is the default.