Configuring Apache

only for RuBoard - do not distribute or recompile

Many people think of a Web server primarily as something that services Web page requests by opening files containing static HTML pages and sending their contents to clients. That is one thing a Web server does, of course, but it s not the only one. Another is to execute programs that generate pages. When you use Apache for this, you re generating dynamic content, because the pages generated by a program can differ each time the program runs. (As a trivial example, consider a program that simply displays the current time of day; it generates different output whenever the time changes.)

Serving dynamic content is more difficult than serving static pages because it involves writing programs to produce the content. It also involves configuring your Web server to know how to find and execute these programs. This section shows how to set up Apache so it knows how to do that, and lays the groundwork for some of the configuration modifications we ll need to make in later chapters.

The examples in this book assume that Apache is laid out using the following set of path names. If the paths on your system are different, just modify the instructions to match.

  • /usr/local/apache

    The root directory of the Apache layout (the Web server root ).

  • /usr/local/apache/bin

    The location for executable binary programs. (Actually, this directory doesn t just contain binaries; it contains scripts, too executable text files.) The most important program here is httpd (HTTP daemon), which is what I m referring to when I say Apache (see sidebar). Another important program in the bin directory is apachectl, a utility used to start and stop the httpd server.

  • /usr/local/apache/conf

    The location for Apache configuration files.The most important file here is httpd.conf, the main httpd configuration file. Older versions of Apache use three configuration files (httpd.conf, srm.conf, and access.conf). I m going to assume you re using the current single-file arrangement. If you use all three files, you ll need to decide which ones to modify when you change Apache s configuration.

Why Apache ?

The program known as Apache is called httpd due to its roots in the program of the same name that originated at the National Center for Supercomputing Applications (NCSA). After development of the original NCSA httpd stagnated, a group of volunteers took over and directed their efforts into improving it. Because of the numerous patches applied to this version of httpd as development continued, it became known as a patchy server hence Apache.

  • /usr/local/apache/htdocs

    The root directory of the Web document tree (the document root ). This is where you place documents that you want to make available through Apache.

  • /usr/local/apache/cgi-bin

    The directory for CGI programs the Web server can execute. This is where the Web scripts written in this chapter will be installed. The cgi-bin directory should be outside the document tree (that is, not anywhere under the document root) so that scripts cannot be requested in plain text. We ll discuss this issue in more detail in Chapter 9. For now, suffice it to say that you don t want remote clients to be able to read your scripts directly, because those scripts may provide clues for hacking your site.

  • /usr/local/apache/cgi-perl

    Another program directory. We ll use it specifically for Web scripts that run under mod_perl. We ll get to that in Chapter 3, Improving Performance with mod_perl, so I won t say much more about it here. The one thing you should know is that most scripts developed from that chapter on will be installed in cgi-perl, because I ll assume you re going to use mod_perl. That means the URLs for those scripts will begin like this:

    http://www.snake.net/cgi-perl/...

    If you don t install mod_perl, you should put the scripts in the cgi-bin directory and adjust the URLs to use cgi-bin rather than cgi-perl :

    http://www.snake.net/cgi-bin/...

  • /usr/local/apache/lib/perl

    The directory for Perl library files. We ll use library files for common operations that are performed by multiple scripts, so that we don t have to write code for them explicitly in each script that uses them. For example, every script that accesses MySQL must connect to the server. Rather than list the connection parameters in each script, we ll make a simple call to a function stored in a library file.

  • /usr/local/apache/logs

    The log file directory. The logs are useful when errors occur because they help you figure out what you need to fix. You may also want to analyze the logs to find out how much traffic you re getting and what pages are requested. The default log filenames are access_log (where page requests are written) and error_log (where diagnostic information is written). If you have trouble getting a Web script to run, take a look in the error log to see whether it contains any useful error messages.

You ll notice that most of the directories in the preceding list are grouped under a single parent directory /usr/local/apache. It s possible that the Apache components won t be so centralized on your system. For example, Red Hat Linux distributions like to scatter the pieces around more, putting configuration files under /etc/httpd, the server in /usr/sbin, and so forth. You should poke around on your system or ask your administrator what kind of Apache layout you have. In particular, you ll need to know the locations of the configuration file httpd.conf and the server httpd because you ll need to edit httpd.conf to control Apache s behavior, and you must restart the server to get it to read the modified configuration file. If you cannot edit httpd.conf directly, ask your administrator how to adapt the instructions in this chapter for use on your system.

Configuring Apache for CGI Program Execution

In this section, we ll configure Apache to know how to execute CGI programs located in the cgi-bin directory. A CGI program is one with which Apache communicates using the Common Gateway Interface, a protocol that serves as a kind of contract between the program and the Web server and that allows each to know how the other can be expected to behave. The server sets up certain environment variables that provide information about the request received from the client; the CGI program uses this information to figure out how to process the request.

To tell Apache that we want to use the cgi-bin directory for CGI programs, put the following line in your httpd.conf file, if it doesn t already contain such a line:

 ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/ 

The ScriptAlias directive tells Apache to associate a directory with executable programs. Thus, any URL received by the Web server that has /cgi-bin/xxx after the host name will be interpreted as a request to execute a program xxx in the directory /usr/local/apache/cgi-bin and send its output to the client that sent the request. If the directory named on the ScriptAlias directive does not yet exist, create it:

 % cd /usr/local/apache  % mkdir cgi-bin 

If you want to base the association on filename rather than location, you can do that, too. This is useful if you want to put scripts in the document tree (which can be dangerous if it results in remote users being able to read your scripts directly, but you can do that if you really want to).

One typical strategy is to specify that filenames with a certain suffix, such as .pl or .cgi, should be considered CGI programs. For example, the following lines can be used in httpd.conf to identify files in the document tree that have names ending with .pl as CGI scripts:

 <Files *.pl>      SetHandler cgi-script      AllowOverride None      Options ExecCGI  </Files> 

Apache s Parent-Child Architecture

The next section describes how to start and stop Apache. For a better understanding of how that works, it s helpful to have some background on Apache s parent-child architecture.When Apache (httpd) starts up, the initial httpd process is the parent process. It reads the configuration file httpd.conf to determine how it should operate. Then it starts forking (spawning) child processes to handle client requests. Each child httpd handles a certain number of requests, and then dies. As each child dies, the parent httpd spawns a new child to take its place.

The number of children spawned initially is controlled by the StartServers configuration directive, and the minimum and maximum number of children the parent tries to keep around are controlled by MinSpareServers and MaxSpareServers. If more simultaneous requests come in than can be handled by the available children, the parent spawns more children, up to the limit specified by MaxSpareServers. As children die, the parent spawns more if the count goes below MinSpareServers. The configuration parameters give you some control over the size of the child pool. Bumping up the maximum value allows more requests to be served simultaneously, but if you put it up too high, you run the risk of allowing too many processes for efficient machine operation. The administrator s task is to find the best balance.

The number of requests served by each child is controlled by the MaxRequestsPerChild directive. Setting it to 1 is inefficient because the parent ends up spawning children at a rate roughly equal to the number of requests. On the other hand, setting the value high (or to 0, which means unlimited ) can have negative consequences. If Apache contains code that has a memory leak, for example, a long-running child may grow large enough over time to take over the machine. The core Apache code isn t likely to do this because it s been pretty well scrutinized, but Apache can be extended with additional modules, and some of these may not have been so well inspected. An additional danger, if you re using mod_perl to allow Apache to execute Perl scripts directly, is that user scripts written in Perl can cause memory leaks (see Chapter 3). Having child httpd processes die after serving a limited number of requests is a safety precaution that guards against such problems. Just as with selecting a server pool size, the administrator s task is to find the proper balance. You want to spawn children relatively infrequently, but not so infrequently that each one gobbles up a huge amount of resources.

Configuring Apache Under Windows

Under Windows, Apache runs as a multithreaded process and does not spawn multiple children. Instead, a single child handles all requests, with each request processed using a separate thread. Consequently, much of the preceding discussion does not apply. You can control the number of simultaneous client connections by controlling the number of threads allowed. To do this, use the ThreadsPerChild directive, which under Windows is equivalent to the StartServers directive. The default value is 50; set it higher if you get a lot of traffic. For MaxRequestsPerChild, the recommended value is 0 because there is just one child. This value causes the child to never exit.

Starting and Stopping Apache Using apachectl

The apachectl script is a handy utility for checking changes to the Apache configuration file and for starting and stopping Apache itself. It understands several arguments, which are summarized in Table 2.1

Table 2.1. Options for apachectl

Option

Effect

configtest

Performs syntax check on configuration file.

start

Starts parent, which spawns children.

stop

Parent and children die immediately.

restart

Children die immediately; parent reinitializes and spawns a new set of children.

graceful

Like restart, but children are allowed to service requests currently in progress before dying.

To start Apache, if it s not running currently, use apachectl start. Generally, after the server is running, you leave it alone. If you modify the configuration file, however, you need to restart the server to get it to notice the changes. A sensible precaution before restarting is to run apachectl configtest to perform a syntax check on the configuration file. This gives you a way to find errors before using the file with a live server. (If the file contains errors, the server won t restart anyway until you either back out the changes or fix the problem. In the meantime, your Web site will be dead and visitors can t use it.)

To restart a server that is already running, you have three options. The most drastic is to stop it cold by bringing it to a complete halt and starting a new parent. You do this with apachectl stop followed by apachectl start. A less drastic option is apachectl restart, which kills the children but lets the parent continue to run. The parent reinitializes itself, and then spawns a new set of children. The gentlest option is apachectl graceful. This is like apachectl restart, but any children currently servicing a request are not killed until the request has been satisfied.

A reasonable strategy after you ve modified the configuration file and checked it with ap achectl configtest is to try apachectl graceful to restart with the least disruption. However, some configuration changes require a complete shutdown and restart. If your changes don t seem to take effect, try apachectl stop followed by apachectl start.

Starting and stopping Apache must be done by a user with sufficient privileges, such as root. If you don t have the ability to restart the server, you ll need to ask your system administrator to do so.

In addition to being able to control httpd manually, you ll want it to start up at system boot time. The boot-time startup mechanism varies between systems, so you should consult your machine s documentation for instructions.

Alternatives to apachectl Under Windows

apachectl is a shell script, so it doesn t run under Windows. Instead, you can control Apache using the following commands:

 apache                  start Apache  apache -k shutdown      stop Apache  apache -k restart       restart Apache  apache -t               check httpd.conf configuration file 

Depending on how you installed Apache, you may also be able to start and stop it from the Start menu. For example, the Apache binary distribution installer adds a Start menu item for starting Apache. Other distributions may do this as well; NuSphere MySQL includes Apache, and its installer places a NuSphere group in the Start menu that has items for controlling Apache.

Under Windows NT, you have the additional option of running Apache as a service.

Verifying That Apache Is Serving Pages

When your Apache server is running, you should be able to enter the URL for your site into your browser and view your home page. For example, this URL requests the home page on www.snake.net:

http://www.snake.net/

Try requesting your own site s home page by substituting its name for www.snake.net. If you don t see the page or you get an error message, check the error log to see what the problem is.

As you begin to install Web scripts later in the book, you ll need to substitute your own host name into the www.snake.net URLs that I use in the examples.

only for RuBoard - do not distribute or recompile


MySQL and Perl for the Web
MySQL and Perl for the Web
ISBN: 0735710546
EAN: 2147483647
Year: 2005
Pages: 77
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net