Section 38.2. Configuring Apache | LPI Linux Certification in a Nutshell (In a Nutshell (OReilly))

38.2. Configuring Apache

The Apache web server is controlled by configuration files that live in the directory /opt/apache/conf. The main configuration file is httpd.conf , and like most configuration files for Unix services, it is a plain text file that can be edited by your favorite text editor. The configuration file is broken down into three main sections:

Global section: This contains configuration options that apply to every host on the server.
Main section: The configuration options here apply to any virtual host that is accessed and does not have its own section.
Virtual host section: Each virtual host that is hosted on the server gets its own configuration section.

You may be asking "What is a virtual host?" While the Apache web server is perfectly capable of serving the web pages for a single web site (such as www.example.com), its real strength lies in its ability to serve pages for many different web sites, all from the same machine. Each of these web sites is called a virtual host, because it appears that each of them has Apache all to itself. This allows companies or ISPs to host many (often thousands) of web sites on a single Apache installation. Virtual host syntax will be covered later in the chapter.

The global configuration section gives directives that affect the entire Apache installation, including any virtual hosts. Let's look at one of the simplest Apache httpd.conf files you'll ever see, Example 38-1. It's all global directives; we're not defining any virtual hosts. This is a complete httpd.conf file and is probably sufficient if you're just using Apache in a testing environment.

Example 38-1. Apache configuration file

 ServerRoot "/opt/apache" Listen 80 LoadModule headers_module modules/mod_headers.so LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_connect_module modules/mod_proxy_connect.so LoadModule proxy_ftp_module modules/mod_proxy_ftp.so LoadModule proxy_http_module modules/mod_proxy_http.so LoadModule proxy_ajp_module modules/mod_proxy_ajp.so LoadModule proxy_balancer_module modules/mod_proxy_balancer.so LoadModule ssl_module modules/mod_ssl.so LoadModule dav_module modules/mod_dav.so LoadModule cgi_module modules/mod_cgi.so LoadModule rewrite_module modules/mod_rewrite.so User daemon Group daemon ServerAdmin you@example.com DocumentRoot "/opt/apache/htdocs" <Directory "/opt/apache/htdocs">     Options Indexes FollowSymLinks     AllowOverride None     Order allow,deny     Allow from all </Directory> DirectoryIndex index.html ErrorLog logs/error_log LogLevel warn LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined CustomLog logs/access_log combined ScriptAlias /cgi-bin/ "/opt/apache/cgi-bin/" DefaultType text/plain TypesConfig conf/mime.types AddHandler cgi-script .cgi

Most of these lines are pretty self-explanatory. ServerRoot defines where you installed Apache ; all directory paths given after this directive can be given relative to this root. Listen 80 tells Apache what TCP port to listen for connections on. The LoadModule lines ensure that the modules we compiled as DSOs get loaded into memory every time Apache starts. The User and Group directives dictate the security context under which the Apache binary will run. The ServerAdmin directive indicates the email address that is shown to web browsers when an error is encountered, and the DocumentRoot directive tells Apache where to look for pages to serve.

The <Directory> line indicates the start of a container object. Anything between the <Directory /opt/apache/htdocs> and </Directory> tags applies only to the directory /opt/apache/htdocs. As you can see, we're turning on some options and dictating who can request content from this directory (in this case, anyone). The Options directive is a complicated one and can be a potential security risk. For more information, visit the Apache documentation at http://httpd.apache.org/docs/2.2/mod/core.html#options. For our simple example here, Indexes and FollowSymLinks are two pretty safe and popular options.

DirectoryIndex defines what files Apache should look for when a directory is requested instead of a file. For example, if our site is www.example.com and our index page is /opt/apache/htdocs/index.html, we want to make sure that when someone requests either http://www.example.com or http://www.example.com/index.html they see the same resource. So the line:

 DirectoryIndex index.html

handles this for us. You can add additional filenames here if you want Apache to search for them (in order) every time it serves files out of a directory. A common global option would be:

 DirectoryIndex index.html index.htm index.php

to try and catch most of the different ways people would name their index files.

The next three lines deal with how and where Apache saves log files. All errors will go to the file defined in the ErrorLog directive. All regular traffic will be logged to the file defined in the CustomLog directive, using the filter indicated, which is combined in this case. As you can see from the LogFormat directive, you have complete control over the formatting of your log files. However, because it's very common to use third-party tools to parse your Apache log files and generate reports for you, you probably want to stick with a standard format that these tools are expecting. The different standards for log file formatting are defined at http://httpd.apache.org/docs/2.2/logs.html#accesslog.

Finally, the last four lines deal with how Apache handles certain "special" content. The ScriptAlias directive tells Apache that when a file is requested from this directory, instead of sending the file directly to the requestor's web browser, the file is executed and the output is sent to the web browser. This is the most common way to allow public users to execute programs on your server. These programs are known as Common Gateway Interface (CGI) scripts. They are often written in scripting languages, such as Perl or Python, but they can also be written in C, Ruby, or any other language that you can run on your server.

The Default Type and Types Config lines tell Apache what to do with certain files. Unlike operating systems such as Microsoft Windows, where file types are usually determined by filename extensions, Apache relies on MIME types to determine the content of a file. These configuration lines tell Apache where to find the MIME types definition file and what the default MIME type should be for files that it serves. Finally, the AddHandler line allows you to have CGI programs anywhere in your web directory structure, not just limited to the cgi-bin directory. As long as your CGI program has a name that ends in .cgi, Apache will execute it and send the results to the web browser.

The main section of the Apache configuration file is used to define default settings for requests that come in that do not match a virtual host. If you are running Apache without any virtual hosts (so it's serving content for only one domain name) this is the only other section you'll need. All your configuration options for that one domain will go here. If you have virtual host containers later, however, this section will be used only when no virtual host matches the request.

To understand that a little better, we need to discuss the difference between name-based and IP-based virtual hosts. Years ago, when the Web was young, if you wanted to host multiple web sites on one system, you had to have a unique IP address for every domain you wanted to host. For example, if you were hosting the domains www.example1.com and www.example2.com, the IP addresses for them might be 192.168.1.1 and 192.168.2.1, respectively. If that were the case, your virtual host containers might look something like this:

 <VirtualHost 192.168.1.1> DocumentRoot /opt/apache/www.example1.com ServerName www.example1.com </VirtualHost> <VirtualHost 192.168.2.1> DocumentRoot /opt/apache/www.example2.com ServerName www.example2.com </VirtualHost>

For every web request that came in on 192.168.1.1, Apache would serve files out of the /opt/apache/www.example1.com directory. For every web request that came in on 192.168.2.1, Apache would serve files out of the /opt/apache/www.example2.com directory.

This works fine, but the problem is that IP addresses are becoming harder and harder to come by. The solution is instead to use name-based virtual hosts, which allow multiple domain names to resolve to the same IP address, while still allowing Apache to figure out what domain is being requested and serve web pages from the appropriate place. A name-based virtual host setup looks like this:

 NameVirtualHost *:80 <VirtualHost *:80> ServerName www.example1.com ServerAlias example1.com DocumentRoot /opt/apache/www.example1.com </VirtualHost> <VirtualHost *:80> ServerName www.example2.com ServerAlias example2.com DocumentRoot /opt/apache/www.example2.com </VirtualHost>

Notice that the first directive is NameVirtualHost. This tells Apache to expect name-based virtual hosts. We also tell Apache here to expect name-based requests coming in on port 80 (as specified by *:80). This is not required if your implementation of Apache is going to listen only on port 80 (if that's the case, you can just put NameVirtualHost *), but it's good practice anyway, because web servers often listen on additional ports.

In this case it's assumed that www.example1.com and www.example2.com both resolve to the same IP address. When a request comes in, Apache will look in the HTTP headers to determine what virtual host is being requested and will search the ServerName and ServerAlias directives to find a match. If a match is found, content is served from the corresponding DocumentRoot. If no match is found, the default virtual host is used.

Name-based virtual hosts treat the different sections of the configuration file in a special way you need to understand. In an IP-based setup, if there is no virtual host match (which means that a request came in on an IP address that was not defined in a virtual host container), the main section is used. However, with name-based virtual hosts, if no match is found, the first defined VirtualHost container becomes the default. Essentially, therefore, named-based virtual hosting overrides the main section of the Apache config file. Even if that section is there, the directives will never be honored.

You can put any directive in a VirtualHost container that you would normally put in the main section. Here is a better example of what a VirtualHost container would often contain:

 <VirtualHost *:80> ServerName www.example1.com ServerAlias example1.com CustomLog /opt/apache/logs/www.example1.com.log combined DocumentRoot /opt/apache/www.example1.com/ ScriptAlias /cgi-bin/ /opt/apache/www.example1.com/cgi-bin/ Alias /image/    /opt/apache/image/ Options FollowSymLinks ErrorDocument 404 /404.htm </VirtualHost>

In this example, we're defining a custom log file, a unique script alias, and a unique error document for this virtual host.

38.2.1. Access Control

It's common to use the Apache configuration to restrict access to certain directories. The two components of this process are the configuration directives and the username and password source.

Apache has the ability to query many different sources to get authentication information. It's very common to have Apache query your existing LDAP or MySQL database if you already have user account information stored there. You can even have Apache query your Microsoft Windows Active Directory server for authentication information. Setting up this kind of advanced authentication is beyond the scope of this book. However, simple authentication with a manually maintained file of usernames and passwords is easy to do and is sufficient for most situations.

The standard authentication mode for Apache is called basic authentication. Basic authentication reads a file created by the htpasswd program that ships with Apache. This program allows you to create and maintain a file containing usernames and passwords that Apache can use to enforce authentication on directories.

Here is an example of creating an authentication file with htpasswd:

 # /opt/apache/bin/htpasswd -c /opt/apache/password.list demouser New password: Re-type new password: Adding password for user demouser

Note that the password you type at the prompt is not echoed back to the screen. We can now look at the contents of this file:

 # cat /opt/apache/password.list demouser:RRm8uBOYINC.s

It contains our username (demouser), a colon as a delimeter, and the encrypted version of the password we typed.

Now that we have this file in place, we can add security to one of our virtual hosts. Let's say that we want to allow all users to access the top level of www.example1.com without a password, but the subdirectory /protected will require a username and password and will restrict access to the demouser user listed in the file we just created (of course, we can add more users later). Here is our new VirtualHost container to accomplish this:

 <VirtualHost *:80> ServerName www.example1.com ServerAlias example1.com CustomLog /opt/apache/logs/www.example1.com.log combined DocumentRoot /opt/apache/www.example1.com/ ScriptAlias /cgi-bin/ /opt/apache/www.example1.com/cgi-bin/ Alias /image/    /opt/apache/image/ Options FollowSymLinks ErrorDocument 404 /404.htm <Directory /opt/apache/www.example1.com/protected> AuthName "Authorized Users Only" AuthType Basic AuthUserFile /opt/apache/password.list require valid-user </Directory> </VirtualHost>

After we add this directory container to the Apache config file at /opt/apache/conf/httpd.conf, we must restart Apache (/opt/apache/bin/apachectl restart). Like most other Unix services, the Apache server reads its config file only at startup, so any change to the config file will go into effect only when Apache restarts.

Now when we go to www.example1.com/protected in our web browser, we get a pop-up box that is entitled Authorized Users Only and asks for our username and password. If our authentication as demouser is successful, we will be shown the content. If not, we'll be asked three times (this behavior is client-specific) before finally being shown an error page stating that our authentication failed.

38.2.2. Third-Party Modules

As stated earlier, the existence of third-party modules is one of the reasons that the Apache web server has remained so popular. In this section, we will look at configuring three of the most popular third-party modules: mod_php, mod_perl, and mod_ssl.

38.2.2.1. mod_php

The mod_php module integrates the PHP (Pre-Hypertext Processing) scripting language into Apache. Like most other web scripting languages, PHP works in this manner:

A PHP file is requested by a web client.
Apache loads the PHP script into memory and passes it to the PHP parsing engine.
The PHP parsing engine parses the script, generating HTML code.
The resulting HTML code is passed to the web browser via HTTP and is displayed to the end user.

This is the same model that is used for CGI scripts, but in this case the parser is not a separate program that Apache calls; it will be built into Apache itself.

Like the Apache web server, you have the choice of installing either the binary version or the source version of PHP. If you're using a binary version of Apache that accompanies your distribution, chances are good that the distribution also has a version of mod_php that will work with your version of Apache. If you're using a source installation of Apache, as we are in our examples, you'll need to download the source code for PHP and compile it.

You can get the latest version of PHP from http://www.php.net. As of this writing, the latest version is 5.1.1. Download the file php-5.1.1.tar.bz2 from a mirror listed on the php.net site and save it to a directory on your hard drive. The procedure for compiling PHP is very similar to the procedure for compiling Apache: we will run ./configure with options, then run make, and finally run make install.

The first step is to uncompress the PHP source code and run ./configure -help to see all the compile-time options available.

 # tar xjf php-5.1.1.tar.bz2 # cd php-5.1.1 # ./configure -help

There are too many compile-time options to list them all here. You can find detailed instructions on each configure option at http://www.php.net/manual/en/configure.php. A couple of important configuration options are described here:

--with-apxs: Enable the building of PHP as a DSO instead of directly into the Apache binary. This is the recommended way to build PHP for the reasons stated earlier, primarily so that you can upgrade PHP independently of Apache if you compile it as a DSO.
--with-mysql: Enable PHP to connect to a MySQL database. One of the more popular platforms for running web applications has been dubbed LAMP (for Linux/Apache/Mysql/PHP). The tight integration with the popular MySQL database makes PHP a smart choice when you're looking to develop database-driven dynamic web sites. MySQL installation is not covered in this chapter.

Although there are many more options, these two will get you off and running with a basic PHP installation that you can become familiar with.

 # ./configure --with-apxs=/opt/apache/bin/apxs --with-mysql

If this command generates any errors, check your settings and try again. It's common to see errors when you're attempting to compile PHP with modules for which you don't have the necessary libraries installed. For example, if you try to enable LDAP support in PHP with the --with-ldap flag, but you don't have any LDAP libraries or headers installed on your system, the configure script will detect this and fail.

Assuming the configure command completes successfully, you will now have a Makefile that will be read by the make command.

 # make # make install

The make install command not only copies all of the necessary PHP files onto your system, but also modifies your Apache httpd.conf file to load the PHP module. This line is added to /opt/apache/conf/httpd.conf:

 LoadModule php5_module        modules/libphp5.so

In order for Apache to know how to identify PHP scripts , you also need to add this line to the global section of your Apache httpd.conf file:

 AddType application/x-httpd-php .php .phtml

This tells Apache that any file ending in .php or .phtml will be assigned the MIME type application/x-httpd-php, which will be in turn handled by the PHP interpreter.

Now we're ready to restart Apache and verify that PHP is working.

 # /opt/apache/bin/apachectl stop # /opt/apache/bin/apachectl start

Since we're loading a new module, it's not sufficient just to tell Apache to reread its configuration file (which is what the restart option does). We actually need to stop it and then start it again.

The easiest way to verify our PHP installation is to create a simple PHP script and try to view it in our web browser. Save this text file in /opt/apache/www.example1.com/test.php:

 <?php phpinfo( ); ?>

Now go to http://www.example1.com/test.php in your browser. You should see a long table with a blue background that says PHP Version 5.1.1 at the top. You can scroll down this page to see all the different PHP settings that are currently set. Congratulations! You have a working installation of Apache and PHP!

38.2.2.2. mod_perl

Installing the mod_perl module is not much different than installing the mod_php module. mod_perl allows you to run Perl scripts from your web server using a Perl interpreter that is integrated into Apache.

Do you need mod_perl in order to run Perl scripts on your web server? No. You can just as easily configure Apache to run Perl scripts as CGI programs, calling your system's Perl interpreter every time a web request is made for a Perl script. However, this can potentially place a very high load on your web server, because every time a Perl script is requested, the web server must load the complete Perl interpreter in memory, pass the script to the interpreter, and then unload the interpreter from memory. It's much more efficient to keep the interpreter in memory all the time, which is what mod_perl provides. It also gives you much tighter integration with Apache itself, including the ability to extend the functionality of the web server with Perl code.

Download the latest version of mod_perl from http://perl.apache.org. As of this writing, the latest version is 2.0.2. Uncompress the file and go through the normal compile routine. The syntax is slightly different since this is a Perl package, but the concept is the same.

 # tar xvf mod_perl-2.0-current.tar.gz # cd mod_perl-2.0.2 # perl Makefile.PL MP_APXS=/opt/apache/bin/apxs # make # make install

We are compiling mod_perl as an Apache DSO, just as we did for PHP. After the make install is complete, we need to add this line to our Apache httpd.conf file:

 LoadModule perl_module modules/mod_perl.so

Now we can stop and start Apache and verify that mod_perl is successfully loaded. In addition to the test script like the one we used earlier for PHP, you can also get information on the currently active modules by looking in the error_log after you restart Apache. Every time Apache starts, it writes a line to the error_log showing what version it is and what modules are loaded, along with their versions. After stopping and starting Apache, look in the error_log and you should see a line that reads something like this:

 [Wed Jan 04 14:39:18 2006] [notice] Apache/2.2.0 (Unix) mod_ssl/2.2.0 OpenSSL/0.9.7f DAV/2 PHP/5.1.1 mod_perl/2.0.2 Perl/v5.8.6 configured -- resuming normal operations

This tells us all of the modules that are configured. We see that we're using mod_perl 2.0.2 which was compiled against Perl Version 5.8.6.

38.2.2.3. mod_ssl

mod_ssl allows Apache to encrypt HTTP connections with the Secure Sockets Layer encryption libraries. SSL was originally developed by Netscape and has become the de facto standard for secure, encrypted communication over the World Wide Web.

mod_ssl is an Apache module that allows Apache to interface with the openssl libraries, which are open source implementations of SSL.

Apache versions before 2.0 required that mod_ssl be downloaded and installed separately, similarly to what we did for mod_php and mod_perl. You can get the latest code from http://www.modssl.org. As of this writing, the latest version is 2.8.25 for Apache 1.3.34. Apache 2.0 and above have mod_ssl built in, so there is no extra compilation process; all you need to do is configure it.

In order to understand the process of encrypting your web traffic with mod_ssl, we need to go into the basics of certificate-based cryptography. SSL provides us with two basic pieces of connection security: authentication and encryption.

The authentication piece works like this: when your web browser connects to an SSL-enabled web site (usually by using the https:// protocol in your browser, which tells your browser to attempt to connect to the web server on TCP port 443 instead of port 80 and to expect an encrypted connection), the web server presents the web browser with a certificate. This certificate has detailed information about the web site, including the official fully qualified domain name, contact information, and the Certificate Authority that is vouching for the identity of this web site. Because this is still the authentication phase, we are concerned with proving that this web site really is who it claims to be. And because there is no easy way to trust this company, we instead look to a third party that we do trust in order to vouch for this company.

This is where the Certificate Authority (CA) comes into play. All of the major web browsers include a list of trusted Certificate Authorities. If a web site presents a certificate that has been signed by one of these trusted CAs, the browser accepts the certificate and the encrypted communication can begin. If the certificate was not signed by a trusted CA, the web browser will usually pop up a window warning the end user of this situation and asking if they'd like to proceed. (A lot of CAs are recognized by some popular web browsers but not others, so the warning box can often pop up for legitimate sites.) If the end user agrees to trust the CA that signed the certificate, an encrypted connection is made. If not, the connection is aborted.

Examples of trusted CAs are Verisign, Thawte Consulting, and GeoTrust, to name a few. In order to get a signed certificate from one of these companies, you create a certificate request and then send it to one of these companies. For a fee, they will sign the certificate and send you back a file you can use to handle the authentication piece on your web server. Let's go through the process of creating a certificate signing request and submitting it to a CA.

In order to accomplish this task, you must have the openssl program installed on your system. As of this writing, the latest version is 0.9.7. This should have come from your vendor in a standard package.

Create an RSA private key for your web server. This will be used during the encryption phase.
```
 # openssl genrsa -des3 -out server.key 1024 
```
You will be asked for a passphrase during this process. You cannot create the key without giving a passphrase. The passphrase will be required every time you attempt to use this key. As you can imagine, this can be a little problematic, because Apache reads this key every time it restarts. If you left this passphrase in place, you would have to give the password every time you restarted Apache, which is unworkable if Apache is restarting from an unattended script. So the recommended second step is to remove this passphrase. You must understand the security implications of this! After you remove this passphrase, anyone who gets hold of the server.key file can masquerade as your web server over SSL connections. So keep the file safe from prying eyes.
To remove the passphrase, first make a copy of the key:
```
 # cp server.key server.key.passphrase 
```
Then, remove the passphrase:
```
 # openssl rsa -in server.key.passphrase -out server.key 
```
The server.key file now has no passphrase.
Create a certificate signing request (CSR) using your newly created RSA key. This file will be sent to a trusted CA for signing.
```
 # openssl req -new -key server.key -out server.csr 
```
You will be asked a series of questions about your organization. It is important to answer these questions correctly, as this information will be incorporated into your certificate and will be viewable by all web users visiting your site. In particular, make sure you use the correct fully qualified domain name by which your encrypted site will be referred. SSL certificates are specific to fully qualified domain names, so even though www.example1.com and example1.com give you the same content, you'll have to either decide on one to be your SSL site or buy a separate certificate for each.
Send the CSR to a trusted Certificate Authority.
Most CAs have web forms that you can use to submit your CSR. After they collect their fee from you (usually anywhere from $150 to $900), you will receive a signed certificate via email. Once you receive this email, store the certificate in a file called server.crt.
Configure Apache.
Now that you have an RSA private key and a signed certificate from a trusted CA, you're ready to set up Apache to handle encrypted SSL connections. You need to make changes to two places in the Apache configuration file. In the global section, enable Apache's SSL functionality and set some global variables. Then create a specific virtual host section for SSL connections . There are other ways to accomplish this with Apache, including running two copies of Apache, one listening on port 80 to serve unencrypted pages and the other listening on port 443 to serve encrypted pages, but the virtual host option is much more elegant and easier to maintain. Here are the lines required in the global section of Apache's httpd.conf file:
```
 # SSL Global Options Listen 443 SSLPassPhraseDialog  builtin SSLSessionCache         dbm:/opt/apache/logs/ssl_scache SSLSessionCacheTimeout  300 SSLMutex  file:/opt/apache/logs/ssl_mutex SSLRandomSeed startup builtin SSLRandomSeed connect builtin 
```
More information on what SSL options are available can be found at http://httpd.apache.org/docs/2.2/mod/mod_ssl.html.

The easiest way to set up Apache to handle SSL connections is to create a VirtualHost container that handles all traffic coming in on port 443. Adding onto our example Apache httpd.conf from before for www.example1.com, we can create a container for SSL connections to www.example1.com that looks like this:

 <VirtualHost *:443> DocumentRoot "/opt/apache/www.example1.com" ScriptAlias /cgi-bin /opt/apache/cgi-bin ServerName www.example1.com ServerAdmin webmaster@example1.com ErrorLog /opt/apache/logs/ssl_error_log CustomLog /opt/apache/logs/ssl_access_log combined Options FollowSymLinks SSLEngine on SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL SSLCertificateFile /opt/apache/ssl/server.crt SSLCertificateKeyFile /opt/apache/ssl/server.key </VirtualHost>

The SSLCipherSuite line lists all of the SSL encryption methods that you will support. This is a pretty standard list; consult the mod_ssl documentation for more encryption options.

Once you've restarted Apache, you are ready to serve pages over an encrypted connection. When you point a browser to https://www.example1.com, you'll have an SSL-secured web connection.

But what if you just want to set up an internal SSL site and don't want to have to pay a fee to a trusted CA? Can you still have an SSL-secured site? The answer is yes. Just because your web browser has a short list of Certificate Authorities it inherently trusts doesn't mean you can't sign your own certificate. Use this command to create a self-signed certificate file:

 # openssl req -new -x509 -nodes -sha1 -days 365 -key server.key \   -out server.crt

Copy the resulting server.crt file to /opt/apache/ssl and restart Apache. Now when you point your web browser to https://www.example1.com, you'll probably get a box that looks something like Figure 38-1.

Figure 38-1. Browser accepting certificate from web server

As you can see, the default option is to accept this certificate temporarily for this session. If you select Accept this certificate permanently, you are telling the browser to accept this certificate from now on, even though it is not from a trusted CA. Clicking on Examine Certificate shows us Figure 38-2.

Figure 38-2. Certificate from web server displayed

You can see that this certificate contains the information we entered when we signed the certificate ourselves. So why use a trusted CA at all and pay the money when you can sign your own certificates? There are two reasons:

End users don't like to see this error box. It makes them nervous, and they may abort the connection.
Even though the rest of the world doesn't necessarily trust you, they probably trust one of the companies on the short list of trusted Certificate Authorities. So you're basically paying these companies to vouch for you.

SSL support has become a very common option for Apache users, as security issues become more important on the Web. We encourage anyone who operates a web site that takes personal information (even usernames and passwords) to consider the benefits of SSL encryption.

38.2.2.4. mod_rewrite

The mod_rewrite module was mentioned during our initial compilation of Apache. mod_rewrite allows you to apply regular expression patterns to HTTP requests and manipulate them as you see fit. This gives you the ability to redirect incoming requests, directing users to different pages or replacing content on the fly. mod_rewrite is one of the most powerful Apache modules and is widely used by webmasters wanting more control over the content and layout of their site.

The complete documentation for mod_rewrite can be found at http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html. Let's look at a few examples to see how mod_rewrite works.

In this example, we'll use our standard web site www.example1.com. Let's say our company just purchased another company called example2.com. We are now hosting web sites for both companies, but we want to redirect people who try to go to example2.com to example1.com. This is easy to do with a VirtualHost container, as we have seen before. We could just make sure our VirtualHost container looked something like this:

 <VirtualHost *:80> ServerName www.example1.com ServerAlias example1.com www.example2.com example2.com DocumentRoot /opt/apache/www.example1.com CustomLog /opt/apache/logs/www.example1.com.log combined </VirtualHost>

Now, any request that comes in for any of these four web sites (www.example1.com, example1.com, www.example2.com, or example2.com) will display the content that lives in /opt/apache/www.example1.com. However, we have two problems with this setup. First, if someone accesses this site with www.example2.com, the URL in his web browser stays www.example2.com throughout his visit to the site. Since we now own this company, we want to change that automatically to www.example1.com. Also, we may want to know how many people come into our site through www.example1.com and how many come in through www.example2.com. Since we have only one log file in this situation, we won't see a breakdown of traffic between the domains.

A better solution would be to use mod_rewrite. We set up a separate VirtualHost container for www.example2.com and use mod_rewrite to redirect traffic to www.example1.com:

 <VirtualHost *:80> ServerName www.example2.com ServerAlias example2.com DocumentRoot /opt/apache/www.example1.com CustomLog /opt/apache/logs/www.example2.com.log combined RewriteEngine On RewriteRule ^(.+) http://www.example1.com [R] </VirtualHost>

The ServerName and ServerAlias lines ensure that requests coming in for these domains are handled by this VirtualHost container (assuming we removed www.example2.com and example2.com from the example1.com VirtualHost container). The DocumentRoot can essentially be any valid directory, because we're not actually going to serve any content from it. The CustomLog will be a specific log for this VirtualHost so we know how many requests are coming in for www.example2.com. The RewriteEngine On line tells Apache to turn on the rewrite rule-processing engine for this VirtualHost. This is a necessary step, because the rewrite engine does incur some overhead, and you don't want Apache wasting memory on it if you're not going to use it. Finally, the RewriteRule line does the actual work for us. The format of the RewriteRule syntax is:

 RewriteRule pattern substitution [options]

The pattern can be any valid regular expression that will be applied to the requested URL. The substitution is what the URL is replaced with. The options are ways to modify the behavior of RewriteRule and are not always required. In our example, we want to redirect every request for www.example2.com to www.example1.com. We also want to ensure that if someone had bookmarked a deep-linked page of www.example2.com (for example, www.example2.com/products/1/a/product.html), she is redirected to the home page of www.example1.com. Since we want our initial regular expression to catch everything, we use a catchall regular expression:

 ^(.+)

The ^ character indicates the start of a line. The parentheses enclose the rest of our regular expression. The period (.) matches any single character, and the plus sign (+) is a modifier for the period, meaning one or more instances of the preceding character. So this regular expression essentially means "match anything."

The substitution section is what we are going to replace the URL with. In this case, we want to replace every URL with our home page of www.example1.com. The only option we're using in this case is the R option, which forces an external redirect. When you're redirecting from one domain to another, you need to use this R option so Apache prepends the http:// to your rewrite destination. This option is not required if you're simply redirecting to another file in the same DocumentRoot.

Another, more complicated example takes advantage of the RewriteCond expression to query more than one variable before rewriting occurs. RewriteCond is also used to query something other than the URL before a rewrite. In this example, we've done some site redesign, but we don't want to break any bookmarks that web users may already have for our site. In the past, people accessed our pages through a CGI script, so requests looked like this:

 http://www.example1.com/cgi-bin/go.cgi?page1

and

 http://www.example1.com/cgi-bin/go.cgi?page2

We have changed our web layout, so the URLs should now be:

 http://www.example1.com/page1.html

and

 http://www.example1.com/page2.html

How can we handle this with mod_rewrite so the old URLs still work?

 <VirtualHost *:80> ServerName www.example1.com ServerAlias example1.com DocumentRoot /opt/apache/www.example1.com CustomLog /opt/apache/logs/www.example1.com.log combined RewriteEngine On RewriteCond %{QUERY_STRING} ^page.+ RewriteRule ^(.+) /%{QUERY_STRING}.html </VirtualHost>

In our RewriteCond line, we're looking at the value of the environment variable QUERY_STRING. The complete list of variables that you can query can be found in the mod_rewrite instructions at http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html and includes variables such as HTTP_USER_AGENT, REMOTE_USER, and REMOTE_HOST. In this case, we're interested in performing a rewrite only if our QUERY_STRING variable matches the regular expression:

 ^page.+

Again, this is similar to our last regular expression. The ^ means the start of the line, and the .+ means "match anything," so we will perform a rewrite only if our query string starts with page and has anything else after it. If that is the case, the RewriteRule will be executed. It says "replace any URL with /%{QUERY_STRING}.html." So a URL of the form http://www.example1.com/cgi-bin/go.cgi?page1 would be redirected to http://www.example1.com/page1.html.

Be sure to watch the Apache log files when you are troubleshooting your mod_rewrite rules. The error_log file in particular will give you a good idea of exactly what is going on if Apache is not behaving as you think it should.

38.2.2.5. Apache performance tuning

So far, we have looked at how to install and configure the Apache web server. We have touched on many different options, including DSO, SSL, mod_php, and mod_rewrite. This final Apache section deals with performance tuning.

The Apache web server process follows the common Unix parent/child model. When you start the Apache web server, you are starting a parent process. This parent process will spawn a number of children to handle incoming HTTP requests. If the number of web requests exceeds the number of children currently available, Apache will spawn more, up to a specified limit. As you can imagine, the spawning of new children while HTTP requests are pending not only places overhead on the web server itself, but can adversely affect the web browsing experience, as web surfers sit and wait for the web server to honor their request. There are a number of options that you can set in the global httpd.conf section to maximize the performance of your Apache install:

StartServers: This option tells Apache how many children to spawn when it initially starts up. You want to ensure that this number is a few more than the average number of connections you see on your web site. If your web site averages 10 simultaneous HTTP requests at all times, it's probably a good idea to set StartServers to at least 15.
MinSpareServers: This option sets how many idle child processes should be kept around. Idle children are useful when traffic spikes. You don't want to be in a situation in which your web surfers have to wait for Apache to spawn new children to handle their requests. Making this value between 5 and 20 (depending upon how busy your site is) ensures that your site handles spikes in traffic gracefully.
MaxSpareServers: This option sets the maximum number of idle children. This is useful to save memory on your web server. If you have a sudden spike in traffic that causes 50 children to spawn, but then traffic goes back down to 5 connections, you don't want 45 idle children taking up memory and doing nothing. This value needs to be higher than MinSpareServers. It's usually a good idea to set this between 5 and 10 values higher than MinSpareServers.
MaxClients: This option sets how many simultaneous HTTP requests your server can handle. If this value is reached, no new HTTP requests will be handled, resulting in an error on the web browser end. The purpose of this option is to prevent Apache from attempting to handle so many requests that it consumes all the resources on the server itself. By default, Apache won't let you set this higher than 256, but that limit can be extended by modifying the code before you compile Apache.

Performance tuning can be a challenge. Careful monitoring of your Apache installation is essential in order to identify what the correct values should be for the parameters listed in this section. For more information on Apache monitoring, the mod_status module can be very useful. You can read its documentation at http://httpd.apache.org/docs/1.3/mod/mod_status.html.