Configuring Apache

 < Free Open Study > 



This section will discuss configuring Apache, after it's been installed. The previous section discussed configuring Apache's compile-time options, which really boils down to enabling or disabling specific features. The run-time options, however, govern the actual behavior of Apache as it runs.

Cross-Reference 

For more information on compile-time versus run-time options, see the section "Software Configuration Options" in Chapter 7.

Apache uses a typical server configuration; that is, it reads a configuration file that sets a variety of options and alters its behavior accordingly. Apache is a pretty complicated piece of software, however, so it comes with a lot of files. This section first summarizes the various files and directories relevant to Apache, and then discusses how to use these files to change the configuration.

Navigating the Installation Directory

The previous section installed Apache into the /opt/apache directory. A quick look in that directory will reveal a number of subdirectories. This section describes these directories, focusing on the important ones. After this "guided tour" of an Apache installation, the next section discusses how to configure the server. To get started, Table 11-2 contains a list of the standard Apache configuration directories.

Table 11-2: Apache Installation Subdirectories

DIRECTORY

CONTENTS

bin

Program binaries for the server and various utilities

build

Utility scripts and other files used for building and adding additional DSO modules

cgi-bin

Web scripts used to dynamically generate web content

conf

Configuration files for the server

error

Files containing text messages the server displays when various errors occur

htdocs

Actual content files served up by the HTTP Server

icons

Small graphical icons referred to by the server when printing certain dynamically generated pages (such as directory listings)

include

Files required to compile DSO modules and other programs

lib

Shared libraries (not DSO modules) required by the HTTP Server

logs

Files containing log information on various HTTP events

man

Standard Unix man pages for the server and utility programs

manual

The full Apache installation and usage manual in HTML format

modules

Dynamically loadable (DSO) module files

Some of the directories listed in Table 11-2 are covered individually in the following sections. Others are less important and are discussed as a group at the end of this section.

Programs in the bin Directory

As you probably expect from reading Chapter 3, the bin subdirectory of the toplevel Apache installation directory contains binary programs. This includes both the actual Apache HTTP Server program (httpd) and several utility programs. Table 11-3 summarizes these programs.

Table 11-3: Programs in the Apache bin Directory

PROGRAM

PURPOSE

httpd

The Apache HTTP Server

ab

A developer's utility for benchmarking the performance of an HTTP Server (such as Apache)

apachectl

Provides a convenient way to start, stop, and manage the HTTP Server

apxs

A developer's and administrator's utility used to compile and install new DSO modules to an existing Apache installation

checkgid

Used internally by the Apache programs

dbmmanage, htdigest, htpasswd

Programs used to manage Apache user accounts and passwords in several formats

logresolve

A convenient utility to resolve numeric IP addresses in Apache log files into actual human-readable hostnames

rotatelogs

"Rotates" the log files by renaming current log files and starting new log files

Server Programs

The httpd, ab, apachectl, and checkgid programs are all related to server management and performance testing. The httpd program is, of course, the Apache HTTP Server itself; this is the main event, so to speak. The apachectl program is actually a shell script, and it is used to start, stop, restart, and otherwise manage the httpd process. The apachectl program is similar to the service scripts used by the SysV init model on Red Hat Linux, as discussed in Chapters 3 and 4.

The checkgid and ab programs are less crucial. The checkgid program is simply a small utility program invoked by Apache; users and administrators don't need to worry about it (it doesn't even have a man page!). The ab program is the "Apache Bench" program, and it's used to benchmark the performance of an HTTP Server. (Even though ab is included with Apache, it can really work with many HTTP servers.) The ab program is used by administrators and developers to test performance-tuning configurations and the performance of web applications. Generally, most users won't need to worry about ab unless there's a problem with the installation.

Administration Programs

In addition to the most important server program and support programs, there are also a few convenience utilities that are included to make users' lives easier. The apxs program is the "Apache Extension" utility. This program is used to compile and install Apache DSO modules. (Recall from the previous section that DSOs are files that Apache can load when it starts up to include certain features into the server.) DSOs provide a way to add functionality to Apache without having to recompile the entire server. If you find yourself needing to install a new Apache DSO module, apxs will be very useful. It's pretty simple to use, but if you have trouble, check out Apache's documentation on it.

The logresolve program is used to convert numeric IP addresses in Apache log files into host names. A typical Apache server gets a lot of hits from web browsers on the Internet. Each hit contains information about the client, including its IP address. Apache logs these hits (as well as other things, such as errors), but the log entries include only the IP address. This is because the process of looking up a client's host name from the Domain Nameservers (DNSs) can be very expensive and needlessly waste bandwidth. The logresolve program is included as an easy way to take Apache's log files (which contain only IP addresses) and "reformat" them with host names fetched from the DNS service. You can use logresolve any time you need or want to view your site's logs with more than just IP addresses.

The rotatelogs program manages Apache's log files. As the HTTP Server runs for a long time, it keeps logging various events (especially hits from web browsers) into its log files. These files can become very large as a result. The rotatelogs command "rotates" Apache's log files; essentially, it just renames the current log files and then starts Apache logging to fresh files. The user can then delete the old log files after reviewing them, in order to conserve disk space.

Note 

The rotatelog command isn't actually used from the command line; instead, it's inserted into Apache's logging mechanism via the configuration files. For more information, see the man page for rotatelogs included with the Apache installation.

User Database Management Programs

A common feature required by web sites is the restriction of access to certain pages or sets of pages to specific users. (These pages and sets of pages are known as realms.) Apache supports this functionality by various directives in its configuration files. However, another key component in supporting user authentication is, of course, to have a database of users.

There is any number of ways to set up a database for users. Commercial web sites, with which most of you will be familiar, use some kind of highcapacity relational database or directory to store user account information. The authentication that Apache supports out of the box, however, is more modest. It supports user databases stored on local disks, in a few different formats. (There are, of course, third-party modules that you can use to provide support for Apache for high-capacity data stores; they're just not included with the standard Apache distribution.)

The three programs dbmmanage, htdigest, and htpasswd are utilities that manage these user databases. The nuances of which data format is best for a given application, and when you can use Apache's basic mechanism or switch to a high-capacity database are advanced topics and aren't really relevant to this book. If you need more information, please consult the manual pages for these programs (included with the Apache installation) or the documentation on the web site at httpd.apache.org.

Configuration Files in the conf Directory

The conf directory contains all configuration files for Apache. (In reality, the httpd program, which is the actual Apache HTTP Server program, can be told which configuration file to use as a command-line parameter, so the Apache configuration directory isn't hard-coded or forced to a specific value.) This section briefly discusses the contents of this directory.

The canonical name for Apache's configuration file is httpd.conf. (Again, however, this doesn't have to be the case; the httpd program can be passed an arbitrary file name.) A typical installation of Apache will have several sample httpd.conf files, such as a configuration tuned for maximum performance and one that enables SSL support, as well as a standard "vanilla" httpd.conf. As a result, there will usually be several files with a ".conf" extension in the conf subdirectory of the Apache installation; only one of these files actually needs to be there. Additionally, for each .conf file there is a -std.conf file. The -std files are copies of the original files; their purpose is to act as a backup in case a tinkering administrator really mucks up a .conf file. However, the -std files are also not required for Apache to run.

You can actually remove all the .conf and -std.conf files except for httpd.conf if you don't like them cluttering up the directory. That will leave the traditional Apache configuration file as a starting point to begin customizing the server. However, it definitely pays to take a look at the other sample files before removing them if you choose to do so; there's a lot of very valuable information and sample configurations in those files. Of course, there's still no substitute for reading the actual documentation. The sample configuration files are just that and are not comprehensive.

There are two other files in the conf subdirectory: mime.types and magic. MIME is the Multimedia Internet Mail Extension, and it is an industry standard for specifying the type of a file, such as "executable" or "plain text" or "HTML." (Don't be confused by the "Mail" in MIME—the MIME standard can be used just about anywhere!) Apache's mime.types file is just a mapping that assigns MIME types to file extensions. (For example, the extension ".txt" is mapped to a MIME type of "text/plain".) Usually, you won't have to modify this file, since it contains reasonable defaults. If you do, carefully read the Apache documentation on it, since altering mime.types can have subtle consequences. The magic file works similarly to mime.types, except that it is used to assign MIME types to files where the file extension may be unreliable, based on "magic numbers" contained in the file. This is frequently used for items such as audio files.

Log Files in the logs Directory

The logs subdirectory contains, intuitively enough, the log files for the Apache HTTP Server. Apache's logging mechanism is very flexible, and it can be configured to log a variety of events to any number of files in any format. Altering this configuration is seldom necessary, and so I don't cover it here. Fortunately, Apache has a default logging configuration that is almost always good enough. Table 11-4 summarizes the files involved in this default configuration.

Table 11-4: Default Apache Log Files

FILE NAME

CONTENTS

error_log

Errors related to the functioning of the server or to failed or erroneous

requests

from web browser clients; used for error debugging

access_log

Information on each request received from a client; used for gathering statistics on the web site and users' usage

httpd.pid

Contains the Unix process ID of the currently running httpd process

Technically, the httpd.pid file isn't really a log file; it just contains the process ID (PID) of Apache. This file makes it easy to quickly figure out the PID in order to restart or shut down Apache. Also, depending on what platform Apache is running on and what options are enabled, there may be more or fewer files in the logs directory than shown in Table 11-4.

Content Directories

The point of the Apache HTTP Server is, of course, to serve up documents such as HTML web pages. Apache also supports many different programming languages and environments that web developers can use to create and run dynamic, interactive web applications. The Apache installation uses the htdocs and cgi-bin directories to contain the files that actually make up the site's content.

The htdocs directory is the home of "static" HTML files. These are files that don't have any dynamically generated content, such as tables of contents, default home pages, standard header and footer HTML files, and so on. The htdocs directory also contains any other static files needed by the site, such as graphical image files, tarball (.tar.gz) files that are downloaded from the site, and so on. Literally, any file placed into the htdocs directory (and that the httpd process has permission to access) becomes available for download over the web site. In other words, don't put anything in htdocs that you don't want people to be able to download!

The cgi-bin directory contains scripts and programs that make use of the Common Gateway Interface (CGI) standard for dynamic web applications. CGI defines a standard, fixed way in which a web server (such as Apache) interacts with the programs used to generate content. Web developers write programs in a programming language such as Perl, Python, C, or some other language that understands the CGI standard, and this allows them to work within a web server that supports CGI, such as Apache. Apache's cgi-bin directory contains all the CGI programs used on the site. The directory is separate from the htdocs directory for security reasons and to make it easier to manage.

Note that the htdocs and cgi-bin directories are for the "standard" way of using Apache. Apache is extremely flexible and supports a wide variety of ways to serve up and dynamically generate content. For example, Apache can be used with web applications using Sun's Java Servlets technology, which defines a very different way that web application data is stored and served up. Java Servlet–based sites frequently don't use either htdocs or cgi-bin.

Documentation Directories

There are two directories with similar names in a standard Apache installation: manual and man. No, this wasn't a typo! These directories contain documentation for Apache, and for its related support utilities.

The manual directory contains a copy of the Apache user's manual, which is in HTML. You can use any web browser to browse this manual, which is a copy of the documentation on Apache's web site. These files are not required for Apache to run, however, so if you're low on disk space they're safe to delete (though they are definitely useful to have around!). Depending on the configuration, you may also need to remove the reference to the manual directory in the httpd.conf file; check that file for details. At any rate, you can access the manual from your server by accessing the URL http://www.localhost/manual with your favorite browser.

The man directory, meanwhile, contains Unix manual pages for the program binaries in the bin directory. This is where the manual pages for httpd, apachectl, and so on are located. However, if you've installed Apache into a directory such as /usr/local or /opt/apache, you may not be able to access these pages because the man program itself doesn't know to look in those directories for the pages. In order to access these manual pages, use a command similar to this one:

 man -M /opt/apache/man <command name> 

You'll have to substitute your actual installation directory for /opt/apache.

Note 

The -M option to the man program is actually a useful thing to know, in general! The option lets you specify an alternate directory for it to search for manual pages. It's very handy when you install software into a location other than /usr.

start sidebar
Using the MANPATH Variable

Using the -M parameter to the man command can be tedious if you check man pages frequently. Fortunately, the man program supports a shell environment variable named MANPATH. By setting this variable to a colon-separated list of paths containing man pages (such as /opt/apache/man), you can permanently add directories for the man program to search when it looks for manual pages. By adding this setting to the global user configuration scripts described in Chapters 4, 5, and 6 (for Red Hat Linux, Slackware Linux, and Debian GNU/Linux, respectively), you can make manual pages in nonstandard locations available to all users.

end sidebar

Other Directories

Several other directories are included in an Apache installation—namely, build, error, icons, include, lib, and modules. In most cases, these directories are used by Apache and the related tools, and users and administrators don't have to do anything in or with these directories. If you'd like more information on them, you can check the Apache web site and other documentation.

Customizing Apache

So far, I've discussed only the installation of Apache and very basic configuration tasks required to get it to run. Apache's purpose, however, is to run a web site, and so the next step is to discuss how to customize Apache to run your web site.

It's imperative to remember that Apache is a very sophisticated product, and it is capable of an amazingly wide range of behavior. The goal of this section is not to exhaustively describe how to customize Apache for every possible circumstance. Instead, the goal is to illustrate the mechanics of customizing Apache and also to demonstrate a particular methodology for configuring a server. After you read this section, you'll not only have a basic understanding of how to configure Apache, but you'll also understand how to go about figuring out other servers' configurations.

Perhaps the most common customization you'll need to do to an Apache HTTP Server is configuring specific directories. Generally, a given web site has many directories, and sometimes each directory needs specific behaviors or features. For example, most directories might be publicly accessible, but one directory needs to be "locked down" in terms of security and to require a password. To support this functionality, Apache allows you to specify properties or enable features for individual directories, by using directives within "Directory" blocks in the httpd.conf file.

A directive is, in Apache's parlance, a statement in the httpd.conf file that instructs the server to take some action. For example, a line that enables SSL/TLS encryption support would be a directive, as would a line that turns off the ability to list the contents of a directory. Apache directives, in turn, are enclosed in various "blocks" that determine the scope of their effect.

There are many types of these enclosing blocks, but the simplest is the Directory block. A Directory block has a definite beginning (marked by a <Directory > line), and a definite ending (marked by a </Directory> line). And directives inside the block affect only that block (and any children of that block).

A typical Apache httpd.conf configuration file contains an entry for the toplevel directory on the site. This entry contains some directives that are applied to the root directory. Since they also apply to any children of the root directory, this effectively sets up site-wide defaults. The following text is an example Directory block that might be used for the Apache installation discussed in this chapter:

 <Directory "/opt/apache/htdocs">     Options FollowSymLinks     AllowOverride None     Order allow,deny     Allow from all     DirectoryIndex index.html index.htm <Directory> 

The path of /opt/apache/htdocs in the preceding directory block points to the location in the installation where HTML content is to be placed, and the directives enclosed in the block will be applied to that directory. Since all of the content has to be located under that directory, meanwhile, this block also effectively serves as a site-wide default. (Describing what each of those directives does would take too much time and isn't really relevant—see the Apache documentation mentioned earlier for details.)

It is possible to override these defaults on a case-by-case basis, however. For example, perhaps you wish to add files named home.html to the DirectoryIndex directive, but only for the directory /opt/apache/htdocs/home. The following Directory block will accomplish this; the DirectoryIndex directive will be updated for that specific directory, but all other directives will remain unchanged.

 <Directory "/opt/apache/htdocs/home">     DirectoryIndex index.html index.htm home.html </Directory> 

Most of the customization of an Apache site revolves around creating these Directory blocks (and other blocks) as appropriate, and then using the correct directives within each block. This model is actually very powerful, since it lets you customize the behavior of each individual area of the site. Unfortunately, this is as much detail as I can really go into here, since I just couldn't do justice to the full power of Apache's configuration system in the space I have. Read the Apache manual cited earlier for full details.

All this terminology about Directories and directives aside, what this really boils down to is that Apache provides you with a set of options (directives) and provides you with a mechanism (Directory blocks) to scope the options to different parts of the system. This is actually a very common model for configuration of server daemon software: Create scope blocks somehow, and then set options within each scope block.

In the Apache world, these scope blocks are created by pairs of <Directory> and </Directory> lines; in other programs, they might be set up through pairs of braces ({ and }) or simply through positioning of the lines in the file (as with the OpenSSH server). Even though these programs differ in their syntax, it's the same general idea in each case. When you find yourself trying to configure a new, unfamiliar program, try and identify cases where you establish "scopes" and create "options" within them.



 < Free Open Study > 



Tuning and Customizing a Linux System
Tuning and Customizing a Linux System
ISBN: 1893115275
EAN: 2147483647
Year: 2002
Pages: 159

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net