After the installation described in Chapter 1, you now have a shiny bright apache/httpd, and you're ready for anything. For our next step, we will be creating a number of demonstration web sites.
It might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential:
Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests.
Contains the documents, images, data, and so forth that you want to serve up to your clients.
Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected.
Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory.
In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache.
Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later.
If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is.
httpd -d /usr/www/APACHE3/site.for_instance -f...
apache -d c:/usr/www/APACHE3/site.for_instance
Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32.
Also note that the Win32 version still uses forward slashes rather than backslashes. This is because Apache internally uses forward slashes on all platforms; therefore, you should never use a backslash in an Apache Config file, regardless of the operating system.
Once you start the executable, Apache runs silently in the background, waiting for a client's request to arrive on a port to which it is listening. When a request arrives, Apache either does its thing or fouls up and makes a note in the log file.
What we call "a site" here may appear to the outside world as hundred of sites, because the Config file can invoke many virtual hosts.
When you are tired of the whole Web business, you kill Apache (see Section 2.3, later in this chapter), and the computer reverts to being a doorstop.
Various issues arise in the course of implementing this simple scheme, and the rest of this book is an attempt to deal with some of them. As we pointed out in the preface, running a web site can involve many questions far outside the scope of this book. All we deal with here is how to make Apache do what you want. We often have to leave the questions of what you want to do and whyyou might want to do it to a higher tribunal.
httpd (or apache) takes the following flags. (This is information you can evoke by running httpd -h):
-Usage: httpd.20 [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-t] [-T] Options: -D name : define a name for use in <IfDefine name> directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled in modules -L : list available configuration directives -t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check)
-i : Installs Apache as an NT service. -u : Uninstalls Apache as an NT service. -s : Under NT, prevents Apache registering itself as an NT service. If you are running under Win95 this flag does not seem essential, but it would be advisable to include it anyway. This flag should be used when starting Apache from the command line, but it is easy to forget because nothing goes wrong if you leave it out. The main advantage is a faster startup (omitting it causes a 30- second delay). -k shutdown|restart : Run on another console window, apache -k shutdown stops Apache gracefully, and apache -k restart stops it and restarts it gracefully.
The Apache Group seems to put in extra flags quite often, so it is worth experimenting with apache -? (or httpd -?) to see what you get.
You can't do much with Apache without a web site to play with. To embody our first shaky steps, we created site.toddle as a subdirectory, /usr/www/APACHE3/site.toddle, which you will find on the code download. Since you may want to keep your demonstration sites somewhere else, we normally refer to this path as ... /. So we will talk about ... /site.toddle. (Windows users, please read this as ...\site.toddle).
In ... /site.toddle, we created the three subdirectories that Apache expects: conf, logs, and htdocs. The README file in Apache's root directory states:
The next step is to edit the configuration files for the server. In the subdirectory called conf you should find distribution versions of the three configuration files: srm.conf-dist, access.conf-dist, and httpd.conf-dist.
As a legacy from the NCSA server, Apache will accept these three Config files. But we strongly advise you to put everything you need in httpd.conf and to delete the other two. It is much easier to manage the Config file if there is only one of them. From Apache v1.3.4-dev on, this has become Group doctrine. In earlier versions of Apache, it was necessary to disable these files explicitly once they were deleted, but in v1.3 it is enough that they do not exist.
The README file continues with advice about editing these files, which we will disregard. In fact, we don't have to set about this job yet; we will learn more later. A simple expedient for now is to run Apache with no configuration and to let it prompt us for what it needs.
The Configuration File
Before we start running Apache with no configuration, we would like to say a few words about the philosophy of the Configuration File. Apache comes with a huge file that, as we observe elsewhere, tries to tell you every possible thing the user might need to know about Apache. If you are new to the software, a vast amount of this will be gibberish to you. However, many Apache users modify this file to adapt it to their needs.
We feel that this is a VERY BAD IDEA INDEED. The file is so complicated to start with that it is very hard to see what to do. It is all too easy to make amendments and then to forget what you have done. The resulting mess then stays around, perhaps for years, being teamed with possibly incompatible Apache updates, until it finally stops working altogether. It is then very difficult to disentangle your input from the absolute original (which you probably have not kept and is now unobtainable).
It is much better to start with a completely minimal file and add to it only what is absolutely necessary.
The set-up process for Unix and Windows systems is quite different, so they are described in two separate sections as follows. If you're using Unix, read on; if not, skip to Section 2.4 later in this chapter.
We can point httpd at our site with the -d flag (notice the full pathname to the site.toddle directory, which will probably be different on your machine):
% httpd -d /usr/www/APACHE3/site.toddle
Since you will be typing this a lot, it's sensible to copy it into a script called go. This can go in /usr/local/bin or in each local site. We have done the latter since it is convenient to change it slightly from time to time. Create it by typing:
% cat > /usr/local/bin/go test -d logs || mkdir logs httpd -f 'pwd'/conf/httpd$1.conf -d 'pwd' ^d
^d is shorthand for Ctrl-D, which ends the input and gets your prompt back. This go will work on every site. It creates a logs directory if one does not exist, and it explicitly specifies paths for the ServerRoot directory (-d) and the Config file (-f). The command 'pwd' finds the current directory with the Unix command pwd. The back-ticks are essential: they substitute pwd's value into the script in other words, we will run Apache with whatever configuration is in our current directory. To accomodate sites where we have more than one Config file, we have used ...httpd$1... where you might expect to see ...httpd... The symbol $1 copies the first argument (if any) given to the command go. Thus ./go 2 will run the Config file called httpd2.conf, and ./go by itself will run httpd.conf.
Remember that you have to be in the site directory. If you try to run this script from somewhere else, pwd's return will be nonsense, and Apache will complain that it 'could not open document config file ...'.
Make go runnable, and run it by typing the following (note that you have to be in the directory .../site.toddle when you run go):
% chmod +x go % go
If you get the error message:
go: command not found
you need to type:
This launches Apache in the background. Check that it's running by typing something like this (arguments to psvary from Unix to Unix):
% ps -aux
This Unix utility lists all the processes running, among which you should find several httpds.
Sooner or later, you have finished testing and want to stop Apache. To do this, you have to get the process identity (PID) of the program httpd using ps -aux:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 701 0.0 0.8 396 240 v0 R+ 2:49PM 0:00.00 ps -aux root 1 0.0 0.9 420 260 ?? Is 8:13AM 0:00.02 /sbin/init -- root 2 0.0 0.0 0 0 ?? DL 8:13AM 0:00.04 (pagedaemon) root 3 0.0 0.0 0 0 ?? DL 8:13AM 0:00.00 (vmdaemon) root 4 0.0 0.0 0 0 ?? DL 8:13AM 0:02.24 (syncer) root 35 0.0 0.3 204 84 ?? Is 8:13AM 0:00.00 adjkerntz -i root 98 0.0 1.8 820 524 ?? Is 7:13AM 0:00.43 syslogd daemon 107 0.0 1.3 820 384 ?? Is 7:13AM 0:00.00 /usr/sbin/portma root 139 0.0 2.1 888 604 ?? Is 7:13AM 0:00.07 inetd root 142 0.0 2.0 980 592 ?? Ss 7:13AM 0:00.27 cron root 146 0.0 3.2 1304 936 ?? Is 7:13AM 0:00.25 sendmail: accept root 209 0.0 1.0 500 296 con- I 7:13AM 0:00.02 /bin/sh /usr/loc root 238 0.0 5.8 10996 1676 con- I 7:13AM 0:00.09 /usr/local/libex root 239 0.0 1.1 460 316 v0 Is 7:13AM 0:00.09 -csh (csh) root 240 0.0 1.2 460 336 v1 Is 7:13AM 0:00.07 -csh (csh) root 241 0.0 1.2 460 336 v2 Is 7:13AM 0:00.07 -csh (csh) root 251 0.0 1.7 1052 484 v0 S 7:14AM 0:00.32 bash root 576 0.0 1.8 1048 508 v1 I 2:18PM 0:00.07 bash root 618 0.0 1.7 1040 500 v2 I 2:22PM 0:00.04 bash root 627 0.0 2.2 992 632 v2 I+ 2:22PM 0:00.02 mince demo_test root 630 0.0 2.2 992 636 v1 I+ 2:23PM 0:00.06 mince home root 694 0.0 6.7 2548 1968 ?? Ss 2:47PM 0:00.03 httpd -d /u webuser 695 0.0 7.0 2548 2044 ?? I 2:47PM 0:00.00 httpd -d /u webuser 696 0.0 7.0 2548 2044 ?? I 2:47PM 0:00.00 httpd -d /u webuser 697 0.0 7.0 2548 2044 ?? I 2:47PM 0:00.00 httpd -d /u webuser 698 0.0 7.0 2548 2044 ?? I 2:47PM 0:00.00 httpd -d /u webuser 699 0.0 7.0 2548 2044 ?? I 2:47PM 0:00.00 httpd -d /u
To kill Apache, you need to find the PID of the main copy of httpd and then do kill <PID> the child processes will die with it. In the previous example the process to kill is 694 the copy of httpd that belongs to root. The command is this:
% kill 694
If ps -aux produces more printout than will fit on a screen, you can tame it with ps -aux | more hit Return to see another line or Space to see another screen. It is important to make sure that the Apache process is properly killed because you can quite easily kill a child process by mistake and then start a new copy of the server with its children and a different Config file or Perl scripts and so get yourself into a royal muddle.
To get just the lines from ps that you want, you can use:
ps awlx | grep httpd
Alternatively and better, since it is less prone to finger trouble, Apache writes its PID in the file ... /logs/httpd.pid (by default see the PidFile directive), and you can write yourself a little script, as follows:
kill 'cat /usr/www/APACHE3/site.toddle/logs/httpd.pid'
You may prefer to put more generalized versions of these scripts somewhere on your path. stop looks like this:
pwd | read path kill 'cat $path/logs/httpd.pid'
Or, if you don't plan to mess with many different configurations, use .../src/support/apachect1 to start and stop Apache in the default directory. You might want to copy it into /usr/local/bin to get it onto the path, or add $apacheinstalldir/bin to your path. It uses the following flags:
usage: ./apachectl (start|stop|restart|fullstatus|status|graceful|configtest|help)
Restart httpd if running by sending a SIGHUP or start if not running.
Dump a full status screen; requires lynx and mod_status enabled.
Dump a short status screen; requires lynx and mod_status enabled.
Do a graceful restart by sending a SIGUSR1 or start if not running.
Do a configuration syntax test.
When we typed ./go, nothing appeared to happen, but when we looked in the logs subdirectory, we found a file called error_log with the entry:
[<date>]:'mod_unique_id: unable to get hostbyname ("myname.my.domain")
In our case, this problem was due to the odd way we were running Apache, and it will only affect you if you are running on a host with no DNS or on an operating system that has difficulty determining the local hostname. The solution was to edit the file /etc/hosts and add the line:
10.0.0.2 myname.my.domain myname
where 10.0.0.2 is the IP number we were using for testing.
However, our troubles were not yet over. When we reran httpd, we received the following error message:
[<date>]--couldn't determine user name from uid
This means more than might at first appear. We had logged in as root. Because of the security worries of letting outsiders log in with superuser powers, Apache, having been started with root permissions so that it can bind to port 80, has attempted to change its user ID to -1. On many Unix systems, this ID corresponds to the user nobody : a supposedly harmless user. However, it seems that FreeBSD does not understand this notion, hence the error message. In any case, it really isn't a great idea to allow Apache to run as nobody (or any other shared user), because you run the risk that an attacker exploiting the fact that various different services are sharing the same user, that is, if you are running several different services (ftp, mail, etc) on the same machine.
The remedy is to create a new user, called webuser, belonging to webgroup. The names are unimportant. The main thing is that this user should be in a group of its own and should not actually be used by anyone for anything else. On most Unix systems, create the group first by running adduser -group webgroup then the user by running adduser. You will be asked for passwords for both. If the system insists on a password, use some obscure non-English string like cQuycn75Vg. Ideally, you should make sure that the newly created user cannot actually log in; how this is achieved varies according to operating system: you may have to replace the encrypted password in /etc/passwd, or remove the home directory, or perhaps something else. Having told the operating system about this user, you now have to tell Apache. Edit the file httpd.conf to include the following lines:
User webuser Group webgroup
The following are the interesting directives.
The User directive sets the user ID under which the server will run when answering requests.
User unix-userid Default: User #-1 Server config, virtual host
In order to use this directive, the standalone server must be run initially as root. unix-userid is one of the following:
Refers to the given user by name
Refers to a user by his number
The user should have no privileges that allow access to files not intended to be visible to the outside world; similarly, the user should not be able to execute code that is not meant for httpd requests. However, the user must have access to certain things the files it serves, for example, or mod_proxy 's cache, when enabled (see the CacheRoot directive in Chapter 9).
The Group directive sets the group under which the server will answer requests.
Group unix-group Default: Group #-1 Server config, virtual host
To use this directive, the standalone server must be run initially as root. unix-group is one of the following:
Refers to the given group by name
Refers to a group by its number
It is recommended that you set up a new group specifically for running the server. Some administrators use group nobody, but this is not always possible or desirable, as noted earlier.
Now, when you run httpd and look for the PID, you will find that one copy belongs to root, and several others belong to webuser. Kill the root copy and the others will vanish.
We found that when we built Apache "out of the box" using a GNU layout, some file defaults were not set up properly. If when you run ./go you get the rather odd error message on the screen:
fopen: No such file or directory httpd: could not open error log file <path to site.toddle>site.toddle/var/httpd/log/error_log
you need to add the line:
to ...conf/httpd.conf. If, having done that, Apache fails to start and you get a message in .../logs/error_log:
.... No such file or directory.: could not open mime types log file <path to site. toddle>/site.toddle/etc/httpd/mime.types
you need to add the line:
to ...conf/httpd.conf. And if, having done that, Apache fails to start and you get a message in .../logs/error_log:
fopen: no such file or directory httpd: could not log pid to file <path to site.toddle>/site.toddle/var/httpd/run/ httpd.pid
you need to add the line:
When you run Apache now, you may get the following error message:
httpd: cannot determine local hostname Use ServerName to set it manually.
What Apache means is that you should put this line in the httpd.conf file:
Finally, before you can expect any action, you need to set up some documents to serve. Apache's default document directory is ... /httpd/htdocs which you don't want to use because you are at /usr/www/APACHE3/site.toddle so you have to set it explicitly. Create ... /site.toddle/htdocs, and then in it create a file called 1.txt containing the immortal words "hullo world." Then add this line to httpd.conf :
The complete Config file, .../site.toddle/conf/httpd.conf, now looks like this:
User webuser Group webgroup ServerName my586 DocumentRoot /usr/www/APACHE3/APACHE3/site.toddle/htdocs/ #fix 'Out of the Box' default problems--remove leading #s if necessary #ServerRoot /usr/www/APACHE3/APACHE3/site.toddle #ErrorLog logs/error_log #PIDFile logs/httpd.pid #TypesConfig conf/mime.types
When you fire up httpd, you should have a working web server. To prove it, start up a browser to access your new server, and point it at http://<yourmachinename>/.
As we know, http means use the HTTP protocol to get documents, and / on the end means go to the DocumentRoot directory you set in httpd.conf.
Lynx is the text browser that comes with FreeBSD and other flavors of Unix; if it is available, type:
% lynx http://<yourmachinename>/
INDEX OF / * Parent Directory * 1.txt
If you move to 1.txt with the down arrow, you see:
If you don't have Lynx (or Netscape, or some other web browser) on your server, you can use telnet :
% telnet <yourmachinename> 80
You should see something like:
Trying 192.168.123.2 Connected to my586.my.domain Escape character is '^]'
GET / HTTP/1.0 <CR><CR>
You should see:
HTTP/1.0 200 OK Sat, 24 Aug 1996 23:49:02 GMT Server: Apache/1.3 Connection: close Content-Type: text/html <HEAD><TITLE>Index of /</TITLE></HEAD><BODY> <H1>Index of </H1> <UL><LI> <A HREF="/"> Parent Directory</A> <LI> <A HREF="1.txt"> 1.txt</A> </UL></BODY> Connection closed by foreign host.
This is a rare opportunity to see a complete HTTP message. The first lines are headers that are normally hidden by your browser. The stuff between the < and > is HTML, written by Apache, which, if viewed through a browser, produces the formatted message shown by Lynx earlier, and by Netscape or Microsoft Internet Explorer in the next chapter.
To get a display of all the processes running, run:
% ps -aux
Among a lot of Unix stuff, you will see one copy of httpd belonging to root and a number that belong to webuser. They are similar copies, waiting to deal with incoming queries.
The root copy is still attached to port 80 thus its children will be as well but it is not listening. This is because it is root and has too many powers for this to be safe. It is necessary for this "master" copy to remain running as root because under the (slightly flawed) Unix security doctrine, only root can open ports below 1024. Its job is to monitor the scoreboard where the other copies post their status: busy or waiting. If there are too few waiting (default 5, set by the MinSpareServers directive in httpd.conf ), the root copy starts new ones; if there are too many waiting (default 10, set by the MaxSpareServers directive), it kills some off. If you note the PID (shown by ps -ax, or ps -aux for a fuller listing; also to be found in ... /logs/httpd.pid ) of the root copy and kill it with:
% kill PID
you will find that the other copies disappear as well.
It is better, however, to use the stop script described in Section 2.3 earlier in this chapter, since it leaves less to chance and is easier to do.
If Apache is to work properly, it's important to correctly set the file-access permissions. In Unix systems, there are three kinds of permissions: read, write , and execute. They attach to each object in three levels: user, group, and other or "rest of the world." If you have installed the demonstration sites, go to ... /site.cgi/htdocs, and type:
% ls -l
-rw-rw-r-- 5 root bin 1575 Aug 15 07:45 form_summer.html
The first - indicates that this is a regular file. It is followed by three permission fields, each of three characters. They mean, in this case:
Read yes, write yes, execute no
Read yes, write yes, execute no
Read yes, write no, execute no
When the permissions apply to a directory, the x execute permission means scan: the ability to see the contents and move down a level.
The permission that interests us is other, because the copy of Apache that tries to access this file belongs to user webuser and group webgroup. These were set up to have no affinities with root and bin, so that copy can gain access only under the other permissions, and the only one set is "read." Consequently, a Bad Guy who crawls under the cloak of Apache cannot alter or delete our precious form_summer.html; he can only read it.
We can now write a coherent doctrine on permissions. We have set things up so that everything in our web site, except the data vulnerable to attack, has owner root and group wheel. We did this partly because it is a valid approach, but also because it is the only portable one. The files on our CD-ROM with owner root and group wheel have owner and group numbers 0 that translate into similar superuser access on every machine.
Of course, this only makes sense if the webmaster has root login permission, which we had. You may have to adapt the whole scheme if you do not have root login, and you should perhaps consult your site administrator.
In general, on a web site everything should be owned by a user who is not webuser and a group that is not webgroup (assuming you use these terms for Apache configurations).
There are four kinds of files to which we want to give webuser access: directories, data, programs, and shell scripts. webuser must have scan permissions on all the directories, starting at root down to wherever the accessible files are. If Apache is to access a directory, that directory and all in the path must have x permission set for other. You do this by entering:
% chmod o+x <each-directory-in-the-path>
To produce a directory listing (if this is required by, say, an index), the final directory must have read permission for other. You do this by typing:
% chmod o+r <final-directory>
It probably should not have write permission set for other:
% chmod o-w <final-directory>
To serve a file as data and this includes files like .htaccess (see Chapter 3) the file must have read permission for other:
% chmod o+r file
And, as before, deny write permission:
% chmod o-w <file>
To run a program, the file must have execute permission set for other:
% chmod o+x <program>
To execute a shell script, the file must have read and execute permission set for other:
% chmod o+rx <script>:
For complete safety:
% chmod a=rx <script>
If the user is to edit the script, but it is to be safe otherwise:
% chmod u=rwx,og=rx <script>
Emboldened by the success of site.toddle, we can now set about a more realistic setup, without as yet venturing out onto the unknown waters of the Web. We need to get two things running: Apache under some sort of Unix and a GUI browser. There are two main ways this can be achieved:
Run Apache and a browser (such as Netscape or Lynx) on the same machine. The "network" is then provided by Unix.
Run Apache on a Unix box and a browser on a Windows 95/Windows NT/Mac OS machine, or vice versa, and link them with Ethernet (which is what we did for this book using FreeBSD).
We cannot hope to give detailed explanations for all possible variants of these situations. We expect that many of our readers will already be webmasters familiar with these issues, who will want to skip the following sidebar. Those who are new to the Web may find it useful to know what we did.
Our Experimental Micro Web
First, we had to install a network card on the FreeBSD machine. As it boots up, it tests all its components and prints a list on the console, which includes the card and the name of the appropriate driver. We used a 3Com card, and the following entries appeared:
... 1 3C5x9 board(s) on ISA found at 0x300 ep0 at 0x300-0x30f irq 10 on isa ep0: aui/bnc/utp[*BNC*] address 00:a0:24:4b:48:23 irq 10 ...
This indicated pretty clearly that the driver was ep0 and that it had installed properly. If you miss this at bootup, FreeBSD lets you hit the Scroll Lock key and page up until you see it then hit Scroll Lock again to return to normal operation.
Once a card was working, we needed to configure its driver, ep0. We did this with the following commands:
ifconfig ep0 192.168.123.2 ifconfig ep0 192.168.123.3 alias netmask 0xFFFFFFFF ifconfig ep0 192.168.124.1 alias
The alias command makes ifconfig bind an additional IP address to the same device. The netmask command is needed to stop FreeBSD from printing an error message (for more on netmasks, see Craig Hunt's TCP/IP Network Administration [O'Reilly, 2002]).
Note that the network numbers used here are suited to our particular network configuration. You'll need to talk to your network administrator to determine suitable numbers for your configuration. Each time we start up the FreeBSD machine to play with Apache, we have to run these commands. The usual way to do this is to add them to /etc/rc.local (or the equivalent location it varies from machine to machine, but whatever it is called, it is run whenever the system boots).
If you are following the FreeBSD installation or something like it, you also need to install IP addresses and their hostnames (if we were to be pedantic, we would call them fully qualified domain names, or FQDN) in the file /etc/hosts :
192.168.123.2 www.butterthlies.com 192.168.123.2 sales.butterthlies.com 192.168.123.3 sales-not-vh.butterthlies.com 192.168.124.1 www.faraway.com
Note that www.butterthlies.com and sales.butterthlies.com both have the same IP number. This is so we can demonstrate the new NameVirtualHosts directive in the next chapter. We will need sales-not-vh.butterthlies.com in site.twocopy. Note also that this method of setting up hostnames is normally only appropriate when DNS is not available if you use this method, you'll have to do it on every machine that needs to know the names.
There is no point trying to run Apache unless TCP/IP is set up and running on your machine. A quick test is to ping some IP and if you can't think of a real one, ping yourself:
If TCP/IP is working, you should see some confirming message, like this:
Pinging 127.0.0.1 with 32 bytes of data: Reply from 127.0.0.1: bytes=32 time<10ms TTL=32 ....
If you don't see something along these lines, defer further operations until TCP/IP is working.
It is important to remember that internally, Windows Apache is essentially the same as the Unix version and that it uses Unix-style forward slashes (/) rather than MS-DOS- and Windows-style backslashes (\) in its file and directory names, as specified in various files.
There are two ways of running Apache under Win32. In addition to the command-line approach, you can run Apache as a "service" (available on Windows NT/2000, or a pseudoservice on Windows 95, 98, or Me). This is the best option if you want Apache to start automatically when your machine boots and to keep Apache running when you log off.
To run Apache from a console window, select the Apache server option from the Start menu.
Alternatively and under Win95/98, this is all you can do click on the MS-DOS prompt to get a DOS session window. Go to the /Program Files/Apache directory with this:
>cd "\Program Files\apache"
The Apache executable, apache.exe,is sitting here. We can start it running, to see what happens, with this:
You might want to automate your Apache startup by putting the necessary line into a file called go.bat. You then only need to type:
Since this is the same as for the Unix version, we will simply say "type go" throughout the book when Apache is to be started, and thus save lengthy explanations.
When we ran Apache, we received the following lines:
Apache/<version number> Syntax error on line 44 of /apache/conf/httpd.conf ServerRoot must be a valid directory
To deal with the first complaint, we looked at the file \Program Files\apache\conf \httpd.conf. This turned out to be a formidable document that, in effect, compresses all the information we try to convey in the rest of this book into a few pages. We could edit it down to something more lucid, but a sounder and more educational approach is to start from nothing and see what Apache asks for. The trouble with simply editing the configuration files as they are distributed is that the process obscures a lot of default settings. If and when someone new has to wrestle with it, he may make fearful blunders because it isn't clear what has been changed from the defaults. We suggest that you build your Config files from the ground up. To prevent this one from getting confused with them, rename it if you want to look at it:
>ren httpd.conf *.cnk
Otherwise, delete it, and delete srm.conf and access.conf :
>del srm.conf >del access.conf
When you run Apache now, you see:
Apache/<version number> fopen: No such file or directory httpd: could not open document config file apache/conf/httpd.conf
And we can hardly blame it. Open edit :
and insert the line:
# new config file
The # makes this a comment without effect, but it gives the editor something to save. Run Apache again. We now see something sensible:
... httpd: cannot determine local host name use ServerName to set it manually
What Apache means is that you should put a line in the httpd.conf file:
Now when you run Apache, you see:
>apache -s Apache/<version number> _
The _ here is meant to represent a blinking cursor, showing that Apache is happily running.
You will notice that throughout this book, the Config files always have the following lines:
... User webuser Group webgroup ...
These are necessary for Unix security and, happily, are ignored by the Win32 version of Apache, so we have avoided tedious explanations by leaving them in throughout. Win32 users can include them or not as they please.
You can now get out of the MS-DOS window and go back to the desktop, fire up your favorite browser, and access http://yourmachinename/. You should see a cheerful screen entitled "It Worked!," which is actually \apache\htdocs\index.html.
When you have had enough, hit ^C in the Apache window.
Alternatively, under Windows 95 and from Apache Version 1.3.3 on, you can open another DOS session window and type:
apache -k shutdown
This does a graceful shutdown, in which Apache allows any transactions currently in process to continue to completion before it exits. In addition, using:
apache -k restart
performs a graceful restart, in which Apache rereads the configuration files while allowing transactions in progress to complete.
To start Apache as a service, you first need to install it as a service. Multiple Apache services can be installed, each with a different name and configuration. To install the default Apache service named "Apache," run the "Install Apache as Service (NT only)" option from the Start menu. Once this is done, you can start the "Apache" service by opening the Services window (in the Control Panel), selecting Apache, then clicking on Start. Apache will now be running in the background. You can later stop Apache by clicking on Stop. As an alternative to using the Services window, you can start and stop the "Apache" service from the control line with the following:
NET START APACHE NET STOP APACHE
See http://httpd.apache.org/docs-2.0/platform/windows.html#signalsrv for more information on installing and controlling Apache services.
Apache, unlike many other Windows NT/2000 services, logs any errors to its own error.log file in the logs folder within the Apache server root folder. You will not find Apache error details in the Windows NT Event Log.
After starting Apache running (either in a console window or as a service), it will be listening to port 80 (unless you changed the Listen directive in the configuration files). To connect to the server and access the default page, launch a browser and enter this URL: http://127.0.0.1
Once this is done, you can open the Services window in the Control Panel, select Apache, and click on Start. Apache then runs in the background until you click on Stop. Alternatively, you can open a console window and type:
>net start apache
To stop the Apache service, type:
>net stop apache
If you're running Apache as a service, you definitely will want to consider security issues. See Chapter 11 for more details.
Here we go over the directives again, giving formal definitions for reference.
ServerName gives the hostname of the server to use when creating redirection URLs, that is, if you use a <Location> directive or access a directory without a trailing /.
ServerName hostname Server config, virtual host
It will also be useful when we consider Virtual Hosting (see Chapter 4).
This directive sets the directory from which Apache will serve files.
DocumentRoot directory Default: /usr/local/apache/htdocs Server config, virtual host
Unless matched by a directive like Alias, the server appends the path from the requested URL to the document root to make the path to the document. For example:
An access to http://www.www.my.host.com/index.html now refers to /usr/web/index.html.
There appears to be a bug in the relevant Module, mod_dir, that causes problems when the directory specified in DocumentRoot has a trailing slash (e.g., DocumentRoot /usr/web/), so please avoid that. It is worth bearing in mind that the deeper DocumentRoot goes, the longer it takes Apache to check out the directories. For the sake of performance, adopt the British Army's universal motto: KISS (Keep It Simple, Stupid)!
ServerRoot specifies where the subdirectories conf and logs can be found.
ServerRoot directory Default directory: /usr/local/etc/httpd Server config
If you start Apache with the -f (file) option, you need to include the ServerRoot directive. On the other hand, if you use the -d (directory) option, as we do, this directive is not needed.
The ErrorLog directive sets the name of the file to which the server will log any errors it encounters.
ErrorLog filename|syslog[:facility] Default: ErrorLog logs/error_log Server config, virtual host
If the filename does not begin with a slash (/), it is assumed to be relative to the server root.
If the filename begins with a pipe (|), it is assumed to be a command to spawn a file to handle the error log.
Apache 1.3 and above: using syslog instead of a filename enables logging via syslogd(8) if the system supports it. The default is to use syslog facility local7, but you can override this by using the syslog:facility syntax, where facility can be one of the names usually documented in syslog(1).
Your security could be compromised if the directory where log files are stored is writable by anyone other than the user who starts the server.
A useful piece of information about an executing process is its PID number. This is available under both Unix and Win32 in the PidFile, and this directive allows you to change its location.
PidFile file Default file: logs/httpd.pid Server config
By default, it is in ... /logs/httpd.pid. However, only Unix allows you to do anything easily with it; namely, to kill the process.
This directive sets the path and filename to find the mime.types file if it isn't in the default position.
TypesConfig filename Default: conf/mime.types Server config
You may want to include material from elsewhere into the Config file. You either just paste it in, or you use the Include directive:
Include filename Server config, virtual host, directory, .htaccess
Because it makes it hard to see what the Config file is actually doing, you probably will not want to use this directive until the file gets really complicated (see, for instance, Chapter 17, where the Config file also has to control the Tomcat Java module).
If you are using the DSO mechanism, you need quite a lot of stuff in your Config file.
In Apache v1.3 the order of these directives is important, so it is probably easiest to generate the list by doing an "out of the box" build using the flag --enable-shared=max. You will find /usr/etc/httpd/httpd.conf.default: copy the list from it into your own Config file, and edit it as you need.
LoadModule env_module libexec/mod_env.so LoadModule config_log_module libexec/mod_log_config.so LoadModule mime_module libexec/mod_mime.so LoadModule negotiation_module libexec/mod_negotiation.so LoadModule status_module libexec/mod_status.so LoadModule includes_module libexec/mod_include.so LoadModule autoindex_module libexec/mod_autoindex.so LoadModule dir_module libexec/mod_dir.so LoadModule cgi_module libexec/mod_cgi.so LoadModule asis_module libexec/mod_asis.so LoadModule imap_module libexec/mod_imap.so LoadModule action_module libexec/mod_actions.so LoadModule userdir_module libexec/mod_userdir.so LoadModule alias_module libexec/mod_alias.so LoadModule access_module libexec/mod_access.so LoadModule auth_module libexec/mod_auth.so LoadModule setenvif_module libexec/mod_setenvif.so # Reconstruction of the complete module list from all available modules # (static and shared ones) to achieve correct module execution order. # [WHENEVER YOU CHANGE THE LOADMODULE SECTION ABOVE UPDATE THIS, TOO] ClearModuleList AddModule mod_env.c AddModule mod_log_config.c AddModule mod_mime.c AddModule mod_negotiation.c AddModule mod_status.c AddModule mod_include.c AddModule mod_autoindex.c AddModule mod_dir.c AddModule mod_cgi.c AddModule mod_asis.c AddModule mod_imap.c AddModule mod_actions.c AddModule mod_userdir.c AddModule mod_alias.c AddModule mod_access.c AddModule mod_auth.c AddModule mod_so.c AddModule mod_setenvif.c
Notice that the list comes in three parts: LoadModules, then ClearModuleList, followed by AddModules to activate the ones you want. As we said earlier, it is all rather cumbersome and easy to get wrong. You might want put the list in a separate file and then Include it (see later in this section). If you have left out a shared module that is required by a directive in your Config file, you will get a clear indication in an error message as Apache loads. For instance, if you use the directive ErrorLog without doing what is necessary for the module mod_log_config, this will trigger a runtime error message.
The LoadModule directive links in the object file or library filename and adds the module structure named module to the list of active modules.
LoadModule module filename server config mod_so
module is the name of the external variable of type module in the file and is listed as the Module Identifier in the module documentation. For example (Unix, and for Windows as of Apache 1.3.15):
LoadModule status_module modules/mod_status.so
For example (Windows prior to Apache 1.3.15, and some third party modules):
LoadModule foo_module modules/ApacheModuleFoo.dll
Note that all modules bundled with the Apache Win32 binary distribution were renamed as of Apache Version 1.3.15.
Win32 Apache modules are often distributed with the old style names, or even a name such as libfoo.dll. Whatever the name of the module, the LoadModule directive requires the exact filename.
The LoadFile directive links in the named object files or libraries when the server is started or restarted; this is used to load additional code that may be required for some modules to work.
LoadFile filename [filename] ... server config Mod_so
filename is either an absolute path or relative to ServerRoot.
This directive clears the list of active modules.
ClearModuleList server config Abolished in Apache v2
It is assumed that the list will then be repopulated using the AddModule directive.
The server can have modules compiled in that are not actively in use. This directive can be used to enable the use of those modules.
AddModule module [module] ... server config Mod_so
The server comes with a preloaded list of active modules; this list can be cleared with the ClearModuleList directive.
 On System V-based Unix systems (as opposed to Berkeley-based), the command ps -ef should have a similar effect.
 In fact, this problem was fixed for FreeBSD long ago, but you may still encounter it on other operating systems.
 Note that if you are on the same machine, you can use http://127.0.0.1/ or http://localhost/, but this can be confusing because virtual host resolution may cause the server to behave differently than if you had used the interface's "real" name.
 telnet is not really suitable as a web browser, though it can be a very useful debugging tool.