Our business has now expanded, and we have a team of salespeople. They need their own web site with different prices, gossip about competitors, conspiracies, plots, plans, and so on that is separate from the customers' web site we have been talking about. There are essentially two ways of doing this:
Run a single copy of Apache that maintains two or more web sites as virtual sites. This is the most common method.
Run two (or more) copies of Apache, each maintaining a single site. You may want to do this to optimize two versions of Apache in different ways for instance, one serving images and the other running scripts.
On site.twocopy (see Section 4.3, later in this chapter) we run two different versions of Apache, each serving a different host. As we have said, you might want to do this to optimize the two versions in different ways. However, it is more common to run a number of virtual Apache servers that steer incoming requests on different URLs (usually with the same IP address) to different sets of documents. These might well be home pages for members of your organization or your clients.
In the first edition of this book, we showed how to do this for Apache 1.2 and HTTP 1.0. The result was rather clumsy, with a main host and a virtual host, but it coped with HTTP 1.0 clients. However, the setup can now be done much more neatly with the NameVirtualHost directive. The possible combinations of IP-based and name-based hosts can become quite complex. A full explanation with examples and the underlying theology can be found at http://www.apache.org/docs/vhosts, but several of the possible permutations are unlikely to be very useful in practice.
This is by far the preferred method of managing virtual hosts, taking advantage of the ability of HTTP 1.1-compliant browsers (or at least browsers that support the Host header . . . pretty much all of them at this point) to send the name of the site they want to access. At .../site.virtual/Name-based we have www.butterthlies.com and sales. butterthlies.com on 192.168.123.2. Of course, these sites must have their names registered in DNS (or, if you are dummying the setup as we did, included in /etc/hosts). The Config file is as follows:
User webuser Group webgroup NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> ServerName www.butterthlies.com ServerAdmin firstname.lastname@example.org DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/access_log </VirtualHost> <VirtualHost sales.butterthlies.com> ServerName sales.butterthlies.com ServerAdmin email@example.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/access_log </VirtualHost>
The key directive is NameVirtualHost, which tells Apache that requests to that IP number will be subdivided by name. It might seem that the ServerName directives play a crucial part, but here they just provide a name for Apache to return to the client. The <VirtualHost> sections are now identified by the name of the site we want them to serve. If this directive were left out, Apache would issue a helpful warning that www.butterthlies.com and sales.butterthlies.com were overlapping (i.e., rival interpretations of the same IP number) and that perhaps we needed a NameVirtualHost directive, which indeed we would.
The virtual sites can all share log files, as shown in the given Config file, or they can use separate ones.
NameVirtualHost allows you to specify the IP addresses of your name-based virtual hosts.
NameVirtualHost address[:port] Server config
Optionally, you can add a port number. The IP address has to match with the IP address at the top of a <VirtualHost> block, which must include a ServerName directive followed by the registered name. The effect is that when Apache receives a request addressed to a named host, it scans the <VirtualHost> blocks having the same IP number that was declared with a NameVirtualHost directive to find one that includes the requested ServerName. Conversely, if you have not used NameVirtualHost, Apache looks for a <VirtualHost> block with the correct IP address and uses the ServerName in the reply. This prevents people from getting to hosts blocked by the firewall by using the IP of an open host and the name of a blocked one.
In the authors' experience, most of the Web still uses IP-based hosting, because although almost all clients use browsers that support HTTP 1.1, there is still a tiny portion that doesn't, and who wants to lose business unnecessarily? However, the Internet is running out of IP addresses, and people are gradually moving to name-based hosting.
This is how to configure Apache to do IP-based virtual hosting. The Config file is as follows:
User webuser Group webgroup # we don't need a NameVirtualHost directive <VirtualHost 192.168.123.2> ServerName www.butterthlies.com ServerAdmin firstname.lastname@example.org DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost> <VirtualHost 192.168.123.3> ServerName sales-IP.butterthlies.com ServerAdmin email@example.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ErrorLog /usr/www/APACHE3/APACHE3/www/APACHE3/APACHE3/site.virtual/IP-based/logs/ error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost>
We don't need a NameVirtualHost directive, but we do need ServerName directives in each of the VirtualHost blocks. This setup responds nicely to requests to http://www.butterthlies.com and http://sales-IP.butterthlies.com. The way our machine was configured, it also served up the customers' page to a request on http://sales.butterthlies.com which is to be expected since they share a common IP number. This method applies to sites that use SSL see Chapter 11 for more details. However, the basic issue derives from the fact that certificate processing takes place before the server sees the Host header.
You can, of course, mix the two techniques. <VirtualHost> blocks that have been NameVirtualHostedwill respond to requests to named servers; others will respond to requests to the appropriate IP numbers. This will also be important when we look at Apache SSL (see Chapter 11):
User webuser Group webgroup NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> ServerAdmin firstname.lastname@example.org DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost> <VirtualHost sales.butterthlies.com> ServerAdmin email@example.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost> <VirtualHost 192.168.123.3> ServerAdmin firstname.lastname@example.org DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales-IP.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost>
The two named sites are dealt with by the NameVirtualHost directive, whereas requests to sales-IP.butterthlies.com, which we have set up to be192.168.123.3, are dealt with by the third <VirtualHost> block. It is important that the IP-numbered VirtualHost block comes last in the file so that a call to it falls through the named blocks.
This is a handy technique if you want to put a web site up for access perhaps for testing by outsiders, but you don't want to make the named domain available. Visitors surf to the IP number and enter your private site. The ordinary visitor is very unlikely to do this: she will surf to the named URL. Of course, you would only use this technique for sites that were not secret or compromising and could withstand inspection by strangers.
Port-based virtual hosting follows on from IP-based hosting. The main advantage of this technique is that it makes it possible for a webmaster to test a lot of sites using only one IP address/hostname or, in a pinch, host a large number of sites without using name-based hosts and without using lots of IP numbers. Unfortunately, most ordinary users don't like their web server having a funny port number, but this can also be very useful for testing or staging sites.
User webuser Group webgroup Listen 80 Listen 8080 <VirtualHost 192.168.123.2:80> ServerName www.butterthlies.com ServerAdmin email@example.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost> <VirtualHost 192.168.123.2:8080> ServerName sales-IP.butterthlies.com ServerAdmin firstname.lastname@example.org DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/access_log </VirtualHost>
The Listen directives tell Apache to watch ports 80 and 8080. If you set Apache going and access http://www.butterthlies.com, you arrive on port 80, the default, and see the customers' site; if you access http://www.butterthlies.com:8080, you get the salespeople's site. If you forget the port and go to http://sales.butterthlies.com, you arrive on the customers' site, because the two share an IP address in our dummied DNS.
To illustrate the possibilities, we will run two copies of Apache with different IP addresses on different consoles, as if they were on two completely separate machines. This is not something you want to do often, but on a heavily loaded site it may be useful to run two Apaches optimized in different ways. The different virtual hosts probably need very different configurations, such as different values for ServerType, User, TypesConfig, or ServerRoot (none of these directives can apply to a virtual host, since they are global to all servers, which is why you have to run two copies to get the desired effect). If you are expecting a lot of hits, you should avoid running more than one copy, as doing so will generally load the machine more.
You can find the necessary machinery in ... /site.twocopy. There are two subdirectories: customers and sales.
The Config file in ... /customers contains the following:
User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.twocopy/customers/htdocs BindAddress www.butterthlies.com TransferLog logs/access_log
In .../sales the Config file is as follows:
User webuser Group webgroup ServerName sales.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.twocopy/sales/htdocs Listen sales-not-vh.butterthlies.com:80 TransferLog logs/access_log
On this occasion, we will exercise the sales-not-vh.butterthlies.com URL. For the first time, we have more than one copy of Apache running, and we have to associate requests on specific URLs with different copies of the server. There are three more directives to for making these associations:
BindAddress addr Default addr: any Server config
This directive forces Apache to bind to a particular IP address, rather than listening to all IP addresses on the machine. It has been abolished in Apache v2: use Listen instead.
Port port Default port: 80 Server config
When used in the main server configuration (i.e., outside any <VirtualHost> sections) and in the absence of a BindAddress or Listen directive, the Port directive sets the port number on which Apache is to listen. This is for backward compatibility, and you should really use BindAddress or Listen.
When used in a <VirtualHost> section, this specifies the port that should be used when the server generates a URL for itself (see also ServerName and UseCanonicalName). It does not set the port on which the virtual host listens that is done by the <VirtualHost> directive itself.
Listen hostname:port Server config
Listen tells Apache to pay attention to more than one IP address or port. By default, it responds to requests on all IP addresses, but only to the port specified by the Port directive. It therefore allows you to restrict the set of IP addresses listened to and increase the set of ports.
Listen is the preferred directive; BindAddress is obsolete, since it has to be combined with the Port directive if any port other than 80 is wanted. Also, more than one Listen can be used, but only a single BindAddress.
There are some housekeeping directives to go with these three:
ListenBacklog number Default: 511 Server config
ListenBacklog sets the maximum length of the queue of pending connections. Normally, doing so is unnecessary, but it can be useful if the server is under a TCP SYN flood attack, which simulates lots of new connection opens that don't complete. On some systems, this causes a large backlog, which can be alleviated by setting the ListenBacklog parameter. Only the knowledgeable should do this. See the backlog parameter in the manual entry for listen.
Back in the Config file, DocumentRoot (as before) sets the arena for our offerings to the customer. ErrorLog tells Apache where to log its errors, and TransferLog its successes. As we will see in Chapter 10 , the information stored in these logs can be tuned.
ServerType [inetd|standalone] Default: standalone Server config Abolished in Apache v2
The ServerType directive allows you to control the way in which Apache handles multiple copies of itself. The arguments are inetd or standalone (the default):
You might not want Apache to spawn a cloud of waiting child processes at all, but rather to start up a new one each time a request comes in and exit once it has been dealt with. This is slower, but it consumes fewer resources when there are no clients to be dealt with. However, this method is deprecated by the Apache Group as being clumsy and inefficient. On some platforms it may not work at all, and the Group has no plans to fix it. The utility inetd is configured in /etc/inetd.conf (see man inetd ). The entry for Apache would look something like this:
http stream tcp nowait root /usr/local/bin/httpd httpd -d directory
The default; this allows the swarm of waiting child servers.
Having set up the customers, we can duplicate the block, making some slight changes to suit the salespeople. The two servers have different DocumentRoots, which is to be expected because that's why we set up two hosts in the first place. They also have different error and transfer logs, but they don't have to. You could have one transfer log and one error log, or you could write all the logging for both sites to a single file.
Type go on the server (this may require root privileges); while on the client, as before, access http://www.butterthlies.com or http://sales.butterthlies.com /.
The files in ... /sales/htdocs are similar to those on ... /customers/htdocs, but altered enough so that we can see the difference when we access the two sites. index.html has been edited so that the first line reads:
<h1>SALESMEN Index to Butterthlies Catalogs</h1>
The file catalog_summer.html has been edited so that it reads:
<h1>Welcome to the great rip-off of '97: Butterthlies Inc</h1> <p>All our worthless cards are available in packs of 20 at $1.95 a pack. WHAT A FANTASTIC DISCOUNT! There is an amazing FURTHER 10% discount if you order more than 100. </p> ...
and so on, until the joke gets boring. Now we can throw the great machine into operation. From console 1, get into ... /customers and type:
The first Apache is running. Now get into .../sales and again type:
Now, as the client, you log on to http://www.butterthlies.com / and see the customers' site, which shows you the customers' catalogs. Quit, and metamorphose into a voracious salesperson by logging on to http://sales.butterthlies.com /. You are given a nasty insight into the ugly reality beneath the smiling face of e-commerce!
An even neater method of managing Virtual Hosting is provided by mod_vhost_alias, which lets you define a single boilerplate configuration and then fills in the details at service time from the IP address and or the Host header in the HTTP request.
All the directives in this module interpolate a string into a pathname. The interpolated string (called the "name") may be either the server name (see the UseCanonicalName directive for details on how this is determined) or the IP address of the virtual host on the server in dotted-quad format (xxx.xxx.xxx.xxx).
The interpolation is controlled by a mantra, %<code-letter>, which is replaced by some value you supply in the Config file. It's not unlike the controls for logging see Chapter 10.
These are the possible formats:
Insert a literal %.
Insert the port number of the virtual host.
Insert (part of ) the name. N and M are numbers, used to specify substrings of the name. N selects from the dot-separated components of the name, and M selects characters within whatever N has selected. M is optional and defaults to zero if it isn't present. The dot must be present if and only if M is present. If we are trying to parse sales.butterthlies.com, the interpretation of N is as follows:
The whole name: sales.butterthlies.com
The first part: sales
The second part: butterthlies
The last part: com
The penultimate part: butterthlies
The second and all subsequent parts: butterthlies.com
The penultimate and all preceding parts: www.butterthlies
The same as 0: sales.butterthlies.com
If N or M is greater than the number of parts available, a single underscore is interpolated.
For simple name-based virtual hosts, you might use the following directives in your server-configuration file:
UseCanonicalName Off VirtualDocumentRoot /usr/local/apache/vhosts/%0
A request for http://www.example.com/directory/file.html will be satisfied by the file /usr/local/apache/vhosts/www.example.com/directory/file.html.
On .../site.dynamic we have implemented a version of the familiar Buttterthlies site, with a password-protected salesperson's department. The first Config file, .../conf/httpd1.conf, is as follows:
User webuser Group webgroup ServerName my586 UseCanonicalName Off VirtualDocumentRoot /usr/www/APACHE3/site.dynamic/htdocs/%0 <Directory /usr/www/APACHE3/site.dynamic/htdocs/sales.butterthlies.com> AuthType Basic AuthName Darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups Require group cleaners </Directory>
Launch it with go 1; it responds nicely to http://www.butterthlies.com and http://sales.butterthlies.com.
There is an equivalent VirtualScriptAlias directive, but it insists on URLs containing ../cgi-bin/... for instance, www.butterthlies.com/cgi-bin/mycgi. In view of the reputed horror some search engines have for "cgi-bin", you might prefer not to use it and to keep "cgi-bin" out of your URLs with this:
ScriptAliasMatch /(.*) /usr/www/APACHE3/cgi-bin/handler/$1
The effect should be that any visitor to <http://yourURL>/fredwill call the script .../cgi-bin/handler and pass "fred" to it in the PATH_INFO Environment variable.
If you have a very large number of virtual hosts, it's a good idea to arrange the files to reduce the size of the vhosts directory. To do this, you might use the following in your configuration file:
UseCanonicalName Off VirtualDocumentRoot /usr/local/apache/vhosts/%3+/%2.1/%2.2/%2.3/%2
A request for http://www.example.isp.com/directory/file.html will be satisfied by the file /usr/local/apache/vhosts/isp.com/e/x/a/example/directory/file.html (because isp.com matches to %3+, e matches to %2.1 the first character of the second part of the URL example, and so on). The point is that most OSes are very slow if you have thousands of subdirectories in a single directory: this scheme spreads them out.
A more even spread of files can often be achieved by selecting from the end of the name, for example:
The example request would come from /usr/local/apache/vhosts/isp.com/e/l/p/example/directory/file.html. Alternatively, you might use:
The example request would come from /usr/local/apache/vhosts/isp.com/e/x/a/mple/directory/file.html.
For IP-based virtual hosting you might use the following in your configuration file:
UseCanonicalName DNS VirtualDocumentRootIP /usr/local/apache/vhosts/%1/%2/%3/%4/docs VirtualScriptAliasIP /usr/local/apache/vhosts/%1/%2/%3/%4/cgi-bin
A request for http://www.example.isp.com/directory/file.html would be satisfied by the file /usr/local/apache/vhosts/10/20/30/40/docs/directory/file.html if the IP address of www.example.com were 10.20.30.40. A request for http://www.example.isp.com/cgi-bin/script.pl would be satisfied by executing the program /usr/local/apache/vhosts/10/20/30/40/cgi-bin/script.pl.
If you want to include the . character in a VirtualDocumentRoot directive, but it clashes with a % directive, you can work around the problem in the following way:
A request for http://www.example.isp.com/directory/file.html will be satisfied by the file /usr/local/apache/vhosts/example.isp/directory/file.html.
The LogFormat directives %V and %A are useful in conjunction with this module. See Chapter 10.
VirtualDocumentRoot interpolated-directory Default: None Server config, virtual host Compatibility: VirtualDocumentRoot is only available in 1.3.7 and later.
The VirtualDocumentRoot directive allows you to determine where Apache will find your documents based on the value of the server name. The result of expanding interpolated-directory is used as the root of the document tree in a similar manner to the DocumentRoot directive's argument. If interpolated-directory is none, then VirtualDocumentRoot is turned off. This directive cannot be used in the same context as VirtualDocumentRootIP.
VirtualDocumentRootIP interpolated-directory Default: None Server config, virtual host
The VirtualDocumentRootIP directive is like the VirtualDocumentRoot directive, except that it uses the IP address of the server end of the connection instead of the server name.
VirtualScriptAlias interpolated-directory Default: None Server config, virtual host
The VirtualScriptAlias directive allows you to determine where Apache will find CGI scripts in a manner similar to how VirtualDocumentRoot does for other documents. It matches requests for URIs starting /cgi-bin/, much like the following:
ScriptAlias /cgi-bin/ ...
VirtualScriptAliasIP interpolated-directoryDefault: NoneServer config, virtual host
The VirtualScriptAliasIP directive is like the VirtualScriptAlias directive, except that it uses the IP address of the server end of the connection instead of the server name.