Virtual Hosting

18.2 Virtual Hosting

Many folks want to have a web presence but don't have high-traffic web sites. For these people, providing a dedicated web server may be a waste, because they're paying many hundreds of dollars a month to lease a server that is mostly idle!

Many web hosters offer lower-cost web hosting services by sharing one computer between several customers. This is called shared hosting or virtual hosting . Each web site appears to be hosted by a different server, but they really are hosted on the same physical server. From the end user 's perspective, virtually hosted web sites should be indistinguishable from sites hosted on separate dedicated servers.

For cost efficiency, space, and management reasons, a virtual hosting company wants to host tens, hundreds, or thousands of web sites on the same serverbut this does not necessarily mean that 1,000 web sites are served from only one PC. Hosters can create banks of replicated servers (called server farms ) and spread the load across the farm of servers. Because each server in the farm is a clone of the others, and hosts many virtual web sites, administration is much easier. (We'll talk more about server farms in Chapter 20 .)

When Joe and Mary started their businesses, they might have chosen virtual hosting to save money until their traffic levels made a dedicated server worthwhile (see Figure 18-2 ).

Figure 18-2. Outsourced virtual hosting

figs/http_1802.gif

18.2.1 Virtual Server Request Lacks Host Information

Unfortunately, there is a design flaw in HTTP/1.0 that makes virtual hosters pull their hair out. The HTTP/1.0 specification didn't give any means for shared web servers to identify which of the virtual web sites they're hosting is being accessed.

Recall that HTTP/1.0 requests send only the path component of the URL in the request message. If you try to get http://www.joes-hardware.com/index.html , the browser connects to the server www.joes-hardware.com , but the HTTP/1.0 request says "GET /index.html", with no further mention of the hostname. If the server is virtually hosting multiple sites, this isn't enough information to figure out what virtual web site is being accessed. For example, in Figure 18-3 :

                If client A tries to access http://www.joes-hardware.com/index.html , the request "GET /index.html" will be sent to the shared web server.

                If client B tries to access http://www.marys- antiques .com/index.html , the identical request "GET /index.html" will be sent to the shared web server.

Figure 18-3. HTTP/1.0 server requests don't contain hostname information

figs/http_1803.gif

As far as the web server is concerned , there is not enough information to determine which web site is being accessed! The two requests look the same, even though they are for totally different documents (from different web sites). The problem is that the web site host information has been stripped from the request.

As we saw in Chapter 6 , HTTP surrogates (reverse proxies) and intercepting proxies also need site-specifying information.

18.2.2 Making Virtual Hosting Work

The missing host information was an oversight in the original HTTP specification, which mistakenly assumed that each web server would host exactly one web site. HTTP's designers didn't provide support for virtually hosted, shared servers. For this reason, the hostname information in the URL was viewed as redundant and stripped away; only the path component was required to be sent.

Because the early specifications did not make provisions for virtual hosting, web hosters needed to develop workarounds and conventions to support shared virtual hosting. The problem could have been solved simply by requiring all HTTP request messages to send the full URL instead of just the path component. HTTP/1.1 does require servers to handle full URLs in the request lines of HTTP messages, but it will be a long time before all legacy applications are upgraded to this specification. In the meantime, four techniques have emerged:

Virtual hosting by URL path

Adding a special path component to the URL so the server can determine the site.

Virtual hosting by port number

Assigning a different port number to each site, so requests are handled by separate instances of the web server.

Virtual hosting by IP address

Dedicating different IP addresses for different virtual sites and binding all the IP addresses to a single machine. This allows the web server to identify the site name by IP address.

Virtual hosting by Host header

Many web hosters pressured the HTTP designers to solve this problem. Enhanced versions of HTTP/1.0 and the official version of HTTP/1.1 define a Host request header that carries the site name. The web server can identify the virtual site from the Host header.

Let's take a closer look at each technique.

18.2.2.1 Virtual hosting by URL path

You can use brute force to isolate virtual sites on a shared server by assigning them different URL paths. For example, you could give each logical web site a special path prefix:

                Joe's Hardware store could be http://www.joes-hardware.com/joe/index.html .

                Mary's Antiques store could be http://www.marys-antiques.com/mary/index.html .

When the requests arrive at the server, the hostname information is not present in the request, but the server can tell them apart based on the path:

                The request for Joe's hardware is "GET /joe/index.html".

                The request for Mary's antiques is "GET /mary/index.html".

This is not a good solution. The "/joe" and "/mary" prefixes are redundant and confusing (we already mentioned "joe" in the hostname). Worse , the common convention of specifying http://www.joes-hardware.com or http://www.joes-hardware.com/index.html for the home page won't work.

In general, URL-based virtual hosting is a poor solution and seldom is used.

18.2.2.2 Virtual hosting by port number

Instead of changing the pathname, Joe and Mary could each be assigned a different port number on the web server. Instead of port 80, for example, Joe could get 82 and Mary could have 83. But this solution has the same problem: an end user would expect to find the resources without having to specify a nonstandard port in the URL.

18.2.2.3 Virtual hosting by IP address

A much better approach (in common use) is virtual IP addressing. Here, each virtual web site gets one or more unique IP addresses. The IP addresses for all of the virtual web sites are attached to the same shared server. The server can look up the destination IP address of the HTTP connection and use that to determine what web site the client thinks it is connected to.

Say a hoster assigned the IP address 209.172.34.3 to www.joes-hardware.com , assigned 209.172.34.4 to www.marys-antiques.com , and tied both IP addresses to the same physical server machine. The web server could then use the destination IP address to identify which virtual site is being requested , as shown in Figure 18-4 :

                Client A fetches http://www.joes-hardware.com/index.html .

                Client A finds the IP address for www.joes-hardware.com , getting 209.172.34.3.

                Client A opens a TCP connection to the shared web server at 209.172.34.3.

                Client A sends the request "GET /index.html HTTP/1.0".

                Before the web server serves a response, it notes the actual destination IP address (209.172.34.3), determines that this is a virtual IP address for Joe's web site, and fulfills the request from the /joe subdirectory. The page /joe/index.html is returned.

Figure 18-4. Virtual IP hosting

figs/http_1804.gif

Similarly, if client B asks for http://www.marys-antiques.com/index.html :

                Client B finds the IP address for www.marys-antiques.com , getting 209.172.34.4.

                Client B opens a TCP connection to the web server at 209.172.34.4.

                Client B sends the request "GET /index.html HTTP/1.0".

                The web server determines that 209.172.34.4 is Mary's web site and fulfills the request from the /mary subdirectory, returning the document /mary/index.html .

Virtual IP hosting works, but it causes some difficulties, especially for large hosters:

                Computer systems usually have a limit on how many virtual IP addresses can be bound to a machine. Hosters that want hundreds or thousands of virtual sites to be hosted on a shared server may be out of luck.

                IP addresses are a scarce commodity. Hosters with many virtual sites might not be able to obtain enough virtual IP addresses for the hosted web sites.

                The IP address shortage is made worse when hosters replicate their servers for additional capacity. Different virtual IP addresses may be needed on each replicated server, depending on the load-balancing architecture, so the number of IP addresses needed can multiply by the number of replicated servers.

Despite the address consumption problems with virtual IP hosting, it is used widely.

18.2.2.4 Virtual hosting by Host header

To avoid excessive address consumption and virtual IP limits, we'd like to share the same IP address among virtual sites, but still be able to tell the sites apart. But as we've seen, because most browsers send just the path component of the URL to servers, the critical virtual hostname information is lost.

To solve this problem, browser and server implementors extended HTTP to provide the original hostname to servers. But browsers couldn't just send a full URL, because that would break many servers that expected to receive only a path component. Instead, the hostname (and port) is passed in a Host extension header in all requests.

In Figure 18-5 , client A and client B both send Host headers that carry the original hostname being accessed. When the server gets the request for /index.html , it can use the Host header to decide which resources to use.

Figure 18-5. Host headers distinguish virtual host requests

figs/http_1805.gif

Host headers were first introduced with HTTP/1.0+, a vendor-extended superset of HTTP/1.0. Host headers are required for HTTP/1.1 compliance. Host headers are supported by most modern browsers and servers, but there are still a few clients and servers (and robots) that don't support them.

18.2.3 HTTP/1.1 Host Headers

The Host header is an HTTP/1.1 request header, defined in RFC 2068. Virtual servers are so common that most HTTP clients, even if they are not HTTP/1.1-compliant, implement the Host header.

18.2.3.1 Syntax and usage

The Host header specifies the Internet host and port number for the resource being requested, as obtained from the original URL:

 Host = "Host" ":" host [ ":" port ] 

In particular:

                If the Host header does not contain a port, the default port for the scheme is assumed.

                If the URL contains an IP address, the Host header should contain the same address.

                If the URL contains a hostname, the Host header must contain the same name.

                If the URL contains a hostname, the Host header should not contain the IP address equivalent to the URL's hostname, because this will break virtually hosted servers, which layer multiple virtual sites over a single IP address.

                If the URL contains a hostname, the Host header should not contain another alias for this hostname, because this also will break virtually hosted servers.

                If the client is using an explicit proxy server, the client must include the name and port of the origin server in the Host header, not the proxy server. In the past, several web clients had bugs where the outgoing Host header was set to the hostname of the proxy, when the client's proxy setting was enabled. This incorrect behavior causes proxies and origin servers to misbehave.

                Web clients must include a Host header field in all request messages.

                Web proxies must add Host headers to request messages before forwarding them.

                HTTP/1.1 web servers must respond with a 400 status code to any HTTP/1.1 request message that lacks a Host header field.

Here is a sample HTTP request message used to fetch the home page of www.joes-hardware.com , along with the required Host header field:

  GET http://www.joes-hardware.com/index.html HTTP/1.0  
  Connection: Keep-Alive  
  User-Agent: Mozilla/4.51 [en] (X11; U; IRIX 6.2 IP22)  
  Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*  
  Accept-Encoding: gzip  
  Accept-Language: en  
  Host: www.joes-hardware.com  
18.2.3.2 Missing Host headers

A small percentage of old browsers in use do not send Host headers. If a virtual hosting server is using Host headers to determine which web site to serve, and no Host header is present, it probably will either direct the user to a default web page (such as the web page of the ISP) or return an error page suggesting that the user upgrade her browser.

18.2.3.3 Interpreting Host headers

An origin server that isn't virtually hosted, and doesn't allow resources to differ by the requested host, may ignore the Host header field value. But any origin server that does differentiate resources based on the host must use the following rules for determining the requested resource on an HTTP/1.1 request:

1.             If the URL in the HTTP request message is absolute (i.e., contains a scheme and host component), the value in the Host header is ignored in favor of the URL.

2.             If the URL in the HTTP request message doesn't have a host, and the request contains a Host header, the value of the host/port is obtained from the Host header.

3.             If no valid host can be determined through Steps 1 or 2, a 400 Bad Response response is returned to the client.

18.2.3.4 Host headers and proxies

Some browser versions send incorrect Host headers, especially when configured to use proxies. For example, when configured to use a proxy, some older versions of Apple and PointCast clients mistakenly sent the name of the proxy instead of the origin server in the Host header.

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net