Step 4: Mapping and Accessing Resources

5.7 Step 4: Mapping and Accessing Resources

Web servers are resource servers. They deliver precreated content, such as HTML pages or JPEG images, as well as dynamic content from resource-generating applications running on the servers.

Before the web server can deliver content to the client, it needs to identify the source of the content, by mapping the URI from the request message to the proper content or content generator on the web server.

5.7.1 Docroots

Web servers support different kinds of resource mapping, but the simplest form of resource mapping uses the request URI to name a file in the web server's filesystem. Typically, a special folder in the web server filesystem is reserved for web content. This folder is called the document root , or docroot . The web server takes the URI from the request message and appends it to the document root.

In Figure 5-8 , a request arrives for /specials/saw- blade .gif . The web server in this example has document root /usr/local/httpd/files . The web server returns the file /usr/local/httpd/files/specials/saw-blade.gif .

Figure 5-8. Mapping request URI to local web server resource

figs/http_0508.gif

To set the document root for an Apache web server, add a DocumentRoot line to the httpd.conf configuration file:

 DocumentRoot /usr/local/httpd/files 

Servers are careful not to let relative URLs back up out of a docroot and expose other parts of the filesystem. For example, most mature web servers will not permit this URI to see files above the Joe's Hardware document root:

http://www.joes-hardware.com/../

5.7.1.1 Virtually hosted docroots

Virtually hosted web servers host multiple web sites on the same web server, giving each site its own distinct document root on the server. A virtually hosted web server identifies the correct document root to use from the IP address or hostname in the URI or the Host header. This way, two web sites hosted on the same web server can have completely distinct content, even if the request URIs are identical.

In Figure 5-9 , the server hosts two sites: www.joes-hardware.com and www.marys- antiques .com . The server can distinguish the web sites using the HTTP Host header, or from distinct IP addresses.

                When request A arrives, the server fetches the file for /docs/joe/index.html .

                When request B arrives, the server fetches the file for /docs/mary/index.html .

Figure 5-9. Different docroots for virtually hosted requests

figs/http_0509.gif

Configuring virtually hosted docroots is simple for most web servers. For the popular Apache web server, you need to configure a VirtualHost block for each virtual web site, and include the DocumentRoot for each virtual server ( Example 5-3 ).

Example 5-3. Apache web server virtual host docroot configuration
 <VirtualHost www.joes-hardware.com> 
 ServerName www.joes-hardware.com 
 DocumentRoot /docs/joe 
 TransferLog /logs/joe.access_log 
 ErrorLog /logs/joe.error_log 
 </VirtualHost> 
 
 <VirtualHost www.marys-antiques.com> 
 ServerName www.marys-antiques.com 
 DocumentRoot /docs/mary 
 TransferLog /logs/mary.access_log 
 ErrorLog /logs/mary.error_log 
 </VirtualHost> 
 ... 

Look forward to Section 18.2 for much more detail about virtual hosting.

5.7.1.2 User home directory docroots

Another common use of docroots gives people private web sites on a web server. A typical convention maps URIs whose paths begin with a slash and tilde ( / ~) followed by a username to a private document root for that user. The private docroot is often the folder called public_html inside that user's home directory, but it can be configured differently ( Figure 5-10 ).

Figure 5-10. Different docroots for different users

figs/http_0510.gif

5.7.2 Directory Listings

A web server can receive requests for directory URLs, where the path resolves to a directory, not a file. Most web servers can be configured to take a few different actions when a client requests a directory URL:

                Return an error.

                Return a special, default, "index file" instead of the directory.

                Scan the directory, and return an HTML page containing the contents.

Most web servers look for a file named index.html or index.htm inside a directory to represent that directory. If a user requests a URL for a directory and the directory contains a file named index.html (or index.htm ), the server will return the contents of that file.

In the Apache web server, you can configure the set of filenames that will be interpreted as default directory files using the DirectoryIndex configuration directive. The DirectoryIndex directive lists all filenames that serve as directory index files, in preferred order. The following configuration line causes Apache to search a directory for any of the listed files in response to a directory URL request:

 DirectoryIndex index.html index.htm home.html home.htm index.cgi 

If no default index file is present when a user requests a directory URI, and if directory indexes are not disabled, many web servers automatically return an HTML file listing the files in that directory, and the sizes and modification dates of each file, including URI links to each file. This file listing can be convenient , but it also allows nosy people to find files on a web server that they might not normally find.

You can disable the automatic generation of directory index files with the Apache directive:

 Options -Indexes 

5.7.3 Dynamic Content Resource Mapping

Web servers also can map URIs to dynamic resourcesthat is, to programs that generate content on demand ( Figure 5-11 ). In fact, a whole class of web servers called application servers connect web servers to sophisticated backend applications. The web server needs to be able to tell when a resource is a dynamic resource, where the dynamic content generator program is located, and how to run the program. Most web servers provide basic mechanisms to identify and map dynamic resources.

Figure 5-11. A web server can serve static resources as well as dynamic resources

figs/http_0511.gif

Apache lets you map URI pathname components into executable program directories. When a server receives a request for a URI with an executable path component, it attempts to execute a program in a corresponding server directory. For example, the following Apache configuration directive specifies that all URIs whose paths begin with /cgi-bin/ should execute corresponding programs found in the directory /usr/local/etc/httpd/cgi-programs/ :

  ScriptAlias /cgi-bin/ /usr/local/etc/httpd/cgi-programs/  

Apache also lets you mark executable files with a special file extension. This way, executable scripts can be placed in any directory. The following Apache configuration directive specifies that all web resources ending in .cgi should be executed:

  AddHandler cgi-script .cgi  

CGI is an early, simple, and popular interface for executing server-side applications. Modern application servers have more powerful and efficient server-side dynamic content support, including Microsoft's Active Server Pages and Java servlets.

5.7.4 Server-Side Includes (SSI)

Many web servers also provide support for server-side includes. If a resource is flagged as containing server-side includes, the server processes the resource contents before sending them to the client.

The contents are scanned for certain special patterns (often contained inside special HTML comments), which can be variable names or embedded scripts. The special patterns are replaced with the values of variables or the output of executable scripts. This is an easy way to create dynamic content.

5.7.5 Access Controls

Web servers also can assign access controls to particular resources. When a request arrives for an access-controlled resource, the web server can control access based on the IP address of the client, or it can issue a password challenge to get access to the resource.

Refer to Chapter 12 for more information about HTTP authentication.

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net