The web pages you see when you surf the Web (quit slacking, you!) are served up via the HyperText Transfer Protocol (HTTP) by an httpd daemon ”the "d" at the end means daemon, programs that are always running in the background. [1]
Currently, Apache is the webserver of choice, and not just for Open Source bigots. As of this writing, Apache has more than 60 percent of the active site webserver market (see www. netcraft .com/survey/). Because it is so widely used, it is widely tested , and when a bug is discovered or a new Web feature is implemented, bug fixes and updates are almost instantaneous. Apache has a BSD-type Open Source license, making it attractive for both commercial and noncommercial applications. Its modular architecture makes it feasible to tailor Apache to the environment you want to serve. Examples of major sites using Apache are Amazon and Yahoo ” people who know how to handle Web traffic. Apache originated, as did many things Web, [2] as an indirect offshoot of the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign (UIUC). [3]
In this chapter, we configure Apache, set up the necessary directories for a basic Web site, and add a few simple HTML files. We assume that you already know some basic HTML; if not, see the list of suggested books at the end of this chapter. HTML is easy to learn. 3.1.1 Apache ExplainedFigure 3.1 depicts what happens when a user requests a web page from the Apache webserver. Figure 3.1. Apache explained
The webserver recognizes an HTTP request by the URL of the thing requested or by the filename extension. For instance, If the URL www.example.com/content/chapter1/ were loaded into a browser, the webserver contacted ( www.example.com )would receive a request that might look like this: [4]
GET /content/chapter1/ HTTP/1.0 The server determines that the thing requested is underneath the document root, a directory where the HTML files reside. For the examples in this book, that is /var/www/html . The text /content/chapter1/ directs Apache to navigate to those directories underneath the document root and grab the HTML file named index.html (by default, the server looks for the file with this name , but this is configurable, as are most things related to Apache). The result is that the server grabs the file /var/www/html/content/chapter1/index.html , which is simply a text file. It then takes the content of this file and prepends an important piece of information called the header . The header tells the client how to interpret the information that is to follow. For an HTML file, the header tells the client that what follows is text, which is to be interpreted as HTML code. The header is separated from the content that follows by a blank line. Of course, webservers can dish up more than HTML these days: music, streaming video, PDF, etc. It's an instructive exercise to view the header, blank line, and body that the server serves up, and this can be achieved without using a browser. This can be done in a shell window. (That's good to know if you are someplace that doesn't have a browser but does have a shell. This used to be more common, but now you are likely to find things the other way around.) This example connects to a server and asks for index.html in the directory /content/chapter1/ : $ telnet www.not_a_real_web_server.com 80 Trying 1.299.299.1 Connected to www.not_a_real_web_server.com (1.299.299.1) Escape character is ^ ]. GET /content/chapter1/ HTTP/1.0 HTTP/1.1 200 OK Date: Thu, 17 Jan 2002 19:57:05 GMT Server: Acme Web Server Version 0.001b Connection: close Content-Type: text/html <html> <head> .... When the server accepts the connection, it tells the client (us) so. Then we make the HTTP request: GET /content/chapter1/ HTTP/1.0 followed by a blank line. The webserver prints out some header stuff, including the content type text/html , followed by a blank line, followed by the contents of the HTML file. Had a browser, instead of a Telnet session, made the same request, the browser would have taken the information in the header and then the body and rendered it appropriately. That's what browsers are programmed to do. That's it! Not so magical once the details are known. |