3.1 Introduction | Open Source Development with LAMP: Using Linux, Apache, MySQL, Perl, and PHP

The web pages you see when you surf the Web (quit slacking, you!) are served up via the HyperText Transfer Protocol (HTTP) by an httpd daemon ”the "d" at the end means daemon, programs that are always running in the background. ^[1]

^[1] In some distributions, the daemon is called apache instead of httpd .

Currently, Apache is the webserver of choice, and not just for Open Source bigots. As of this writing, Apache has more than 60 percent of the active site webserver market (see www. netcraft .com/survey/). Because it is so widely used, it is widely tested , and when a bug is discovered or a new Web feature is implemented, bug fixes and updates are almost instantaneous. Apache has a BSD-type Open Source license, making it attractive for both commercial and noncommercial applications. Its modular architecture makes it feasible to tailor Apache to the environment you want to serve. Examples of major sites using Apache are Amazon and Yahoo ” people who know how to handle Web traffic.

Apache originated, as did many things Web, ^[2] as an indirect offshoot of the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign (UIUC). ^[3]

^[2] Tim Berners-Lee invented the World Wide Web at CERN, the European high-energy physics (HEP) laboratory. One of us is a high-energy physicist , and WWW was invented so that large HEP collaborations consisting of hundreds of scientists at dozens of locations could communicate results, data, software, and papers. The Open Source applications most of us use today, Netscape and Mozilla and Apache, originated from NCSA in one way or another.

^[3] The NCSA webserver was widely used, but eventually NCSA stopped supporting it. Many people began creating patches to add functionality and fix bugs (it was Open Source, after all). Eventually, developers decided to make it a full-blown non-NCSA project called Apache because it was based on "a patchy" bit of code. There's also a good story from these origins about Mosaic, Netscape, failed dotcoms, and monopolistic rulings about software companies. But not here, not now

In this chapter, we configure Apache, set up the necessary directories for a basic Web site, and add a few simple HTML files. We assume that you already know some basic HTML; if not, see the list of suggested books at the end of this chapter. HTML is easy to learn.

3.1.1 Apache Explained

Figure 3.1 depicts what happens when a user requests a web page from the Apache webserver.

Figure 3.1. Apache explained

graphics/03fig01.gif

The webserver recognizes an HTTP request by the URL of the thing requested or by the filename extension. For instance, If the URL www.example.com/content/chapter1/ were loaded into a browser, the webserver contacted ( www.example.com )would receive a request that might look like this: ^[4]

^[4] This example demonstrates the simpler HTTP protocol version 1.0. It is more likely that the version used will be 1.1, but 1.0 still works, and because it is simpler, we use it here.

 GET /content/chapter1/ HTTP/1.0

The server determines that the thing requested is underneath the document root, a directory where the HTML files reside. For the examples in this book, that is /var/www/html . The text /content/chapter1/ directs Apache to navigate to those directories underneath the document root and grab the HTML file named index.html (by default, the server looks for the file with this name , but this is configurable, as are most things related to Apache).

The result is that the server grabs the file /var/www/html/content/chapter1/index.html , which is simply a text file. It then takes the content of this file and prepends an important piece of information called the header . The header tells the client how to interpret the information that is to follow. For an HTML file, the header tells the client that what follows is text, which is to be interpreted as HTML code. The header is separated from the content that follows by a blank line. Of course, webservers can dish up more than HTML these days: music, streaming video, PDF, etc.

It's an instructive exercise to view the header, blank line, and body that the server serves up, and this can be achieved without using a browser. This can be done in a shell window. (That's good to know if you are someplace that doesn't have a browser but does have a shell. This used to be more common, but now you are likely to find things the other way around.) This example connects to a server and asks for index.html in the directory /content/chapter1/ :

  $ telnet www.not_a_real_web_server.com 80  Trying 1.299.299.1  Connected to www.not_a_real_web_server.com (1.299.299.1)  Escape character is  ^  ].  GET /content/chapter1/ HTTP/1.0  HTTP/1.1 200 OK  Date: Thu, 17 Jan 2002 19:57:05 GMT  Server: Acme Web Server Version 0.001b  Connection: close  Content-Type: text/html  <html>  <head>  ....

When the server accepts the connection, it tells the client (us) so. Then we make the HTTP request:

 GET /content/chapter1/ HTTP/1.0

followed by a blank line. The webserver prints out some header stuff, including the content type text/html , followed by a blank line, followed by the contents of the HTML file. Had a browser, instead of a Telnet session, made the same request, the browser would have taken the information in the header and then the body and rendered it appropriately. That's what browsers are programmed to do.

That's it! Not so magical once the details are known.