C.2.1 The HTTP Server
On the Internet, communication is also handled by a TCP/IP connection. The Web is based on this model. The server side responds to client (browser) requests and provides feedback by sending back a document, by executing a CGI program, or by issuing an error message. The network protocol that is used by the Web so that the server and client know how to talk to each other is the Hypertext Transport Protocol, or HTTP. This does not preclude the TCP/IP protocol being implemented. HTTP objects are mapped onto the transport data units, a process that is beyond the scope of this discussion; it is a simple, straightforward process that is unnoticed by the typical Web user . (See www.cis.ohio-state.edu/cgi-bin/rfc/rfc2068.html for a technical description of HTTP.) The HTTP protocol was built for the Web to handle hypermedia information; it is object-oriented and stateless. In object-oriented terminology, the documents and files are called objects and the operations that are associated with the HTTP protocol are called methods . When a protocol is stateless, neither the client nor the server stores information about each other, but manages its own state information.
Once a TCP/IP connection is established between the Web server and client, the client will request some service from the server. Web servers are normally located at well-known TCP port 80. The client tells the server what type of data it can handle by sending Accept statements with its requests. For example, one client may accept only HTML text, whereas another client might accept sounds and images as well as text. The server will try to handle the request (requests and responses are in ASCII text) and send back whatever information it can to the client (browser).
( Client's (Browser) Request ) GET /pub HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.0 Gold Host: severname.com Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,*/*
( Server's Response ) HTTP/1.1 200 OK Server: Apache/1.2b8 Date: Mon, 22 Jan 2001 13:43:22 GMT Last-modified: Mon, 01 Dec 2000 12:15:33 Content-length: 288 Accept-Ranges: bytes Connection: close Content-type: text/html <HTML><HEAD><TITLE>Hello World!</TITLE> ---continue with body--- </HTML> Connection closed by foreign host.
The response confirms what HTTP version was used, the status code describing the results of the server's attempt (did it succeed or fail?), a header, and data. The header part of the message indicates whether the request is okay, what type of data is being returned (for example, the content type may be html/text ), and how many bytes are being sent. The data part contains the actual text being sent.
The user then sees a formatted page on the screen, which may contain highlighted hyperlinks to some other page. Regardless of whether the user clicks on a hyperlink, once the document is displayed, that transaction is completed and the TCP/IP connection will be closed. Once closed, a new connection will be started if there is another request. What happened in the last transaction is of no interest to either client or server; in other words, the protocol is stateless.
HTTP is also used to communicate between browsers, proxies, and gateways to other Internet systems supported by FTP, Gopher, WAIS, and NNTP protocols.
C.2.2 HTTP Status Codes and the Log Files
When the server responds to the client, it sends information that includes the way it handled the request. Most Web browsers handle these codes silently if they fall in the range between 100 and 300. The codes within the 100 range are informational, indicating that the server's request is being processed . The most common status code is 200, indicating success, which means the information requested was accepted and fulfilled.
Check your server's access log to see what status codes were sent by your server after a transaction was completed.  The following example consists of excerpts taken from the Apache server's access log. This log reports information about a request handled by the server and the status code generated as a result of the request. The error log contains any standard error messages that the program would ordinarily send to the screen, such as syntax or compiler errors.
Table C.1. HTTP status codes
(From Apache's Access log) 1 susan - - [06/Jul/1997:14:32:23 -0700] "GET /cgi-bin/hello.cgi HTTP/1.0" 500 633 2 susan - - [16/Jun/1997:11:27:32 -0700] "GET /cgi-bin/hello.cgi HTTP/1.0" 200 1325 3 susan - - [07/Jul/1997:09:03:20 -0700] "GET /htdocs/index.html HTTP/1.0" 404 170
C.2.3 The URL (Uniform Resource Locator)
URLs are what you use to get around on the Web. You click on a hotlink and you are transported to some new page, or you type a URL in the browser's Location box and a file opens up or a script runs. It is a virtual address that specifies the location of pages, objects, scripts, etc. It refers to an existing protocol such as HTTP, Gopher, FTP, mailto, file, Telnet, or news (see Table C.2). A typical URL for the popular Web HTTP protocol looks like this:
Table C.2. Web protocols.
The two basic pieces of information provided in the URL are the protocol http and the data needed by the protocol, www.comp.com/dir/files/text.html . The parts of the URL are further defined in Table C.3.
Table C.3. Parts of a URL.
The default HTTP network port is 80; if an HTTP server resides on a different network port, say 12345 on www.comp.com , then the URL becomes
Not all parts of a URL are necessary. If you are searching for a document in the Locator box in the Netscape browser, the URL may not need the port number, parameters, query, or fragment parts. If the URL is part of a hotlink in the HTML document, it may contain a relative path to the next document, that is, relative to the root directory of the server. If the user has filled in a form, the URL line may contain information appended to a question mark in the URL line. The appearance of the URL really depends on what protocol you are using and what operation you are trying to accomplish.
1 http://www.cis.ohio-state.edu/htbin/rfc2068.html 2 http://127.0.0.1/Sample.html 3 ftp://oak.oakland.edu/pub/ 4 file://opt/apache_1.2b8/htdocs/index.html 5 http://susan/cgi-bin/form.cgi?string=hello+there
File URLs and the Server's Root Directory
If the protocol used in the URL is file , the server assumes that file is on the local machine. A full pathname followed by a filename is included in the URL. When the protocol is followed by a server name, all pathnames are relative to the document root of the server. The document root is the directory defined in the server's configuration file as the main directory for your Web server. The leading slash that precedes the path is not really part of the path as with the UNIX absolute path, which starts at the root directory. Rather, the leading slash is used to separate the path from the hostname. An example of a URL leading to documents in the server's root directory:
The full UNIX pathname for this might be
A shorthand method for linking to a document on the same server is called a partial or relative URL. For example, if a document at http://www.myserver/stories/webjoke.html contains a link to images/webjoke.gif , this is a relative URL. The browser will expand the relative URL to its absolute URL, http://www.myserver/stories/images/webjoke.gif , and make a request for that document if asked.