Chapter 21. How URLs Work | How the Internet Works (8th Edition)

The web pages and the hosts that make up the World Wide Web must have unique locations so that your computer can locate and retrieve the pages. The unique identifier for a host is called the Internet Protocol (IP) address, and the unique identifier for a page is called the uniform resource locator (URL). A URL functions much like a postal or email address. Just as postal and email addresses list a name and specific location, a URL, or web address, indicates where the host computer is located, the location of the website on the host, and the name of the web page and the file type of each document, among other information.

A typical URL looks like this:

http://www.whitehouse.gov/president/index.html

If you were to interpret the instructions in this URL from left to right, it would translate to: "Go to the host computer called whitehouse (a government agency), in a directory called president, and retrieve a hypertext document with the filename index.html." The URL, or address, tells the browser which document to fetch and exactly where to find it on a specific remote host computer somewhere on the Internet.

The first part of the URL indicates what type of transfer protocol will be used to retrieve the specified document. The most common request is for a hypertext document that uses Hypertext Transfer Protocol (HTTP).

The second portion of the URL refers to the specific host computer on which the document resides, which is to be contacted by the browser software. This part of the address is also called the domain name. See Chapter 5, "How Internet Addresses and Domains Work," for more information about domains.

The third part of the URL is the directory on the host computer that contains a specific website or multiple websites. This is always located after the first single slash in the URL and is essentially the subdirectory on the hard disk that houses the website. Subdirectories might also be indicated in this part of the address. For example, if the previous URL were changed to http://www.whitehouse.gov/history/presidents/, there would be two subdirectorieshistory and presidents.

In the preceding example, the filename is chapter1.html. This is always the last portion of the URL. If you see an address without a filename, it is assumed that the filename index.html contains the requested web page. Therefore, the default document a web server will deliver to the client when no other filename is listed is index.html. (Note that sometimes the last portion of the URL might not be a filenameit could be other types of information required by a web server, such as codes required to log on to the web server.)

The illustration in this chapter shows the process necessary to request and retrieve a web document. When a request for a document occurs for the first time in a web-browsing session, the host computer must first be located to find the file. After that, the specific subdirectory and document are retrieved.