As you learned in Chapter 1, the Web isn't stored on any single computer, and no company owns the Web. Instead, the individual pieces (Web sites) are scattered across millions of computers (Web servers). Only a subtle illusion makes all these Web sites seem to be part of a single environment. In reality, the Internet is just a set of standards that let independent computers talk to each other.
So how does your favorite browser navigate this tangled network of computers to find the Web page you want? It's all in the URLthe Web site address you type into your browser.
A URL (Uniform Resource Locator) consists of several pieces. Some of these pieces are optional, because they can be filled in by the browser or Web server automatically. Others are always required. Figure 3-1 dissects the URL http://www.SellMyJunkForMillions.com/Buyers/listings.htm .
|
Altogether, the URL packs a lot of information into one place, including:
The protocol is the way you communicate over the Web. Technically, it's the way that request and response messages are transmitted across your Internet connection. Web pages always use HTTP (HyperText Transport Protocol), which means the protocol is always http:// or https :// . (The latter establishes a super-secure connection over HTTP that encrypts sensitive information you type in, like credit card numbers or passwords.) In most browsers, you can get away without typing this part of the URL. For example, when you type www.google.com, your browser will automatically convert it to the full URL http://www.google.com.
The domain identifies the Web serverthe computer that hosts the Web site you want to see. As a convention, these computers usually have names that start with www to identify them as Web servers, although this isn't always the case. As you'll discover in this chapter, the friendly seeming domain name is really just a fa §ade hiding a numeric address.
The path identifies the location on the Web server where the Web page is stored. This part of the URL can have as many levels as is needed. For example, the path /MyFiles/Sales/2005/ refers to a MyFiles folder that contains a Sales folder that, in turn , contains a folder named 2005. Windows fans, take notethe slashes in the path portion of the URL are ordinary forward slashes, not the backward slashes used in Windows file paths (like c:\MyFiles\Current ). This convention is designed to match the file paths used by Unix-based computers, which were the first machines to host Web sites. It's also the convention used in modern Macintosh operating systems (OS X and later).
The file name is the last part of the path. Often, you can recognize it by the file extension .htm or .html , both of which stand for HTML.
The bookmark is an optional part of a URL that identifies a specific position in a page. You can recognize a bookmark because it always starts with the hash character (#), and is placed after the file name. For example, the URL http://www.LousyDeals.com/index.html#New includes the bookmark #New. When clicked, it takes the visitor to the section of the index.html page where the New bookmark is placed. You'll learn about bookmarks in Chapter 8.
The query string is an optional part of the URL that some Web sites use to send extra instructions from one Web page to another. You can identify the query string because it starts with a question mark (?) character, and is placed after the file name. To see a query string in action, surf to www.google.com and perform a search for "pet platypus." When you click the Search button, you're directed to a URL like http://www.google.ca/search?hl=en&q=pet+platypus&meta=. This URL is a little tricky to analyze, but if you search for the question mark in the URL you'll discover that you're on a page named "search." The information to the right of the question mark indicates that you're performing an English language search for pages that match both the "pet" and " platypus " keywords. When you request this URL, a specialized Google Web application analyzes the query string to determine what type of search it needs to perform.
Clearly, the URL packs a lot of useful information into one place. But how does a browser actually use the URL to request the Web page you want? To understand how this works, it helps to take a peek behind the scenes (see Figure 3-2).
|
The following list of steps shows a breakdown of what the browser needs to do when you type http://www.SellMyJunkForMillions.com/Buyers/listings.htm into the address bar and hit Enter:
First, the browser needs to figure out what Web server to contact. It does this by extracting the domain from the URL .
In this example, the domain is www.SellMyJunkForMillions.com .
In order to find the Web server named www.SellMyJunkForMillions.com, the browser needs to convert the domain name into a more computer-friendly number, which is called the IP address . Every computer on the WebWeb servers and regular PCs alikehas an IP address. To find the IP address for the Web server, the browser looks up the Web server's domain name in a giant catalog called the DNS (Domain Name Service) .
An IP address looks like a set of four numbers separated by periods (or, in techy speak, dots). For example, the www.SellMyJunkForMillions.com Web site may have the IP address 17.202.99.125.
Using the IP address, the browser sends the request to the Web server .
The actual route that the message takes is difficult to predict. It may cross through a number of other Web servers on the way.
When the Web server receives the request, it looks at the path and file name in the URL .
In this case, the Web server sees that the request is for a file named listings.htm in a folder named Buyers . It looks up that file, and then sends it back to the Web browser. If the file doesn't exist, it sends back an error message instead.
The browser gets the HTML page it's been waiting for (the listings.htm file), and renders it for your viewing pleasure .
The URL http://www.SellMyJunkForMillions.com/Buyers/listings.htm is a typical example. However, in the wild, you'll sometimes come across URLs that seem a lot simpler. For instance, consider http://www.amazon.com. It clearly specifies the domain name (www.amazon.com), but it doesn't include any information about the path or file name. So what's a Web browser to do?
When your URL doesn't include a file name, the browser just sends the request as is, and lets the Web server decide what to do. The Web server sees that you aren't requesting a specific file, and so it sends you the site's default Web page, which is often named index.htm or index.html . However, the Web administrator can configure the Web server to use any Web page file name as the default.
Now that you understand how URLs work, you're ready to integrate your own pages into the fabric of the Web.