A Closer Look at URLs

URLs are so common now that they appear with little or no explanation on TV commercials and bubble gum wrappers. But the home page URLs you hear in the media are only a small subset of the many options available with this versatile form. The URL is defined in RFC 1738.

Not all URLs refer to HTTP. In fact, the URL form was devised as a universal method for several different Internet protocols. The protocol portion of the URL is referred to as the scheme. The scheme identifies a protocol and therefore tells the computer how to interpret the rest of the URL. The general format for a URL is described in RFC 1738 as

 <scheme>:<scheme-specific-part> 

Table 17.1 shows some of the scheme options defined in RFC 1738. Other schemes are also possible. In fact, some new schemes have been added in later RFCs.

Table 17.1. URL Schemes

Scheme

Description

ftp

File Transfer Protocol

http

Hypertext Transfer Protocol

gopher

The Gopher protocol

mailto

Electronic mail

news

Usenet news

nntp

Usenet news with NNTP access

telnet

Interactive session (see Hour 15)

wais

Wide area information servers

file

Host-specific filenames

As the <protocol-specific-part> term in later the general form of the URL demonstrates, the structure of the URL may differ, depending on the URL's scheme. The computer first reads the scheme, and the scheme tells the computer how to interpret the rest of the URL. As this hour focuses on HTTP, this section will focus primarily on the HTTP form of the URL. But it is worth noting that you'll also encounter other schemes as you browse the Web. The ftp scheme is another common variant. Most modern Web browsers are capable of recognizing alternative schemes such as ftp and responding to the URL accordingly.

The general form for later an HTTP URL is

http://<host>[:<port>]/<path>[;<parameters>][?<search>]

<host> is the DNS name of the server (for example, www.dobro.com), and <path> is the path to the HTML document or other resource. The other options are less common and are less familiar to the average user. Those options include

  • <port> The port number of the daemon or service to which the browser is connecting. (See Hour 6, "The Transport Layer," for more on port numbers.) The port number reserved for HTTP servers is TCP port 80. If the port number is omitted, port 80 is assumed.

  • <parameters> Optional parameters supplied by the client. The user almost never has to enter parameters in order to access a Web site. However, parameters are sometimes passed to the server through scripts.

  • <search> Lets the client send a query string to the user. The user almost never enters a query into a URL by hand. Watch the URL box of your Web browser when you enter a search through one of the Internet search engines. You may see a query string transmitted to the search server through the URL.

By the Way

Complex URLs containing ports, parameters, and queries are sometimes used to reconfigure the Web server itself. The Web server must possess the necessary extensions and scripts to process the configuration request.


If a connection has already been established, it is not necessary to use the entire URL to identify a resource. HTTP and RFC 1738 permit the use of a relative URL. The relative URL gives the URL as referenced from the current page or from a default <BASE> location defined in the document. For example, if you are already on the home page specified with the URL http://www.dobro.com, the relative URL to the file

http:/www.dobro.com/techniques/repair/fix.html

is techniques/repair/fix.html.

The relative URL might seem like a confusing way to save a few bits and keystrokes, but it offers benefits in building and deploying Web sites. As shown in Figure 17.3, if the Webmaster uses relative URLs for the internal links within a Web site, the complete directory structure for the site can be copied to a different server without disrupting the integrity of the links.

Figure 17.3. Relative URLs make a Web site portable.

graphics/17fig03.gif



Sams Teach Yourself TCP/IP in 24 Hours
Sams Teach Yourself TCP/IP in 24 Hours (4th Edition)
ISBN: 0672329964
EAN: 2147483647
Year: 2003
Pages: 259
Authors: Joe Casad

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net