One of the most important things to grasp when working on the Web is the format for URLs. A URL is basically an address on the Web, identifying each document uniquely (for example, http://www.oreilly.com/products.html ). Since URLs are so fundamental to the Web, we discuss them here in a little detail. The simple syntax for a URL is: http:// host / path - host
-
The host to connect to: www.oreilly.com or www.google.com . (While many web servers run on hosts beginning with www , the www prefix is just a convention.) - path
-
The document requested on that server. This is not the same as the filesystem path, as its root is defined by the server. Most URLs follow this simple syntax. A more generalized syntax, however, is: scheme :// host / path / extra-path-info ? query-info - scheme
-
The protocol that connects to the site. For web sites, the scheme is http; for FTP, the scheme is ftp. - extra-path-info and query-info
-
Optional information used by CGI programs. See Chapter 12. HTML documents also often use a "shorthand" for linking to other documents on the same server, called a relative URL . An example of a relative URL is images/webnut.gif . The browser knows to translate this into complete URL syntax before sending the request. For example, if http://www.oreilly.com/books/webnut.html contains a reference to images/webnut.gif , the browser reconstructs the relative URL as a full (or absolute ) URL, http://www.oreilly.com/books/images/webnut.gif , and requests that document independently (if needed). Often in this book, you'll see us refer to a URI, not a URL. A URI (Universal Resource Identifier) is a superset of URL, in anticipation of different resource naming conventions being developed for the Web. For the time being, however, the only URI syntax in practice is URL; so while purists might complain, you can safely assume that "URI" is synonymous with "URL" and not go wrong (yet). |