Chapter 5: Developing with Uniform Resource Identifiers


As the Web began to take shape, many problems associated with its massive growth arose. Being able to identify objects or resources in a way that avoided conflicts was one key problem for the creators of the Web. These objects were often files such as documents, graphics, or programs. Uniform Resource Identifiers (URIs) were created to solve the problem of unique object identification by specifying a universal set of namespaces that can be used to identify all resources. URIs play a critical role in network development because users often need to either interact with or refer to resources that are represented by URIs.

This chapter covers the components of a URI and introduces you to System.Uri , the Microsoft Windows .NET Framework class used to represent a URI. We ll discuss the most common techniques used when manipulating URIs with System.Uri and then delve into the aspects that developers often struggle with, such as understanding the escaping logic, comparing URIs, exposing URIs in your application, and working with different URI schemes.

Key Components of a URI

A URI is defined in Request for Comments (RFC) 2396 as a compact string of characters for identifying an abstract or a physical resource. A URI in general is made up of two parts : the scheme and the scheme-specific part.

In Figure 5-1, you can see that the scheme is often associated with protocols seen on the Web today, where the scheme-specific part identifies the resource. Many scheme-specific parts of a URI also have an authority, a path , and a query, but these parts are not required.

click to expand
Figure 5-1: Syntax and examples of the required URI parts

In Figure 5-2, you ll notice that the authority is often used to contain what is commonly considered to be the host name or host address. The path might contain file names , and the query is used to specify name/value pairs of information.

click to expand
Figure 5-2: Syntax and examples of common (but not required) URI parts

Scheme Component

The scheme determines the logic that s used for parsing and, in cases where possible, for resolving the resource specified in the scheme-specific part. Scheme names are defined in lowercase. However, because URIs are not always machine generated, most applications will accept a scheme in a case- insensitive manner.

Authority Component

The authority component of a URI is defined as the top hierarchical element of the URI that governs the remainder of the namespace defined by the URI. The authority component is often preceded by a double slash (//). For example, consider the following URI:

 http://www.contoso.com/products/list.aspx?name=soap 

The authority in this example is www.contoso.com , and it s responsible for the remainder of the namespace.

Many schemes designate protocols that include a default port number to be used when resolving the resource. If the port number is omitted from the authority, the default is used. In this example, the default port number for the HTTP protocol is 80. An authority could specify a non-default port number, as follows :

 http://www.contoso.com:8080/products/list.aspx?name=soap 

Path Component

The path is used to further identify the resource within the scope of the specified scheme and authority (when an authority is present). The path is often preceded by a single slash (/) and might contain multiple path segments separated by a slash. For example, the following URI contains three path segments:

 http://www.contoso.com/products/new/list.aspx?name=soap 

In this example, the segments are products , new , and list.aspx .

Note  

Tip Note that list.aspx is a path segment. The general syntax for a URI does not define the notion of a file and an extension. Developers should be careful about making any assumptions based on the notion that a file name and an extension can be reliably parsed from a URI. There s no guarantee that a segment that looks like a file and an extension is not just a directory with a dot in the middle.

Query Component

The query, also called the query string, represents a string of information that s interpreted by the resource. Although the query string is often used to provide information to be interpreted by the resource, developers should be aware of the fact that the query component is considered part of the URI, so its contents can be used to determine the resource that s obtained when the URI is resolved.

Note that some URI schemes, such as HTTP, support the notion of a fragment . A fragment is a string of information that s meant to be interpreted by the program resolving the URI and is not considered part of the URI. The fragment is separated from the URI by a crosshatch (#) character. For example, suppose the following URI and fragment are entered into a browser:

 http://www.contoso.com/products/new/list.htm#newproducts 

In this example, the newproducts fragment might be interpreted by the browser as a bookmark telling it which portion of the page it should display. The resource in this case is represented by http://www.contoso.com/products /new/list.htm. Because the fragment is not part of the URI, its contents are not used to determine the resource that s obtained if the resource can be resolved.

URI Types

You ve probably noticed by now that we ve been careful to point out that not all resources represented by a URI can be resolved. In fact, there are two principal types of URIs: URLs and URNs. The type that most people are familiar with is the Uniform Resource Locator (URL). A URL is a subset of a URI that identifies a resource by indicating how the resource can be accessed. For example, most of the URIs displayed thus far in this chapter fit into the URL category because you can use them to access a resource. However, URIs can also be used to name things without necessarily describing how they are accessed. In the case where the URI defines a name, it s called a Uniform Resource Name (URN). The benefit of having a URN is that it can be used to identify a resource even after the resource ceases to exist or becomes unavailable. For example, the URI urn:people:santaclaus could be used to name Santa Claus, but it does not give us the information necessary to locate him.

The one other type distinction that s important to understand for URIs is that of an absolute URI versus a relative URI . An absolute URI contains a scheme and a scheme-specific part. So far in this chapter, we ve mostly been talking about absolute URIs. It s also possible to have a relative URI, which is a URI reference that s related to some base URI. A relative URI does not contain a scheme. In most cases, a relative URI contains only the path and query components. Because the scheme and the authority are not present, they must be known through some other means. For example, an application downloading an HTML page might find an absolute URI in the document and then find relative links within the embedded HTML. If the absolute URI is not defined in the document, the application might assume that the original URI for the downloaded HTML page is the base URI.




Network Programming for the Microsoft. NET Framework
Network Programming for the MicrosoftВ® .NET Framework (Pro-Developer)
ISBN: 073561959X
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net