Recipe14.3.Parsing a URI


Recipe 14.3. Parsing a URI

Problem

You need to split a uniform resource identifier (URI) into its constituent parts.

Solution

Construct a System.Net.Uri object and pass the URI to the constructor. This class constructor parses out the constituent parts of the URI and allows access to them via the Uri properties. You can then display the URI pieces individually, as shown in Example 14-1.

Example 14-1. ParseURI method

 public static void ParseUri(string uriString) {     try     {         // Use just one of the constructors for the System.Net.Uri class.         // This will parse it for us.         Uri uri = new Uri(uriString);         // Look at the information we can get at now…         StringBuilder uriParts = new StringBuilder();         uriParts.AppendFormat("AbsoluteURI: {0}{1}",                             uri.AbsoluteUri,Environment.NewLine);         uriParts.AppendFormat("AbsolutePath: {0}{1}",                             uri.AbsolutePath,Environment.NewLine);         uriParts.AppendFormat("Scheme: {0}{1}",                             uri.Scheme,Environment.NewLine);         uriParts.AppendFormat("UserInfo: {0}{1}",                             uri.UserInfo,Environment.NewLine);         uriParts.AppendFormat("Authority: {0}{1}",                             uri.Authority,Environment.NewLine);         uriParts.AppendFormat("DnsSafeHost: {0}{1}",                             uri.DnsSafeHost,Environment.NewLine);         uriParts.AppendFormat("Host: {0}{1}",                             uri.Host,Environment.NewLine);         uriParts.AppendFormat("HostNameType: {0}{1}",                             uri.HostNameType.ToString(),Environment.NewLine);         uriParts.AppendFormat("Port: {0}{1}",uri.Port,Environment.NewLine);         uriParts.AppendFormat("Path: {0}{1}",uri.LocalPath,Environment.NewLine);         uriParts.AppendFormat("QueryString: {0}{1}",uri.Query,Environment.NewLine);         uriParts.AppendFormat("Path and QueryString: {0}{1}",                             uri.PathAndQuery,Environment.NewLine);         uriParts.AppendFormat("Fragment: {0}{1}",uri.Fragment,Environment.NewLine);         uriParts.AppendFormat("Original String: {0}{1}",                             uri.OriginalString,Environment.NewLine);         uriParts.AppendFormat("Segments: {0}",Environment.NewLine);         for (int i = 0; i < uri.Segments.Length; i++)             uriParts.AppendFormat(" Segment {0}:{1}{2}",                             i, uri.Segments[i], Environment.NewLine);         // GetComponents can be used to get commonly used combinations         // of URI information.         uriParts.AppendFormat("GetComponents for specialized combinations: {0}",                             Environment.NewLine);         uriParts.AppendFormat("Host and Port (unescaped): {0}{1}",                              uri.GetComponents(UriComponents.HostAndPort,                              UriFormat.Unescaped),Environment.NewLine);         UriParts.AppendFormat("HttpRequestUrl (unescaped): {0}{1}",                              uri.GetComponents(UriComponents.HttpRequestUrl,                              UriFormat.Unescaped),Environment.NewLine);         UriParts.AppendFormat("HttpRequestUrl (escaped): {0}{1}",                              uri.GetComponents(UriComponents.HttpRequestUrl,                              UriFormat.UriEscaped),Environment.NewLine);         UriParts.AppendFormat("HttpRequestUrl (safeunescaped): {0}{1}",                             uri.GetComponents(UriComponents.HttpRequestUrl,                              UriFormat.SafeUnescaped),Environment.NewLine);         UriParts.AppendFormat("Scheme And Server (unescaped): {0}{1}",                              uri.GetComponents(UriComponents.SchemeAndServer,                              UriFormat.Unescaped),Environment.NewLine);         UriParts.AppendFormat("SerializationInfo String (unescaped): {0}{1}",                             uri.GetComponents(UriComponents.SerializationInfoString,                             UriFormat.Unescaped),Environment.NewLine);         UriParts.AppendFormat("StrongAuthority (unescaped): {0}{1}",                              uri.GetComponents(UriComponents.StrongAuthority,                              UriFormat.Unescaped),Environment.NewLine);         UriParts.AppendFormat("StrongPort (unescaped): {0}{1}",                             uri.GetComponents(UriComponents.StrongPort,                             UriFormat.Unescaped),Environment.NewLine);         // Write out our summary.         Console.WriteLine(UriParts.ToString());     }     catch(ArgumentNullException e)     {         // UriString is a null reference (Nothing in Visual Basic).         Console.WriteLine("Uri string object is a null reference: {0}",e);     }     catch(UriFormatException e)     {         Console.WriteLine("Uri formatting error: {0}",e); }     } } 

Discussion

The Solution code uses the Uri class to do the heavy lifting. The constructor for the Uri class can throw two types of exceptions: an ArgumentNullException and a UriFormatException. The ArgumentNullException is thrown when the uri argument passed is null. The UriFormatException is thrown when the uri argument passed is of an incorrect or indeterminate format. Here are the error conditions that can throw a UriFormatException:

  • An empty Uri was passed in.

  • The scheme specified in the Uri is not correctly formed. See CheckSchemeName.

  • The URI passed in contains too many slashes.

  • The password specified in the passed-in URI is invalid.

  • The hostname specified in the passed-in URI is invalid.

  • The filename specified in the passed-in URI is invalid.

  • The username specified in the passed-in URI is invalid.

  • The host or authority name specified in the passed-in URI cannot be terminated by backslashes.

  • The port number specified in the passed-in URI is invalid or cannot be parsed.

  • The length of the passed-in URI exceeds 65,534 characters.

  • The length of the scheme specified in the passed-in URI exceeds 1023 characters.

  • There is an invalid character sequence in the passed-in URI.

There is no actual validation that occurs for the username, host or authority name, password or port number to insure that they exist or are correct. The validation is simply that they are in the correct format according to the URI specification (RFC 2396).


System.Net.Uri provides methods to compare URIs, parse URIs, and combine URIs. It is all you should ever need for URI manipulation and is used by other classes in the Framework when a URI is called for. The syntax for the pieces of a URI is this:

 [scheme]://[user]:[password]@[host/authority]:[port]/[path];[params]? [query string]#[fragment] 

If you pass the following URI to ParseUri:

http://user:password@localhost:8080/www.abc.com/home.htm?item=1233#stuff

it will display the following items:

 AbsoluteURI: http://user:password@localhost:8080/www.abc.com/home%20page.htm? item=1233#stuff AbsolutePath: /www.abc.com/home%20page.htm Scheme: http UserInfo: user:password Authority: localhost:8080 DnsSafeHost: localhost Host: localhost HostNameType: Dns Port: 8080 Path: /www.abc.com/home page.htm QueryString: ?item=1233 Path and QueryString: /www.abc.com/home%20page.htm?item=1233 Fragment: #stuff Original String: http://user:password@localhost:8080/www.abc.com/home%20page.htm? item=1233#stuff Segments:     Segment 0: /     Segment 1: www.abc.com/     Segment 2: home%20page.htm GetComponents for specialized combinations: Host and Port (unescaped): localhost:8080 HttpRequestUrl (unescaped): http://localhost:8080/www.abc.com/home page.htm? item=1233 HttpRequestUrl (escaped): http://localhost:8080/www.abc.com/home%20page.htm? item=1233 HttpRequestUrl (safeunescaped): http://localhost:8080/www.abc.com/home page.htm? item=1233 Scheme And Server (unescaped): http://localhost:8080 SerializationInfo String (unescaped): http://user:password@localhost:8080/ www.abc.com/home page.htm?item=1233#stuff StrongAuthority (unescaped): user:password@localhost:8080 StrongPort (unescaped): 8080 

See Also

See the "Uri Class," "ArgumentNullException Class," and " UriFormatException Class" topics in the MSDN documentation.



C# Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2004
Pages: 424

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net