Web Network Programming | Microsoft Visual J# .NET (Core Reference) (Pro-Developer)

I l @ ve RuBoard

Sockets are the foundation of many higher-level APIs for building distributed applications that communicate over the Internet (or an intranet). They're lean and mean ”they're fast because they offer the minimum functionality needed to transport data from one computer to another. In this section, we'll examine some of the classes that Microsoft has layered on top of sockets for building systems that use upper-layer protocols, such as HTTP for handling Web requests and providing additional features not available to raw sockets. You'll learn about the approach that the .NET Framework Class Library adopts for defining the HTTP classes and about how the model can be extended to implement other application-defined protocols.

If you've built Web-based systems, you should be familiar with the cycle of operations for a typical exchange of messages between a client and a server. In a Web environment, the client is often a browser such as Microsoft Internet Explorer and the server is a Web server such as Microsoft Internet Information Services (IIS). For the moment, we'll ignore Web services in which the client might be another custom application.

The server resides on a computer that usually listens for incoming client requests using a server socket connected to a well-known port. (Port 80 is reserved for ordinary HTTP requests.) A Web server offers resources that can be accessed by a client that issues an appropriate Uniform Resource Identifier (URI). A URI comprises several pieces: the protocol or scheme for communicating with the server, the address of the server computer itself, and an optional string identifying the particular resource needed. For example, the URI http://www.contentmaster.com/pages/framesets/mainex_fr.asp specifies the protocol as HTTP, the name of the server computer as www.contentmaster.com , and the requested resource as /pages/framesets/mainex_fr.asp .

How the resource request is interpreted is entirely up to the server ”in this case, the resource is an Active Server Pages (ASP) page. The server processes the request and sends back a reply, which in the case of an HTTP server is most likely encoded as an HTML or XML response. In the case of HTTP, the protocol might be connectionless and use TCP/IP services such as DNS to discover the IP address of the destination server. The client fires off a request, which the server can accept or deny ”or even not bother answering. If no reply is received within a reasonable period of time, most browsers will time out and display a default error page. If a reply is sent back, it might arrive as a single packet or many, depending on the nature of the resource requested. The browser receives all this data, pieces it together, and displays the result.

Connection-oriented protocols might also be used when you access data over the Web ”File Transfer Protocol (FTP), for example. By using higher-level protocols, you can also add features that are not available with the lower-level protocols. For example, even though sockets can identify the source of a request, they carry no information about the credentials of the process on the client computer that issued that request. Protocols such as HTTP and HTTPS can transmit information about the identity of the requesting process to a server, which is transported through the underlying socket as ordinary data. ( HTTPS works with Secure Sockets Layer (SSL) to encrypt this data; more on this later in the chapter.)

Pluggable Protocols

Besides HTTP, HTTPS, and FTP, client applications might want to use other protocols such as Gopher, News, or NNTP, and more protocols might be added in the future (and some might disappear). However, the basic model used is often the same ”the client requests a URI and waits for a server response that (it hopes) is in a format it expects. Rather than tie a structure to a specific protocol, the .NET Framework Class Library uses a system of pluggable protocols, which is flexible and extensible.

The System.Net namespace contains two abstract classes ” WebRequest and WebResponse ” that you can use as a basis for implementing the request/response model of the Web. The WebRequest class defines the basic framework needed to issue a request for a URI. The WebResponse class specifies the essential functionality exposed by the response to a Web request. Both classes are protocol-independent . The intention is that a developer can derive specialized versions of the classes to handle each required protocol, although the .NET Framework Class Library already contains the HttpWebRequest and HttpWebResponse classes, which implement HTTP versions of these classes. They are located in the System.Net namespace. Also, the FileWebRequest and FileWebResponse classes provide a file system implementation of these two classes (using the file: scheme).

Note

Do not confuse the HttpWebRequest and HttpWebResponse classes in System.Net with the HttpRequest and HttpResponse classes in System.Web . The System.Web classes are used primarily by ASP.NET code running on the Web server and are intended to process requests from clients and construct responses to send to clients . ASP.NET is covered in detail in Part V of this book.

When you define your own request class, you must register the scheme (such as http or ftp ) that it handles by executing the static RegisterPrefix of the WebRequest class. The WebRequest class maintains a mini in-memory database of registered prefixes and request classes. A given prefix (such as ftp ) can be registered only against a single request class. The return value from WebRequest.RegisterPrefix is true if the prefix was successfully registered, false otherwise .

Note

The prefixes http , https , and file are automatically registered by .NET. The http and https prefixes are handled by the HttpWebRequest class, and the file prefix is handled by the FileWebRequest class. You cannot reregister these prefixes against your own request classes.

Registering Request Classes

An alternative to using WebRequest.RegisterPrefix to register a request class is to use the application configuration file. This file has an optional <webRequestModules> section in which you can specify prefixes and associated request classes. The default configuration setting is shown here:

 <configuration> <system.net> <webRequestModules> <addprefix= "http" type= "System.Net.HttpRequestCreator" /> <addprefix= "https" type= "System.Net.HttpRequestCreator" /> <addprefix= "file" type= "System.Net.FileWebRequestCreator" /> </webRequestModules> </system.net> </configuration>

You create an instance of the request class by executing the static Create method of the WebRequest class, passing it the URI, including the scheme name, of the requested resource. ( WebRequest acts as a factory for different request types.) For example, if you've defined your own class for processing FTP requests called FtpRequest , the following lines of code will register it with the ftp prefix and instantiate it to handle an FTP request to a server at the Fourth Coffee company:

 WebRequest.RegisterPrefix("ftp",newFtpRequest()); WebRequestrequest=WebRequest.Create("ftp://ftp.fourthcoffee.com");

The Create method examines the prefix specified in the URI and builds an instance of the request class registered against that prefix. It is common practice to use a WebRequest variable to refer to the underlying request object. The WebRequest class contains most of the functionality you're likely to need, and writing code in this way decouples it from the underlying protocol, making it easier to switch if the need arises. If a derived request object exposes additional properties and methods , you can cast the return value of Create to the appropriate type to access them:

 FtpRequestrequest= (FtpRequest)WebRequest.Create("ftp://ftp.fourthcoffee.com");

Requesting and Receiving Data Using HTTP

Creating an HTTP connection is simply a matter of using a valid URI containing the http prefix to create a WebRequest object. The PageReader class in the PageReader.jsl sample below (available in the BasicWeb project) illustrates how to connect to an HTTP server, request a page of data, and process the response (this code is available in the BasicWeb project). The main method contacts the server at www.microsoft.com and requests the home page ms.htm :

 WebRequestrequest=WebRequest.Create("http://www.microsoft.com/ms.htm");

This statement actually creates an HttpWebRequest object and fires off the request for the page. You retrieve the reply from the server using the GetResponse method of the request object. This method will block until the server replies (or the client times out), and then it will create a WebResponse object. By default, an HttpWebRequest object will wait up to 100 seconds for a reply. The program changes the timeout to 10 seconds by modifying the Timeout property, which is specified in milliseconds :

 request.set_Timeout(10000); WebResponseresponse=request.GetResponse();

Strictly speaking, the response variable in this example is an HttpWebResponse object, and you can cast the return value from GetResponse if you need to. (There are a few additional properties and methods specific to HTTP that you can access.) To retrieve the reply from the server, you read the data using the Stream object returned by the GetResponseStream method. A word of warning: Because you can use the Stream class's Read method to populate a ubyte array and you can determine how long the response is by querying the ContentLength property, you might be tempted to create a ubyte array big enough to hold the entire response and issue a single Read request to fetch it. But depending on the volume of data in the response, this approach might not succeed because the Web server may struggle to transmit data to the underlying socket at a sufficient rate. You can end up losing a lot of data!

A more tolerant approach is to create a StreamReader based on the response stream and read the data a line at a time:

 StreamReaderreader=newStreamReader(response.GetResponseStream()); StringpageData=reader.ReadLine(); while(pageData!=null) { Console.WriteLine(pageData); pageData=reader.ReadLine(); }

Once all the data has been extracted, you should close the response object, as shown here:

 response.Close();

If you compile and run the program, it will display the HTML text for the Microsoft home page.

An HTML page can link to other resources, such as images and sounds. You must request these resources as well, using the URIs embedded in the HTML response received from the server. These resources usually consist of binary data, so you should not use a text-based stream to read them ”use a BinaryReader instead. Alternatively, you can use the DownloadFile method of a WebClient object. (The WebClient class will be described shortly.)

In addition, a single response from a server might be broken into several pieces. After the first part of the response has been returned, subsequent fragments will be sent as HTTP 100-continue responses (part of the HTTP 1.1 protocol). You can handle these responses in your applications by assigning HttpContinueDelegate to the ContinueDelegate property of the HttpWebRequest object. (Use set_ContinueDelegate in J#.) The method referenced by the delegate will be automatically invoked for each continue response received.

If you connect to the Internet using a Web proxy, you should create a WebProxy object and attach it to the request object so the request is routed correctly. The following code fragment creates a WebProxy that routes requests through the proxy server listening on port 80 at the address http://myproxy . The Boolean value ( true ) indicates that the proxy should not be used for URIs on the local intranet.

 WebRequestrequest=...; WebProxyproxy=newWebProxy("http://myproxy:80",true); request.set_Proxy(proxy); WebResponseresponse=request.GetResponse();//usestheproxy

The WebProxy class has a highly overloaded constructor, and you can specify a range of different values and settings. One useful constructor allows you to provide an array of URIs that will bypass the proxy. If you want to ensure that the same proxy is used by all Web requests, instead of setting the Proxy object for each WebRequest object you can use the GlobalProxySelection class. This class specifies global proxy settings for use by all requests in the application. It exposes a public static property called Select that you can use to get or set the global proxy:

 WebProxyproxy=...; GlobalProxySelection.set_Select(proxy);

Note that setting the Proxy property for a WebRequest object overrides the global proxy setting. You can also indicate that a proxy should not be used by setting the Proxy property to the value returned by the static GetEmptyWebProxy method of the GlobalProxySelection class:

 WebRequestrequest=...; Request.set_Proxy(GlobalProxySelection.GetEmptyWebProxy());

Note

The System.Net namespace also includes the classes FileWebRequest and FileWebResponse . These are used for accessing local files using the file:// scheme. The methods you use to request and retrieve a file are similar to those described in this section for handling HTTP requests. We won't discuss these classes further.

Web Access and Security

As with sockets, using the Web classes requires the appropriate permission. Assemblies that work with the Web classes must be granted System.Net.WebPermission . Again, as with sockets, assemblies that you download from the Internet or an intranet or that you load from a network share do not have this privilege. You can, however, modify the code access security policy to grant it to them. The .NET Framework Configuration tool also allows you to selectively grant assemblies access to individual Web sites.

Posting Data

By default, the HttpWebRequest class uses the HTTP GET method when submitting a request to a URI. You can switch the nature of the request by setting the Method property of an HttpWebRequest object before retrieving any response. (If you try to change it afterwards, you'll trip an InvalidOperationException .) Valid values you can use include HEAD , POST , PUT , DELETE , TRACE , and OPTIONS . We won't describe all these methods here, but we'll look at how to use the POST verb to submit data to a Web server.

The POST verb is commonly used when sending large volumes of data. (Large in this case means more than can be handled in the query string of a GET command.) It is often used with HTML forms. Data is sent to the Web server as a stream. The Web server often executes a program to read this stream, process it, and send back any results. This program might be a CGI script, an ASP page, or just about any other type of executable that can handle streamed input.

The server must be informed of the type of data in the stream and how much data to expect. The HTTP protocol does this by setting fields in the header at the start of the request with this information. The HTTP header is transmitted first, followed by the data stream. The WebRequest class exposes the HTTP header fields as properties. The following code fragment transmits the contents of the string variable data using an HTTP POST operation. The Method , ContentType , and ContentLength properties set the fields in the HTTP header with appropriate values. (For details on HTTP content types, see the HTTP 1.1 specification.)

The GetRequestStream method returns a stream that you can use for sending the data to the Web server. If you want to send a string, it makes sense to create a StreamWriter wrapper around this stream. Certain characters have special meanings in URIs and streams posted to Web servers. These characters must be filtered out and replaced with an appropriate escape sequence. For example, spaces should be replaced with + , and the & character should be replaced by the sequence %26 . The static UrlEncode method of the System.Web.HttpUtility class does this, and it returns an encoded string, which can be submitted safely to a Web server. In this example, the data string is encoded for transmission before being sent. You should close the stream when all the data has been output.

 Stringdata=...; WebRequestrequest=...; request.set_Method("POST"); request.set_ContentType("application/x-www-form-urlencoded"); StringencodedData=HttpUtility.UrlEncode(data); request.set_ContentLength(data.length()); StreamWriterwriter=newStreamWriter(request.GetRequestStream()); writer.Write(encodedData); writer.Close();

Tip

The classes in the System.Web namespace are implemented in the assembly System.Web.dll. Be sure to reference this assembly when you compile an application that uses this namespace. From the command line, use the /reference flag, as described in earlier chapters. If you're using Visual Studio .NET, choose Add Reference from the Project menu; in the Add Reference dialog box, select System.Web.dll and then click OK.

If all is well and the Web server understands the request, you can open the response stream using the GetResponseStream method of the request object and process any reply as before. If the server does not understand the request, the content length indicated does not match the actual length of the content, or some other sort of error occurs, an attempt to read the response form the server will throw a WebException containing an HTTP error code.

The HttpWebRequest and HttpWebResponse Objects

Most of the time, ordinary vanilla WebRequest and WebResponse objects will provide an adequate interface for interacting with the Web using HTTP. However, the HttpWebRequest and HttpWebResponse classes contain some additional methods and properties that you can use if you're writing specialized code. In particular, the Cookies property lets you get or set a collection of cookies associated with a response or request, and the ClientCertificates property lets you retrieve X509 certificate information for a request.

Processing Requests Asynchronously

The examples shown so far send a request and receive a response synchronously. In the world of unreliable connections and indeterminate response times that is the Internet, sending and receiving data in this manner can tie up an application for a long time. The solution, as ever, is to use the thread pool and perform these operations asynchronously. (You can then set a long timeout period to give requests a chance to succeed!)

The WebRequest class supplies the methods BeginGetRequestStream and BeginGetResponse methods, which follow the familiar asynchronous pattern: Both methods return an IAsyncResult object and expect an AsyncCallback delegate that refers to a method to be executed by a thread from the thread pool, along with a user -defined state object. The EndGetRequestStream method blocks until the request stream has been established, and it returns a stream handle. The EndGetResponse method also blocks until a reply is received from the Web server, and it returns a WebResponse object you can use to read and process the reply:

 WebRequestrequest=...; //Waitforaresponseasynchronously request.BeginGetResponse(newAsyncCallback(waitForResponse),request); privatevoidwaitForResponse(IAsyncResultar) { //ExtracttheWebRequestobjectfromtheasync.stateproperty WebRequestrequest=(WebRequest)ar.get_AsyncState(); //Waitforresponse.Blockthethreadifnecessary WebResponseresponse=request.EndGetResponse(ar); //Readtheresponseandprocessitalineatatime StreamReaderreader=newStreamReader(response.GetResponseStream()); }

You can cancel an asynchronous call to BeginGetResponse by calling the Abort method of the request object.

Using a WebClient Object

Just as the TcpClient class wraps the code for setting a client socket and setting its properties, the WebClient class does the same for a WebRequest . The WebClient class exposes methods that let you upload and download data to and from a Web server ”for example, to hide the complexities of setting the properties needed to perform a POST operation. You can send data to a server using the UploadData , UploadFile , and UploadValues methods. You can use the OpenWrite method to open the request stream to send data to the server. (You still need to encode the data to make sure it is not misinterpreted by the server.) The DownloadData and DownloadFile methods retrieve data from a Web server to a byte array or a local file on the client, and the OpenRead method returns a handle to the response stream.

The following code shows an alternative implementation of the main method of the PageReader class that uses a WebClient rather than a WebRequest to read the ms.html page at www.microsoft.com :

 publicstaticvoidmain(String[]args) { WebClientclient=newWebClient(); //Sendarequestforthehomepageatwww.microsoft.com StreamReaderreader= newStreamReader(client.OpenRead("http://www.microsoft.com/ms.htm")); //Readtheresponseanddisplayitalineatatime StringpageData=reader.ReadLine(); while(pageData!=null) { Console.WriteLine(pageData); pageData=reader.ReadLine(); } //Closethestream reader.Close(); }

HTTP Connection Management and Pooling

HTTP was originally designed to be a connectionless protocol because of the nature of client requests and the overall fragility of the Internet. Just because a client had issued a request, there was no guarantee that it would stay connected and issue another, or that the connection between the client and the server would not evaporate because of a switching error somewhere! These were valid concerns, but the overall cost of connecting and disconnecting each time a client sent a request to the same server became prohibitive. Consider a Web page containing embedded images and other resources ”it requires a number of requests to transmit the entire content of the page to the client. Continual connecting and disconnecting began to affect the rate at which data could pour through the Internet as a whole ”a large proportion of the packets flowing around the Internet dealt with handling connections and routing rather than sending real data.

The HTTP 1.1 protocol addressed these issues and provided persistent connections. This involved the client setting a flag in the HTTP header of the initial request asking for a persistent connection, and when the client had finished it would send another HTTP header containing a close flag. (In fact, the default in HTTP 1.1 is to assume a persistent connection unless the client requests otherwise.) A server does not have to honor the request to keep a persistent connection open, however. (This doesn't result in an error, but a new connection must be established the next time the client communicates.) Also, most servers will time out and close a connection that has been inactive for a period of time.

HTTP 1.1 also supports pipelining. A client can send a series of requests without waiting for a response each time. The server should process these requests and send back responses in the same order that the requests were received. The client can then stream through the responses as they are received.

You can indicate that an HttpWebRequest object should use a persistent connection by setting its KeepAlive property ( set_KeepAlive in J#) to true (which is the default value). Setting this property to false will send an HTTP header with the close flag set. You can ask that a request be pipelined by setting the Pipelined ( set_Pipelined ) property of the request object to true. Pipelining requires that KeepAlive also be set to true . These properties are not available through the WebRequest class.

Persistent connections are a potentially expensive resource. The .NET Framework Class Library implements persistent HTTP connection pooling. When an HttpWebRequest object connects to a URI, a ServicePoint object is created that caches the connection. If another HttpWebRequest object accesses the same server (the resource could be different), the same connection will be used and shared by this second request. You can query the ServicePoint object used by an HttpWebRequest object using the ServicePoint property ( get_ServicePoint ). Remember that the connection is established only after a request object attempts to actually send data to or retrieve data from a Web server. You can also obtain a handle on a ServicePoint by calling the static FindServicePoint method of the ServicePointManager class, passing in a URI that specifies the Web server in question:

 ServicePointpooledConnection=ServicePointManager.FindServicePoint(newUri("http://www.microsoft.com"));

If there is currently no existing ServicePoint for the specified Web server, FindServicePoint will create one.

The HTTP 1.1 protocol currently allows up to two concurrent requests to share a connection. If a third request is made to the same Web server, the request will block until one of the first two is closed or a timeout occurs. However, you can increase the ConnectionLimit property of a ServicePoint object to prevent this behavior:

 //Increasetheconnectionlimitto4 pooledConnection.set_ConnectionLimit(4);

You can change the default connection limit for all ServicePoint objects by setting the DefaultConnectionPointLimit property of the ServicePointManager class. This will affect only service points created after the property has been changed.

To examine the number of active connections a ServicePoint has, you can query its CurrentConnections property. As requests disconnect, the number of active connections using a ServicePoint can drop to zero. If a ServicePoint has not been used for a period of time, it will be recycled. The default value is actually 900000ms (15 minutes). You can modify this idle time setting using the MaxIdleTime property:

 //Settheidletimeto30s(30000ms) pooledConnection.set_MaxIdleTime(30000);

You can limit the size of the ServicePoint pool by setting the MaxServicePoints property of the ServicePointManager class. This value defaults to zero, which means that there is no limit on the size of the pool.

I l @ ve RuBoard