Recipe14.8.Obtaining the HTML from a URL


Recipe 14.8. Obtaining the HTML from a URL

Problem

You need to get the HTML returned from a web server in order to examine it for items of interest. For example, you could examine the returned HTML for links to other pages or for headlines from a news site.

Solution

You can use the methods for web communication that were set up in Recipes 14.5 and 14.6 to make the HTTP request and verify the response; then, you can get at the HTML via the ResponseStream property of the HttpWebResponse object:

 public static string GetHtmlFromUrl(string url) {     if (string.IsNullOrEmpty(url))         throw new ArgumentNullException("url","Parameter is null or empty");     string html = "";     HttpWebRequest request = GenerateHttpWebRequest(url);     using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())     {         if (VerifyResponse(response) == ResponseCategories.Success)         {              // Get the response stream.              Stream responseStream = response.GetResponseStream();             // Use a stream reader that understands UTF8.             using(StreamReader reader =             new StreamReader(responseStream, Encoding.UTF8))             {                 html = reader.ReadToEnd();             }         }     }     return html; } 

Discussion

The GetHtmlFromUrl method gets a web page using the GenerateHttpWebRequest and Getresponse methods, verifies the response using the VerifyResponse method, and then, once it has a valid response, starts looking for the HTML that was returned.

The GeTResponseStream method on the HttpWebResponse provides access to the body of the message that was returned in a System.IO.Stream object. In order to read the data, instantiate a StreamReader with the response stream and the UTF8 property of the Encoding class to allow for the UTF8-encoded text data to be read correctly from the stream. Then call the StreamReader's ReadToEnd method, which puts all of the content in the string variable called html, and return it.

See Also

See the "HttpWebResponse.GetResponseStream Method," "Stream Class," and "StringBuilder Class" topics in the MSDN documentation.



C# Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2004
Pages: 424

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net