Recipe 17.2. Accessing Content Within an HTML Document


Problem

You need to extract some information from within a web page.

Solution

Sample code folder: Chapter 17\UseHTMLDOM

While you could use standard string-manipulation techniques to scan through a web page, it's a lot of work. If the HTML content you need to parse has a consistent format with identifiable tags and elements, you can use Microsoft's Managed HTML Document Object Model (DOM) to traverse the HTML content as a set of objects.

Discussion

This recipe builds on the code developed in Recipe 17.1. Create a new Windows Forms project following the instructions in that recipe. Now add the following additional code to the form's code template:

 Private Sub WebContent_DocumentCompleted( _       ByVal sender As Object, ByVal e As _       System.Windows.Forms. _       WebBrowserDocumentCompletedEventArgs) _       Handles   WebContent.DocumentCompleted    ' ----- Extract the title and display it.    MsgBox(WebContent.Document.Title) End Sub 

Run the program, and as you browse from page to page, the title of each page will appear in a message box.

The Managed HTML DOM, made available through the WebBrowser control's Document property, provides object-based access to all elements of an HTML page, including links (via the Links property), cookies associated with the page (via the Cookies string-array property), and the body content (via the Body property). You can search for specific elements by ID using the GetElementByID() method.

Specific use of the Managed HTML DOM is beyond the scope of this book. Use the MSDN documentation supplied with Visual Studio to obtain information about the HtmlElement class and other classes used within the DOM.

See Also

Recipe 17.1 includes most of the code used in this recipe. Recipe 17.3 uses the HTML DOM to access links within a web page.




Visual Basic 2005 Cookbook(c) Solutions for VB 2005 Programmers
Visual Basic 2005 Cookbook: Solutions for VB 2005 Programmers (Cookbooks (OReilly))
ISBN: 0596101775
EAN: 2147483647
Year: 2006
Pages: 400

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net