You need to extract some information from within a web page.
Sample code folder: Chapter 17\UseHTMLDOM
While you could use standard string-manipulation techniques to scan through a web page, it's a lot of work. If the HTML content you need to parse has a consistent format with identifiable tags and elements, you can use Microsoft's Managed HTML Document Object Model (DOM) to traverse the HTML content as a set of objects.
This recipe builds on the code developed in Recipe 17.1. Create a new Windows Forms project following the instructions in that recipe. Now add the following additional code to the form's code template:
Private Sub WebContent_DocumentCompleted( _ ByVal sender As Object, ByVal e As _ System.Windows.Forms. _ WebBrowserDocumentCompletedEventArgs) _ Handles WebContent.DocumentCompleted ' ----- Extract the title and display it. MsgBox(WebContent.Document.Title) End Sub
Run the program, and as you browse from page to page, the title of each page will appear in a message box.
The Managed HTML DOM, made available through the WebBrowser control's Document property, provides object-based access to all elements of an HTML page, including links (via the Links property), cookies associated with the page (via the Cookies string-array property), and the body content (via the Body property). You can search for specific elements by ID using the GetElementByID() method.
Specific use of the Managed HTML DOM is beyond the scope of this book. Use the MSDN documentation supplied with Visual Studio to obtain information about the HtmlElement class and other classes used within the DOM.
Recipe 17.1 includes most of the code used in this recipe. Recipe 17.3 uses the HTML DOM to access links within a web page.