Recipe 17.3. Getting All Links from a Web Page


You want to build a list of the hyperlinks included in a specific web page.


Sample code folder: Chapter 17\ListWebLinks

Use the Managed HTML DOM to traverse the list of web page links as objects.


This recipe's sample code builds a list of links from a web page. Create a new Windows Forms application, and add the following controls to Form1:

  • A TextBox control named WebAddress.

  • A Button control named ActGo. Set its Text property to Go.

  • A WebBrowser control named WebContent.

  • A ListBox control named WebLinks.

Add informational labels if desired, and arrange the controls to look like Figure 17-4.

Figure 17-4. Controls for the listing web links sample

Next add the following source code to the form's class template:

 Private Class LinkDetail    Public LinkURL As String    Public LinkText As String    Public Overrides Function ToString() As String       Return LinkText    End Function End Class Private Sub ActGo_Click(ByVal sender As System.Object, _       ByVal e As System.EventArgs) Handles ActGo.Click    ' ----- Jump to a new web page.    If (Trim(WebAddress.Text) <> "") Then       WebLinks.Items.Clear()       WebContent.Navigate(WebAddress.Text)    End If End Sub Private Sub WebContent_DocumentCompleted( _       ByVal sender As Object, ByVal e As       System.Windows.Forms. _       WebBrowserDocumentCompletedEventArgs) _       Handles WebContent.DocumentCompleted    ' ----- Build the list of links.    Dim oneLink As HtmlElement    Dim newLink As LinkDetail    ' ----- Scan through all the links.    For Each oneLink In WebContent.Document.Links       ' ----- Buld a new link entry.       newLink = New LinkDetail       If (oneLink.InnerText = "") Then          newLink.LinkText = "[Image or Unknown]"       Else          newLink.LinkText = oneLink.InnerText       End If       newLink.LinkURL = oneLink.GetAttribute("href")       ' ----- Add the link to the list.         WebLinks.Items.Add(newLink)    Next oneLink End Sub Private Sub WebLinks_DoubleClick(ByVal sender As Object, _       ByVal e As System.EventArgs) Handles WebLinks.DoubleClick    ' ----- Show the detail of a web link.    Dim linkContent As LinkDetail    If (WebLinks.SelectedIndex = -1) Then Return    linkContent = CType(WebLinks.SelectedItem, LinkDetail)    MsgBox("Display = " & linkContent.LinkText & vbCrLf & _       "URL = " & linkContent.LinkURL) End Sub 

Run the program, enter an address in the TextBox control, and click the Go button. The web page appears, as does the list of its links. Double-click a link to display its target URL, as shown in Figure 17-5.

Figure 17-5. Displaying the URL for a parsed web link

See Also

Recipe 17.2 discusses the general use of the Managed HTML Document Object Model.

