From parsing a Web page for links to adding your shortcut to the Favorites, this section contains a whole bundle of techniques for utilizing the Internet with your favorite programming language.
The WebBrowser control we became oh-so-familiar with in Visual Basic 6 has no .NET equivalent. To use it, we need to step back into the world of COM.
To add a WebBrowser control to a Windows form, right-click on the toolbox and select Customize Toolbox. Browse the list of available COM components and check the Microsoft Web Browser option, then click on OK. This will automatically create a wrapper for you, allowing you to use the COM component in .NET.
At the bottom of your toolbox control list, you ll now see an Explorer item. Draw an instance of this onto your form, and that s your browser window!
So, what can you do with it? Everything you could before. Let s review the most popular methods , most of which are self-explanatory:
AxWebBrowser1.Navigate ("http://www.vbworld.com/") AxWebBrowser1.GoBack AxWebBrowser1.GoForward AxWebBrowser1.Stop AxWebBrowser1.Refresh AxWebBrowser1.GoHome ' Visits the homepage AxWebBrowser1.GoSearch ' Visits the default search page
TOP TIP | It may be a neat control, but the WebBrowser is prone to generating whopping great big error messages for any silly little matter. As such, don t feel bad for using those old On Error Resume Next statements liberally. |
We also have a number of particularly interesting properties:
strPageTitle = AxWebBrowser1.LocationName strURL = AxWebBrowser1.LocationURL AxWebBrowser1.Document... ' Accessing page HTMLDocument object
You ll also find that the browser supports a bundle of cool events, including DocumentComplete (which fires when any Web page has finished loading), BeforeNavigate2 (which fires before a page is visited ”set the Cancel property to True to cancel the request), and ProgressChange (which fires whenever the progress bar in Internet Explorer would change).
That s all you need to get your favorite Web control into .NET. (See Figure 7-1 for my sample application.) Good luck!
TOP TIP | If you want to manipulate data inside a Web page, automatically filling out forms and extracting data, you ll need to do some heavy-duty work with the WebBrowser.Document object. Alternatively, check out the new WebZinc .NET component at www.webzinc.net for an easier solution. |
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”Snatch HTML folder.
Need to visit a competitor Web page and parse out the latest rival product prices? Looking to retrieve data from a company that hasn t yet figured out Web services? Whatever your motives, if you re looking to grab the HTML of a Web page, the following little function should be able to help.
Just call the following GetPageHTML function, passing in the URL of the page you want to retrieve. It ll return a string containing the HTML:
Public Function GetPageHTML(ByVal URL As String) As String ' Retrieves the HTML from the specified URL Dim objWC As New System.Net.WebClient() Return New System.Text.UTF8Encoding().GetString( _ objWC.DownloadData(URL)) End Function
Here s an example of its usage:
strHTML = GetPageHTML("http://www.karlmoore.com/")
An extremely short function, but incredibly useful.
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”Snatch HTML with Timeout folder.
The function I demonstrated in the last tip ( How to Snatch the HTML of a Web Page ) is great for many applications. You pass it a URL, and it ll work on grabbing the page HTML. The problem is that it will keep trying until it eventually either times out or retrieves the page.
Sometimes, you don t have that luxury. Say you re running a Web site that needs to retrieve the HTML, parse it, and display results to a user . You can t wait two minutes for the server to respond, then download the page and feed it back to your visitor. You need a response within ten seconds ”or not at all.
Unfortunately, despite numerous developer claims to the contrary, this cannot be done through the WebClient class. Rather, you need to use some of the more in-depth System.Net classes to handle the situation. Here s my offering, wrapped into a handy little function:
Public Function GetPageHTML(ByVal URL As String, _ Optional ByVal TimeoutSeconds As Integer = 10) _ As String ' Retrieves the HTML from the specified URL, ' using a default timeout of 10 seconds Dim objRequest As Net.WebRequest Dim objResponse As Net.WebResponse Dim objStreamReceive As System.IO.Stream Dim objEncoding As System.Text.Encoding Dim objStreamRead As System.IO.StreamReader Try ' Setup our Web request objRequest = Net.WebRequest.Create(URL) objRequest.Timeout = TimeoutSeconds * 1000 ' Retrieve data from request objResponse = objRequest.GetResponse objStreamReceive = objResponse.GetResponseStream objEncoding = System.Text.Encoding.GetEncoding( _ "utf-8") objStreamRead = New System.IO.StreamReader( _ objStreamReceive, objEncoding) ' Set function return value GetPageHTML = objStreamRead.ReadToEnd() ' Check if available, then close response If Not objResponse Is Nothing Then objResponse.Close() End If Catch ' Error occured grabbing data, simply return nothing Return "" End Try End Function
Here, our code creates objects to request the data from the Web, setting the absolute server timeout. If the machine responds within the given timeframe, the response is fed into a stream, converted into the UTF8 text format we all understand, and then passed back as the result of the function. You can use it a little like this:
strHTML = GetPageHTML("http://www.karlmoore.com/", 5)
Admittedly, this all seems like a lot of work just to add a timeout. But it does its job ”and well. Enjoy!
TOP TIP | Remember, the timeout we ve added is for our request to be acknowledged by the server, rather than for the full HTML to have been received. |
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”Parse Links and Images folder.
So, you ve retrieved the HTML of that Web page and now need to parse out all the links to use in your research database. Or maybe you ve visited the page and want to make a note of all the image links, so you can download at some later point.
Well, you have two options. You can write your own parsing algorithm, consisting of ten million InStr and Mid statements. They re often slow and frequently buggy , but they re a truly great challenge (always my favorite routines to write).
Alternatively, you can write a regular expression in VB .NET. This is where you provide an expression that describes how a link looks and what portion you want to retrieve (that is, the bit after <a href=" but before the next " for a hyperlink). Then you run the expression and retrieve matches. The problem with these is that they re difficult to formulate . (See Chapter 8, The Hidden .NET Language for more information.)
So, why not cheat? Following you ll find two neat little functions I ve already put together using regular expressions. Just pass in the HTML from your Web page, and it ll return an ArrayList object containing the link/image matches:
Public Function ParseLinks(ByVal HTML As String) As ArrayList ' Remember to add the following at top of class: ' - Imports System.Text.RegularExpressions Dim objRegEx As System.Text.RegularExpressions.Regex Dim objMatch As System.Text.RegularExpressions.Match Dim arrLinks As New System.Collections.ArrayList() ' Create regular expression objRegEx = New System.Text.RegularExpressions.Regex( _ "a.*href\s*=\s*(?:""(?<1>[^""]*)""(?<1>\S+))", _ System.Text.RegularExpressions.RegexOptions.IgnoreCase Or _ System.Text.RegularExpressions.RegexOptions.Compiled) ' Match expression to HTML objMatch = objRegEx.Match(HTML) ' Loop through matches and add <1> to ArrayList While objMatch.Success Dim strMatch As String strMatch = objMatch.Groups(1).ToString arrLinks.Add(strMatch) objMatch = objMatch.NextMatch() End While ' Pass back results Return arrLinks End Function Public Function ParseImages(ByVal HTML As String) As ArrayList ' Remember to add the following at top of class: ' - Imports System.Text.RegularExpressions Dim objRegEx As System.Text.RegularExpressions.Regex Dim objMatch As System.Text.RegularExpressions.Match Dim arrLinks As New System.Collections.ArrayList() ' Create regular expression objRegEx = New System.Text.RegularExpressions.Regex( _ "img.*src\s*=\s*(?:""(?<1>[^""]*)""(?<1>\S+))", _ System.Text.RegularExpressions.RegexOptions.IgnoreCase Or _ System.Text.RegularExpressions.RegexOptions.Compiled) ' Match expression to HTML objMatch = objRegEx.Match(HTML) ' Loop through matches and add <1> to ArrayList While objMatch.Success Dim strMatch As String strMatch = objMatch.Groups(1).ToString arrLinks.Add(strMatch) objMatch = objMatch.NextMatch() End While ' Pass back results Return arrLinks End Function
Here s a simplified example using the ParseLinks routine. The ParseImages routine works in exactly the same way:
Dim arrLinks As ArrayList = ParseLinks( _ "<a href=""http://www.marksandler.com/"">" & _ "Visit MarkSandler.com</a>") ' Loop through results Dim shtCount As Integer For shtCount = 0 To arrLinks.Count - 1 MessageBox.Show(arrLinks(shtCount).ToString) Next
One word of warning: many Web sites use relative links. In other words, an image may refer to /images/mypic.gif rather than <http://www.mysite.com/images/mypic.gif>. You may wish to check for this in code (perhaps look for the existence of http ) ”if the prefix isn t there, add it programmatically.
And that s all you need to know to successfully strip links and images out of any HTML. Best wishes!
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”HTML to Text folder.
Whether you want to convert an HTML page into pure text so you can parse out that special piece of information, or you simply want to load a page from the Net into your own word processing package, this mini function could come in handy.
It s called StripTags and accepts an HTML string. Using a regular expression, it identifies all < tags > , removes them, and returns the modified string. Here s the code:
Public Function StripTags(ByVal HTML As String) As String ' Removes tags from passed HTML Dim objRegEx As _ System.Text.RegularExpressions.Regex Return objRegEx.Replace(HTML, "<[^>]*>", "") End Function
Here s a simple example demonstrating how you could use this function in code (see Figure 7-2 for my sample application):
strData = StripTags("<body><b>Welcome!</b></body>")
I admit, it doesn t look like much, but this little snippet can be a true lifesaver, especially if you ve ever tried doing it yourself using Instr and Mid statements. Have fun!
One of my early tasks when working with .NET was figuring out how to take a stream of data (in my case, an XML document) and post it to a CGI script, in code.
It wasn t easy. I ended up with two pages of code incorporating practically every Internet- related class in the .NET Framework. Months later now, and I ve managed to refine this posting technique to just a few generic lines of code. And that s what I d like to share with you in this tip.
The following chunk of code starts by creating a WebClient object and setting a number of headers (which you can change as appropriate). It then converts my string (MyData) into an array of bytes, and then uploads direct to the specified URL. The server response to this upload is then converted into a string, which you ll probably want to analyze for possible success or error messages.
' Setup WebClient object Dim objWebClient As New System.Net.WebClient() ' Convert data to send into an array of bytes Dim bytData As Byte() = System.Text.Encoding.ASCII.GetBytes(MyData) ' Add appropriate headers With objWebClient.Headers .Add("Content-Type", "text/xml") .Add("Authorization", "Basic " & _ Convert.ToBase64String( _ System.Text.Encoding.ASCII.GetBytes( _ "MyUsername:MyPassword"))) End With ' Upload data to page (CGI script, or whatever) and receive response Dim objResponse As Byte() = objWebClient.UploadData( _ "http://www.examplesite.com/clients/upload.cgi", _ "POST", bytData) ' Convert response to a string Dim strResponse As String = _ System.Text.Encoding.ASCII.GetString(objResponse) ' Check response for data, errors, etc...
I initially used this code to submit details of new store locations automatically to mapping solution provider Multimap.com. It accessed the destination CGI script, providing all necessary credentials, streamed my own XML document across the wire, and then checked the XML response for any errors.
A few pointers here. Firstly, you can easily remove the Authorization header. This was included to demonstrate how you can upload to a protected source ” which, although a common request, is not everyone s cup of tea. Secondly, the content type here is set to text/xml . You can change this to whatever content type you deem fit ” text/html for example, or perhaps application/x-www-form-urlencoded if you want to make the post look as though it were coming from a Web form. Finally, you don t always have to upload pure data like this; you can also upload files with the .UploadFile function, or simulate a true form post, by submitting key pairs (such as text box names and related values) with the .UploadValues function.
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”Adding Favorites folder.
This is one of those cute little code snippets that you have a use for in practically every application. Applications that can do this look cool and intelligent ”and it takes just a few simple lines of code. I m talking about adding an Internet shortcut to the user s Favorites menu.
How do you do it? Well, the following function encompasses all the logic for you. It accepts a page title and a URL. Then it locates the current Favorites folder (which could vary greatly depending on the machine setup) and creates a URL file in that folder, based on the title you passed. Inside that file, it includes a little required text for an Internet shortcut, alongside your URL. And that s it ”shortcut created!
Here s the code:
Public Sub CreateShortcut(ByVal Title As String, ByVal URL As String) ' Creates a shortcut in the users Favorites folder Dim strFavoriteFolder As String ' Retrieve the favorite folder strFavoriteFolder = System.Environment.GetFolderPath( _ Environment.SpecialFolder.Favorites) ' Create shortcut file, based on Title Dim objWriter As System.IO.StreamWriter = _ System.IO.File.CreateText(strFavoriteFolder & _ "\" & Title & ".url") ' Write URL to file objWriter.WriteLine("[InternetShortcut]") objWriter.WriteLine("URL=" & URL) ' Close file objWriter.Close() End Sub
To finish off this snippet, here are a couple of interesting calls to this procedure (see Figure 7-3 to see the created shortcuts in Internet Explorer):
CreateShortcut("Karl Moore.com", "http://www.karlmoore.com/") CreateShortcut("Send mail to Karl Moore", "mailto:karl@karlmoore.com")
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”IP folder.
You may want to discover the IP address of your local machine for a number of reasons. You may, for example, be developing a messaging-style application using the .NET equivalent of the Winsock control ”the Socket class (look up Socket class in the help index) and need to register the local IP in a central database somewhere.
So, how can you find out your IP address? The code is easy:
Dim objEntry As System.Net.IPHostEntry = _ System.Net.Dns.GetHostByName( _ System.Net.Dns.GetHostName) Dim strIP As String = CType( _ objEntry.AddressList.GetValue(0), _ System.Net.IPAddress).ToString
Here, we pass our machine name to the GetHostByName function, which returns a valid IPHostEntry object. We then retrieve the first IP address from the entry AddressList array and convert it to a string. Simple!
Download supporting files at www.apress.com .
The files for this tip are in the Ch7 ”IsConnectionAvailable folder.
Checking whether an Internet connection is available isn t always as easy as it sounds.
Admittedly, there is a Windows API call that can check whether a connection exists, but it s extremely fragile and returns incorrect results if the machine has never had Internet Explorer configured correctly. Oops.
The best method is to actually make a Web request and see whether it works. If it does, you ve got your connection. The following neat code snippet does exactly that. Just call IsConnectionAvailable and check the return value:
Public Function IsConnectionAvailable() As Boolean ' Returns True if connection is available ' Replace www.yoursite.com with a site that ' is guaranteed to be online - perhaps your ' corporate site, or microsoft.com Dim objUrl As New System.Uri("http://www.yoursite.com/") ' Setup WebRequest Dim objWebReq As System.Net.WebRequest objWebReq = System.Net.WebRequest.Create(objUrl) Dim objResp As System.Net.WebResponse Try ' Attempt to get response and return True objResp = objWebReq.GetResponse objResp.Close() objWebReq = Nothing Return True Catch ex As Exception ' Error, exit and return False objResp.Close() objWebReq = Nothing Return False End Try
Here s how you might use this function in your application:
If IsConnectionAvailable() = True Then MessageBox.Show("You are online!") End If