Working with the Internet


From parsing a Web page for links to adding your shortcut to the Favorites, this section contains a whole bundle of techniques for utilizing the Internet with your favorite programming language.

Creating Your Own Web Browser

The WebBrowser control we became oh-so-familiar with in Visual Basic 6 has no .NET equivalent. To use it, we need to step back into the world of COM.

To add a WebBrowser control to a Windows form, right-click on the toolbox and select Customize Toolbox. Browse the list of available COM components and check the Microsoft Web Browser option, then click on OK. This will automatically create a wrapper for you, allowing you to use the COM component in .NET.

At the bottom of your toolbox control list, you ll now see an Explorer item. Draw an instance of this onto your form, and that s your browser window!

So, what can you do with it? Everything you could before. Let s review the most popular methods , most of which are self-explanatory:

 AxWebBrowser1.Navigate ("http://www.vbworld.com/")  AxWebBrowser1.GoBack  AxWebBrowser1.GoForward  AxWebBrowser1.Stop  AxWebBrowser1.Refresh  AxWebBrowser1.GoHome     ' Visits the homepage  AxWebBrowser1.GoSearch   ' Visits the default search page 
TOP TIP  

It may be a neat control, but the WebBrowser is prone to generating whopping great big error messages for any silly little matter. As such, don t feel bad for using those old On Error Resume Next statements liberally.

We also have a number of particularly interesting properties:

 strPageTitle = AxWebBrowser1.LocationName  strURL = AxWebBrowser1.LocationURL  AxWebBrowser1.Document...   ' Accessing page HTMLDocument object 

You ll also find that the browser supports a bundle of cool events, including DocumentComplete (which fires when any Web page has finished loading), BeforeNavigate2 (which fires before a page is visited ”set the Cancel property to True to cancel the request), and ProgressChange (which fires whenever the progress bar in Internet Explorer would change).

That s all you need to get your favorite Web control into .NET. (See Figure 7-1 for my sample application.) Good luck!

click to expand
Figure 7-1: My Web browser application visiting some totally random Web site
TOP TIP  

If you want to manipulate data inside a Web page, automatically filling out forms and extracting data, you ll need to do some heavy-duty work with the WebBrowser.Document object. Alternatively, check out the new WebZinc .NET component at www.webzinc.net for an easier solution.

How to Snatch the HTML of a Web Page

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”Snatch HTML folder.

Need to visit a competitor Web page and parse out the latest rival product prices? Looking to retrieve data from a company that hasn t yet figured out Web services? Whatever your motives, if you re looking to grab the HTML of a Web page, the following little function should be able to help.

Just call the following GetPageHTML function, passing in the URL of the page you want to retrieve. It ll return a string containing the HTML:

 Public Function GetPageHTML(ByVal URL As String) As String      ' Retrieves the HTML from the specified URL      Dim objWC As New System.Net.WebClient()      Return New System.Text.UTF8Encoding().GetString( _          objWC.DownloadData(URL))  End Function 

Here s an example of its usage:

 strHTML = GetPageHTML("http://www.karlmoore.com/") 

An extremely short function, but incredibly useful.

How to Snatch HTML, with a Timeout

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”Snatch HTML with Timeout folder.

The function I demonstrated in the last tip ( How to Snatch the HTML of a Web Page ) is great for many applications. You pass it a URL, and it ll work on grabbing the page HTML. The problem is that it will keep trying until it eventually either times out or retrieves the page.

Sometimes, you don t have that luxury. Say you re running a Web site that needs to retrieve the HTML, parse it, and display results to a user . You can t wait two minutes for the server to respond, then download the page and feed it back to your visitor. You need a response within ten seconds ”or not at all.

Unfortunately, despite numerous developer claims to the contrary, this cannot be done through the WebClient class. Rather, you need to use some of the more in-depth System.Net classes to handle the situation. Here s my offering, wrapped into a handy little function:

 Public Function GetPageHTML(ByVal URL As String, _         Optional ByVal TimeoutSeconds As Integer = 10) _         As String      ' Retrieves the HTML from the specified URL,      ' using a default timeout of 10 seconds      Dim objRequest As Net.WebRequest      Dim objResponse As Net.WebResponse      Dim objStreamReceive As System.IO.Stream      Dim objEncoding As System.Text.Encoding      Dim objStreamRead As System.IO.StreamReader      Try          ' Setup our Web request          objRequest = Net.WebRequest.Create(URL)          objRequest.Timeout = TimeoutSeconds * 1000          ' Retrieve data from request          objResponse = objRequest.GetResponse          objStreamReceive = objResponse.GetResponseStream          objEncoding = System.Text.Encoding.GetEncoding( _               "utf-8")          objStreamRead = New System.IO.StreamReader( _              objStreamReceive, objEncoding)          ' Set function return value          GetPageHTML = objStreamRead.ReadToEnd()          ' Check if available, then close response          If Not objResponse Is Nothing Then              objResponse.Close()          End If       Catch          ' Error occured grabbing data, simply return nothing          Return ""       End Try  End Function 

Here, our code creates objects to request the data from the Web, setting the absolute server timeout. If the machine responds within the given timeframe, the response is fed into a stream, converted into the UTF8 text format we all understand, and then passed back as the result of the function. You can use it a little like this:

 strHTML = GetPageHTML("http://www.karlmoore.com/", 5) 

Admittedly, this all seems like a lot of work just to add a timeout. But it does its job ”and well. Enjoy!

TOP TIP  

Remember, the timeout we ve added is for our request to be acknowledged by the server, rather than for the full HTML to have been received.

Tricks of Parsing a Web Page for Links and Images

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”Parse Links and Images folder.

So, you ve retrieved the HTML of that Web page and now need to parse out all the links to use in your research database. Or maybe you ve visited the page and want to make a note of all the image links, so you can download at some later point.

Well, you have two options. You can write your own parsing algorithm, consisting of ten million InStr and Mid statements. They re often slow and frequently buggy , but they re a truly great challenge (always my favorite routines to write).

Alternatively, you can write a regular expression in VB .NET. This is where you provide an expression that describes how a link looks and what portion you want to retrieve (that is, the bit after <a href=" but before the next " for a hyperlink). Then you run the expression and retrieve matches. The problem with these is that they re difficult to formulate . (See Chapter 8, The Hidden .NET Language for more information.)

So, why not cheat? Following you ll find two neat little functions I ve already put together using regular expressions. Just pass in the HTML from your Web page, and it ll return an ArrayList object containing the link/image matches:

 Public Function ParseLinks(ByVal HTML As String) As ArrayList      ' Remember to add the following at top of class:      ' - Imports System.Text.RegularExpressions      Dim objRegEx As System.Text.RegularExpressions.Regex      Dim objMatch As System.Text.RegularExpressions.Match      Dim arrLinks As New System.Collections.ArrayList()      ' Create regular expression      objRegEx = New System.Text.RegularExpressions.Regex( _          "a.*href\s*=\s*(?:""(?<1>[^""]*)""(?<1>\S+))", _          System.Text.RegularExpressions.RegexOptions.IgnoreCase Or _          System.Text.RegularExpressions.RegexOptions.Compiled)      ' Match expression to HTML      objMatch = objRegEx.Match(HTML)      ' Loop through matches and add <1> to ArrayList      While objMatch.Success          Dim strMatch As String          strMatch = objMatch.Groups(1).ToString          arrLinks.Add(strMatch)          objMatch = objMatch.NextMatch()      End While      ' Pass back results      Return arrLinks  End Function  Public Function ParseImages(ByVal HTML As String) As ArrayList      ' Remember to add the following at top of class:      ' - Imports System.Text.RegularExpressions      Dim objRegEx As System.Text.RegularExpressions.Regex      Dim objMatch As System.Text.RegularExpressions.Match      Dim arrLinks As New System.Collections.ArrayList()      ' Create regular expression      objRegEx = New System.Text.RegularExpressions.Regex( _          "img.*src\s*=\s*(?:""(?<1>[^""]*)""(?<1>\S+))", _          System.Text.RegularExpressions.RegexOptions.IgnoreCase Or _          System.Text.RegularExpressions.RegexOptions.Compiled)      ' Match expression to HTML      objMatch = objRegEx.Match(HTML)      ' Loop through matches and add <1> to ArrayList      While objMatch.Success          Dim strMatch As String          strMatch = objMatch.Groups(1).ToString          arrLinks.Add(strMatch)          objMatch = objMatch.NextMatch()      End While      ' Pass back results      Return arrLinks  End Function 

Here s a simplified example using the ParseLinks routine. The ParseImages routine works in exactly the same way:

 Dim arrLinks As ArrayList = ParseLinks( _      "<a href=""http://www.marksandler.com/"">" & _      "Visit MarkSandler.com</a>")  ' Loop through results  Dim shtCount As Integer  For shtCount = 0 To arrLinks.Count - 1      MessageBox.Show(arrLinks(shtCount).ToString)  Next 

One word of warning: many Web sites use relative links. In other words, an image may refer to /images/mypic.gif rather than <http://www.mysite.com/images/mypic.gif>. You may wish to check for this in code (perhaps look for the existence of http ) ”if the prefix isn t there, add it programmatically.

And that s all you need to know to successfully strip links and images out of any HTML. Best wishes!

Converting HTML to Text, Easily

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”HTML to Text folder.

Whether you want to convert an HTML page into pure text so you can parse out that special piece of information, or you simply want to load a page from the Net into your own word processing package, this mini function could come in handy.

It s called StripTags and accepts an HTML string. Using a regular expression, it identifies all < tags > , removes them, and returns the modified string. Here s the code:

 Public Function StripTags(ByVal HTML As String) As String      ' Removes tags from passed HTML      Dim objRegEx As _          System.Text.RegularExpressions.Regex      Return objRegEx.Replace(HTML, "<[^>]*>", "")  End Function 

Here s a simple example demonstrating how you could use this function in code (see Figure 7-2 for my sample application):

click to expand
Figure 7-2: My sample application, retrieving HTML from www.bbc.co.uk , then converting it to text
 strData = StripTags("<body><b>Welcome!</b></body>") 

I admit, it doesn t look like much, but this little snippet can be a true lifesaver, especially if you ve ever tried doing it yourself using Instr and Mid statements. Have fun!

Real Code for Posting Data to the Web

One of my early tasks when working with .NET was figuring out how to take a stream of data (in my case, an XML document) and post it to a CGI script, in code.

It wasn t easy. I ended up with two pages of code incorporating practically every Internet- related class in the .NET Framework. Months later now, and I ve managed to refine this posting technique to just a few generic lines of code. And that s what I d like to share with you in this tip.

The following chunk of code starts by creating a WebClient object and setting a number of headers (which you can change as appropriate). It then converts my string (MyData) into an array of bytes, and then uploads direct to the specified URL. The server response to this upload is then converted into a string, which you ll probably want to analyze for possible success or error messages.

 ' Setup WebClient object  Dim objWebClient As New System.Net.WebClient()  ' Convert data to send into an array of bytes  Dim bytData As Byte() = System.Text.Encoding.ASCII.GetBytes(MyData)  ' Add appropriate headers  With objWebClient.Headers      .Add("Content-Type", "text/xml")      .Add("Authorization", "Basic " & _          Convert.ToBase64String( _          System.Text.Encoding.ASCII.GetBytes( _          "MyUsername:MyPassword")))  End With  ' Upload data to page (CGI script, or whatever) and receive response  Dim objResponse As Byte() = objWebClient.UploadData( _      "http://www.examplesite.com/clients/upload.cgi", _      "POST", bytData)  ' Convert response to a string  Dim strResponse As String = _      System.Text.Encoding.ASCII.GetString(objResponse)  ' Check response for data, errors, etc... 

I initially used this code to submit details of new store locations automatically to mapping solution provider Multimap.com. It accessed the destination CGI script, providing all necessary credentials, streamed my own XML document across the wire, and then checked the XML response for any errors.

A few pointers here. Firstly, you can easily remove the Authorization header. This was included to demonstrate how you can upload to a protected source ” which, although a common request, is not everyone s cup of tea. Secondly, the content type here is set to text/xml . You can change this to whatever content type you deem fit ” text/html for example, or perhaps application/x-www-form-urlencoded if you want to make the post look as though it were coming from a Web form. Finally, you don t always have to upload pure data like this; you can also upload files with the .UploadFile function, or simulate a true form post, by submitting key pairs (such as text box names and related values) with the .UploadValues function.

Adding a Web Shortcut to the Favorites

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”Adding Favorites folder.

This is one of those cute little code snippets that you have a use for in practically every application. Applications that can do this look cool and intelligent ”and it takes just a few simple lines of code. I m talking about adding an Internet shortcut to the user s Favorites menu.

How do you do it? Well, the following function encompasses all the logic for you. It accepts a page title and a URL. Then it locates the current Favorites folder (which could vary greatly depending on the machine setup) and creates a URL file in that folder, based on the title you passed. Inside that file, it includes a little required text for an Internet shortcut, alongside your URL. And that s it ”shortcut created!

Here s the code:

 Public Sub CreateShortcut(ByVal Title As String, ByVal URL As String)      ' Creates a shortcut in the users Favorites folder      Dim strFavoriteFolder As String      ' Retrieve the favorite folder      strFavoriteFolder = System.Environment.GetFolderPath( _          Environment.SpecialFolder.Favorites)      ' Create shortcut file, based on Title      Dim objWriter As System.IO.StreamWriter = _          System.IO.File.CreateText(strFavoriteFolder & _          "\" & Title & ".url")      ' Write URL to file      objWriter.WriteLine("[InternetShortcut]")      objWriter.WriteLine("URL=" & URL)      ' Close file      objWriter.Close()  End Sub 

To finish off this snippet, here are a couple of interesting calls to this procedure (see Figure 7-3 to see the created shortcuts in Internet Explorer):

click to expand
Figure 7-3: A couple of plug-plug Internet shortcuts added by my sample code
 CreateShortcut("Karl Moore.com", "http://www.karlmoore.com/")  CreateShortcut("Send mail to Karl Moore", "mailto:karl@karlmoore.com") 

Retrieving Your IP Address ”And Why You May Want To

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”IP folder.

You may want to discover the IP address of your local machine for a number of reasons. You may, for example, be developing a messaging-style application using the .NET equivalent of the Winsock control ”the Socket class (look up Socket class in the help index) and need to register the local IP in a central database somewhere.

So, how can you find out your IP address? The code is easy:

 Dim objEntry As System.Net.IPHostEntry = _      System.Net.Dns.GetHostByName( _      System.Net.Dns.GetHostName)  Dim strIP As String = CType( _      objEntry.AddressList.GetValue(0), _      System.Net.IPAddress).ToString 

Here, we pass our machine name to the GetHostByName function, which returns a valid IPHostEntry object. We then retrieve the first IP address from the entry AddressList array and convert it to a string. Simple!

Is an Internet Connection Available?

Download supporting files at www.apress.com .

The files for this tip are in the Ch7 ”IsConnectionAvailable folder.

Checking whether an Internet connection is available isn t always as easy as it sounds.

Admittedly, there is a Windows API call that can check whether a connection exists, but it s extremely fragile and returns incorrect results if the machine has never had Internet Explorer configured correctly. Oops.

The best method is to actually make a Web request and see whether it works. If it does, you ve got your connection. The following neat code snippet does exactly that. Just call IsConnectionAvailable and check the return value:

 Public Function IsConnectionAvailable() As Boolean      ' Returns True if connection is available      ' Replace www.yoursite.com with a site that      ' is guaranteed to be online - perhaps your      ' corporate site, or microsoft.com      Dim objUrl As New System.Uri("http://www.yoursite.com/")      ' Setup WebRequest      Dim objWebReq As System.Net.WebRequest      objWebReq = System.Net.WebRequest.Create(objUrl)      Dim objResp As System.Net.WebResponse      Try          ' Attempt to get response and return True          objResp = objWebReq.GetResponse          objResp.Close()          objWebReq = Nothing          Return True      Catch ex As Exception          ' Error, exit and return False          objResp.Close()          objWebReq = Nothing          Return False      End Try 

Here s how you might use this function in your application:

 If IsConnectionAvailable() = True Then      MessageBox.Show("You are online!")  End If 



The Ultimate VB .NET and ASP.NET Code Book
The Ultimate VB .NET and ASP.NET Code Book
ISBN: 1590591062
EAN: 2147483647
Year: 2003
Pages: 76
Authors: Karl Moore

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net