12.1 Browser Helper Objects

only for RuBoard - do not distribute or recompile

Before looking at developing Browser Extensions for IE 5.0, we'll look at developing BHOs.

12.1.1 How BHOs Work

Browser helper objects are simple components , really. Once you know what you are doing, you could probably build one in about five minutes (minus any functionality, of course).

When Explorer first loads, it examines the following key for any browser helper objects that might be registered:

 HKEY_LOCAL_MACHINE\     Software\         Microsoft\             Windows\                 CurrentVersion\                     Explorer\                         Browser Helper Objects

Chances are, if you look in your registry right now, this key is not there, or if it does exist, it is empty. That's because Internet Explorer does not install a default set of browser helpers.

When BHOs are loaded, Explorer passes to the component what is known as a site pointer , which is actually an IUnknown interface pointer that the BHO can use to communicate with Explorer. Beyond this point, everything a BHO does is application-dependent. Typically, though, the BHO will be used to gain access to the current instance of Internet Explorer by querying the site pointer for IWebBrowser2 . Once it has this interface, it has access to all of IE's events, as well as full access to Microsoft's Dynamic HTML Object Model. This means that whatever is being displayed in the browser is fully accessible to the BHO. So, as you might guess, BHOs can evolve from a very simple components into complex entities rather quickly.

12.1.2 Browser Helper Interfaces

Browser helper objects are only required to implement one interface: IObjectWithSite . IObjectWithSite consists of two methods , SetSite and GetSite , as Table 12.1 shows.

Table12.1. IObjectWithSite

Method	Description
`GetSite`	Returns the last site set with `SetSite.`
`SetSite`	Provides the `IUnknown` site pointer of Explorer.

12.1.2.1 GetSite

This function returns the last site set with SetSite . Its syntax is:

 HRESULT GetSite(REFIID riid, void** ppvSite);

with the following parameters:

riid: [in] The interface identifier whose pointer should be returned in ppvSite .
ppvSite: [in, out] Address of the interface pointer described by riid .

The job of GetSite is simply to return the site pointer passed in by Explorer via SetSite . It is provided as a means for additional objects to gain access to the site pointer; it is a hooking mechanism.

The interesting thing about this method is that its arguments look very much like those of QueryInterface . This is no coincidence . The only thing this method needs to do is forward the riid and ppvSite parameters to a QueryInterface call and return the result.

12.1.2.2 SetSite

When an instance of Explorer is fired up, the BHO is loaded and Explorer calls SetSite , passing in an IUnknown interface pointer. This is known as a site pointer . The syntax of the SetSite method is:

 HRESULT SetSite(IUnknown* pUnkSite);

Its single parameter is:

pUnkSite: [in] Site pointer.

A BHO typically will query the site pointer for the IWebBrowser2 interface. When you declare a variable as type Internet Explorer, this is really a reference to IWebBrowser2 (but we'll talk about that soon enough). This gives the browser helper access to the current instance of the web browser, including IE's event sink.

The IDL to define the IObjectWithSite interface is shown in Example 12.1. Notice that the IUnknown pointer has been replaced with IUnknownVB in the IDL definition. Here, this is done "just in case" for flexibility. Later, you might want to call QueryInterface , AddRef , or even Release on the site pointer. Who knows ? Here, though, it's not really necessary. When we implement SetSite , we'll cache a copy of the site pointer in a private member that will be declared as IUnknownVB . Our private member will give us access to all of IUnknown 's methods (which is necessary to properly implement GetSite ).

Example 12.1. IDL for the IObjectWithSite Interface

 [     uuid(FC4801A3-2BA9-11CF-A229-00AA003D7352),     helpstring("IObjectWithSite Interface"),     odl ] interface IObjectWithSite : IUnknown {     HRESULT SetSite([in] IUnknownVB* pSite);     HRESULT GetSite([in] REFIID priid,                     [in, out] VOID * ppvObj); }

12.1.3 The Project

We begin by creating a new ActiveX DLL project called BHO and adding three references to our project:

Our type library
Microsoft Internet Controls
Microsoft HTML Object Library

The Microsoft HTML Object Library will most likely have to be added manually. It is in the system directory in a DLL named mshtml .dll .

We will call our class clsInetSpeak for reasons that will be apparent later, and the first thing we will do is implement IObjectWithSite :

 'clsInetSpeak Implements IObjectWithSite

Next we implement SetSite . First, we add two private member variables to the class. The first is of type IUnknownVB . This is used to save the site pointer passed into us by Explorer. The second is of type Internet Explorer. This is used to manipulate the current running instance of Explorer. This is demonstrated in Example 12.2.

Example 12.2. SetSite Implementation

 'clsInetSpeak Implements IObjectWithSite Private m_pUnkSite As IUnknownVB  Private WithEvents m_ie As InternetExplorer  Private Sub IObjectWithSite_SetSite( _      ByVal pSite As VBShellLib.IUnknownVB)          If ObjPtr(pSite) = 0 Then         CopyMemory m_ie, 0&, 4         Exit Sub     End If     Set m_   pUnkSite = pSite  'Save the site pointer for GetSite  Set m_ie = pSite        'QueryInterface for IWebBrowser2  End Sub

SetSite is also called when Explorer is closed, and the value of the pSite argument, instead of containing the site pointer, contains a null pointer. When this happens, we need to overwrite the address of m_ie with and exit the sub. We do not set m_ie equal to Nothing . This is speculation, but it appears that setting m_ie equal to Nothing here does not immediately release the component. Explorer thinks that it still has a valid reference and crashes. Overwriting the address with a prevents this from happening.

Notice in Example 12.2 that m_ie is declared WithEvents . This gives us access to a wide variety of events that are fired by Explorer. We'll talk about that shortly, but let's implement GetSite first. It's one line of code, so let's get it out of the way. Example 12.3 contains the code.

Example 12.3. GetSite

 Private Sub IObjectWithSite_GetSite(     ByVal priid As VBShellLib.REFIID,      ppvObj As VBShellLib.VOID)     m_pUnkSite.QueryInterface priid, ppvObj End Sub

Can life get simpler than this? All we need to do here is call QueryInterface on the site pointer with the parameters passed in by the shell. We are done.

If we stop right here, the code we have so far is the minimal skeleton required to implement any BHO. You might want to save this somewhere as a template for creating future BHOs.

12.1.4 Registration

Registering browser helper objects is quite easy. Figure 12.1 shows the appropriate entry. {x x x x x x x x-x x x x-x x x x-x x x x-x x x x x x x x x x x x} represents the CLSID of the BHO.

Figure 12.1. Registering BHOs

Example 12.4 shows the registry script for the BHO that will be created in this chapter. It only contains one entry. You can modify this script easily for your own helper objects.

Example 12.4. Registry Script for BHO

 REGEDIT4 [HKEY_LOCAL_MACHINE\Software\Microsoft\Windows \CurrentVersion\Explorer\Browser Helper Objects\ {D6862A22-1DD6-11D3-BB7C-444553540000}\]

12.1.5 IWebBrowser2

We are actually going to do something with this BHO we are building, but first we need to discuss this Internet Explorer reference we have and all the things we can do with it. This should give you a better idea of the types of projects you can create with BHOs.

That reference we're holding to Internet Explorer is actually an IWebBrowser2 interface. Why is it important that you know this? Well, for one thing, the documentation for IWebBrowser2 is much better than for the Microsoft Internet Control, and the methods of the IWebBrowser2 interface correspond to the methods and properties of Internet Explorer. This is important because the interfaces we will be working with contain so many methods (over 70 of them) that it would be impractical to list them all here. But since the documentation for this interface is good, you should be able to figure out what most of the methods are for. These interfaces are all documented in the Platform SDK and are listed as IHTML xxxx . You can access them and view their type information in the Object Browser by adding a reference to the Microsoft HTML Object Library ( MSHTML.TLB ) to your project.

Most of the methods of IWebBrowser2 deal with explicit settings of the Explorer program itself: status bar text, menu visibility, view mode, current URL, etc. But there is one property that is especially important to us: Document. This property returns a reference to an IHTMLDocument2 interface. From this interface, we can navigate the entire Dynamic HTML object model, which is depicted in Figure 12.2. This means that our BHO has access to any element on a given web page. If we want to change the href property of an anchor tag that is nested four frames deep inside the third form on the page, no problem. If we want to execute JavaScript against the currently loaded page, we can do that too. In fact, we can programmatically change any element we want or strip information from any page we desire .

Figure 12.2. HTML object model

Let's take some of the confusion out of navigating the object model. We start by retrieving the value of the Document property of IWebBrowser2 , which gives us an IHTMLDocument2 interface. From here, we can get the parent window of the document, the collection of frames inside of a document, or an object that is a part of the current document itself. Think about it in terms of a web page. A web page contains a main window. This window contains the document. The document can contain a collection of frames. Each one of these frames contains a window. Taking this frame window as a Window object brings us back to the top of the hierarchy.

Once we have an IHTMLDocument2 interface, we can get to any element ( <BODY> , <A> , <TABLE> , etc.) on a page. Each of these elements has a corresponding interface that is in the format IHTML xxxx . As Figure 12.1 illustrates, there are several collections available to us. Let's take the "all" collection for example. The all property returns an IHTMLElementsCollection interface that will allow us to iterate through every element on a web page. Once we have a specific element, we can then get any number of the IHTML xxxx interfaces. The code fragment in Example 12.5 demonstrates this.

Example 12.5. Traversing an HTML Document

 Private WithEvents m_ie As InternetExplorer . . . Dim pDocument as IHTMLDocument2 Dim pElements as IHTMLElementsCollection Dim pElement as IHTMLElement Set pDocument = m_ie.Document Set pElements = pDocument.all 'Loop through each element  For i = 0 to pElements.length - 1     Set pElement = pElements.item(i)     if pElement.tagName = "a" Then         Dim pAnchor as IHTMLAnchorElement         Set pAnchor = pElement         pAnchor.href="http://www.oreilly.com"  .         .         .

As we loop through each element, we check the tag name . If the tag is equal to "a," then we can QueryInterface for IHTMLAnchorElement . The code fragment in Example 12.5 would actually change the href of each anchor it came across to http://www.oreilly.com. You would be redirected to this URL if you were to then click on the link.

There are over 100 IHTML xxxx interfaces documented in the Platform SDK, and each has methods specific to its function. It is well beyond the scope of this book to document them all. A good place to start is IHTMLWindow2 and IHTMLDocument2 . You can navigate the entire Dynamic HTML Object Library from these two interfaces. Oddly enough, if you are running IE 4.0, these two interfaces are marked [hidden] in the library. If you have installed IE 5.0, all of the interfaces are visible.

12.1.6 Events

The Microsoft Internet Control provides 18 events for which we can add our code. Table 12.2 contains a complete list of the events provided by Internet Explorer. These events seemingly cover just about every situation you might envision. The events typically used in a browser extension are marked with an asterisk.

There is no event for a refresh. But the flexibility is there for you to handle this situation yourself. The NavigateComplete2 event is passed a parameter, URL , that identifies the resource to which the browser has navigated. You can store the value of this argument to a Private member variable. Then you can check the URL argument in the BeforeNavigate2 event against the stored value. If they are the same, a refresh has occurred.

Table12.2. DWebBrowserEvents2 Methods

Method	Fired When . . .
`StatusTextChange`	The status bar text has changed.
`ProgressChange`	Information on the progress of a download is updated.
`CommandStateChange`	The enabled state of a menu command changes.
`DownloadBegin` *	A navigation operation is starting, shortly after the BeforeNavigate2 event, unless the navigation is canceled .
`DownloadComplete` *	A navigation operation finishes, is stopped , or has failed.
`TitleChange`	The title of a document in the Web Browser control becomes available or has changed.
`PropertyChange`	The `IWebBrowser2::PutProperty` method changes the value of a property.
`BeforeNavigate2` *	The Web Browser control is about to navigate to a new URL.
`NewWindow2`	A new window is created for displaying a resource.
`NavigateComplete2` *	The browser has completed navigation to a new location.
`DocumentComplete` *	The document being navigated to has finished loading.
`OnQuit`	The Internet Explorer application is ready to quit.
`OnVisible`	The window for the WebBrowser should be shown or hidden.
`OnToolBar`	The ToolBar property has changed.
`OnMenuBar`	The MenuBar property has changed.
`OnStatusBar`	The StatusBar property has changed.
`OnFullScreen`	The FullScreen property has changed.
`OnTheaterMode`	The TheaterMode property has changed.

Okay, let's add some code to our BHO skeleton. This is going to be temporary code that we will rip out later. The purpose is just to show you some object model navigation techniques. This exercise will also demonstrate how one task can be accomplished several different ways.

If you have IE 5.0, undoubtedly you have seen the new feature that remembers values that have been entered previously. Say you go to log in to your Hotmail account. IE 5.0 will ask you if you would like to save the password information. If you say "yes," the next time you come to Hotmail, your previously entered information is available, and then you can log in automatically.

We're not going to do anything that fancy, but this example could be the foundation for such a component. All this component will do is fill in our login name when we go to Hotmail. Everything is hardcoded. This is just a beta, after all!

The first thing we need to do is to add a private member variable to our class, called m_b GoingToHotmail . Then we add some code to the BeforeNavigate2 event, as Example 12.6 shows.

Example 12.6. BeforeNavigate2 Example

 Private Sub m_ie_BeforeNavigate2(ByVal pDisp As Object, _                                   URL As Variant, _                                   Flags As Variant, _                                   TargetFrameName As Variant, _                                   PostData As Variant, _                                   Headers As Variant, _                                   Cancel As Boolean)          If URL = "http://www.hotmail.com/" Then         m_bGoingToHotmail = True     Else         m_bGoingToHotmail = False     End If      End Sub

The reason we are testing for this URL in the BeforeNavigate2 event is that at the time of this writing, when you go to Hotmail, you are immediately redirected to another URL. BeforeNavigate2 will allow us to capture the URL http://www.hotmail.com before the redirection, so we'll know we're at the right place. The result of this event procedure is that when you navigate to Hotmail, a flag indicating this fact is set. Hotmail can redirect us to wherever, and we still know we are heading there.

Now we just have to implement one more event, DocumentComplete. This event is called when the page has finished loading and the document is available. The HTML source for the Hotmail login page indicates that there is one form on the page, and that the login text box we are interested in is on the form. Example 12.7 shows one possible way of navigating to the text box and setting its value.

Example 12.7. DocumentComplete Event Procedure

 Private Sub m_ie_DocumentComplete(ByVal pDisp As Object, _                                    URL As Variant)     If (m_bGoingToHotmail = True) Then                  Dim i As Long                  Dim pDoc As IHTMLDocument2         Set pDoc = m_ie.Document                  Dim pForms As IHTMLElementCollection         Set pForms = pDoc.Forms              Dim pForm As IHTMLFormElement         Set pForm = pForms.Item(0)  Dim pElements As Object  Set pElements = pForm.elements           Dim pElement As IHTMLElement         Set pElement = pElements.Item("login")         Dim pInput As IHTMLInputTextElement         Set pInput = pElement         pInput.Value = "oreilly"          End If   End Sub

Wow, that's a bunch of code for one simple task! Don't worry, we'll trim it down to size in a little while, but for now let's walk through it.

First, we get the current document by calling the Document property of m_ie and assigning the resulting object reference to pDoc . Then we get the collection of all the forms on the page and assign it to pForms . Since we have looked at the source to the page, we know there is only one form, so we can get the form directly without looping through the collection by calling:

 Set pForm = pForms.Item(0)

Now we get to an inconsistency in the model. We want to grab a collection of all the elements on the form, so we call the elements property of IHTMLFormElement . You might expect this to return an IHTMLElementsCollection reference, but it does not. It returns an IDispatch interface.

Once we have all the elements on the form, we can get the element we are looking for by name with the call:

 Set pElement = pElements.Item("login")

We can then query pElement for IHTMLInputTextElement and set the value of the text box with the call:

 Dim pInput As IHTMLInputTextElement Set pInput = pElement pInput.Value = "oreilly"

Wait, don't run away. That was the convoluted, horribly inefficient way to achieve this task. Achieving this goal is really is much easier than Example 12.7 shows.

Okay, delete the DocumentComplete code, and we'll try this again. Let's look at Example 12.8.

Example 12.8. Easier DocumentCompleteevent Procedure

 Private Sub m_ie_DocumentComplete(ByVal pDisp As Object, _                                    URL As Variant)     If (m_bGoingToHotmail = True) Then                  Dim pDoc As IHTMLDocument2         Set pDoc = m_ie.Document                  Dim pElements As IHTMLElementCollection         Set pElements = pDoc.All              Dim pInput As IHTMLInputTextElement         Set pInput = pElements.Item("login")                  pInput.Value = "oreilly"              End If      End Sub

Once again, we get the current document by calling the Document property. But this time, instead of getting the elements of a form, we just grab the whole page by calling All. Once we have all the elements, we can get the element we are interested in directly by name.

We can also use JavaScript to achieve our goal. Example 12.9 illustrates this point.

Example 12.9. Yet Another DocumentComplete Event Procedure

 Private Sub m_ie_DocumentComplete(ByVal pDisp As Object, _                                    URL As Variant)     If (m_bGoingToHotmail = True) Then                  Dim pDoc As IHTMLDocument2         Set pDoc = m_ie.Document                  Dim pWnd As IHTMLWindow2         Set pWnd = pDoc.parentWindow              Dim strJava As String         strJava = "document.passwordform.login.value = 'oreilly';"                  pWnd.execScript strJava              End If      End Sub

That's kind of slick, isn't it? We get the current document as before. Then we get the current window by calling parentWindow. This returns an IHTMLWindow2 interface. IHTMLWindow2 is [hidden] , by the way, if you're still running IE 4.0. Once we have that, we can execute JavaScript directly against the page by calling execScript . execScript takes a string that contains either JavaScript or VBScript, which can be specified by a second, optional parameter to the method.

You can execute two whole pages of JavaScript if you want. As long as each statement is delimited by a semicolon, the whole two pages could be shoved into a string and sent to execScript . But that's a little unwieldy. The easy way would be to chop up the string into manageable chunks and execute them one at a time:

 strJava = "document.passwordform.login.value = 'oreilly';" pWnd.execScript strJava strJave = "document.submit(  );" pWnd.execScript strJava . . .

Before we continue, delete the code for BeginNavigate2 and DocumentComplete, so we can have a fresh start. Ahhh, that's better.

only for RuBoard - do not distribute or recompile