The Component Object Model | Programming Windows with MFC, Second Edition

[Previous] [Next]

COM is an acronym for Component Object Model. Simply put, COM is a way of building objects that is independent of any programming language. If you want the gory details, you can download the COM specification from Microsoft's Web site. But don't be too quick to pull out your browser: if this is your first exposure to COM, the specification might be a bit overwhelming. A better approach is to start slowly and allow yourself time to understand the big picture rather than risk getting mired in details that for the moment are unimportant.

C++ programmers are accustomed to writing classes that other C++ programmers can use. The problem with these classes is that only other C++ programmers can use them. COM tells us how to build objects in any programming language that can also be used in any programming language. In other words, COM transcends language-specific ways of building reusable objects and gives us a true binary standard for object architectures.

C++ classes have member functions; COM objects have methods. Methods are grouped into interfaces and are called through interface pointers. Interfaces exist to semantically bind together groups of related methods. For example, suppose you're writing a COM class that has methods named Add, Subtract, and CheckSpelling. Rather than make all three methods members of the same interface, you might assign Add and Subtract to an interface named IMath and CheckSpelling to an interface named ISpelling. (Prefacing interface names with a capital I for Interface is an almost universal COM programming convention.) Microsoft has predefined more than 100 interfaces that any COM object can support. These interfaces are called standard interfaces. User-defined interfaces such as IMath and ISpelling are custom interfaces. COM objects can use standard interfaces, custom interfaces, or a combination of the two.

Every COM object implements an interface named IUnknown. IUnknown contains just three methods:

Method Name	Description
QueryInterface	Returns a pointer to another interface
AddRef	Increments the object's reference count
Release	Decrements the object's reference count

One of the rules of COM says that given a pointer to an interface, a client can call any IUnknown method through that pointer as well as any methods that are specific to the interface. In other words, all interfaces must support the three IUnknown methods in addition to their own methods. This means that if you define an IMath interface with methods named Add and Subtract, the interface actually contains five methods: QueryInterface, AddRef, Release, Add, and Subtract. Most objects don't implement IUnknown as a separate interface. Because all interfaces include the IUnknown methods, most objects, if asked for an IUnknown pointer, simply return a pointer to one of their other interfaces.

Figure 18-1 shows a schematic of a simple COM object. The sticks, or "lollipops" as they're sometimes called, represent the object's interfaces. The IUnknown lollipop is often omitted because it's understood that every COM object implements IUnknown.

Figure 18-1. A simple COM object.

I've been using human-readable names such as IMath to refer to interfaces, but in truth, interfaces are identified by number, not by name. Every interface is uniquely identified by a 128-bit value called an interface identifier, or IID. So many different 128-bit numbers are possible that the chances of you and I ever picking the same IID at random are virtually nil. Therefore, it doesn't matter if two people on different sides of the planet happen to define incompatible versions of a custom interface named IMath. What counts is that the two IMath interfaces have different IIDs.

Microsoft Visual C++ comes with two tools for generating IIDs. One is a command line utility named Uuidgen. The other is a GUI application named Guidgen. Both utilities do their best to maximize the randomness of the 128-bit numbers they generate, even factoring in variables such as your network card's Ethernet ID and the time of day. You can generate IIDs programmatically with the COM API function CoCreateGuid. The Guid in CoCreateGuid stands for globally unique identifier, a generic term that describes any 128-bit identifier. An IID is simply a special GUID.

Instantiating a COM Object

COM classes, like interfaces, are identified by 128-bit values. GUIDs that identify classes are called class IDs, or CLSIDs. All a client needs to know in order to instantiate an object is the object's CLSID. COM has an API of its own that includes activation functions for creating object instances. The most commonly used activation function is CoCreateInstance, which accepts a CLSID and returns an interface pointer to an object. The following statements instantiate the COM class whose CLSID is CLSID_Object and cache a pointer to the object's IMath interface in pMath:

 IMath* pMath; CoCreateInstance (CLSID_Object, NULL,     CLSCTX_SERVER, IID_IMath, (void**) &pMath);

IID_IMath is simply a variable that holds IMath's 128-bit interface ID.

Once it has an interface pointer, a C++ client can call methods on that interface using the -> operator. The following statements call IMath::Add to add a pair of numbers:

 int sum; pMath->Add (2, 2, &sum);

Add doesn't return the sum of the two inputs directly; instead, it copies the result to an address specified by the caller—in this case, to the variable named sum. That's because COM methods return special 32-bit values called HRESULTs. An HRESULT tells the caller whether a call succeeded or failed. It can also provide detailed information about the nature of the failure if the call doesn't succeed. You might think that a method as simple as Add can never fail, but it could fail if the object that implements the method is running on a remote network server and the client is unable to contact the server because a cable has been disconnected. If that happens, the system steps in and returns an HRESULT informing the caller that the call didn't go through.

One aspect of COM that newcomers frequently find confusing is the fact that every externally creatable COM class (that is, every COM class that can be instantiated by passing a CLSID to CoCreateInstance) is accompanied by a class object. A class object is also a COM object. Its sole purpose in life is to create other COM objects. Passing a CLSID to CoCreateInstance appears to instantiate an object directly, but internally, CoCreateInstance first instantiates the object's class object and then asks the class object to create the object. Most class objects implement a special COM interface known as IClassFactory (or IClassFactory2, a newer version of the interface that is a functional superset of IClassFactory). A class object that implements IClassFactory is called a class factory. Given an IClassFactory interface pointer, a client creates an object instance by calling IClassFactory::CreateInstance. This method— CreateInstance—has been described as the COM equivalent of the new operator in C++.

Not all COM classes are externally creatable. Some are intended only for private use and can't be instantiated with CoCreateInstance because they have no CLSIDs and no class factories. C++ programmers instantiate these objects by calling new on the C++ classes that implement the objects. Typically, these objects play a part in implementing a COM-based protocol such as drag-and-drop data transfers. Some of MFC's COM classes fit this profile. You'll learn more about them when we discuss the various COM and ActiveX technologies that MFC supports.

Object Lifetimes

C++ programmers are used to creating heap-based objects using the C++ new operator. They're also accustomed to calling delete to delete the objects that they create with new. COM differs from C++ in this respect, because clients create object instances but they don't delete them. Instead, COM objects delete themselves. Here's why.

Suppose two or more clients are using the same instance of an object. Client A creates the object, and Client B attaches to the object by somehow acquiring an interface pointer. If Client A, unaware that Client B exists, deletes the object, Client B is left with an interface pointer that no longer points to anything. Because a COM client typically doesn't know (and doesn't care) whether it's the sole user of an object or one of many users, COM leaves it up to the object to delete itself. Deletion occurs when an internal reference count maintained by the object drops to 0. The reference count is a running count of the number of clients holding pointers to the object's interfaces.

For COM classes implemented in C++, the reference count is typically stored in a member variable. The count is incremented when AddRef is called and decremented when Release is called. (Remember that because AddRef and Release are IUnknown methods, they can be called through any interface pointer.) Implementations of AddRef and Release are normally no more complicated than this:

 ULONG __stdcall CComClass::AddRef () {     return ++m_lRef; } ULONG __stdcall CComClass::Release () {     if (—m_lRef == 0) {         delete this;         return 0;     }     return m_lRef; }

In this example, CComClass is a C++ class that represents a COM class. m_lRef is the member variable that holds the object's reference count. If every client calls Release when it's finished using an interface, the object conveniently deletes itself when the last client calls Release.

A bit of protocol is involved in using AddRef and Release. It's the responsibility of the object—not the client—to call AddRef whenever it hands out an interface pointer. However, it's the client's responsibility to call Release. Clients sometimes call AddRef themselves to indicate that they're making a copy of the interface pointer. In such cases, it's still up to the client (or whomever the client hands the copied interface pointer to) to call Release when the interface pointer is no longer needed.

Acquiring Interface Pointers

The CoCreateInstance example we examined earlier created an object and asked for an IMath interface pointer. Now suppose that the object also implements ISpelling. How would a client that holds an IMath pointer ask the object for an ISpelling pointer?

That's what the third of the three IUnknown methods is for. Given an interface pointer, a client can call QueryInterface through that pointer to get a pointer to any other interface that the object supports. Here's how it looks in code:

 IMath* pMath; HRESULT hr = CoCreateInstance (CLSID_Object, NULL,     CLSCTX_SERVER, IID_IMath, (void**) &pMath); if (SUCCEEDED (hr)) { // CoCreateInstance worked.           ISpelling* pSpelling;     hr = pMath->QueryInterface (IID_ISpelling, (void**) &pSpelling);     if (SUCCEEDED (hr)) {         // Got the interface pointer!                    pSpelling->Release ();     }     pMath->Release (); }

Notice that this time, the client checks the HRESULT returned by CoCreateInstance to make sure that the activation request succeeded. Sometime after the object is created, the client uses QueryInterface to request an ISpelling pointer, once more checking the HRESULT rather than simply assuming that the pointer is valid. (The SUCCEEDED macro tells a client whether an HRESULT code signifies success or failure. A related macro named FAILED can be used to test for failure.) Both interfaces are released when they're no longer needed. When Release is called through the IMath pointer, the object deletes itself if no other clients are holding interface pointers.

NOTE
There is no COM function that you can call to enumerate all of an object's interfaces. The assumption is that the client knows what interfaces an object supports, so it can call QueryInterface to obtain pointers to any and all interfaces. An object can publish a list of the interfaces that it supports using a mechanism known as type information. Some COM objects make type information available to their clients, and some don't. Certain types of COM objects, ActiveX controls included, are required to publish type information. You'll see why when we examine the ActiveX control architecture in Chapter 21.

COM Servers

If COM is to create objects in response to activation requests, it must know where to find each object's executable file. An executable that implements a COM object is called a COM server. The HKEY_CLASSES_ROOT\CLSID section of the registry contains information that correlates CLSIDs and executable files. For example, if a server named MathSvr.exe implements Math objects and a client calls CoCreateInstance with Math's CLSID, COM looks up the CLSID in the registry, extracts the path to MathSvr.exe, and launches the EXE. The EXE, in turn, hands COM a class factory, and COM calls the class factory's CreateInstance method to create an instance of the Math object.

COM servers come in two basic varieties: in-process and out-of-process. In-process servers (often referred to as in-proc servers) are DLLs. They're called in-procs because in the Win32 environment, a DLL loads and runs in the same address space as its client. EXEs, in contrast, run in separate address spaces that are physically isolated from one another. In most cases, calls to in-proc objects are very fast because they're little more than calls to other addresses in memory. Calling a method on an in-proc object is much like calling a subroutine in your own application.

Out-of-process servers (also known as out-of-proc servers) come in EXEs. One advantage to packaging COM objects in EXEs is that clients and objects running in two different processes are protected from one another if one crashes. The disadvantage is speed. Calls to objects in other processes are roughly 1,000 times slower than calls to in-proc objects because of the overhead incurred when a method call crosses process boundaries.

Microsoft Windows NT 4.0 introduced Distributed COM (DCOM), which gives out-of-proc servers the freedom to run on remote network servers. It's simple to take an out-of-proc server that has been written, tested, and debugged locally and deploy it on a network. (As of Windows NT 4.0 Service Pack 2, in-proc servers can also run remotely using a mechanism that relies on surrogate EXEs to host the DLLs.) CoCreateInstance and other COM activation functions are fully capable of creating objects that reside elsewhere on the network. Even legacy COM servers written before DCOM came into existence can be remoted with a few minor registry changes.

To differentiate out-of-proc servers that serve up objects on the same machine from out-of-proc servers that run on remote machines, COM programmers use the terms local server and remote server. A local server is an EXE that runs on the same machine as its client; a remote server, in contrast, runs elsewhere on the network. Although there are important structural differences between in-proc and out-of-proc servers, there are no differences between local and remote servers. Objects designed with DCOM in mind are often tweaked to leverage the operating system's underlying security model or to improve performance. But optimizations aside, the fact remains that local servers and remote servers share the exact same server and object architectures.

Location Transparency

One of COM's most powerful features is location transparency. Simply put, location transparency means that a client neither knows nor cares where an object lives. The exact same sequence of instructions that calls a method on an object running in the same address space as the client also calls a method on an object running in another process or even on another machine. A lot of magic goes on behind the scenes to make location transparency work, but COM handles the bulk of it.

When a method call goes out to an object in another process or on another machine, COM remotes the call. As part of the remoting process, COM marshals the method's parameters and return values. Marshaling comes in many forms, but the most common type of marshaling essentially reproduces the caller's stack frame in the call recipient's address space. Proxies and stubs carry out most marshaling and remoting. When a client is handed an interface pointer to an object running in a process other than its own, COM creates an interface proxy in the client process and an interface stub in the server process. Interface pointers held by the client are really interface pointers to the proxy, which implements the same interfaces and methods as the real object. When a client calls a method on the object, the call goes to the proxy, which uses some type of interprocess communication (IPC) to forward the call to the stub. The stub unpackages the method parameters, calls the object, and marshals any return values back to the proxy. Figure 18-2 illustrates the relationship between clients, objects, proxies, and stubs.

click to view at full size.

Figure 18-2. Proxies and stubs.

Where do proxies and stubs come from? If an object uses only standard interfaces, COM supplies the proxies and stubs. If an object uses custom interfaces, it's up to the object implementor to provide the proxies and stubs in the form of a proxy/stub DLL. The good news is that you rarely need to write a proxy/stub DLL by hand. Visual C++ comes with a tool called the MIDL (Microsoft Interface Definition Language) compiler that "compiles" IDL (Interface Definition Language) files, producing the source code for proxy/stub DLLs. The bad news is that now you have to learn another language—IDL. IDL has been called the lingua franca of COM. The better you know your IDL, the better equipped you are to optimize the performance of local and remote servers. You can avoid IDL and MIDL altogether by using an alternative marshaling strategy known as custom marshaling, but custom marshaling is so difficult to implement correctly that proxies and stubs are the way to go unless you have clear and compelling reasons to do otherwise. You can opt for other ways to avoid writing proxies and stubs if you're willing to make a few trade-offs in flexibility and performance. One of those other ways is Automation, which we'll discuss in Chapter 20.

The key to location transparency is the fact that when clients communicate with objects in other processes, they don't know that they're really communicating through proxies and stubs. All a client knows is that it has an interface pointer and that method calls through that interface pointer work. Now you know why.

Object Linking and Embedding

Before there was COM, there was object linking and embedding, better known by the acronym OLE. OLE allows you to place content objects created by one application in documents created by another application. One use for OLE is to place Excel spreadsheets inside Word documents. (See Figure 18-3.) In such a scenario, Excel acts as an OLE server by serving up an embedded or linked spreadsheet object (a "content object") and Word acts as an OLE container by hosting the object.

click to view at full size.

Figure 18-3. A Microsoft Excel chart embedded in a Microsoft Word document.

OLE is a complex software protocol that describes how OLE servers talk to OLE containers and vice versa. Microsoft built OLE 1.0 on top of Dynamic Data Exchange (DDE). DDE proved to be a less than ideal IPC mechanism, so Microsoft invented COM to serve as the underlying IPC mechanism for OLE 2.0. For a long time, Microsoft affixed the OLE label to all new COM technologies: Automation became OLE Automation, ActiveX controls were named OLE controls, and so on. Microsoft even went so far as to say that OLE was no longer an acronym; it was a word. It wasn't until the term ActiveX was coined in 1995 that Microsoft reversed itself and said, in effect, "We've changed our minds; OLE once again stands for object linking and embedding." Despite this reversal, many programmers still (erroneously) use the terms COM and OLE interchangeably. They are not synonymous. COM is the object model that forms the foundation for all OLE and ActiveX technologies. OLE is the technology that allows you to place Excel spreadsheets inside Word documents. Get used to this new world order, and you'll avoid the confusion that has stricken so many programmers.

Just how does OLE use COM? When an OLE server such as Excel serves up a spreadsheet object to a container such as Word, it creates one or more COM objects that implement certain standard interfaces such as IOleObject and IViewObject. Word, too, creates COM objects that conform to published specifications. The architecture is generic in that it isn't limited only to Word and Excel; any application can be an OLE container or server, or both. The container and the server communicate by calling methods through interface pointers. Thanks to location transparency, it doesn't matter that the container and the server are running in different processes, although some of OLE's COM interfaces must be implemented in proc to work around certain limitations of Windows. Because device context handles aren't portable between processes (for example, when a container asks a server to draw an object in the container's window), that part of the server must be implemented in proc.

Figure 18-4 shows a schematic of a simple embedding container. For each content object embedded in the container's document, the container implements a site object. At a minimum, a site object must implement the COM interfaces IOleClientSite and IAdviseSink. To talk to the container, the server calls methods through pointers to these interfaces. The simplicity of this diagram belies the inward complexity of real-life linking and embedding servers, but it nonetheless illustrates the role that COM plays as an enabling technology.

click to view at full size.

Figure 18-4. A simple embedding container.

For the record, linked objects and embedded objects are fundamentally different. Embedded objects are stored in the container's document file alongside the container's native document data. Linked objects, on the other hand, are stored in external files. The container's document stores only a link to the object, which is a fancy way of saying that the container stores the name of and path to the file that holds the object's data. Links can be more sophisticated than that. If you create a link to a range of cells in an Excel spreadsheet, for example, the link includes information identifying the range as well as the path to the file.

Active Documents

In my opinion, OLE is the least interesting of all the COM technologies that Microsoft has defined, so I won't cover it further in this book. (If you want to learn more about it, start with the OLE lessons in the Scribble tutorial that comes with Visual C++.) However, one COM-based technology that has grown out of OLE at least deserves mention because it is potentially very useful. That technology is Active Documents.

The Active Documents protocol is a superset of object linking and embedding. It permits Active Document containers such as Microsoft Internet Explorer to open document files created by Active Document servers such as Word and Excel. Ever notice how you can open a Word DOC file or an Excel XLS file inside Internet Explorer? Internet Explorer appears to understand the Word and Excel file formats. It doesn't. What's really happening is that Internet Explorer talks to Word or Excel through—you guessed it—COM interfaces. Word or Excel runs in the background (you can prove that by viewing the task list while a DOC or XLS file is open in Internet Explorer) and essentially takes over the interior of Internet Explorer's window. You're really using Word or Excel, although it certainly doesn't look that way.

Active Documents really pay off when you post a Word or an Excel document on a Web site. If the machine on which Internet Explorer is running has Word and Excel installed, you can view DOC and XLS files as effortlessly as you do HTML pages. That's Active Documents at work.

ActiveX

First there was OLE. Next there was COM. And then along came ActiveX. When Microsoft turned its attention to the Internet in 1995, the software giant coined the term ActiveX to refer to a suite of COM-based technologies designed to make the Internet—and the World Wide Web in particular—more interactive. ActiveX controls are probably the best-known ActiveX technology, but there are others. If "Active" is in the name, it's an ActiveX technology: ActiveX controls, ActiveX Data Objects (ADO), Active Server Pages (ASP), and Active Documents, to name but a few. The roster is growing every day.

The one thing all ActiveX technologies have in common is that they're all COM-based. ActiveX controls, for example, are COM objects that conform to the rules of behavior set forth in Microsoft's OLE control (OCX) specifications. Applications that host ActiveX controls also implement COM interfaces; officially, they're known as ActiveX control containers.

Writing a full-blown ActiveX control—that is, one that can be plugged into a Web page or displayed in a window or a dialog box—is not a trivial undertaking. The ActiveX control architecture is complex. A typical ActiveX control implements more than a dozen COM interfaces, some of which contain more than 20 methods. Even something as seemingly simple as plugging an ActiveX control into a dialog box is far more complex than most people realize. To host an ActiveX control, a dialog box has to be an ActiveX control container, and containers must implement a number of COM interfaces of their own.

Fortunately, MFC does an excellent job of wrapping ActiveX controls and control containers. Check a box in AppWizard, and any dialog box instantly becomes a control container. You don't have to write a single line of code because MFC provides all the necessary infrastructure. MFC also simplifies ActiveX control development. Writing an ActiveX control from scratch can easily require two months of development time, but do it with MFC and you can write a fully functional control in a matter of hours. Why? Because MFC provides stock implementations of COM's ActiveX control interfaces. All you have to do is override a virtual function here and there and add the elements that make your control different from the rest.