A Brief Overview of COM | The Guru[ap]s Guide to SQL Server[tm] Stored Procedures, XML, and HTML

for RuBoard

If you've built many Windows applications, you probably have at least a passing familiarity with OLE and ActiveX. OLE originally stood for Object Linking and Embedding and represented the first generation of cross-application object access and manipulation in Windows. The idea was to have a document-centric view of the world where an object from one application could happily reside in and interact with another. OLE 1.0 used DDE (dynamic data exchange) to facilitate communication between objects. DDE is a message-based interprocess communication mechanism that's based on the Windows messaging architecture. DDE has a number of shortcomings (it's slow, inflexible , difficult to program, and so on), so the second version of OLE was moved away from it.

The second iteration of OLE was rewritten to depend entirely on COM. And even though COM is more efficient and faster than DDE, OLE is still a bit of a bear to deal with. Why? Because it was the first-ever implementation of COM. We've learned a lot since then. That said, OLE provides functionality that's very powerful and very rich. It may be big, slow, and hard to code to, but that's not COM's faultthat has to do with how OLE itself was built.

ActiveX is also built on COM. The original and still primary focus of ActiveX is on Internet-enabled components. ActiveX is a set of technologies whose primary mission is to enable interactive content (hence, the "Active" designation) on Web pages. Formerly known as OLE controls or OCX controls, ActiveX controls are components you can insert into a Web page or Windows application to make use of packaged functionality provided by a third party.

COM is the foundation on which OLE and ActiveX controls are built. Through COM, an object can expose its functionality to other components and applications. In addition to defining an object's life cycle and how the object exposes itself to the outside world, COM also defines how this exposure works across processes and networks.

COM is Microsoft's answer to the fundamental questions: How do I expose the classes in my code to other applications in a language-neutral fashion? How do I provide an object-oriented way for users of my DLL to use it? How can people make use of my work without needing source code or header files?

Before COM

There was a time not so long ago in software development when it was quite normal to distribute full source code and/or header files with third-party libraries. To make use of these libraries, people simply compiled them (or included their header files) into applications. The end result was a single executable that might contain code from many different vendors . Because it was common for many developers to use the same third-party library, a version of the library might exist in the executables deployed with numerous products. Executables tended to be relatively large and there was little or no code sharing between them. Updating one of these third-party libraries required recompilation and/or relinking because the library was incorporated directly into the executable at compile time.

This all changed with Windows' introduction of DLLs. Almost overnight, it became quite common for third-party vendors to ship only header files and binaries. Instead of being able to deploy a single executable, the developer would end up distributing a sometimes sizeable collection of DLLs with his application. At runtime, it was up to the application to loadeither implicitly or explicitlythe DLLs provided by the third-party vendor. As applications became more complex, it was not uncommon to see executables that required dozens of DLLs with complex interdependencies between them.

NOTE

This is, in fact, how Windows itself works. Windows is an executable with a large collection of DLLs. Windows apps make calls to the functions exposed by these DLLs.

This approach worked reasonably well, but it had several drawbacks. One of the main ones was that the interfaces to these DLLs weren't object- oriented, and therefore were difficult to extend and susceptible to being broken by even minor changes to an exposed function. If a vendor added a new parameter to a function in his third-party library, he might well break the code of everyone currently using that library. The approach most vendors took to address this was simply to create a new version of the function (often with an "Ex" suffix or something similar) that included the new parameter. The end result was call-level interfaces (CLIs) that became unmanageable very quickly. It was common for third-party libraries (and even Windows itself) to include multiple versions of the same function call in an attempt to be compatible with every version of the library that had ever existed. The situation quickly grew out of control and was exacerbated by the fact that there was no easy, direct method for users of these libraries to know which of the many versions of a given function should be used. Coding to these interfaces became a trial-and-error exercise that involved lots of scouring API manuals and guesswork.

Another big problem with this approach was the proliferation of multiple copies of the same DLL across a user's computer. Hard drive space was once much more expensive than it is now, so having multiple copies of a library in different places on an end user 's system was something vendors sought to avoid. Unfortunately, their solution to the problem wasn't really very well thought out. Their answer was to put the DLLs their apps needed in the Windows system directory. This addressed the problem of having multiple copies of the same DLL, but introduced a whole host of others.

Chief among these were the problems inherent with conflicting versions of the same DLL. If vendor A and vendor B depended on different versions of a DLL produced by vendor C, there was a strong likelihood that one of their products would be broken by the other's version of the DLL. If the interface to the DLL changed even slightly between versions, it was quite likely that at least one of the apps would misbehave (if it worked at all) when presented with a version of the DLL it wasn't expecting.

Another problem with centralizing DLLs was the trouble that arose from centralized, yet unmanaged configuration information. In the days before the Windows registry, it was common to have a separate configuration file (usually with an .INI extension) for every application (and even multiple configuration files for some applications). These configuration files might include paths to DLLs of which the application made use, further complicating the task of unraveling DLL versioning problems. Because these configuration files were not managed by Windows itself, there was nothing to stop an application from completely wiping out a needed configuration file, putting entries into it that might break other applications, or completely ignoring it. These .INI files were simply text files that an application could use or not use as it saw fit.

The progression used by Windows to locate DLLs was logical and well documented; however, the fact that an application might use Windows' LoadLibrary and grab a DLL from anywhere it pleased on a user's hard drive meant that this might not mean anything in terms of knowing what code an application actually depended on. The app might pick up a load path from a configuration file that no one else even knew about, or it might just search the hard drive and load what it thought was the best version of the library. It was common for applications to have subtle interdependencies that made the applications themselves rather brittle. We had come full circle from the days of bloated executables and little or no code sharing: Now everyone depended on everyone else, with the installation of one app frequently breaking another.

The Dawn of COM

Microsoft's answer to these problems was COM. Simply put, COM provides an interface to third-party code that is:

Object-oriented
Centralized
Versioned
Language-neutral

Because it uses the system registry, the days of unmanaged/improperly used configuration information are gone. When an application instantiates a COM object (usually through a call to CreateObject()), Windows checks the system registry to find the object's location on disk and loads it. There's no guesswork and multiple copies of the same object aren't allowedeach COM object lives in exactly one place on the system.

NOTE

Microsoft has recently introduced the concept of COM redirection and side-by-side deployment. This allows multiple versions of the same COM object to reside happily on the same system. This functionality has all the hallmarks of an afterthought and is only applicable in limited circumstances. (You can't, for example, use COM redirection to load different copies of an object into different Web applications on an IIS implementation. Although the Web pages may seem like different apps to users, there's actually just one applicationIISin the scenario, and COM still limits a given app to just one copy of a particular object version.) The vast majority of COM applications still abide by the standard COM versioning constraints.

This isn't to say that you can't have multiple versions of an object on a system. COM handles this through multiple interfaces. Each new version of an object has its own interface and may as well be a completely separate object as far as its users are concerned . There may or may not be code sharing between the versions of the object. As an application developer, you try not to worry about thisyou just code to the interface.

Lest I omit a very fundamental detail, an interface is similar to a class without a body or implementation. It's a programming construct that defines a functionality contracta contract between the provider of the functionality and its users. By implementing an interface, the author of the object ensures that clients of the object can depend on a fixed set of functionality being present in the object. Regardless of what the object actually is, the client can code to the interface without being concerned about the details. If the author of the object ever needs to enhance his code in a way that might break client applications that depend on it, he can simply define a new interface and leave the old one intact.

COM has its limitations (most of which are addressed in the forthcoming .NET Framework), but it is ubiquitous and fairly standardized. The world has embraced COM, so SQL Server includes a mechanism for working with COM objects from Transact -SQL.

Basic Architecture

The fundamental elements of COM are

Interfaces
Reference counting
QueryInterface
IUnknown
Aggregations
Marshaling

Let's talk about each of these separately.

Interfaces

From an OOP standpoint and also from the perspective of COM, an interface is a mechanism for exposing functionality, as I mentioned earlier. Typically, an object uses an interface to make its capabilities available to the outside world. When an object uses an interface, the object is said to implement that interface. Users of the object can interact with the interface without knowing what the object actually is, and a single object can implement multiple interfaces.

Generally speaking, to implement an interface, the methods exposed by the interface are linked to an object's methods . The interface itself requires no memory and really just specifies the functionality that an object implementing it must have.

Each COM interface is based on IUnknown, the fundamental COM interface. IUnknown allows navigation to the other interfaces exposed by the object.

Each interface has a unique interface ID (IID). This makes it is easy to support interface versioning. A new version of a COM interface is actually a separate interface, with its own IID. The IIDs for the standard ActiveX, OLE, and COM interfaces are predefined.

Reference Counting

Unlike .NET and the Java Runtime, COM does not perform automatic garbage collection. Disposing of objects that are no longer needed is left to the developer. You use an object's reference count to determine whether the object can be destroyed .

The IUnknown methods AddRef and Release manage the reference count of interfaces on a COM object. When a client receives a pointer to a COM interface (a descendent of IUnknown), AddRef must be called on the interface. When the client has finished using the interface, it must call Release.

In its most primitive form, each AddRef call increments a counter variable inside its object and each Release call decrements it. When this count reaches zero, the interface no longer has any clients and can be destroyed.

You can also implement reference counting such that each reference to an object (as opposed to an interface implemented by the object) is counted. In this scenario, calls to AddRef and Release are delegated to a central reference count implementation. Release frees the whole object when its reference count reaches zero.

QueryInterface/IUnknown

The fundamental COM mechanism used to access an object's functionality is the QueryInterface method of the IUnknown interface. Because every COM interface is derived from IUnknown, every COM interface has an implementation of QueryInterface. QueryInterface queries an object using the IID of the interface to which the caller wants a pointer. If the object implements IUnknown, Query Interface retrieves a pointer to it and also calls AddRef. If the object does not implement IUnknown, QueryInterface returns the E_NOINTERFACE error code.

Aggregations

For those situations when an object's implementer wants to make use of the services offered by another (e.g., a third party) object and wants this second object to function as a natural part of the first one, COM supports the concepts of containment and aggregation.

By aggregation, I mean that the containing object creates the contained object as part of its construction process and exposes the interfaces of the contained object within its own interface. Some objects can be aggregated; some can't. An object must follow a specific set of rules to participate in aggregation.

Marshaling

Marshaling enables the COM interfaces exposed by an object in one process to be accessed by another process. Through marshaling, COM either provides code or uses code provided by the implementer of the interface to pack a method's parameters into a format that can be shipped across processes or across the network to other machines and to unpack those parameters during the call. When the call returns, the process is reversed .

Marshaling is usually unnecessary when an interface is being used in the same process as the object that provides it. However, marshaling can still be required between threads.

COM at Work

Practically speaking, COM objects are used through two basic means: early and late binding. When an application makes object references that are resolvable at compile time, the object is early bound. To early bind an object in Visual Basic, you add a reference to the library containing the object during development, then "Dim" specific instances of it. To early bind an object in tools like Visual C++ and Delphi, you import the object's type library and work with the interfaces it provides. In either case, you code directly to the interfaces exposed by the object as though they were interfaces you created yourself. The object itself may live on a completely separate machine and may be accessed via Distributed COM (DCOM) or may be marshaled by a transaction manager such as Microsoft Transaction Server or Component Services. Generally speaking, you don't care; you just code to the interface.

When references to an object aren't known until runtime, the object is late bound. You normally instantiate late bound objects via a call to CreateObject() and store the object instance in a variant. Because the compiler didn't know what object you were referencing at compile time, you may encounter bad method calls or nonexistent properties at runtime. This is the tradeoff with late binding: It's more flexible in that you can decide at runtime what objects to create and can even instantiate objects that didn't exist on the development system, but it's also more error prone. It's easy to make mistakes when you late bind objects because your development environment can't provide the same level of assistance that it can when it knows the objects you're dealing with.

Once you have an instance of an object, you call methods and access properties on it like any other object. COM supports the notion of events (although they're a bit more trouble to use than they should be), so you can subscribe and respond to events on COM objects as well.

for RuBoard