Taking COM out of Process | Programming Distributed Applications with Com and Microsoft Visual Basic 6.0 (Programming/Visual Basic)

[Previous] [Next]

So far, this chapter has described the interaction between a client and an object only in the scope of a single process running under a single thread of execution. When a client is bound to an in-process object, it can directly invoke methods through the use of function pointers that are stored in a vTable. The interaction is very efficient because the client code and the object code share the same thread, the same call stack, and the same set of memory addresses. Unfortunately, when the object runs in another process, none of these resources can be shared.

The function pointers stored in vTables have no meaning across process boundaries. A client can't use a remote function pointer to access an object in another process. How, then, can COM remote a method call from the client's process to the object's process? COM makes remote communication possible with a pair of helper objects called the proxy and the stub.

Figure 3-9 shows how the proxy and the stub are deployed. The proxy runs in the client's process, while the stub runs in the object's process. The proxy and the stub establish a communication channel using an interprocess communication mechanism called Remote Procedure Call (RPC). RPC is a connection-oriented protocol that is based on synchronous request/response pairs. The proxy and stub use the RPC channel to pass data such as method parameters back and forth during the execution of each method. However, in order to do this, the proxy and stub must serialize this data into a form that can be transmitted across process and host boundaries. The act of serializing method-related data for transmission across the proxy/stub layer is known as marshaling.

click to view at full size.

Figure 3-9 COM's remoting architecture requires that a proxy/stub layer be introduced between the client and the object. The proxy and the stub establish an RPC channel between them to transmit method-related data back and forth.

When the client invokes a method on the proxy, the proxy forwards the request to the stub. To properly transmit this request, the proxy marshals the method's inbound parameters and then transmits them to the stub across the RPC channel. When the stub receives the request, it unmarshals the inbound parameters and performs the actual call on the object. After the object completes the method and returns control back to the stub, the stub prepares a response by marshaling outbound parameters and a return value. The data is then transmitted back to the proxy. Finally, the proxy unmarshals the data in the response and returns control back to the client.

You should be asking yourself an important question: Who creates proxies and stubs? The answer is that they're usually created by the COM runtime. When a client calls CoCreateInstance to create an object in another process, the proxy and stub are created automatically by the system. By the time the client has successfully activated the object, the proxy and stub have already been created. However, the client is bound to the proxy instead of the object. And it's the stub, not the client, that gets bound to the object.

In most cases, the COM runtime simply detects the need for proxies and stubs and inserts them wherever they're needed. The idea is simple. When an interface reference is exported from the object's process, the COM runtime builds a stub. When the interface reference is imported into the client's process, the COM runtime builds a proxy. The communication channel that is established between the proxy and the stub makes it possible to remote method requests from the client to the object and back.

What about the case in which an object reference is passed across process boundaries in a COM method call? Again, the COM runtime automatically does the right thing. It knows to build a proxy/stub pair whenever the object and the client who receives the reference live in different processes. Things couldn't be any easier for you. While C++ programmers can get into sticky situations in which they're required to explicitly build proxies and stubs by calling functions in the COM library, Visual Basic programmers really never have to worry. Proxies and stubs always get built for them behind the scenes.

The best part about COM's remoting architecture is that neither the client nor the object can tell that calls are being remoted. The client thinks that the proxy is the object. The object thinks that the stub is the client. This allows COM programmers to write code for both clients and objects without regard to whether objects will be activated in-process or out-of-process. This powerful feature is known as location transparency.

You should note that there is a proxy/stub pair for each connected interface. This means that a client and an object can have two or more proxy/stub pairs connecting them at any one time. It makes sense that the proxy/stub pair is associated with the interface because it's the interface definition that describes the methods that need to be remoted.

The key to generating the code to build proxies and stubs lies in interface definitions. As you know, COM interfaces can be defined in either IDL source files or type libraries. In fact, IDL is a language that was originally created to solve the problem of marshaling function parameters from one machine to another. This is why IDL requires you to specify parameters with attributes such as [in], [out], and [in, out]. These parameter attributes tell the proxy and stub which direction to marshal all the relevant data for a remote method call.

A type library is like an IDL source file in the sense that it can hold interface definitions. One of the main differences between the two is that a type library exists in a binary format. It's easier to parse at runtime. As it turns out, COM provides a service that can build proxies and stubs on an as-needed basis by examining interface definitions in a type library. This service is known as the universal marshaler. It goes by other names as well, such as the type library marshaler and the automation marshaler. However, the term automation marshaler can be somewhat misleading because the marshaler doesn't require that clients and objects communicate through IDispatch.

The Role of the Universal Marshaler

The proxy and the stub have their work cut out for them. They must work together to give both the client and the object the perception that they're running in a single process on a single thread. They create this illusion by constructing a call stack in the object's process that is identical to the one in the client's process. Any data sitting on the call stack in the client's process must be marshaled to the object's process. What's more, any pointers on the client's call stack require the proxy to marshal the data that the pointer refers to. The stub is responsible for unmarshaling all the data and setting up the call stack. This might involve unmarshaling data items that aren't stack-based and then setting up stack-based pointers that point to them.

As you can imagine, the code that accomplishes the marshaling behind a proxy/stub pair can become quite complicated. Fortunately, you can rely on the universal marshaler to build the required proxy/stub code. The universal marshaler is part of OLEAUT32.DLL. When you're dealing with interfaces that rely on the universal marshaler, the COM runtime calls upon code inside OLEAUT32.DLL whenever it determines that a proxy or a stub is needed.

You should note that C++ programmers often build their own custom proxy/stub code instead of using the universal marshaler. They have to deal with a few more issues that aren't a concern for Visual Basic programmers. It's safe to assume that the interfaces you implement in Visual Basic components will rely on the universal marshaler to build their proxies and stubs.

The universal marshaler uses interface definitions in type libraries to build proxies and stubs. Earlier in the chapter, you saw that Visual Basic also uses interface definitions in type libraries to generate direct vTable binding code at compile time. Now, you see that type libraries have another important role at runtime: They're the key to building interprocess connections between clients and objects.

Interfaces that rely on the universal marshaler are defined in type libraries and in IDL with the [oleautomation] attribute. This is what differentiates interfaces that use the universal marshaler from interfaces that provide their own custom proxy/stub code. When a type library is registered, a key is placed in the Registry that tells the universal marshaler where to find it. At registration time, configuration information is also added to the Registry for each IID declared with the [oleautomation] attribute.

Remember that every proxy/stub pair is based on a specific IID. When the COM runtime needs to build either a proxy or a stub, it examines the Registry key HKEY_CLASSES_ROOT\Interface to locate configuration information for the interface in question. Figure 3-10 shows the Registry entries for an oleautomation IID. There is a ProxyStubClsid32 key, which tells the COM runtime where to find the code to build proxies and stubs. For oleautomation interfaces, this is the CLSID for the universal marshaler.

Figure 3-10 The Registry settings for an interface marked with the [oleautomation] attribute tell the COM runtime to use the universal marshaler. They also tell the universal marshaler which type library holds the interface definition.

When the COM runtime determines that it needs to build a proxy or a stub for an oleautomation interface, it forwards the request to the universal marshaler. The universal marshaler then inspects the Registry to find the GUID for the type library associated with the IID. The universal marshaler then locates the type library using the information shown in Figure 3-11. Once the universal marshaler has the path, it can load the type library, read the interface definition, and build a proxy or a stub.

Figure 3-11 The universal marshaler uses the TypeLib key to find the physical path to the type library. This path can point to either a stand-alone .TLB file or a COM server with an embedded type library.

In later chapters, you'll see that the COM+ runtime builds proxies and stubs in a few other situations as well. I'll also introduce contexts and apartments. A context is a set of objects running inside an apartment. An apartment is a set of contexts running inside a process. Client and object often communicate through proxies and stubs even when they're running inside the same process. I'll explain why this is the case. For now, let's concentrate on proxies and stubs that allow method calls to flow across process and host boundaries.

It's important to see that the client and object both require a type library to successfully build a proxy/stub pair. When the client and object are running on different machines, both computers must have a local copy of the type library. Because the object always has a local version of the server, it can rely on the type library inside the server's binary image. The client computer, on the other hand, doesn't require the server. It needs only the type library. It's common practice to generate a stand-alone type library for installation on client machines when they'll be activating objects from across the network.

Observations About Out-of-Process COM

First and foremost, you should be ecstatic that the low-level details of out-of-process communication have been abstracted away and hidden beneath the covers. RPC is essential to COM, but the manner in which COM creates and uses an RPC connection doesn't require your attention. Imagine how much harder it would be to create a distributed application if you had to directly program against an interprocess communication layer such as RPC or sockets.

Next, you should note three important performance-related points about out-of-process COM. The first is that out-of-process method calls take much longer than in-process calls. Generally, you can expect an out-of-process call to take at least 1000 times as long as an in-process call with direct vTable binding. The proxy/stub layer requires thread switching and marshaling, so it adds a significant amount of overhead.

The second thing to note is that remote method calls on Visual Basic objects are always conducted synchronously. This is due to the synchronous nature of RPC. When the client invokes a method on a remote object, the calling thread in the client's process is blocked until the call returns. This means that the client must wait it out while the RPC request message travels to the stub, the stub executes the call, and, finally, the RPC response message makes its way back to the proxy. Although new asynchronous RPC support has been added to Windows 2000, you must program in C++ to take advantage of it. Every RPC-based COM method call you make to or from Visual Basic code is conducted synchronously.

The third key point is that objects created with Visual Basic can be passed only by reference and never by value. Don't be fooled into thinking that you can simply pass a Visual Basic object from one machine to another. Your methods can define parameters that are object references but not actual objects. The current version of Visual Basic lets you put the ByVal keyword in front of object types in parameter definitions, but these parameters are still interpreted with pass-by-reference semantics. When you have a reference to an out-of-process object, access to each method or property in the object requires an expensive round-trip.

Out-of-process objects created with Visual Basic are bound with proxies and stubs built by the universal marshaler. This technique for binding a client to an out-of-process object from the information in a type library is a form of standard marshaling. Many programmers who use languages other than Visual Basic also use standard marshaling because it's easy to set up.

C++ programmers can forgo standard marshaling in favor of custom marshaling. Those who are willing to write their own marshaling code by implementing a special interface named IMarshal can optimize the communication channel in ways that are impossible with standard marshaling. For instance, you can implement pass-by-value semantics with custom marshaling code. However, you should note a few important limitations with custom marshaling. First, it requires the use of C++ on the object side. Second, it's not supported for configured components in either COM+ or MTS. It can be used only with nonconfigured components.

As a Visual Basic programmer, you can't produce components that support custom marshaling. However, you might come into contact with one common nonconfigured component that does support custom marshaling. The ADO Recordset objects implement IMarshal. This means that you can define method parameters and return values in terms of the ADO Recordset component and all the data for the recordset (not just a reference) will flow over the network across the proxy/stub layer. Note that you must use disconnected, client-side recordsets in order for this to work. Marshaling ADO recordsets is one of several techniques for moving data around the network in an efficient manner. We'll revisit this topic in greater depth in Chapter 11.

Object activation and location transparency

In the previous edition of this book, I explained how object activation works in an out-of-process server such as an ActiveX EXE. I've decided to omit these details from this edition because they're no longer relevant to developers who are using COM+ to create distributed applications. Chapter 6 describes how out-of-process and remote activation occur in a COM+ application. What's important to note is that clients and objects can't tell whether they're running in the same process or even on the same machine.

Figure 3-12 shows three different deployment scenarios. The components you create in an ActiveX DLL can be configured to run in-process, locally, or across the network. It's all transparent to client code as well as component code. You can move back and forth between all three deployment scenarios without ever rebuilding your servers or your client applications. Changing the location of the client or a component requires minimal effort using a simple administrative tool.

click to view at full size.

Figure 3-12 Location transparency eliminates the need for programmers to be concerned with the details of remote activation or interprocess communication.

The ability of programmers to write both client code and object code for an in-process relationship and have the code work automatically across process boundaries is one of the most powerful features of COM. This location transparency eliminates the need for programmers to be concerned with the grungy details of interprocess communication. It also means that components can be redeployed around the network with little or no impact on code. You can redirect a client that is programmed to activate a certain object from an in-process DLL so that it activates a remote object by making just a few minor adjustments. You don't have to rewrite a single line of code.

However, the mere fact that your component code automatically works in both in-process and out-of-process situations doesn't mean that it's efficient. Code written for in-process objects might not scale when deployed in an application that's running across the network. Quite a few coding techniques work adequately in in-process scenarios but result in unacceptable performance when the proxy/stub layer is introduced. Chapter 11 explains the importance of designing interfaces that work efficiently across the proxy/stub layer.

Summary

This chapter described the essential concepts and details of COM. You must have a firm understanding of COM before you can understand how all of the new features of COM+ have been layered on top of it. For the rest of the book, I'll use the term COM when I refer to the aspects of the Windows platform that I've covered in this chapter. When I use the term COM+, I'm talking about a runtime layer and a programming model specific to Windows 2000. The distinction between COM and COM+ can be somewhat fuzzy because of the fact that COM+ is built on top of COM. When you're talking about Window 2000, the COM runtime and the COM+ runtime are one and the same.

The original version of COM meets four important design requirements: It's based on binary reuse, it's object-oriented, it's language-independent, and it provides a foundation for interprocess communication. After reading this chapter, you know that COM's original architects had to deal with countless details in order to meet these requirements.

COM is based on a formal separation of interface from implementation. At the physical level, the vTable serves as the in-memory representation for an interface. Some clients access objects through direct vTable binding, while others, such as scripting clients, require automation objects that expose functionality through the IDispatch interface.

Visual Basic hides COM's goriest details. On the component side, Visual Basic builds in support for programmers who don't want to deal with the extra complexities involved with user-defined interfaces. On the client side, it builds in support for direct vTable binding by inspecting interface definitions in type libraries at compile time. You should have an appreciation for why objects implement IUnknown and IDispatch. But at the same time, you should be grateful that Visual Basic takes care of all the details for you behind the scenes.

The SCM is a dynamic class loader that clients use to create objects on demand. The requirements of interface-based programming don't allow servers to expose concrete classes. Instead, every server must support a somewhat complicated activation protocol. Clients make activation requests to the SCM, and the SCM forwards the request to the server. Once again, the Visual Basic compiler steps in and takes care of all these details behind the scenes. You don't have to do much in order for clients to be able to activate objects from your components.

The last part of this chapter discussed the fundamentals of COM's remoting layer. The proxy/stub layer provides the architecture for remoting method calls across processes and host boundaries while keeping clients and objects ignorant of what's really going on. These details are the responsibility of the COM runtime and the universal marshaler. The universal marshaler creates the code for remoting method calls by inspecting an interface definition from a type library. You can thus conclude that interfaces are the key to seamless distribution in COM.