Understanding Recordset Marshaling | Ado Examples and Best Practices

Team-Fly

When you want to transport data from one tier to another, or when you are sharing ADO objects among threads, it's best to understand what COM and ADO do behind the scenes. This can also help you understand your options, and the limitations of using ADO in the middle tier.

COM marshaling involves two concepts:

Passing data from one process or thread to another
Synchronizing threads accessing a COM object

When passing Recordsets from place to place, ADO behaves very differently depending on whether or not you're asking it to move data between threads or parts of your own application (within a single process), or between two separate processes via a COM (or DCOM) link. Let's examine the various flavors of ADO marshaling and see how they behave in these two environments—in process and between processes.

ADO General Marshaling

By default, ADO provides standard COM marshaling. This means that ADO uses standard OLE automation types for its arguments, and interface pointers can be passed across processes without any special custom marshaling code. For example, a COM server can return a Connection or Recordset object to a client, and the ADO object's interface pointer can be passed across processes (in other words, you can pass a reference to the object). Any calls to methods of the interface cause a call from the client back to the existing server-side ADO object.

For the in-process case, ADO specializes the marshaling of its objects by using the COM Free-Threaded Marshaler (using CoCreateFreeThreadedMarshaler). This means that while ADO is marked for apartment-model threading by default, ADO skirts the apartment rules with the free-threaded marshaler and will have direct pass-through of any ADO object interfaces from one thread to another. No COM synchronization takes place, as ADO is thread-safe and does all of the necessary synchronization. Overall, this means that ADO does not use COM marshaling for the in-process scenarios.

Beyond Standard Marshaling

ADO's standard marshaling is sufficient for many, but not all scenarios. For example, standard marshaling across processes can hurt performance because for each call to the ADO object, an application makes a call across process. It would be better for performance if all of the data of a Recordset were returned to the client. Any subsequent calls such as fetching would be made locally in the client rather than back to the COM server where the Recordset was created.

ADO already provides such a mechanism for the Recordset object. To return a Recordset object from a server and have the data of the Recordset marshaled to the client, specify adUseClient for the Recordset object's CursorLocation property. The adUseClient value tells the Recordset object to use the Client Cursor Engine, which caches all of the records of the Recordset. Whenever a Recordset is sent across process boundaries with adUseClient set, a Recordset object (which also uses the client cursor engine) is constructed in the receiving process and it is populated with the sent records. At this point, the Recordset is said to be "disconnected" or "disassociated" from a database connection. The database connection is not carried along with the Recordset data when adUseClient is specified.

So, to make sure ADO knows that you plan to pass the Recordset to another tier, be sure to set the ActiveConnection to Nothing, just as we discussed in Chapter 6. By setting the ActiveConnection property to Nothing, we signal ADO to complete population so all qualifying rows become members before the Recordset is released to you and before it gets sent anywhere.

With adUseClient set, you always pass the data of the Recordset when sending the Recordset across process boundaries. This is typically called "passing by value." Without adUseClient, you will always get standard marshaling, and only a reference to the ADO Recordset object is passed across process boundaries—this is typically called "passing by reference."

Limitations of Marshaling Recordsets "By Value"

When ADO Recordset objects are passed by value, the client cursor engine marshals the Recordset data across the process boundaries. The newly created ADO Recordset object in the receiving process simply attaches to the Recordset stored in the client cursor engine cache.

However, there are a number of important issues here. Because the underlying OLE DB rowset (the data and metadata) is being marshaled, instead of the ADO Recordset, the properties of the Recordset object do not get marshaled. This means that properties such as Filter, Sort, and ActiveConnection are not carried from one process to the next. So, for example, a common pitfall is to set the Filter property of a Recordset and expect only the filtered records to be passed across the process boundaries. Instead, all of the original records of the query are passed. Furthermore, the cursor location is set to the beginning of the Recordset when it is marshaled—a new ADO Recordset is created in the receiving process and is attached to the marshaled rowset, causing the cursor to be positioned at the first record. The current cursor pointer is lost.

So, as a general rule of thumb, do not expect Recordset properties to be marshaled across process boundaries. The advantage to this is that ADO Recordset marshaling is lightweight, passing only the metadata (table name(s), column name, and data types, and some flags) and record data.

Marshaling Out of Process

When one tier modifies a Recordset object, it often makes sense to marshal back just the changes, instead of the whole rowset. However, it also makes a difference whether or not your code is marshaling across process boundaries. Remember that when you test a component that's coded as a class within your Visual Basic application, it's running in process. When it's running in another executable somewhere, it's running out of process. This section focuses on the situation where you're passing a Recordset to another tier—way out of your process.

If you pass an ADO Recordset in process, you are passing a reference to an object in memory, regardless of the setting of the CursorLocation property. When you pass an ADO Recordset out of process, the ADO libraries are loaded in both processes.^[3] The ADO libraries pass the information stored in the Recordset from the one process to the other. In the process, ADO converts the Recordset to ADTG format, thus reducing network overhead. Each process then has its own copy of the Recordset object, rather than just a reference to the initial object. This means that the other tier's copy of the data does not reflect changes on either end of the wire.

Let's look at a couple of examples. The first example passes the data from a server to a client. This is accomplished by having the client call an MTS component's method that returns a Recordset. This could also be a Web page that returns a Recordset (or an XML string or Stream—which we discuss in Chapter 9).

 Public Function GetAuthorsByISBN(ISBN as String) as Recordset Dim rs as ADODB.Recordset ' Do what it takes to fetch the rows for the Recordset … Set GetAuthorsByISBN = Rs End Function

Suppose you call this method out of process using this VB client code:

 Dim rs as Recordset Set rs = oServer.GetAuthorsByISBN(txtISBN.Text) ' Display and manipulate (add, change, delete) the rows for the Recordset …

In this case, the server-side Recordset is left unmodified.

At this point, we're ready for phase two. We need to send the changes back to the server so it can reconnect the Recordset to the data source and post the changes. This is the Microsoft Transaction Server component that posts the changes to the data source.

 Public Sub SaveAuthor (ByVal Rs as Recordset) On Error GoTo ErrorHandler Dim Cn as ADODB.Connection Set Cn = New Connection Cn.Open "provider=sqloledb;data source=(local);initial catalog=mydb", _ "admin", "pw" Set Rs.ActiveConnection = Cn ' And the UpdateBatch posts the changes to the database-or tries to… Rs.UpdateBatch Exit sub ErrorHandler:  ' Yada yada yada… End Sub

Of course, unless you have a "Yada" processor, you'll need to stick in robust collision-management error handlers here.

So what if I send 50 rows and the client only changes 1? Does ADO have to send back the whole megillah? Actually no—you can tell ADO to send just the changes back to the server to post to the data source. This saves processing time and network bandwidth. So, how do you do it? The ADO Recordset object exposes the MarshalOptions property. If you use adMarshalModifiedOnly, ADO only copies over the updated and new records. If you use adMarshalAll, all of the data is copied. It's that simple.

The bookmark, filter, and sort information are not used in determining what records are passed across process boundaries. All rows of the Recordset are passed, regardless of the current setting of the Filter property. In the resulting Recordset, the first record is the current record, regardless of the bookmark set in the other process. The resulting Recordset does not receive the value of the Sort property from the other process.

Marshaling Performance

Limiting the amount of data sent in both directions is the primary means of effecting performance improvements when marshaling. The server should return just the rows and columns absolutely necessary, if we need to return rows at all. There are plenty of cases where rows are not really called for, and simply passing variables back will do just fine. When the client sends data to the server for posting, it's essential that it send just the changes. Otherwise, it's kinda like asking your wife for a phone number and, in response, she throws the yellow pages at you.

Let's explore different options to send and receive data between tiers. The first scenario assumes you want to send more than a single set of values. In this case, you usually want to send a Recordset as it's designed to deal with multiple rows (sets) of related values (columns). The second scenario we examine focuses on passing single sets of data values such as those returned by OUTPUT parameters from a stored procedure.

Passing Sets of Data

The ADO Recordset object gives you a lot of functionality on the client side, but it needs to bring in all the metadata (DDL) to the client to recreate itself. When you remote a Recordset to the client, you automatically get access to an advanced object model that enables you to do rich data manipulation, such as data shaping (and reshaping), off-line sorting, and filtering, as well as automatic updating to the backend when the metadata is rich enough. Metadata includes column names and aliases, datatypes, precision or scale, primary key and foreign key info, base table info, updatability info, and more. Let's walk through some of the guiding principles that hobble or enhance marshaling performance. Sure, some of these points have been made before, but, as you've guessed, these are all intertwined in the fabric of a high-performance application or component.

If your application needs the flexibility and depth of structure provided by the Recordset, then by all means pass a Recordset. For example, if your application needs to morph to evolving data structures, the metadata in a Recordset always reflects the latest structural and schema changes. Your application can be coded to adapt automatically to these changes. However, if you don't expect changes or you want a lighter-weight approach that depends on stable (inviolate) schemas, passing the metadata is an unnecessary luxury you can do without.
The best way to pass efficient Recordsets is to construct small, focused Recordsets. To make sure your Recordsets contain just the rows and columns you need, write your SELECT statement carefully. SELECT *, as I said earlier, is a pariah—it is to be avoided. Choose just the columns you need for this particular query and no more. Make sure your WHERE clause focuses your query on just the rows you need now, not 10 minutes from now, and generally never more than a few hundred rows.
Remember to set the MarshalOptions property to ensure that just modified rows are sent back from the client to be posted.
Don't pass BLOBs from tier to tier unless you can't avoid it. In any case, only pass BLOBs for specific pictures or objects when called for—avoid including a BLOB column in a multirow Recordset. Examine alternative transport mechanisms for BLOBs, such as intranet or common file shares.
To avoid the metadata overhead of a Recordset, you can choose one of the other data formats, including CSV, Variant array, or arrays of user-defined types (UDTs). Each has varying degrees of pseudo-metadata handled by Visual Basic, but not nearly the rich DDL as persisted by the Recordset. For example, you'll be able to pick up a type definition out of a Variant, but it's a "fuzzy" definition. So, in most cases, you'll be depending on the discipline (you've heard of that, haven't you?) in your organization—the discipline that prevents one developer from changing a data structure without informing the rest of the team. If your organization is facing a lot of changes imposed by the client or the dictates of your business, perhaps this approach is problematic. In the "olden" days, this was standard operating procedure—no changes without the data committee agreeing.

When you marshal a CSV^[4] or a safe array (another form of multiple set data), the Recordset metadata does not get marshaled, but there is not a whole lot you can do with that data on the client. You won't have any of the Recordset features available to display, filter, sort, or munge the data. But that might not be necessary for your requirements. For large Recordsets, marshaling is not very expensive, but for small Recordsets the metadata portion can be proportionally heavy when compared to the data and CSV/safe arrays.

When you use XML to marshal the data, the marshaling technique again depends on whether you need the metadata on the client or not. If you have to marshal all the metadata, XML actually uses more space than the binary protocol—but only about 50 percent more. Marshaling Recordsets using the ADO binary format and XML has been tested, but XML compresses well, so the difference is not really significant in size or performance.

^[3]Just make sure that both ends of the transfer (client and server) are loaded with compatible versions of ADO (2.0 to 2.0 or 2.1 to 2.5).

^[4]CSV: A fancy acronym for a comma-delimited record. However, these are saved in Unicode, which doubles the amount of RAM (and disk) they require.

Team-Fly