Data Conversion Classes | Inside ATL (Programming Languages/C)

[Previous] [Next]

Because COM is a language-independent model for software development, in the early years of its use, developers tried to standardize on the primitive data types that would be passed from component to component. In COM, text characters are represented using the 16-bit OLECHAR data type. The architects of COM decided to use a 2-byte character instead of the 8-bit character representation more familiar to C and C++ developers so that COM could support all existing code pages, including Unicode. OLECHAR strings can be represented by using a buffer in which the end of the string is specified with a terminating null value, similar to single-byte character strings. (See Figure 51.) The only difference is that each character—including the null character—is represented using 2 bytes instead of 1.

click to view at full size.

Figure 5-1. Format of an OLECHAR string.

The use of simple null-terminated OLECHAR arrays isn't the preferred technique for passing text from one component to another. Instead, the de facto standard is to use the BSTR type, an array of OLECHARs that is length-prefixed as well as null-terminated. It is popular because the BSTR type, which stands for "Basic STRing," is the string data type both Visual Basic and the Java java.lang.String class use. Although the widespread use of the BSTR type in rapid application development (RAD) languages makes it a good choice for usage as a COM data type, the BSTR type is foreign to most C and C++ developers. The first four bytes of a BSTR are used as a length prefix that indicates the length of the text string. This approach is advantageous because it allows developers to encode NULL characters inside the BSTR, but it poses an interesting problem. Because the first four bytes of a BSTR represent its length rather than the first two characters in the array, how are BSTRs and OLECHAR arrays used interchangeably in display functions? The answer is that the COM-provided BSTR allocation APIs—SysAllocString and SysReallocString—return a pointer to the first character in the string, not the first byte in the allocated array, as shown in Figure 5-2.

click to view at full size.

Figure 5-2. Format of a BSTR.

This technique turns out to be a mixed blessing for C++ developers. On one hand, it means that a BSTR can be passed into most functions that take a pointer to an OLECHAR array. On the other hand, BSTRs can't be created, freed, and manipulated using the familiar C++ run-time functions. Functions such as malloc, free, new, delete, lstrcat, and lstrlen don't currently work when applied to BSTRs. The use of BSTR as the standard string data type in COM makes sense because it allows components developed in C++, Visual Basic, and Visual J++ to share text information, but it means extra work for C++ developers. Just as you must treat an interface pointer differently than a class pointer, you must treat a BSTR differently than a TCHAR*.

The CComBSTR Class

Fortunately, ATL provides support for dealing with BSTRs. The CComBSTR class encapsulates the functionality of a BSTR in much the same way as the MFC CString class encapsulates a TCHAR array, albeit with fewer features. Table 52 describes several of the most pertinent methods of this class.

Table 5-2. Frequently Used CComBSTR Methods.

CComBSTR Method	Description
CComBSTR	Various overloaded constructors allocate a new BSTR given an LPCOLESTR, an LPCSTR, or another CComBSTR.
~CComBSTR, Empty	Frees the encapsulated BSTR.
Attach, Detach	Attaches an existing BSTR to the class, or detaches it such that the destructor won't free it when the class goes out of scope. Detach is helpful when using the CComBSTR class to assign an [out] parameter.
operator BSTR, operator&	Allows the BSTR to be accessed directly. operator BSTR allows CComBSTR to be used in place of BSTR as an [in] parameter. operator& allows CComBSTR to be used in place of a BSTR* as an [out] parameter.
operator=, operator+=, operator<, operator==, operator>	Overloaded operators provide assignment, concatenation, and simple comparison of BSTRs.
LoadString	Allows you to initialize a BSTR with text stored in a string resource.
ToLower, ToUpper	Converts the BSTR to all uppercase or all lowercase using the language-safe CharLower and CharUpper Microsoft Win32 APIs, respectively.
WriteToStream,ReadFromStream	Reads or writes the BSTR to and from an IStream interface.

To developers transitioning from MFC to ATL, the CComBSTR class is frustrating because it doesn't offer nearly as many convenient features as does the CString class. A list of notable omissions is shown in Table 5-3. Simply stated, CComBSTR isn't meant as a full-blown replacement for string manipulation. It is simply a convenience class for converting from an LPCTSTR to a BSTR and for treating a BSTR as a class rather than with the COM SysXXXXString APIs. If you need to perform sophisticated string manipulation—which we'll conveniently define as any operation shown in Table 5-3—you should instead use the wstring class provided by the Standard Template Library (STL). Admittedly, the syntax of the STL string classes is quite different from CString, so they take some getting used to, but time spent mastering STL is a good investment.

Table 5-3. Notable CComBSTR Omissions.

Features Not Included in CComBSTR	Explanation
LPCSTR extraction	Several of the CComBSTR methods take an LPCSTR as input, allowing you to convert from a single-byte character string to a BSTR. However, there are no methods that allow you to convert back to an LPCSTR. The need to convert back to an LPCSTR often occurs in non-Unicode projects when you want to pass the string to a Win32 API that takes an LPCTSTR. The solution when developing for Microsoft Windows NT is to explicitly specify the Unicode version of the API, such as MessageBoxW instead of MessageBox, so that no conversion is required. Under Microsoft Windows 95 and Windows 98, or if your situation otherwise requires you to convert from BSTR to LPCSTR, you can use the _bstr_t class provided by the Visual C++ run-time library, which provides an LPCTSTR extraction operator.
String manipulation (including Replace, Insert, Delete, Remove, Find, Mid,Left, Right, and so on)	CComBSTR doesn't support these methods because they are beyond the scope of its role. To perform string manipulation on an array of wide characters, use the wstring class provided by STL.
Language-sensitive collation	The string comparison functions provided by CComBSTR (<, >, ==) perform byte-by-byte comparisons rather than language-specific collation. To perform collation, use the wstring class.

The following pseudocode shows the typical use of CComBSTR:

 HRESULT CMyObject::MyMethod(IOtherObject* pSomething) {     CComBSTR bstrText(L"Hello");     bstrText += " again";                     // LPCSTR conversion     bstrText.ToUpper();     pSomething->Display(bstrText);            // [in] parameter     MessageBoxW(0, bstrText, L"Test", MB_OK); // Assumes Windows NT }

CComBSTR Gotchas

As you can see, CComBSTR significantly simplifies the use of BSTRs. Four uses of CComBSTR, however, require special care:

Freeing the BSTR explicitly

Using CComBSTR as an [out] parameter

Using a CComBSTR automatic variable in right-side assignments

Using a CComBSTR member variable in right-side assignments

Because CComBSTR exposes an operator BSTR method, there's nothing to prevent you from explicitly freeing the underlying BSTR, as shown here:

 HRESULT CMyObject::MyMethod1() {     CComBSTR bstrText(L"This is a test");     ::SysFreeString(bstrText);          MessageBoxW(NULL, bstrText, L"Test", MB_OK); }

In this code, the BSTR beneath bstrText has already been freed, but there's nothing to stop you from still using it because bstrText hasn't yet gone out of scope. When it finally does go out of scope, SysFreeString will be called a second time. Preventing this "gotcha" would require removing the operator BSTR method from the class—but that would render CComBSTR nearly useless because you couldn't use it in place of BSTR for [in] parameters.

When passing a CComBSTR as an [out] parameter in place of a BSTR*, you must first call Empty to free the contents of the string, as shown here:

 HRESULT CMyObject::MyMethod2(ISomething* p, /*[out]*/ BSTR* pbstr) {     CComBSTR bstrText;          bstrText = L"Some assignment";     // BSTR is allocated.          bstrText.Empty();                  // Must call empty before     pSomething->GetText(&bstrText);    //  using as an [out] parameter.     if(bstrText != L"Schaller")         bstrText += "Hello";           // Convert from LPCSTR. }

Calling Empty before passing the CComBSTR as an [out] parameter is required because—following the COM rules for [out] parameters—the called method doesn't call SysFreeString before overwriting the contents of the BSTR. If you forget to call Empty, the contents of the BSTR immediately preceding the call will be leaked.

The third CComBSTR gotcha is also fairly obvious, but it's much more dangerous than the first two. Examine the following code:

 HRESULT CMyObject::MyMethod3(/*[out, retval]*/ BSTR* pbstr) {     CComBSTR bstrText(L"Hello");     bstrText += " again";     *pbstr = bstrText;        // No! Call Detach instead! }

The pointer to the BSTR encapsulated by bstrText is passed as an [out] parameter in the *pbstr = bstrText assignment statement. When bstrText goes out of scope just before returning from MyMethod3, the BSTR will be freed by the call to SysFreeString in the CComBSTR destructor. The caller will get a pointer to an already freed buffer, causing extremely undesirable results. Because bstrText is about to go out of scope, you must instead assign *pbstr to the output of your choice of the CComBSTR Copy or Detach methods. CComBSTR Copy makes a copy of the string; Detach simply removes the BSTR from the auspices of the wrapper class so that it won't be deleted when bstrText goes out of scope:

 HRESULT CMyObject::MyMethod4(/*[out, retval]*/ BSTR* pbstr) {     CComBSTR bstrText(L"Hello");     bstrText += " again";     //*pbstr = bstrText.Copy();    // Better!     *pbstr = bstrText.Detach();    // Much better! }

In this case, you'd be better off calling Detach instead of Copy for reasons of efficiency. Detach doesn't incur the unneeded overhead of creating an extra copy of the string. However, Copy is required when the contents of CComBSTR must still be used after making the assignment.

The final CComBSTR gotcha is the most subtle. In the following code, the CStringTest class uses CComBSTR as a member variable to store a BSTR property. The put_Text and get_Text interface methods allow the value to be modified.

 class CStringTest {          CComBSTR bstrText; // IStringTest public:     STDMETHOD(put_Text)(/*[in]*/ BSTR newVal)     {         m_bstrText = newVal;         return S_OK;     }     STDMETHOD(get_Text)(/*[out, retval]*/ BSTR *pVal)     {         *pVal = m_bstrText;    // Oops! Call m_bstrText.Copy                                //  instead.         return S_OK;     } };

Can you see the bug? Because m_bstrText doesn't go out of scope at the end of the get_Text method, you might think you can reasonably assume that you don't need to call Copy when making the *pVal = m_bstrText assignment. This is not the case. According to the rules of COM, the caller is responsible for freeing the contents of an [out] parameter. Because *pVal points to the BSTR encapsulated by m_bstrText instead of a copy, both the caller and the m_bstrText destructor will attempt to delete the string.

Previous Versions of ATL: CComBSTR
The CComBSTR class was enhanced from ATL 2.1 to ATL 3.0. The ToLower, ToUpper, and LoadString methods are new, as are the operator overloads (+=, <, ==, and >). It's possible that future versions of ATL will continue to extend the functionality of CComBSTR until it approaches the feature set provided by CString, but we doubt it. The STL wstring class is the preferred alternative to CString when performing string manipulation.

The CComVariant Class

In addition to the BSTR, ATL provides support for another popular but less-than-C++-friendly COM data type called the VARIANT (or alternatively, VARIANTARG). A VARIANT is a catchall data structure used to represent all Automation-compatible data types including (but not limited to) floating point numbers, longs, dates, strings, interface pointers, and arrays. Structurally, a VARIANT consists of a union and a type variable that specifies which member of the union is in use. The VARIANT structure also contains three unused data members reserved for future use. The chief benefit of the VARIANT is that it allows RAD and scripting languages to treat all data types polymorphically, meaning that you can perform operations on a VARIANT without being concerned with the actual data type it represents. If necessary, those languages automatically call the COM-provided VariantChangeType API to perform the required conversion. Here's an example from Visual Basic:

 Sub Form_OnLoad     Dim myValue     set myValue = "132"     myValue = myValue + 68     myValue = myValue & " dollars"     MsgBox myValue    ` Displays "200 dollars" End Sub

This flexibility is appropriate for scripting languages because a single data type can intermittently be treated as both a number and a string. The necessary conversions required to make this code work correctly are conveniently hidden from the developer.

Unlike scripting languages, C++ puts a higher priority on type safety and correctness than on flexibility and ease of use. C++ developers are encouraged to use strongly typed data rather than catchall types whenever possible. However, the genesis of COM, strongly influenced by what was then known as OLE Automation, has resulted in the need to support the VARIANT.

NOTE
OLE is a rich but complicated implementation of compound document_based features that uses COM as its underlying technology. At one time, OLE stood for object linking and embedding. Later, it stood for nothing. It is our completely unfounded belief that once the marketing types at Microsoft learned that the word olé in Spanish means "bravo!" or "good," they decided that the term "OLE Automation" was an oxymoron and so dropped the "OLE," leaving simply Automation.

Because the VARIANT can represent any one of a number of data types, the tasks of variable initialization, assignment, and comparison are especially tedious under C++. To offer some relief, ATL provides a CComVariant class that wraps the VARIANT in a manner similar to the way CComBSTR wraps the BSTR, except that CComVariant uses inheritance rather than composition. In other words, CComVariant inherits from VARIANT rather than having a VARIANT data member. This inheritance allows you to use CComVariant anywhere a VARIANT is required, which is a good thing. But be aware that CComVariant is subject to the same misuse as the CComBSTR class, as shown here:

Because much of the functionality of CComVariant is the result of overloaded constructors and assignment operators, we won't provide an exhaustive list of its members. Instead, in Table 5-4 we've outlined the services provided by CComVariant.

CComVariant Functionality	Description
Overloaded constructors	CComVariant provides a type-specific constructor for each of the basic data types supported by the VARIANT. Notable exceptions are the DATE and SAFEARRAY types.
Assignment operators (operator=)	CComVariant provides type-specific assignment operator overloads for each of the data types supported by its constructors.
Comparison operators (operator==)	CComVariant provides a comparison operator that implements a switch statement on the union type. When comparing VARIANT BSTRs, operator== performs a string comparison, not a pointer comparison. When comparing VARIANT-encoded IDispatch and IUnknown interfaces, operator== performs a simple pointer comparison. Because the COM specifications expressly allow an object to return a different (but equivalent) IDispatch pointer value every time QueryInterface is called, it is possible that operator== will return FALSE when comparing IDispatch VARIANTs that point to the same object.
Type changing	CComVariant exposes a ChangeType method that allows the VARIANT type to change during its lifetime by calling the global VariantChangeType API.
Stream serialization	CComVariant implements ReadFromStream and WriteFromStream serialization support. If the underlying VARIANT represents an IDispatch or IUnknown pointer, the ReadFromStream method queries the pointer for IPersistStream. If IPersistStream is supported, CComVariant uses the OleSaveToStream and OleLoadFromStream APIs to handle object serialization.