The CComBSTR Smart BSTR ClassA Review of the COM String Data Type: BSTR
COM is a language-neutral, hardware-architecture-neutral model. Therefore, it needs a language-neutral, hardware-architecture-neutral text data type. COM defines a generic text data type,
OLECHAR
, that represents the text data COM uses on a specific platform. On most platforms, including all 32-bit Windows platforms, the
OLECHAR
data type is a typedef for the
wchar_t
data type. That is, on most platforms, the COM text data type is equivalent to the C/C++ wide-character data type, which contains Unicode
COM also defines a text data type called
BSTR
. A
BSTR
is a length-prefixed string of
OLECHAR
characters. Most interpretive environments prefer length-prefixed strings for performance reasons. For example, a length-prefixed string does not require
Therefore, when you pass a string to or receive a string from a method parameter to an interface defined by a C/C++ component, you'll often use the
OLECHAR*
data type. However, if you need to use an interface defined by another language, frequently string parameters will be the
BSTR
data type. The
BSTR
data type has a number of poorly documented semantics, which makes using
BSTRs
A BSTR has the following attributes:
With all these special semantics, it would be useful to encapsulate these details in a reusable class. ATL provides such a class: CComBSTR . |
The CComBSTR Class
The
CComBSTR
class is an ATL utility class that is a useful encapsulation for the COM string data type,
BSTR
. The
atlcomcli.h
file contains the definition of the
CComBSTR
class. The only state
//////////////////////////////////////////////////// // CComBSTR class CComBSTR { public: BSTR m_str; ... } ; Constructors and DestructorEight constructors are available for CComBSTR objects. The default constructor simply initializes the m_str variable to NULL , which is equivalent to a BSTR that represents an empty string. The destructor destroys any BSTR contained in the m_str variable by calling SysFreeString . The SysFreeString function explicitly documents that the function simply returns when the input parameter is NULL so that the destructor can run on an empty object without a problem.
CComBSTR() { m_str = NULL; } ~CComBSTR() { ::SysFreeString(m_str); }
Later in this section, you will learn about
Probably the most frequently used constructor initializes a
CComBSTR
object from a pointer to a
NUL
-character-
CComBSTR(LPCOLESTR pSrc) { if (pSrc == NULL) m_str = NULL; else { m_str = ::SysAllocString(pSrc); if (m_str == NULL) AtlThrow(E_OUTOFMEMORY); } }
You invoke the
[View full width]
The previous constructor copies
CComBSTR(int nSize, LPCOLESTR sz);
This constructor creates a
BSTR
with room for the number of characters specified by
nSize
; copies the specified number of characters, including any embedded
NULL
characters, from
sz
; and then appends a terminating
NUL
character. When
sz
is
NULL
,
SysAllocStringLen
skips the copy step, creating an
// str2 contains "This is a string"
CComBSTR str2 (16, OLESTR ("This is a string of OLECHARs"));
// Allocates an uninitialized BSTR with room for 64 characters
CComBSTR str3 (64, (LPCOLESTR) NULL);
// Allocates an uninitialized BSTR with room for 64 characters
CComBSTR str4 (64);
The CComBSTR class provides a special constructor for the str3 example in the preceding code, which doesn't require you to provide the NULL argument. The preceding str4 example shows its use. Here's the constructor:
CComBSTR(int nSize) { ... m_str = ::SysAllocStringLen(NULL, nSize); ... } One odd semantic feature of a BSTR is that a NULL pointer is a valid value for an empty BSTR string. For example, Visual Basic considers a NULL BSTR to be equivalent to a pointer to an empty stringthat is, a string of zero length in which the first character is the terminating NUL character. To put it symbolically, Visual Basic considers IF p = "" , where p is a BSTR set to NULL , to be true. The SysStringLen API properly implements the checks; CComBSTR provides the Length method as a wrapper:
unsigned int Length() const { return ::SysStringLen(m_str); }
You can also use the following copy constructor to create and initialize a
CComBSTR
object to be equivalent to an already
CComBSTR(const CComBSTR& src) { m_str = src.Copy(); ... } In the following code, creating the str5 variable invokes the preceding copy constructor to initialize their respective objects:
CComBSTR str1 (OLESTR("This is a string of OLECHARs")) ;
CComBSTR str5 = str1 ;
Note that the preceding copy constructor calls the Copy method on the source CComBSTR object. The Copy method makes a copy of its string and returns the new BSTR . Because the Copy method allocates the new BSTR using the length of the existing BSTR and copies the string contents for the specified length, the Copy method properly copies a BSTR that contains embedded NUL characters.
BSTR Copy() const { if (!*this) { return NULL; } return ::SysAllocStringByteLen((char*)m_str, ::SysStringByteLen(m_str)); }
Two constructors initialize a
CComBSTR
object from an
LPCSTR
string. The single argument constructor expects a
NUL
-terminated
LPCSTR
string. The two-argument constructor
CComBSTR(LPCSTR pSrc) { ... m_str = A2WBSTR(pSrc); ... } CComBSTR(int nSize, LPCSTR sz) { ... m_str = A2WBSTR(sz, nSize); ... } The final constructor is an odd one. It takes an argument that is a GUID and produces a string containing the string representation of the GUID.
CComBSTR(REFGUID src);
This constructor is quite useful when building strings used during component registration. In a number of situations, you need to write the string representation of a GUID to the Registry. Some code that uses this constructor
// Define a GUID as a binary constant
static const GUID GUID_Sample = { 0x8a44e110, 0xf134, 0x11d1,
{ 0x96, 0xb1, 0xBA, 0xDB, 0xAD, 0xBA, 0xDB, 0xAD } };
// Convert the binary GUID to its string representation
CComBSTR str6 (GUID_Sample) ;
// str6 contains "{8A44E110-F134-11d1-96B1-BADBADBADBAD}"
AssignmentThe CComBSTR class defines three assignment operators. The first one initializes a CComBSTR object using a different CComBSTR object. The second one initializes a CComBSTR object using an LPCOLESTR pointer. The third one initializes the object using a LPCSTR pointer. The following operator=() method initializes one CComBSTR object from another CComBSTR object:
CComBSTR& operator=(const CComBSTR& src) { if (m_str != src.m_str) { ::SysFreeString(m_str); m_str = src.Copy(); if (!!src && !*this) { AtlThrow(E_OUTOFMEMORY); } } return *this; } Note that this assignment operator uses the Copy method, discussed a little later in this section, to make an exact copy of the specified CComBSTR instance. You invoke this operator when you write code such as the following:
CComBSTR str1 (OLESTR("This is a string of OLECHARs"));
CComBSTR str7 ;
str7 = str1; // str7 contains "This is a string of OLECHARs"
str7 = str7; // This is a NOP. Assignment operator
// detects this case
The second operator=() method initializes one CComBSTR object from an LPCOLESTR pointer to a NUL -character-terminated string.
CComBSTR& operator=(LPCOLESTR pSrc) { if (pSrc != m_str) { ::SysFreeString(m_str); if (pSrc != NULL) { m_str = ::SysAllocString(pSrc); if (!*this) { AtlThrow(E_OUTOFMEMORY); } } else { m_str = NULL; } } return *this; } Note that this assignment operator uses the SysAllocString function to allocate a BSTR copy of the specified LPCOLESTR argument. You invoke this operator when you write code such as the following:
CComBSTR str8 ;
str8 = OLESTR ("This is a string of OLECHARs");
It's quite easy to misuse this assignment operator when you're dealing with strings that contain embedded
NUL
characters. For example, the following code
CComBSTR str9 ;
str9 = OLESTR ("This works as expected");
// BSTR bstrInput contains "This is part one
To properly handle situations such as this one, you should
HRESULT AssignBSTR(const BSTR bstrSrc) { HRESULT hr = S_OK; if (m_str != bstrSrc) { ::SysFreeString(m_str); if (bstrSrc != NULL) { m_str = ::SysAllocStringByteLen((char*)bstrSrc, ::SysStringByteLen(bstrSrc)); if (!*this) { hr = E_OUTOFMEMORY; } } else { m_str = NULL; } } return hr; } You can modify the code as follows:
CComBSTR str9 ;
str9 = OLESTR ("This works as expected");
// BSTR bstrInput contains
// "This is part one
The third operator=() method initializes one CComBSTR object using an LPCSTR pointer to a NUL -character-terminated string. The operator converts the input string, which is in ANSI characters, to a Unicode string; then it creates a BSTR containing the Unicode string.
CComBSTR& operator=(LPCSTR pSrc) { ::SysFreeString(m_str); m_str = A2WBSTR(pSrc); if (!*this && pSrc != NULL) { AtlThrow(E_OUTOFMEMORY); } return *this; } The final assignment methods are two overloaded methods called LoadString .
bool LoadString(HINSTANCE hInst, UINT nID) ; bool LoadString(UINT nID) ; The first loads the specified string resource nID from the specified module hInst (using the instance handle). The second loads the specified string resource nID from the current module using the global variable _ AtlBaseModule . CComBSTR OperationsFour methods give you access, in varying ways, to the internal BSTR string that is encapsulated by the CComBSTR class. The operator BSTR() method enables you to use a CComBSTR object in situations where a raw BSTR pointer is required. You invoke this method any time you cast a CComBSTR object to a BSTR implicitly or explicitly.
operator BSTR() const { return m_str; }
Frequently, you invoke this operator implicitly when you pass a CComBSTR object as a parameter to a function that expects a BSTR . The following code demonstrates this:
HRESULT put_Name (/* [in] */ BSTR pNewValue) ;
CComBSTR bstrName = OLESTR ("Frodo Baggins");
put_Name (bstrName); // Implicit cast to BSTR
The operator&() method returns the address of the internal m_str variable when you take the address of a CComBSTR object. Use care when taking the address of a CComBSTR object. Because the operator&() method returns the address of the internal BSTR variable, you can overwrite the internal variable without first freeing the string. This causes a memory leak. However, if you define the macro ATL_CCOMBSTR_ADDRESS_OF_ASSERT in your project settings, you get an assertion to help catch this error.
#ifndef ATL_CCOMBSTR_ADDRESS_OF_ASSERT // Temp disable CComBSTR::operator& Assert #define ATL_NO_CCOMBSTR_ADDRESS_OF_ASSERT #endif BSTR* operator&() { #ifndef ATL_NO_CCOMBSTR_ADDRESS_OF_ASSERT ATLASSERT(!*this); #endif return &m_str; }
This operator is quite useful when you are receiving a
BSTR
pointer as the output of some method call. You can store the returned
BSTR
directly into a
CComBSTR
object so that the object
HRESULT get_Name (/* [out] */ BSTR* pName); CComBSTR bstrName ; get_Name (&bstrName); // bstrName empty so no memory leak The CopyTo method makes a duplicate of the string encapsulated by a CComBSTR object and copies the duplicate's BSTR pointer to the specified location. You must free the returned BSTR explicitly by calling SysFreeString .
HRESULT CopyTo(BSTR* pbstr);
This method is handy when you need to return a copy of an existing BSTR property to a caller. For example:
STDMETHODIMP SomeClass::get_Name (/* [out] */ BSTR* pName) {
// Name is maintained in variable m_strName of type CComBSTR
return m_strName.CopyTo (pName);
}
The Detach method returns the BSTR contained by a CComBSTR object. It empties the object so that the destructor will not attempt to release the internal BSTR . You must free the returned BSTR explicitly by calling SysFreeString .
BSTR Detach() { BSTR s = m_str; m_str = NULL; return s; }
You use this method when you have a string in a CComBSTR object that you want to return to a caller and you no longer need to keep the string. In this situation, using the CopyTo method would be less efficient because you would make a copy of a string, return the copy, and then discard the original string. Use Detach as follows to return the original string directly:
STDMETHODIMP SomeClass::get_Label (/* [out] */ BSTR* pName) {
CComBSTR strLabel;
// Generate the returned string in strLabel here
*pName = strLabel.Detach ();
return S_OK;
}
The
Attach
method
void Attach(BSTR src) { if (m_str != src) { ::SysFreeString(m_str); m_str = src; } } Use care when using the Attach method. You must have ownership of the BSTR you are attaching to a CComBSTR object because eventually the object will attempt to destroy the BSTR . For example , the following code is incorrect:
STDMETHODIMP SomeClass::put_Name (/* [in] */ BSTR bstrName) {
// Name is maintained in variable m_strName of type CComBSTR
m_strName.Attach (bstrName); //
Wrong! We don't own bstrName
return E_BONEHEAD;
}
More often, you use Attach when you're given ownership of a BSTR and you want a CComBSTR object to manage the lifetime of the string.
STDMETHODIMP SomeClass::get_Name (/* [out] */ BSTR* pName); ... BSTR bstrName; pObj->get_Name (&bstrName); // We own and must free the raw BSTR CComBSTR strName; strName.Attach(bstrName); // Attach raw BSTR to the object You can explicitly free the string encapsulated in a CComBSTR object by calling Empty . The Empty method releases any internal BSTR and sets the m_str member variable to NULL . The SysFreeString function explicitly documents that the function simply returns when the input parameter is NULL so that you can call Empty on an empty object without a problem.
void Empty() { ::SysFreeString(m_str); m_str = NULL; }
CComBSTR
HRESULT BSTRToArray(LPSAFEARRAY *ppArray) { return VectorFromBstr(m_str, ppArray); } HRESULT ArrayToBSTR(const SAFEARRAY *pSrc) { ::SysFreeString(m_str); return BstrFromVector((LPSAFEARRAY)pSrc, &m_str); }
As you can see, these methods merely serve as thin wrappers for the Win32 functions
VectorFromBstr
and
BstrFromVector
.
BSTRToArray
String Concatenation Using CComBSTREight methods concatenate a specified string with a CComBSTR object: six overloaded Append methods, one AppendBSTR method, and the operator+=() method.
HRESULT Append(LPCOLESTR lpsz, int nLen); HRESULT Append(LPCOLESTR lpsz); HRESULT Append(LPCSTR); HRESULT Append(char ch); HRESULT Append(wchar_t ch); HRESULT Append(const CComBSTR& bstrSrc); CComBSTR& operator+=(const CComBSTR& bstrSrc); HRESULT AppendBSTR(BSTR p);
The
Append(LPCOLESTR lpsz, int nLen)
method computes the sum of the length of the current string plus the specified
nLen
value, and allocates an empty
BSTR
of the correct size. It copies the original string into the new
BSTR
and then
CComBSTR strSentence = OLESTR("Now is ");
strSentence.Append(OLESTR("the time of day is 03:00 PM"), 9);
// strSentence contains "Now is the time "
The remaining overloaded
Append
methods all use the first method to perform the real work. They
CComBSTR str11 (OLESTR("for all good men ");
// calls Append(const CComBSTR& bstrSrc);
strSentence.Append(str11);
// strSentence contains "Now is the time for all good men "
// calls Append(LPCOLESTR lpsz);
strSentence.Append((OLESTR("to come "));
// strSentence contains "Now is the time for all good men to come "
// calls Append(LPCSTR lpsz);
strSentence.Append("to the aid ");
// strSentence contains
// "Now is the time for all good men to come to the aid "
CComBSTR str12 (OLESTR("of their country"));
StrSentence += str12; // calls operator+=()
// "Now is the time for all good men to come to
// the aid of their country"
When you call Append using a BSTR parameter, you are actually calling the Append(LPCOLESTR lpsz) method because, to the compiler, the BSTR argument is an OLECHAR* argument. Therefore, the method appends characters from the BSTR until it encounters the first NUL character. When you want to append the contents of a BSTR that possibly contains embedded NULL characters, you must explicitly call the AppendBSTR method. One additional method exists for appending an array that contains binary data:
HRESULT AppendBytes(const char* lpsz, int nLen);
AppendBytes does not perform a conversion from ANSI to Unicode. The method uses SysAllocStringByteLen to properly allocate a BSTR of nLen bytes (not characters) and append the result to the existing CComBSTR . You can't go wrong following these guidelines:
Character Case ConversionThe two character case-conversion methods, ToLower and ToUpper , convert the internal string to lowercase or uppercase, respectively. In Unicode builds, the conversion is actually performed in-place using the Win32 CharLowerBuff API. In ANSI builds, the internal character string first is converted to MBCS and then CharLowerBuff is invoked. The resulting string is then converted back to Unicode and stored in a newly allocated BSTR . Any string data stored in m_str is freed using SysFreeString before it is overwritten. When everything works, the new string replaces the original string as the contents of the CComBSTR object.
HRESULT ToLower() { if (m_str != NULL) { #ifdef _UNICODE // Convert in place CharLowerBuff(m_str, Length()); #else UINT _acp = _AtlGetConversionACP(); ... int nRet = WideCharToMultiByte( _acp, 0, m_str, Length(), pszA, _convert, NULL, NULL); ... CharLowerBuff(pszA, nRet); nRet = MultiByteToWideChar(_acp, 0, pszA, nRet, pszW, _convert); ... BSTR b = ::SysAllocStringByteLen( (LPCSTR) (LPWSTR) pszW, nRet * sizeof(OLECHAR)); if (b == NULL) return E_OUTOFMEMORY; SysFreeString(m_str); m_str = b; #endif } return S_OK; }
Note that these methods properly do case conversion, in case the original string contains embedded NUL characters. Also note, however, that the conversion is
CComBSTR Comparison OperatorsThe simplest comparison operator is operator!() . It returns true when the CComBSTR object is empty, and false otherwise.
bool operator!() const { return (m_str == NULL); }
There are four overloaded versions of the operator<() methods, four of the operator>() methods, and five of the operator==() and operator!=() methods. The additional overload for operator==() simply handles special cases comparison to NULL . The code in all these methods is nearly the same, so I discuss only the operator<() methods; the comments apply equally to the operator>() and operator==() methods. These operators internally use the VarBstrCmp function, so unlike previous versions of ATL that did not properly compare two CComBSTRs that contain embedded NUL characters, these new operators handle the comparison correctly most of the time. So, the following code works as expected. Later in this section, I discuss properly initializing CComBSTR objects with embedded NUL s.
BSTR bstrIn1 =
SysAllocStringLen(
OLESTR("Here's part 1
In the first overloaded version of the
operator<()
method, the operator
bool operator<(const CComBSTR& bstrSrc) const { return VarBstrCmp(m_str, bstrSrc.m_str, LOCALE_USER_DEFAULT, 0) == VARCMP_LT; } In the second overloaded version of the operator<() method, the operator compares against a provided LPCSTR argument. An LPCSTR isn't the same character type as the internal BSTR string, which contains wide characters. Therefore, the method constructs a temporary CComBSTR and delegates the work to operator<(const CComBSTR& bstrSrc) , just shown .
bool operator>(LPCSTR pszSrc) const { CComBSTR bstr2(pszSrc); return operator>(bstr2); } The third overload for the operator<() method accepts an LPCOLESTR and operates very much like the previous overload:
bool operator<(LPCOLESTR pszSrc) const { CComBSTR bstr2(pszSrc); return operator>(bstr2); } The fourth overload for the operator<() accepts an LPOLESTR ; the implementation does a quick cast and calls the LPCOLESTR version to do the work:
bool operator>(LPOLESTR pszSrc) const { return operator>((LPCOLESTR)pszSrc); } CComBSTR Persistence Support
The last two methods of the
CComBSTR
class read and write a
BSTR
string to and from a stream. The
WriteToStream
method
HRESULT WriteToStream(IStream* pStream) { ATLASSERT(pStream != NULL); if(pStream == NULL) return E_INVALIDARG; ULONG cb; ULONG cbStrLen = ULONG(m_str ? SysStringByteLen(m_str)+sizeof(OLECHAR) : 0); HRESULT hr = pStream->Write((void*) &cbStrLen, sizeof(cbStrLen), &cb); if (FAILED(hr)) return hr; return cbStrLen ? pStream->Write((void*) m_str, cbStrLen, &cb) : S_OK; } The ReadFromStream method reads a ULONG count of bytes from the specified stream, allocates a BSTR of the correct size, and then reads the characters directly into the BSTR string. The CComBSTR object must be empty when you call ReadFromStream ; otherwise, you will receive an assertion from a debug build or will leak memory in a release build.
HRESULT ReadFromStream(IStream* pStream) { ATLASSERT(pStream != NULL); ATLASSERT(!*this); // should be empty ULONG cbStrLen = 0; HRESULT hr = pStream->Read((void*) &cbStrLen, sizeof(cbStrLen), NULL); if ((hr == S_OK) && (cbStrLen != 0)) { //subtract size for terminating NULL which we wrote out //since SysAllocStringByteLen overallocates for the NULL m_str = SysAllocStringByteLen(NULL, cbStrLen-sizeof(OLECHAR)); if (!*this) hr = E_OUTOFMEMORY; else hr = pStream->Read((void*) m_str, cbStrLen, NULL); ... } if (hr == S_FALSE) hr = E_FAIL; return hr; } Minor Rant on BSTRs, Embedded NUL Characters in Strings, and Life in GeneralThe compiler considers the types BSTR and OLECHAR* to be synonymous. In fact, the BSTR symbol is simply a typedef for OLECHAR* . For example, from wtypes.h :
typedef /* [wire_marshal] */ OLECHAR __RPC_FAR *BSTR;
This is more than somewhat brain damaged. An arbitrary BSTR is not an OLECHAR* , and an arbitrary OLECHAR* is not a BSTR . One is often misled on this regard because frequently a BSTR works just fine as an OLECHAR* .
STDMETHODIMP SomeClass::put_Name (LPCOLESTR pName) ; BSTR bstrInput = ... pObj->put_Name (bstrInput) ; // This works just fine... usually SysFreeString (bstrInput) ; In the previous example, because the bstrInput argument is defined to be a BSTR , it can contain embedded NUL characters within the string. The put_Name method, which expects a LPCOLESTR (a NUL -character-terminated string), will probably save only the characters preceding the first embedded NUL character. In other words, it will cut the string short. You also cannot use a BSTR where an [out] OLECHAR* parameter is required. For example:
STDMETHODIMP SomeClass::get_Name(OLECHAR** ppName) {
BSTR bstrOutput =... // Produce BSTR string to return
*ppName = bstrOutput ; // This compiles just fine
return S_OK ; // but leaks memory as caller
// doesn't release BSTR
}
Conversely, you cannot use an OLECHAR* where a BSTR is required. When it does happen to work, it's a latent bug. For example , the following code is incorrect:
STDMETHODIMP SomeClass::put_Name (BSTR bstrName) ;
//
Wrong! Wrong! Wrong!
pObj->put_Name (OLECHAR("This is not a BSTR!")) ;
If the
put_Name
method calls
SysStringLen
to obtain the length of the
BSTR
, it will try to get the length from the integer preceding the stringbut there is no such integer. Things get
Because the compiler cannot tell the difference between a
BSTR
and an
OLECHAR*
, it's quite easy to
To construct a CComBSTR , you must specify the length of the string:
BSTR bstrInput =
SysAllocStringLen (
OLESTR ("This is part one
Assigning a BSTR that contains embedded NUL characters to a CComBSTR object never works. For example:
// BSTR bstrInput contains
// "This is part one
The
str10.Empty(); // Insure object is initially empty str10.AppendBSTR (bstrInput); // This works!
In practice, although a
BSTR
can potentially contain embedded
NUL
characters, most of the time it doesn't. Of course, this means that, most of the time, you don't see the latent
|