Problem
You want to be able to handle the wide-character strings used by the Xerces library safely and easily. In particular, you want to be able to store strings returned by Xerces functions as well as to convert between Xerces strings and C++ standard library strings.
Solution
You can store wide-character strings returned by Xerces library functions using the template std::basic_string specialized for the Xerces wide-character type XMLCh:
typedef std::basic_string XercesString;
To translate between Xerces strings and narrow-character strings, use the overloaded static method TRanscode( ) from the class xercesc::XMLString, defined in the header xercesc/util/XMLString.hpp. Example 14-4 defines two overloaded utility functions, toNative and fromNative, that use transcode to translate from narrow-character strings to Xerces strings and vice versa. Each function has two variants, one that takes a C-style string and one that takes a C++ standard library string. These utility functions are all you'll need to convert between Xerces string and narrow-character strings; once you define them, you'll never need to call transcode directly.
Example 14-4. The header xerces_strings.hpp, for converting between Xerces strings and narrow-character strings
#ifndef XERCES_STRINGS_HPP_INCLUDED #define XERCES_STRINGS_HPP_INCLUDED #include #include #include typedef std::basic_string XercesString; // Converts from a narrow-character string to a wide-character string. inline XercesString fromNative(const char* str) { boost::scoped_array ptr(xercesc::XMLString::transcode(str)); return XercesString(ptr.get( )); } // Converts from a narrow-character string to a wide-charactr string. inline XercesString fromNative(const std::string& str) { return fromNative(str.c_str( )); } // Converts from a wide-character string to a narrow-character string. inline std::string toNative(const XMLCh* str) { boost::scoped_array ptr(xercesc::XMLString::transcode(str)); return std::string(ptr.get( )); } // Converts from a wide-character string to a narrow-character string. inline std::string toNative(const XercesString& str) { return toNative(str.c_str( )); } #endif // #ifndef XERCES_STRINGS_HPP_INCLUDED
To convert between Xerces strings and std::wstrings, simply use the std::basic_string constructor taking a pair of iterators. For example, you can define the following two functions:
// Converts from a Xerces String to a std::wstring std::wstring xercesToWstring(const XercesString& str) { return std::wstring(str.begin( ), str.end( )); } // Converts from a std::wstring to a XercesString XercesString wstringToXerces(const std::wstring& str) { return XercesString(str.begin( ), str.end( )); }
These functions rely on the fact that wchar_t and XMLCh are integral types each of which can be implicitly converted to the other; it should work regardless of the size of wchar_t, as long as no values outside the range of XMLCh are used. You can define similar functions taking C-style strings as arguments, using the std::basic_string constructor that takes a character array and a length as arguments.
Discussion
Xerces uses the null-terminated sequences of characters of type XMLCh to represent Unicode strings. XMLCh is a typedef for an implementation-defined integral type having a size of at least 16 bitswide enough to represent almost all known characters in any language using a single character. Xerces uses the UTF-16 character encoding, which means that theoretically some Unicode characters must be represented by a sequence of more than one XMLCh; in practice, however, you can think of an XMLCh as directly representing a Unicode code point, i.e., the numerical value of a Unicode character.
At one time, XMLCh was defined as a typedef for wchar_t, which meant you could easily store a copy of a Xerces string as a std::wstring. Currently, however, Xerces defines XMLCh as a typedef for unsigned short on all platforms. Among other things, this means that on some platforms XMLCh and wchar_t don't even have the same width. Since Xerces may change the definition of XMLCh in the future, you can't count on XMLCh to be identical to any particular type. So if you want to store a copy of a Xerces string, you should use a std::basic_string.
When using Xerces you will frequently need to convert between narrow-character strings and Xerces strings; Xerces provides the overloaded function transcode( ) for this purpose. transcode() can convert a Unicode string to a narrow-character string in the "native" character encoding or a narrow-character string in the "native" encoding to a Unicode string. What constitutes the native encoding is not precisely defined, however, so if you are programming in an environment where there are several commonly used character encodings, you may need to take matters into your own hands and perform your own conversion, either by using a std::codecvt facet, or by using Xerces's pluggable transcoding services , described in the Xerces documentation. In many cases, however, transcode() is all you need.
The null-terminate string returned by TRanscode( ) is dynamically allocated using the array form of operator new; it's up to you to delete it using delete []. This presents a slight memory-management problem, since typically you will want to make a copy of the string or write it to a stream before you delete it, and these operations can throw exceptions. I've addressed this problem in Example 14-4 by using the template boost::scoped_array, which takes ownership of a dynamically allocated array and deletes it automatically when it goes out of scope, even if an exception is thrown. For example, look at the implementation of fromNative:
inline XercesString fromNative(const char* str) { boost::scoped_array ptr(xercesc::XMLString::transcode(str)); return XercesString(ptr.get( )); }
Here, ptr takes ownership of the null-terminated string returned by transcode( ) and frees it even if the XercesString constructor throws a std::bad_alloc exception.
Building C++ Applications
Code Organization
Numbers
Strings and Text
Dates and Times
Managing Data with Containers
Algorithms
Classes
Exceptions and Safety
Streams and Files
Science and Mathematics
Multithreading
Internationalization
XML
Miscellaneous
Index