Working with Xerces Strings

Problem

You want to be able to handle the wide-character strings used by the Xerces library safely and easily. In particular, you want to be able to store strings returned by Xerces functions as well as to convert between Xerces strings and C++ standard library strings.

Solution

You can store wide-character strings returned by Xerces library functions using the template std::basic_string specialized for the Xerces wide-character type XMLCh:

typedef std::basic_string XercesString;

To translate between Xerces strings and narrow-character strings, use the overloaded static method TRanscode( ) from the class xercesc::XMLString, defined in the header xercesc/util/XMLString.hpp. Example 14-4 defines two overloaded utility functions, toNative and fromNative, that use transcode to translate from narrow-character strings to Xerces strings and vice versa. Each function has two variants, one that takes a C-style string and one that takes a C++ standard library string. These utility functions are all you'll need to convert between Xerces string and narrow-character strings; once you define them, you'll never need to call transcode directly.

Example 14-4. The header xerces_strings.hpp, for converting between Xerces strings and narrow-character strings

#ifndef XERCES_STRINGS_HPP_INCLUDED
#define XERCES_STRINGS_HPP_INCLUDED

#include 
#include 
#include 

typedef std::basic_string XercesString;

// Converts from a narrow-character string to a wide-character string.
inline XercesString fromNative(const char* str)
{
 boost::scoped_array ptr(xercesc::XMLString::transcode(str));
 return XercesString(ptr.get( ));
}

// Converts from a narrow-character string to a wide-charactr string.
inline XercesString fromNative(const std::string& str)
{
 return fromNative(str.c_str( ));
}

// Converts from a wide-character string to a narrow-character string.
inline std::string toNative(const XMLCh* str)
{
 boost::scoped_array ptr(xercesc::XMLString::transcode(str));
 return std::string(ptr.get( ));
}

// Converts from a wide-character string to a narrow-character string.
inline std::string toNative(const XercesString& str)
{
 return toNative(str.c_str( ));
}

#endif // #ifndef XERCES_STRINGS_HPP_INCLUDED

To convert between Xerces strings and std::wstrings, simply use the std::basic_string constructor taking a pair of iterators. For example, you can define the following two functions:

// Converts from a Xerces String to a std::wstring
std::wstring xercesToWstring(const XercesString& str)
{
 return std::wstring(str.begin( ), str.end( ));
}

// Converts from a std::wstring to a XercesString
XercesString wstringToXerces(const std::wstring& str)
{
 return XercesString(str.begin( ), str.end( ));
}

These functions rely on the fact that wchar_t and XMLCh are integral types each of which can be implicitly converted to the other; it should work regardless of the size of wchar_t, as long as no values outside the range of XMLCh are used. You can define similar functions taking C-style strings as arguments, using the std::basic_string constructor that takes a character array and a length as arguments.

Discussion

Xerces uses the null-terminated sequences of characters of type XMLCh to represent Unicode strings. XMLCh is a typedef for an implementation-defined integral type having a size of at least 16 bitswide enough to represent almost all known characters in any language using a single character. Xerces uses the UTF-16 character encoding, which means that theoretically some Unicode characters must be represented by a sequence of more than one XMLCh; in practice, however, you can think of an XMLCh as directly representing a Unicode code point, i.e., the numerical value of a Unicode character.

At one time, XMLCh was defined as a typedef for wchar_t, which meant you could easily store a copy of a Xerces string as a std::wstring. Currently, however, Xerces defines XMLCh as a typedef for unsigned short on all platforms. Among other things, this means that on some platforms XMLCh and wchar_t don't even have the same width. Since Xerces may change the definition of XMLCh in the future, you can't count on XMLCh to be identical to any particular type. So if you want to store a copy of a Xerces string, you should use a std::basic_string.

When using Xerces you will frequently need to convert between narrow-character strings and Xerces strings; Xerces provides the overloaded function transcode( ) for this purpose. transcode() can convert a Unicode string to a narrow-character string in the "native" character encoding or a narrow-character string in the "native" encoding to a Unicode string. What constitutes the native encoding is not precisely defined, however, so if you are programming in an environment where there are several commonly used character encodings, you may need to take matters into your own hands and perform your own conversion, either by using a std::codecvt facet, or by using Xerces's pluggable transcoding services , described in the Xerces documentation. In many cases, however, transcode() is all you need.

The null-terminate string returned by TRanscode( ) is dynamically allocated using the array form of operator new; it's up to you to delete it using delete []. This presents a slight memory-management problem, since typically you will want to make a copy of the string or write it to a stream before you delete it, and these operations can throw exceptions. I've addressed this problem in Example 14-4 by using the template boost::scoped_array, which takes ownership of a dynamically allocated array and deletes it automatically when it goes out of scope, even if an exception is thrown. For example, look at the implementation of fromNative:

inline XercesString fromNative(const char* str)
{
 boost::scoped_array ptr(xercesc::XMLString::transcode(str));
 return XercesString(ptr.get( ));
}

Here, ptr takes ownership of the null-terminated string returned by transcode( ) and frees it even if the XercesString constructor throws a std::bad_alloc exception.

Building C++ Applications

Code Organization

Numbers

Strings and Text

Dates and Times

Managing Data with Containers

Algorithms

Classes

Exceptions and Safety

Streams and Files

Science and Mathematics

Multithreading

Internationalization

XML

Miscellaneous

Index



C++ Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2006
Pages: 241

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net