Unicode Strings

< BACK  NEXT >
[oR]

All character strings in the Windows 2000 operating system are stored internally as Unicode. The Unicode scheme uses 16 bits to represent each character and makes it easier to port applications and the OS itself to most languages of the world. Unicode is an industry standard (incidentally, the character coding standard for Java). More information can be found at the Web site http://www.Unicode.org. Unless otherwise noted, any character strings a driver sends to or receives from Windows 2000 will be Unicode. Note, however, that data transfer between a user's buffer and the device is not necessarily Unicode. Data transfers are considered to be binary and transparent to the I/O subsystem of Windows 2000.

Unicode String Data Types

The Unicode data type is now part of the C-language specification. To use "wide" characters in a program, perform the following:

Table 5.2. The UNICODE_STRING Data Structure
UNICODE_STRING, *PUNICODE_STRING
Field Contents
USHORT Length Current string length in bytes
USHORT MaximumLength Maximum string length in bytes
PWSTR Buffer Pointer to driver-allocated buffer holding the real string data

  • Prefix Unicode string constants with the letter L. For example, "some text" generates Unicode text, while "some text" produces 8-bit ANSI.

  • Use the wchar_t data type for Unicode characters. The DDK header files provide a typedef, WCHAR, for the standard wchar_t, and PWSTR, for wchar_t*.

  • Use the constant UNICODE_NULL to terminate a Unicode string. A UNICODE_NULL is defined to be 16 bits of zero.

Windows 2000 system routines work with a Unicode structure, UNICODE_STRING, described in Table 5.2. The purpose of this structure is to make it easier to pass around Unicode strings and to help manage them. Although the standard C library provides Unicode string functions to perform common operations (e.g., wcscpy is equivalent to strcpy for wchar_t* data), this environment is not available to kernel-mode driver code.

Incidentally, the DDK also defines an ANSI_STRING structure. It is identical to the UNICODE_STRING structure except that the buffer offset is of type char*. Several Rtl conversion routines require the data type of ANSI_STRING.

Working with Unicode

The kernel provides a number of functions for working with the ANSI and Unicode strings. These functions replace (albeit clumsily) the standard C library routines that work with Unicode. Table 5.3 presents several of these functions. The Windows 2000 DDK provides the authoritative list and usage of the functions and should be reviewed. Some of these functions have restrictions on the IRQL levels from which they can be called, so care must be taken when using them. To be safe, it is best to restrict the use of all Rtl Unicode functions to PASSIVE_LEVEL IRQL.

Working with Unicode can be frustrating primarily because the length in bytes of a Unicode string is twice the content length. C programmers are ingrained with the belief that one character equals one byte, but with Unicode the rule is changed. When working with Unicode, consider the following:

Table 5.3. Common Unicode Manipulation Functions
Unicode String Manipulation Functions
Function Description
RtlInitUnicodeString Initializes a UNICODE_STRING from a NULL-terminated Unicode string
RtlAnsiStringToUnicodeSize Calculates number of bytes required to hold a converted ANSI string
RtlAnsiStringToUnicodeString Converts and ANSI string to Unicode
RtlIntegerToUnicodeString Converts an integer to Unicode text
RtlAppendUnicodeStringToString Concatenates two Unicode strings
RtlCopyUnicodeString Copies a source string to a destination
RtlUpcaseUnicodeString Converts Unicode string to uppercase
RtlCompareUnicodeString Compares two Unicode strings
RtlEqualUnicodeString Tests equality of two Unicode strings

  • Remember that the number of characters in a Unicode string is not the same as the number of bytes. Be very careful about any arithmetic that calculates the length of a Unicode string.

  • Don't assume anything about the collating sequence of the characters or the relationship of uppercase and lowercase characters.

  • Don't assume that a table with 256 entries is large enough to hold the entire character set.

For convenience, this book provides a C++ class wrapper, CUString, for use with the UNICODE_STRING structure. This CUString class encapsulates a UNICODE_STRING structure, providing many constructors and conversion operators that in turn rely on the Rtl Unicode functions of the kernel. A portion of this CUString class declaration is listed below.

 // Unicode.h // #pragma once class CUString { public:      CUString() {Init(); }  // constructor relies on                             // internal Init function      CUString(const char* pAnsiString);      CUString(PCWSTR pWideString);      ~CUString();          // destructor gives back                            // buffer allocation      void Init(); // performs "real" initialization      void Free (); // performs real destruct      // copy constructor (required)      CUString(const CUString& orig);      // assignment operator overload (required)      CUString operator=(const CUString& rop);      // comparison operator overload      BOOLEAN operator==(const CUString& rop) const;      // concatenation operator      CUString operator+(const CUString& rop) const;      // cast operator into wchar_t*      operator PWSTR() const;      // cast operator into ULONG      operator ULONG() const;      // converter: ULONG->CUString      CUString(ULONG value);      // buffer access operator      WCHAR& operator[](int idx);      USHORT Length() {return uStr.Length/2;}       protected:      UNICODE_STRING uStr; // W2K kernel structure for                           // Unicode string      enum ALLOC_TYPE {Empty, FromCode, FromPaged};      ALLOC_TYPE  aType; // where buffer is allocated }; 

The disk included with this book supplies two files, Unicode.h. and Unicode.cpp, that hold the declaration and implementation of the CUString class. The methods of this class assume that users are at PASSIVE_LEVEL IRQL. Routines that allocate memory do so from the paged pool. The use of the class is intended for convenience, and not for heavy-duty string manipulation. Consider modifying the class implementation to allocate memory from a lookaside list if more intense string manipulation is required by a driver.

Of course, at this point it might seem premature to be introducing portions of driver code. After all, building and loading a driver is not discussed until the next chapter. However, the code included for this chapter supplies a trivial, mock environment to test code such as the CUString class from the Win32 environment. Rtl stub functions rely on either Win32 or C runtime library functions to perform a reasonably faithful emulation of kernel runtime support. A simple Win32 console program is included to demonstrate the use of the CUString class. A portion of the test program is shown below.

 #include "DDKTestEnv.h" #include "Unicode.h" #include "stdio.h" int main(int argc, char* argv[]) {      CUString strEmpty;      CUString strOne("One");      CUString strTwo(L"Two");            CUString str2468("2468");      ULONG ul2468 = str2468;            CUString strxFF01("xFF01");      ULONG ulxFF01 = strxFF01;      CUString str2244(2244);      CUString strOnePlusTwo = strOne + strTwo;            wprintf(L"strOnePlusTwo: %s\n",                (PWSTR) strOnePlusTwo);      printf("Conversion of str2468 into ULONG = %d\n",                ul2468);      printf("Conversion of strxFF01 into ULONG = %x\n",                ulxFF01);      wprintf(L"Conversion of 2244 into CUString = %s\n",                (PWSTR) str2244);      wprintf(L"On the fly conversion of 3366 into "                "CUString = %s\n",                (PWSTR)(CUString)3366);      printf("Test of buffer access operator []:\n");      for (int i=0; i<strOnePlusTwo.Length(); i++) {           wprintf(L"%c ", strOnePlusTwo[i]);           strOnePlusTwo[i] = L'A' + i;      }            wprintf(L"\nAfter replacing buffer, "                "strOnePlusTwo = %s\n",                (PWSTR)strOnePlusTwo);      ... 

Two files, DDKTestEnv.h and DDKTestEnv.cpp, supply the declaration and implementation of the emulation environment. This environment was intended to be simple and provide a testbed of some driver logic before entering the real kernel-mode environment.

< BACK  NEXT >


The Windows 2000 Device Driver Book(c) A Guide for Programmers
The Windows 2000 Device Driver Book: A Guide for Programmers (2nd Edition)
ISBN: 0130204315
EAN: 2147483647
Year: 2000
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net