All character strings in the Windows 2000 operating system are stored internally as Unicode. The Unicode scheme uses 16 bits to represent each character and makes it easier to port applications and the OS itself to most languages of the world. Unicode is an industry standard (incidentally, the character coding standard for Java). More information can be found at the Web site http://www.Unicode.org. Unless otherwise noted, any character strings a driver sends to or receives from Windows 2000 will be Unicode. Note, however, that data transfer between a user's buffer and the device is not necessarily Unicode. Data transfers are considered to be binary and transparent to the I/O subsystem of Windows 2000. Unicode String Data TypesThe Unicode data type is now part of the C-language specification. To use "wide" characters in a program, perform the following:
Windows 2000 system routines work with a Unicode structure, UNICODE_STRING, described in Table 5.2. The purpose of this structure is to make it easier to pass around Unicode strings and to help manage them. Although the standard C library provides Unicode string functions to perform common operations (e.g., wcscpy is equivalent to strcpy for wchar_t* data), this environment is not available to kernel-mode driver code. Incidentally, the DDK also defines an ANSI_STRING structure. It is identical to the UNICODE_STRING structure except that the buffer offset is of type char*. Several Rtl conversion routines require the data type of ANSI_STRING. Working with UnicodeThe kernel provides a number of functions for working with the ANSI and Unicode strings. These functions replace (albeit clumsily) the standard C library routines that work with Unicode. Table 5.3 presents several of these functions. The Windows 2000 DDK provides the authoritative list and usage of the functions and should be reviewed. Some of these functions have restrictions on the IRQL levels from which they can be called, so care must be taken when using them. To be safe, it is best to restrict the use of all Rtl Unicode functions to PASSIVE_LEVEL IRQL. Working with Unicode can be frustrating primarily because the length in bytes of a Unicode string is twice the content length. C programmers are ingrained with the belief that one character equals one byte, but with Unicode the rule is changed. When working with Unicode, consider the following:
For convenience, this book provides a C++ class wrapper, CUString, for use with the UNICODE_STRING structure. This CUString class encapsulates a UNICODE_STRING structure, providing many constructors and conversion operators that in turn rely on the Rtl Unicode functions of the kernel. A portion of this CUString class declaration is listed below. // Unicode.h // #pragma once class CUString { public: CUString() {Init(); } // constructor relies on // internal Init function CUString(const char* pAnsiString); CUString(PCWSTR pWideString); ~CUString(); // destructor gives back // buffer allocation void Init(); // performs "real" initialization void Free (); // performs real destruct // copy constructor (required) CUString(const CUString& orig); // assignment operator overload (required) CUString operator=(const CUString& rop); // comparison operator overload BOOLEAN operator==(const CUString& rop) const; // concatenation operator CUString operator+(const CUString& rop) const; // cast operator into wchar_t* operator PWSTR() const; // cast operator into ULONG operator ULONG() const; // converter: ULONG->CUString CUString(ULONG value); // buffer access operator WCHAR& operator[](int idx); USHORT Length() {return uStr.Length/2;} protected: UNICODE_STRING uStr; // W2K kernel structure for // Unicode string enum ALLOC_TYPE {Empty, FromCode, FromPaged}; ALLOC_TYPE aType; // where buffer is allocated }; The disk included with this book supplies two files, Unicode.h. and Unicode.cpp, that hold the declaration and implementation of the CUString class. The methods of this class assume that users are at PASSIVE_LEVEL IRQL. Routines that allocate memory do so from the paged pool. The use of the class is intended for convenience, and not for heavy-duty string manipulation. Consider modifying the class implementation to allocate memory from a lookaside list if more intense string manipulation is required by a driver. Of course, at this point it might seem premature to be introducing portions of driver code. After all, building and loading a driver is not discussed until the next chapter. However, the code included for this chapter supplies a trivial, mock environment to test code such as the CUString class from the Win32 environment. Rtl stub functions rely on either Win32 or C runtime library functions to perform a reasonably faithful emulation of kernel runtime support. A simple Win32 console program is included to demonstrate the use of the CUString class. A portion of the test program is shown below. #include "DDKTestEnv.h" #include "Unicode.h" #include "stdio.h" int main(int argc, char* argv[]) { CUString strEmpty; CUString strOne("One"); CUString strTwo(L"Two"); CUString str2468("2468"); ULONG ul2468 = str2468; CUString strxFF01("xFF01"); ULONG ulxFF01 = strxFF01; CUString str2244(2244); CUString strOnePlusTwo = strOne + strTwo; wprintf(L"strOnePlusTwo: %s\n", (PWSTR) strOnePlusTwo); printf("Conversion of str2468 into ULONG = %d\n", ul2468); printf("Conversion of strxFF01 into ULONG = %x\n", ulxFF01); wprintf(L"Conversion of 2244 into CUString = %s\n", (PWSTR) str2244); wprintf(L"On the fly conversion of 3366 into " "CUString = %s\n", (PWSTR)(CUString)3366); printf("Test of buffer access operator []:\n"); for (int i=0; i<strOnePlusTwo.Length(); i++) { wprintf(L"%c ", strOnePlusTwo[i]); strOnePlusTwo[i] = L'A' + i; } wprintf(L"\nAfter replacing buffer, " "strOnePlusTwo = %s\n", (PWSTR)strOnePlusTwo); ... Two files, DDKTestEnv.h and DDKTestEnv.cpp, supply the declaration and implementation of the emulation environment. This environment was intended to be simple and provide a testbed of some driver logic before entering the real kernel-mode environment.
|