Use WideCharToMultiByte with WC_NO_BEST_FIT_CHARS

Use WideCharToMultiByte with WC_NO_BEST_FIT_CHARS

For strings that require validation such as filenames, resource names, and usernames always use the WC_NO_BEST_FIT_CHARS flag with WideCharToMultiByte. This flag prevents the function from mapping characters to characters that appear similar but have very different semantics. In some cases, the semantic change can be extreme. For example, 8 (infinity) maps to 8 (eight) in some code pages!

WC_NO_BEST_FIT_CHARS is available only on Microsoft Windows 2000, Microsoft Windows XP, and Microsoft Windows .NET Server 2003. If your code must run on earlier platforms, you can achieve the same effect by converting the resulting string back to the source encoding that is, by calling WideCharToMultibyte to get the UTF-16 string and then MultiByteToWideChar with the UTF-16 string to recover the original string. Any code point that differs between the original and the recovered string is said to not round-trip. Any code point that does not round-trip is a best-fit character. The following sample outlines how to perform a round-trip:

/* RoundTrip.cpp : Defines the entry point for the console application. */ #include "stdafx.h" /* CheckRoundTrip Returns TRUE if the given string round trips between Unicode and the given code page. Otherwise, it returns FALSE. */ BOOL CheckRoundTrip( DWORD uiCodePage, LPWSTR wszString) { BOOL fStatus = TRUE; BYTE *pbTemp = NULL; WCHAR *pwcTemp = NULL; try { //Determine if string length is < MAX_STRING_LEN //Handles null strings gracefully const size_t MAX_STRING_LEN = 200; size_t cchCount = 0; if (!SUCCEEDED(StringCchLength(wszString, MAX_STRING_LEN, &cchCount))) throw FALSE; pbTemp = new BYTE[MAX_STRING_LEN]; pwcTemp = new WCHAR[MAX_STRING_LEN]; if (!pbTemp !pwcTemp) { printf("ERROR: No Memory!\n"); throw FALSE; } ZeroMemory(pbTemp,MAX_STRING_LEN * sizeof(BYTE)); ZeroMemory(pwcTemp,MAX_STRING_LEN * sizeof(WCHAR)); //Convert from Unicode to the given code page. int rc = WideCharToMultiByte( uiCodePage, 0, wszString, -1, (LPSTR)pbTemp, MAX_STRING_LEN, NULL, NULL ); if (!rc) { printf("ERROR: WC2MB Error = %d, CodePage = %d, String = %ws\n", GetLastError(), uiCodePage, wszString); throw FALSE; } //Convert from the given code page back to Unicode. rc = MultiByteToWideChar(uiCodePage, 0, (LPSTR)pbTemp, -1, pwcTemp, MAX_STRING_LEN / sizeof(WCHAR) ); if (!rc) { printf("ERROR: MB2WC Error = %d, CodePage = %d, String = %ws\n", GetLastError(), uiCodePage, wszString); throw FALSE; } //Get length of original Unicode string, //check it's equal to the conversion length. size_t Length = 0; StringCchLength(wszString, MAX_STRING_LEN,&Length); if (Length+1 != rc) { printf("Length %d != rc %d\n", Length, rc); throw FALSE; } //Compare the original Unicode string to the converted string //and make sure they are identical. for (size_t ctr = 0; ctr < Length; ctr++) { if (pwcTemp[ctr] != wszString[ctr]) throw FALSE; } } catch (BOOL iErr) { fStatus = iErr; } if (pbTemp) delete [] pbTemp; if (pwcTemp) delete [] pwcTemp; return (fStatus); } int _cdecl main( int argc, char* argv[]) { LPWSTR s1 = L"\x00a9MicrosoftCorp"; // Copyright LPWSTR s2 = L"To\x221e&Beyond"; // Infinity printf("1252 Copyright = %d\n", CheckRoundTrip(1252, s1)); printf("437 Copyright = %d\n", CheckRoundTrip(437, s1)); printf("1252 Infinity = %d\n", CheckRoundTrip(1252, s2)); printf("437 Infinity = %d\n", CheckRoundTrip(437, s2)); return (1); }

The sample demonstrates that some characters cannot round-trip in some code pages. For example, the copyright symbol and the infinity sign in code pages 1252 (Windows codepage Latin I, used for Western European languages) and 437 (the original MS-DOS codepage) the copyright symbol exists in 1252, but not in 437, and the infinity symbol exists in 437, but not in 1252.



Writing Secure Code
Writing Secure Code, Second Edition
ISBN: 0735617228
EAN: 2147483647
Year: 2001
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net