Section 3.7. World Domination, Fast | Open Sources 2.0: The Continuing Evolution

3.7. World Domination, Fast

I've heaped enough opprobrium on Win32. Let's give it a break and consider something the designers really did get right, and one of the advantages it has over POSIX. I'm talking about the early adoption of the Unicode standard in Win32. When Microsoft was creating Win32, one of the things it realized was that this couldn't just be another English-only, American- and European-centric standard. It had to be able to not only cope with, but also encourage, applications written in all world languages (never accuse Microsoft of thinking small in its domination of the computing world).

Given those criteria, its adoption of Unicode as the native character set for all the system calls in Win32 was a stroke of genius. Even though the Asian countries aren't particularly fond of Unicode, because it merges several character sets they consider separate into one set of code points, Unicode is the best way to cope with the requirements of internationalization and localization in application development.

To allow older MS-DOS and Win16 applications to run, the Win32 API is available in two different forms, selectable by a compiler #define of -DUNICODE (it also helps if you own the compiler market for Windows, as Microsoft does, as you can standardize tricks like this). The older code-page-based applications call Win32 libraries that internally convert any string arguments to 16-bit Unicode and then call the real Win32 library interface, which, like the Windows NT kernel, is Unicode only.

In addition, Win32 comes with a full set of library interfaces to split out the text messages an application may need to display into resource files so that ISVs can easily have them translated for a target market. This eases the internationalization and localization burdens considerably for vendors.

What is more useful, but not as obvious, is that making the Win32 standard natively use Unicode meant developers were immediately confronted with the requirements of multilingual code development. Many applications written in English-speaking (or Western European eight-bit character set-compatible) countries are badly written, making the assumption that a character will always fit within one byte. The early versions of Samba definitely made that mistake and retrofitting multibyte character set handling into old code is a real bear to get right. I know, because I was the person who first had to work on this for Samba (later I got some much-needed help from Andrew), so I may be a little touchy on this subject.

Whenever I did Win32 development, I immediately designed with non-English languages in mind, and wrote everything with the abstract type TCHAR (one of the few useful abstract types in Win32), which is selectable at compile time using the Unicode defined to be either wchar_t with Unicode turned on, or unsigned char with Unicode turned off. Getting yourself in the right multibyte character set mindset from the beginning eliminates a whole class of bugs that you get when having to convert a quick "English-only" hacked-up program into something maintainable for different languages. POSIX has been catching up over the years with the iconv() functionality to cope with character set conversions, and Sun designed gettext() interfaces for localization, but Win32 had it all right from the start.