Unicode

In the words of the Unicode Consortium (see http://www.unicode.org), Unicode provides a unique number for every character, regardless of platform, program, or language. This means that if you use Unicode to represent characters in your application, there is no ambiguity over what a character should be in the application's user interface, in the application's input devices, or in the application's data. It's a simple concept, but it is a fundamental building block for an international application. The good news for .NET developers is that the .NET Framework and Visual Studio use Unicode throughout, so Unicode is a given.

The numbers that identify characters are called code points. Unicode groups these code points into 17 blocks of numbers, called planes, which consist of 64KB code points each. The first plane is plane 0 (also called the Basic Multilingual Plane, BMP) and includes the most commonly used languages. The 16 other planes include languages that are not in current use (e.g., Gothic and Old Italic), mathematical and musical symbols, and characters for more languages. The mapping of characters to code points is an ongoing process (Unicode 3.0 has more than 50,000 code points, Uni-code 4.0 has more than 94,000), and although the Unicode Consortium will eventually catch up with all the existing characters in the world, new characters and symbols will arise, and Unicode will need to be updated to include these. For this reason, the Unicode specification has a version that identifies the features that are covered within that release. Because code points are just numbers, the .NET Framework can read and write any code point without requiring previous knowledge of the code point. Note, however, the limit to which such unknown code points can be handled. When the .NET Framework has no information about a code point, it cannot know how to sort, case, or render it. So although the .NET Framework 1.1 was released before the most recent version of Unicode, it can still read and write all the new code points, although it cannot make decisions about how to process these code points. Fonts, however, are another matter, but they fall outside the .NET Framework.

Unicode mapping tables are tables that map legacy code pages (covered in the next section) to their Unicode equivalents. Table 2.1 shows the Unicode Mapping Table versions that the .NET Framework supports.

Table 2.1. .NET Framework Support for Unicode Mapping Tables
Framework	Unicode Mapping Tables
.NET Framework 2.0	4.0
.NET Framework 1.1	3.0

Table 2.1. .NET Framework Support for Unicode Mapping Tables