Character Encodings | Developing and Implementing WindowsR-based Applications with Visual BasicR. NET and Visual StudioR. NET Exam CramT 2 (Exam 70-306)

Because many different possible character sets may be used in an application, the .NET Framework provides support for encodings through the System.Text.Encoding class. An encoding is a set of characters and their associated numerical values, such as the ASCII character set, which associates common Latin characters with numerical values between 0 and 127.

Unicode

The .NET Framework utilizes the Unicode 2-byte (16-bit) character set by default. The Unicode character set includes roughly 65,000 common characters from most of the world's languages, with additional support possible for approximately one million special characters. Older versions of Windows and Microsoft development platforms made use of separate code pages, each of which specified a 256-character encoding specific to a particular language or locale.

Converting Encodings

Although the native mode of the .NET Framework uses the Unicode character set, the .NET Framework also includes support for the older encodings in order to provide backward compatibility with legacy applications. The System.Text namespace contains classes that may be used to convert characters from the Unicode (UTF16) encoding to other encodings, as well as the converse .

Table 8.1 lists some of the more common subclasses of the System.Text.Encoding class that can be used to convert between character encodings.

Table 8.1. Encoding Subclasses Within the System.Text Namespace

Class	Use
ASCIIEncoding	Converts between Unicode and ASCII
Encoding	A general-purpose class using the Encoding.GetEncoding static method to return encodings that can be used for legacy code page compatibility
UnicodeEncoding	Converts to or from Unicode encoded as consecutive bytes
UTF7Encoding	Converts to and from 7-bit Unicode
UTF8Encoding	Converts to and from 8-bit Unicode