Before Unicode, there was darkness. The darkness was code pages, and they provided a poor solution to the problem that ASCII was inadequate. ASCII uses the first 7 bits of a byte to define 128 characters, which include Latin characters, numbers, punctuation, and control characters. With these 128 characters taken, there were only 128 possible values remaining to represent all the other characters in the world in a single byte. This was clearly impossible, so the solution of code pages was created. A code page defines one possible set of characters for these remaining 128 characters. Each character is identified by a number called a code point. For example, the Windows code page 1252 is "Western Europe" and includes many Latin accented characters. Similarly, the Windows code page 1256 includes Arabic characters. This solves the problem of representing a single character in a single byte, but it is a disaster for data integrity. To know what character a code point is trying to represent, you have to know what code page it uses. A file that was created for code page 1252 is meaningless (for the upper 128 characters) if it is represented using code page 1256. Even worse, not all scripts (e.g., including Armenian and Geez [Ethiopia]) have a code page, so representing such characters without Unicode support ranges from difficult to impossible. Mercifully, if your application does not interact with legacy systems and your application does not use the console, you will be spared from having to use code pages because the .NET Framework is based upon Unicode, not code pages. |