3.3. Windows Latin 1 and Other Windows CodesThe ISO 8859 character codes, which have been defined by international standards, have Microsoft-specific counterparts, which are here called "Windows codes." The main difference is that some code positions are reserved for control characters (and mostly unused) in ISO 8859 but assigned to various printable characters, especially punctuation marks, in Windows codes. Although defined only by a software vendor, the Windows codes are very important due to the market share of Microsoft. 3.3.1. Windows Latin 1Microsoft defined its own Latin 1 encoding as different from ISO Latin 1, although only in the sense that some positions that are reserved for control codes in ISO Latin 1 (codes 128159 decimal) are used for printable characters in Windows Latin 1. The main reason was very understandable: the inclusion of typographically correct quotation marks, as in "foo" and 'foo, and em dash (') and en dash (). The right single quote is also the typographically correct apostrophe. Some other characters were added as well. Windows Latin 1 is one of the most commonly used encodings in the world. In most contexts where the default is said to be ISO Latin 1, it's really Windows Latin 1 (sometimes called WinLatin1 ). For example, if a web document is labeled as ISO-8859-1 but contains octets with values 128149, browsers will generally display them according to Windows Latin 1. The practical reason is that most often this is what the document's author really meant. However, the use of octets in the range 128159 in any data to be processed by a program that expects ISO 8859-1 encoded data is an error, and it might cause problems. The octets might for example be ignored, or be processed in a manner that looks meaningful, or (in rare cases) be interpreted as control characters. The encoding has been registered under the name windows-1252. In practice, the name cp-1252, or cp1252, was widely used before the registration, and it can still be seen. Windows Latin 1 is often referred to as the ANSI character set, but this is completely misleading. ANSI, the American National Standards Institute, never adopted the set as a standard. Microsoft started using the name because they based the design on a draft for an ANSI standard. Other Windows character codes have also been called "ANSI." The Windows Latin 1 encoding has existed in somewhat different variants. The main difference in practice is that early versions did not include the euro sign, €. Table 3-4 presents the modern version of the characters in Windows Latin 1 that do not belong to ISO Latin 1. The table is grouped by character semantics and uses Unicode names for the characters. The names used in Microsoft documentation are partly different and vary by document.
3.3.2. Other Windows Character CodesMicrosoft has also defined other Windows-specific 8-bit character codes that resemble ISO 8859 encodings, such as Windows Latin 2, also known as Windows Central European or Windows East European. They, too, use the range of control codes (128159) for added punctuation and other characters. In addition to this, the encodings may differ from the corresponding ISO 8859 encoding in other positions. In particular, Windows Latin 2 differs from ISO 8859-2 in several positions. The Windows codes are widely used as de facto standards in many environments. If you travel to Central/Eastern Europe and use computers there, you will find that they very often have Windows Latin 2 as the default encoding. The Windows codes are known as windows-1250 through windows-1258 in the official registry of character encodings; these names are often called MIME names of encodings, for reasons explained in Chapter 10. Moreover, there is windows-874, which has not been officially registered. In practice, somewhat different names are used, as shown in Table 3-5. Note that the numbering of windows-1250 etc. differs from the numbering of the corresponding ISO 8859 standards. The table also compares the codes with ISO 8859 codes; differences in the range 128159 are not mentioned here.
The windows-1258 encoding has no direct ISO 8859 counterpart, but its overall design is the same as in ISO 8859-1, with the added characters as in windows-1252 and with some modifications made to meet some needs of the Vietnamese language. Names like cp1250 or cp-1250 (instead of windows-1250) are often used, but they are not official (registered). For detailed information, consult Microsoft's documentation "Code pages supported by Windows," http://www.microsoft.com/globaldev/reference/wincp.mspx. |