Chapter 6: Text Encoding in Windows

Encoding is exceedingly important. Even as long ago as Windows was still under construction and there were only vague rumors about it, I often had to convert text information. Once upon a time I even had to write a special driver for a printer that refused to understand standard encoding.

Encoding Text Information

The American National Standards Institute (ANSI) has introduced the American Standard Code for Information Interchange (ASCII). In the ASCII standard, there are two code tablesbasic and extended. The basic table includes codes from 0 to 127, and extended table adds the values from 128 to 255. The starting 32 codes of the ASCII table are reserved for use by hardware manufacturers. They are so-called control codes. Codes from 32 to 127 are used for English alphabetic characters, digits, punctuation marks, and other characters .

The extended table was intended for encoding national language characters. Here, there is no common standard. The International Standards Organization (ISO) standard provides encoding for national language characters; however, in each country, there are also national standards that are used more frequently. For example, in Russia, there are several encoding methods pretending to play the role of a standard. Thus, to encode Cyrillic characters, Microsoft has introduced Windows code page 1251. KOI8 encoding was inherited from the Soviet Union. DOS encoding, or CP-866, is generally used by Windows for displaying text information in the console window. [i]

Single-byte encoding doesn't allow two or more alphabets to be used simultaneously . Furthermore, some national alphabets cannot be encoded using single-byte numbers. Currently, a more universal encoding system is used, which is based on representing characters using two-byte numbers . This universal encoding system is known as Unicode. Note that despite obvious advantages of such an approach, two-byte encoding became popular only recently. The reason is straightforward: additional resources are required to use Unicode. This relates both to the RAM, since all text strings have a double length, and to the CPU resources.

[i] One of the most important tasks of the console window was displaying MS-DOS programs.



The Assembly Programming Master Book
The Assembly Programming Master Book
ISBN: 8170088178
EAN: 2147483647
Year: 2004
Pages: 140
Authors: Vlad Pirogov

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net