Character Set Converters | Special Edition Using Java 2, Standard Edition (Special Edition Using...)

Characters can be represented as binary numbers . This is normally referred to as an encoding scheme. The most common scheme used for English text is called the ISO Latin-1 encoding. The set of characters supported by any one encoding is said to be its character set, which includes all possible characters that can be represented by the encoding. Usually, the first 127 codes of an encoding correspond to the almost universally accepted ASCII character set, which includes all the standard characters and punctuation marks. Nevertheless, most encoding schemes can vary radically , especially because some, such as Chinese and Japanese encoding schemes, have character sets that bear little resemblance to the English set.

The SDK 1.3 supports the encodings shown in Table 24.4. The Java 2 Runtime Environment, 1.3 Standard Edition, for Windows comes in two different versions: US-only and international. The US-only version only supports the encodings shown in table 24.4. The international version (which includes the lib/i18n.jar file) supports the encodings shown in Table 24.4 and many more. In fact, too many to list here. For the encodings supported in the i18n.jar, see the internationalization documentation under the JAVA_HOME/ docs/guide/intl directory.

Table 24.4. Standard Java Character Encodings

`ASCII`	American Standard Code for Information Interchange
`Cp1252`	Windows-Latin-1
`ISO8859_1`	ISO 8859-1, Latin alphabet No. 1
`UnicodeBig`	Sixteen-bit Unicode Transformation Format, big-endian byte order, with byte-order mark
`UnicodeBigUnmarked`	Sixteen-bit Unicode Transformation Format, big-endian byte order
`UnicodeLittle`	Sixteen-bit Unicode Transformation Format, little-endian byte order, with byte-order mark
`UnicodeLittleUnmarked`	Sixteen-bit Unicode Transformation Format, little-endian byte order
`UTF8`	Eight-bit Unicode Transformation Format
`UTF-16`	Sixteen-bit Unicode Transformation Format, byte order specified by a mandatory initial byte-order mark