The first 128 Unicode charactersthat is, characters 0 through 127are identical to the ASCII character set. 32 is the ASCII space; therefore, 32 is the Unicode space. 33 is the ASCII exclamation point; therefore, 33 is the Unicode exclamation point, and so on. Table A-1 lists this character set.
In the first column, characters 0 through 31 are referred to as control characters because they e traditionally entered by holding down the control key and a letter key (on at least some dumb terminals). For instance, Ctrl-H is often ASCII 8, backspace. Ctrl-S is often mapped to ASCII 19, DC3, or XOFF. Ctrl-Q is often mapped to ASCII 17, DC1, or XON. Generally, each control character is entered by pressing the Control key and the printable character whose ASCII value is the ASCII value of the character you want plus 64 (or 96, if you count from the capitals). Character 127, delete, is also a control character.
The common abbreviation for the character is given first, followed by its common meaning. Some of these codes are pretty much obsolete. For instance, Im not aware of any modern system that actually uses characters 28 through 31 as file, group, record, and unit separators. Those control codes that are still used often have different meanings on different platforms. For example, character 10, the linefeed, originally meant move the platen on the printer up one line, while character 13, the carriage return, meant return the print-head to the beginning of the line. On paper-based teletype terminals, this could be used to position the print-head anywhere on a page and perhaps overtype characters that had already been typed. This no longer makes sense in an era of glass terminals and GUIs, so linefeed has come to mean a generic end-of-line character.
The next 128 Unicode charactersthat is, 128 through 255have the same values as the equivalent characters in the Latin-1 character set defined in ISO standard 8859-1. Latin-1, a slight variation of which is used by Windows, adds the various accented characters, umlauts, cedillas, upside-down question marks, and other characters needed to write text in most Western European languages. shows these characters. The first 128 characters in Latin-1 are the ASCII characters shown in Table A-2.
Characters 128 through 159 are nonprinting control characters, much like characters 0 through 31 of the ASCII set. Unicode does not specify any meanings for these 32 characters, but their common interpretations are listed in the table. On Windows, most of these positions are used for noncontrol characters not included in Latin-1. These alternate interpretations are given in Table A-3.
Values beyond 255 encode characters from various other character sets. Where possible, character blocks describing a particular group of characters map onto established encodings for that set of characters by simple transposition. For instance, Unicode characters 884 through 1011 encode the Greek alphabet and associated characters like the Greek question mark (;). This is a direct transposition by 720 of characters 128 through 255 of the ISO 8859-7 character set, which is in turn based on the Greek national standard ELOT 928. For example, the small letter delta, d, Unicode character 948, is ISO 8859-7 character 228. A small epsilon, e, Unicode character 949, is ISO 8859-7 character 229. In general, the Unicode value for a Greek character equals the ISO 8859-7 value for the character plus 720. Other character sets are included in Unicode in a similar fashion whenever possible.
As much as Id like to include complete tables for all Unicode characters, if I did so, this book would be little more than that table. For complete lists of all the Unicode characters and associated glyphs, the canonical reference is The Unicode Standard Version 4.0 by the Unicode Consortium, ISBN 0-321-18578-1. Updates to that book can be found at http://www.unicode.org/. Online charts can be found at http://unicode.org/charts.
About the Author
Elliotte Rusty Harold is originally from New Orleans, to which he returns periodically in search of a decent bowl of gumbo. However, he currently resides in the Prospect Heights neighborhood of Brooklyn with his wife, Beth, and cats Charm (named after the quark) and Marjorie (named after his mother-in-law). Hes an adjunct professor of computer science at Polytechnic University, where he teaches Java, XML, and object oriented programming. His Cafe au Lait web site (http://www.cafeaulait.org) is one of the most popular independent Java sites on the Internet, and his spin-off site, Cafe con Leche (http://www.cafeconleche.org), has become one of the most popular XML sites. Hes currently working on the XOM library for XML, the Jaxen XPath engine, and the Amateur media player. His previous books include Java Network Programming (OReilly) and Processing XML with Java (Addison-Wesley). |
Basic I/O
Introducing I/O
Output Streams
Input Streams
Data Sources
File Streams
Network Streams
Filter Streams
Filter Streams
Print Streams
Data Streams
Streams in Memory
Compressing Streams
JAR Archives
Cryptographic Streams
Object Serialization
New I/O
Buffers
Channels
Nonblocking I/O
The File System
Working with Files
File Dialogs and Choosers
Text
Character Sets and Unicode
Readers and Writers
Formatted I/O with java.text
Devices
The Java Communications API
USB
The J2ME Generic Connection Framework
Bluetooth
Character Sets