Properties of Characters


Unicode contains about 100,000 characters and is still growing. To manage the multitude of characters, we need to assign useful classifying and other properties to them. The Unicode standard defines a large number of properties, related to things like decompositions, collation, sorting, directionality, and line breaking, as well as Unicode normalization forms. Some of the properties are answers to simple questions like "Is the character a digit?" or (for letters) "What is the corresponding uppercase letter?" Many properties are more technical and intended for use in formal specifications and in programming.

This chapter concentrates on properties in a rigorous sense: properties defined for characters in the Unicode standard in an exact, objective, formalized manner. All the properties discussed here differ from purely verbal descriptions of characters in the standard, such as the description of possible glyph variation. For example, the description that the ASCII quotation mark " (U+0022) has a vertical glyph is surely relevant, but not formalized. The same applies to other similar notes in the text of the standard and the annotations in the code charts.

The Unicode standard designates some properties as normative. Such a property is prescriptive in the sense that if a conforming implementation uses the property, it must do so in accordance with its definition. The non-normative properties are called informative. Character properties, even normative properties, are not guaranteed to remain stable, and in practice, some properties have been changed between Unicode versions.

The properties discussed here have different uses:

  • They help you to understand correctly the meaning and intended use of a character.

  • They specify default processing rules for characters. Programs can and should implement the rules, so that the rules will be overridden only when application-specific reasons make this sensible.

  • They are used to construct machine-readable information on characters. You can use such information with viewers that let you search and display it, but also via programs and subroutine libraries, which let you use the information in programs that you design.

Figure 5-1. Viewing characters and their properties in Uniview


Figure 5-1 illustrates the use of an online service, Uniview, for viewing some key properties of characters with a graphic user interface. In the figure, the character itself is shown on the right, with some property values listed under it. Uniview lets you browse and search characters by general category or other properties. Uniview is available at http://people.w3.org/rishida/scripts/uniview/.



Unicode Explained
Unicode Explained
ISBN: 059610121X
EAN: 2147483647
Year: 2006
Pages: 139

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net