27.2 HTML4 Entity Sets


HTML 4.0 predefines several hundred named entities, many of which are quite useful. For instance, the nonbreaking space is   . XML, however, defines only five named entities:


The ampersand ( & )


The less-than sign ( < )


The greater-than sign ( > )


The straight double quote (")


The straight single quote (')

Other needed characters can be inserted with character references in decimal or hexadecimal format. For instance, the nonbreaking space is Unicode character 160 (decimal). Therefore, you can insert it in your document as either &#160; or &#xA0; . If you really want to type it as &nbsp; , you can define this entity reference in your DTD. Doing so requires you to use a character reference:

 <!ENTITY nbsp "&#160;"> 

The XHTML 1.0 specification includes three DTD fragments that define the familiar HTML character references:

Latin-1 characters (http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent)

The non-ASCII, graphic characters included in ISO-8859-1 from code points 160 through 255, shown in Table 27-3

Special characters (http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent)

A few useful letters and punctuation marks not included in Latin-1

Symbols (http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent)

The Greek alphabet, plus various arrows, mathematical operators, and other symbols used in mathematics

Feel free to borrow these entity sets for your own use. They should be included in your document's DTD with these parameter entity references and PUBLIC identifiers:

 <!ENTITY % HTMLlat1 PUBLIC    "-//W3C//ENTITIES Latin 1 for XHTML//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLspecial PUBLIC     "-//W3C//ENTITIES Special for XHTML//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent"> %HTMLspecial; <!ENTITY % HTMLsymbol PUBLIC     "-//W3C//ENTITIES Symbols for XHTML//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent"> %HTMLsymbol; 

However, we do recommend saving local copies and changing the system identifier to match the new location, rather than downloading them from the http://www.w3.org every time you need to parse a file. You may import just one, two, or all three of them, depending on what you need. There are no interdependencies.

Instead, you can just use the character references shown in Tables Table 27-4, Table 27-5, and Table 27-6.

Table 27-4. The HTML Latin-1 entity set

XML in a Nutshell
XML in a Nutshell, Third Edition
ISBN: 0596007647
EAN: 2147483647
Year: 2003
Pages: 232

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net



XHTMLentity reference

Hexadecimalcharacter reference

Decimalcharacter reference


Nonbreaking space




Inverted exclamation mark




Cent sign




Pound sign




Currency sign




Yen sign, Yuan sign




Broken vertical bar




Section sign





Dieresis, spacing dieresis




Copyright sign




Feminine ordinal indicator




Left-pointing double angle quotation mark, left-pointing guillemot




Not sign





Soft hyphen, discretionary hyphen




Registered trademark sign




Macron, overline, APL overbar




Degree sign




Plus-or-minus sign





Superscript digit two, squared





Superscript digit three, cubed