New Characters in XML Names

New Characters in XML Names

XML 1.1 expands the set of characters allowed in XML names (that is, element names, attribute names, entity names, ID-type attribute values, and so forth) to allow characters that were not defined in Unicode 2.0, the version that was extant when XML 1.0 was first defined. Unicode 2.0 is fully adequate to cover the needs of markup in English, French, German, Russian, Chinese, Japanese, Spanish, Danish, Dutch, Arabic, Turkish, Hebrew, Farsi, Thai, Hindi, and most other languages you're likely to be familiar with as well as several thousand you aren't. However, Unicode 2.0 did miss a few important living languages including Mongolian, Yi, Cambodian, Amharic, Dhivehi, and Burmese, so if you want to write your markup in these languages, XML 1.1 is worthwhile.

However, note that this is relevant only if we're talking about markup, particularly element and attribute names. It is not necessary to use XML 1.1 to write XML data, particularly element content and attribute values, in these languages. For example, here's the beginning of an Amharic translation of the Book of Matthew written in XML 1.0.

 <?xml version="1.0" encoding="UTF-8"> <book> graphics/03inl01.gif </verse> </chapter> </book> 

Here the element and attribute names are in English although the content and attribute values are in Amharic. On the other hand, if we were to write the element and attribute names in Amharic, we would need to use XML 1.1.

 <?xml version="1.1" encoding="UTF-8"> graphics/03inl02.gif 

This is plausible. A native Amharic speaker might well want to write markup like this. However, the loosening of XML's name character rules have effects far beyond the few extra languages they're intended to enable. Whereas XML 1.0 is conservative (everything not permitted is forbidden), XML 1.1 is liberal (everything not forbidden is permitted). XML 1.0 lists the characters you can use in names. XML 1.1 lists the characters you can't use in names. Characters XML 1.1 allows in names include:

  • Symbols like the copyright sign ( )

  • Mathematical operators such as ±

  • Superscript 7 ( 7 )

  • The musical symbol for a six-string fretboard

  • The zero-width space

  • Private-use characters

  • Several hundred thousand characters that aren't even defined in Unicode and probably never will be

XML 1.1's lax name character rules have the potential to make documents much more opaque and obfuscated .



Effective XML. 50 Specific Ways to Improve Your XML
Effective XML: 50 Specific Ways to Improve Your XML
ISBN: 0321150406
EAN: 2147483647
Year: 2002
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net