5.4 XML-Defined Character Sets

     

An XML parser is required to handle the UTF-16 and UTF-8 encodings or Unicode (about which more follows ). However, XML parsers are allowed to understand and process many other character sets. In particular, the specification recommends that processors recognize and be able to read these encodings:

UTF-8

UTF-16

ISO-10646-UCS-2

ISO-10646-UCS-4

ISO-8859-1

ISO-8859-2

ISO-8859-3

ISO-8859-4

ISO-8859-5

ISO-8859-6

ISO-8859-7

ISO-8859-8

ISO-8859-9

ISO-8859-JP

Shift_JIS

EUC-JP


Many XML processors understand other legacy encodings. For instance, processors written in Java often understand all character sets available in the Java virtual machine. For a list, see http://java.sun.com/products/j2se/1.4.2/docs/guide/intl/encoding.doc.html. Furthermore, some processors may recognize aliases for these encodings; both Latin-1 and 8859_1 are sometimes used as synonyms for ISO-8859-1. However, using these names limits your document's portability. We recommend that you use standard names for standard encodings. For encodings whose standard name isn't given by the XML 1.0 specification, use one of the names registered with the Internet Assigned Numbers Authority (IANA), listed at ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. Knowing the name of a character set and saving a file in that set does not mean that your XML parser can read such a file, however. XML parsers are only required to support UTF-8 and UTF-16. They are not required to support the hundreds of different legacy encodings used around the world.



XML in a Nutshell
XML in a Nutshell, Third Edition
ISBN: 0596007647
EAN: 2147483647
Year: 2003
Pages: 232

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net