Item 10. White Space Matters

XML defines white space as the Unicode characters space (0x20), carriage return (0x0D), line feed (0x0A), and tab (0x09), as well as any combination of them. Other invisible characters, such as the byte order mark and the nonbreaking space (0xA0), are treated the same as visible characters, such as A and $.

White space is significant in XML character data. This can be a little surprising to programmers who are used to languages like Java where white space mostly isn't significant. However, remember that XML is a markup language, not a programming language. An XML document contains data, not code. The data parts of a program (that is, the string literals) are precisely where white space does matter in traditional code. Thus it really shouldn't be a huge surprise that white space is significant in XML.

For example, the following two shape elements are not the same.

 <shape>star</shape> <shape>      star </shape> 

Depending on the context, a particular XML application may choose to treat these two elements the same. However, an XML parser will faithfully report all the data in both shape elements to the client application. If the client application chooses to trim the extra white space from the content of the second element, that's the client application's business. XML has nothing to do with it.



Effective XML. 50 Specific Ways to Improve Your XML
Effective XML: 50 Specific Ways to Improve Your XML
ISBN: 0321150406
EAN: 2147483647
Year: 2002
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net