2.5 References

     

The character data inside an element must not contain a raw unescaped opening angle bracket ( < ). This character is always interpreted as beginning a tag. If you need to use this character in your text, you can escape it using the entity reference &lt; , the numeric character reference &#60; , or the hexadecimal numeric character reference &#x3C; . When a parser reads the document, it replaces any &lt; , &#x60; , or &#x3C; references it finds with the actual < character. However, it will not confuse the references with the starts of tags. For example:

 <SCRIPT LANGUAGE="JavaScript">   if (location.host.toLowerCase( ).indexOf("ibiblio") &lt; 0) {     location.href="http://ibiblio.org/xml/";   } </SCRIPT> 

Character data may not contain a raw unescaped ampersand ( & ) either. This is always interpreted as beginning an entity reference. However, the ampersand may be escaped using the &amp; entity reference like this:

 <company>W.L. Gore &amp; Associates</company> 

The ampersand is code point 38 so it could also be written with the numeric character reference &#38; :

 <company>W.L. Gore &#38; Associates</company> 

Entity references such as &amp; and character references such as &#60; are markup. When an application parses an XML document, it replaces this particular markup with the actual character or characters the reference refers to.

XML predefines exactly five entity references. These are:


&lt;

The less-than sign, a.k.a. the opening angle bracket ( < )


&amp;

The ampersand ( & )


&gt;

The greater-than sign, a.k.a. the closing angle bracket ( > )


&quot;

The straight, double quotation marks (")


&apos;

The apostrophe, a.k.a. the straight single quote (')

Only &lt; and &amp; must be used instead of the literal characters in element content. The others are optional. &quot; and &apos; are useful inside attribute values where a raw " or ' might be misconstrued as ending the attribute value. For example, this image tag uses the &apos; entity reference to fill in the apostrophe in "O'Reilly:"

 <image source='oreilly_koala3.gif' width='122' height='66'  alt='Powered by O&apos;Reilly Books  ' /> 

Although there's no possibility of an unescaped greater-than sign ( > ) being misinterpreted as closing a tag it wasn't meant to close, &gt; is allowed mostly for symmetry with &lt; .

There is one unusual case where the greater-than sign does need to be escaped. The three-character sequence ]]> cannot appear in character data. Instead you have to write it as ]]&gt; .


In addition to the five predefined entity references, you can define others in the document type definition. We'll discuss how to do this in Chapter 3.

Entity and character references can only be used in element content and attribute values. They cannot be used in element names, attribute names , or other markup. Text like &amp; or &#60; may appear inside a comment or a processing instruction. However, in these places it is not resolved. The parser only replaces references in element content and attribute values. It does not recognize references in other locations.



XML in a Nutshell
XML in a Nutshell, Third Edition
ISBN: 0596007647
EAN: 2147483647
Year: 2003
Pages: 232

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net