Using Special Characters in XML


One of the frustrations developers encounter when using XML is trying to use special characters inside an otherwise well-formed XML document. For example, let's say we want to embed some text with an ampersand inside of a document:

 <company name="Baker & Associates" location="CA">   ... </company> 

XML parsers would not be able to process that document because the ampersand is a special character. This section shows you two different ways to deal with this problem: entity references and CDATA sections.

Entity References

If you've ever used a sequence such as &nbsp; or &amp; within HTML or another markup language, then you've used entity references. Entity references are a way to encode special characters so that they can be used within an XML document without affecting parsers' abilities to read them. Here's an example:

 <company name="Baker &amp; Associates" location="CA">   ... </company> 

An entity reference begins with an ampersand and ends with a semicolon. Following are some of the entity references you may have seen in HTML:

 &nbsp; (nonbreaking space) &amp; (ampersand) &iexcl; (inverted exclamation mark) &Uuml; (capital U with umlaut) 

Whenever a browser sees one of these entity references, the browser knows to display the corresponding character instead.

There are approximately 250 references defined in HTML. In XML, however, there are only five:

 &amp; (ampersand) &lt; (less-than sign) &gt; (greater-than sign) &quot; (double-quote) &apos; (single-quote) 

Any special characters outside of this set (specifically, high-ASCII characters above ASCII 127) must be encoded using a character reference, which looks like one of these two examples:

 &#160; &#xa0; 

Both of these examples represent a nonbreaking space, ASCII 160. The &# introduces a decimal number, and &#x introduces a hexadecimal (base 16) number.

This issue of special character encoding is why you should always use XmlFormat() when putting data into an XML document. The XmlFormat() automatically handles special characters by replacing them with their appropriate entity and character references before putting them into the document.

CDATA Sections

XmlFormat() is well suited for special characters that may occur in isolated areas of your XML document, but what if you have a section of your document that represents HTML markup with lots of special characters, or tags that don't parse correctly because they're not well formed?

In situations like these, XmlFormat() can go from being a blessing to being a curseas a result of all the escaping that's done to the stored text. A good alternative is to use a CDATA section as shown in Listing 14.8.

Listing 14.8. Namespaces.xmlA Portion of Listing 14.1, Using CDATA Rather Than Regular Text
 <company name="ABC MegaCorp, Inc." location="NY">   <comments>     <![CDATA[       <P>A very large company with 4 divisions:       <UL>         <LI>Financial Services         <LI>Baby Food         <LI>Large Vehicles         <LI>Fashion Consultation       </UL>     ]]>   </comments>   <employee ssn="123-45-6789">     ...   </employee>   <employee ssn="541-29-8376">     ...   </employee> </company> 

The content inside the CDATA section is unparsed, so it can contain any characterseven the special characters that are normally unusable in XML content. The only restriction is that you cannot use the sequence ]]> except to end the CDATA block.

To use CDATA within ColdFusion, use a node's xmlCData property rather than its xmlText property. The content you place into xmlCData is automatically placed into a CDATA section within the document, whereas content inside of xmlText is encoded and not put inside of a CDATA section:

 <cfset myXmlNode.xmlCData = "<P>This is content I do not want to escape"> 

CDATA sections are also ideal for long passages of text with many special characters, because escaped characters can take up anywhere from three to eight times as much space as their unescaped counterparts. The disadvantage to using a CDATA section is slightly more complex code.

From this point forward we will not use any more CDATA sectionsin order to keep the focus on new topics.



Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net