Section 15.7. Suppressing markup recognition | XML in Office 2003: Information Sharing with Desktop XML


Prev	don't be afraid of buying books	Next

15.7. Suppressing markup recognition

Sometimes when you are creating an XML document, you want to protect certain characters from being interpreted as markup. Imagine, for example, that you are writing a user's guide to HTML. You would need a way to include an example of markup. Your first attempt might be to create an example element and do something like Example 15-20.

Example 15-20. An invalid approach to HTML examples in XML

 <p>HTML documents must start with a DOCTYPE, etc. etc. This is an example of a small HTML document: <sample>   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">   <HTML>   A document's title   <H1>A document's title</H1>   </HTML> </sample>

This will not work, however, because the angle brackets that are supposed to represent HTML markup will be interpreted as if they belonged to the XML document you are creating, not the mythical HTML document in the example. Your XML parser will complain that it is not appropriate to have an HTML DOCTYPE declaration in the middle of an XML document!

There are two solutions to this problem: CDATA sections and predefined entities.

15.7.1 CDATA sections

A construct called a CDATA section allows you to ask the parser to suspend markup recognition in a large chunk of text: "Hands off! This isn't meant to be interpreted."

CDATA stands for "character data". You can mark a section as being character data using the syntax shown in Example 15-21.

Example 15-21. Writing about HTML in a CDATA section

 <![CDATA[ <HTML> This is an example from HTML for Dumbbells! <p>It may be a pain to write a book about HTML in HTML, but it is easy in XML! </HTML> ]]>

The first and last lines mark the start and end, respectively, of the CDATA section. The last line is a delimiter called CDEnd (]]>). It may only be used to close CDATA sections. It must not occur anywhere else in an XML document.

15.7.2 Predefined entities

Predefined entities allow an author to represent individual data characters that would otherwise be interpreted as markup. There are five of them, shown in Table 15-1, along with the markup interpretations that they avoid.

Table 15-1. Predefined entities

Entity reference	Character	Markup not recognized
&	`&`	Entity or character reference
<	`<`	Tag
>	`>`	CDend
'	`'`	Literal
"	`"`	Literal

We can use references to the predefined entities to insert these characters, instead of typing them directly. Then they will not be interpreted as markup. Example 15-22 demonstrates this.

Example 15-22. Writing about HTML with predefined entities

 <p>HTML documents must start with a DOCTYPE, etc. etc. This is an example of a small HTML document: <sample>    &lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">    &lt;HTML>    &lt;HEAD>    &lt;TITLE>A document's title    &lt;/TITLE>    &lt;/HEAD>    &lt;/HTML> </sample>

When your XML parser parses the document, it will replace the entity references with actual characters. It will not interpret the characters it inserts as markup, but as "plain old data characters" (character data).

Predefined entities and CDATA sections only relate to the interpretation of the markup, not to the properties of the real document that the markup represents.


	Amazon