7.3. XML Document SyntaxNow let's look at some of the particulars of XML syntax using this simple XML document: <?xml version="1.0" encoding="US-ASCII" standalone="no"?> <!DOCTYPE accounts SYSTEM "simple.dtd"> <accounts> <customer> <name> <firstname>Bobby</firstname> <lastname>Five</lastname> </name> <accountNumber>4456</accountNumber> <balance>111.32</balance> </customer> <!-- more customers will be added soon --> <?php print date ('Fj,Y') ?> </accounts>
7.3.1. XML DeclarationThe first line of the example is the XML declaration. <?xml version="1.0" encoding="US-ASCII" standalone="no"?> The XML declaration contains special information for the XML parser. First, the version attribute tells the parser that it is an XML document that conforms to Version 1.0 of the XML standard (which, incidentally, is the only available option). In addition, the encoding attribute specifies which character encoding the document uses. By default, XML use the UTF-8 encoding of the Unicode character set (the most complete character set including glyphs from most of the world's languages). Alternate encodings may also be specified, such as ISO-8859-1 (Latin-1), which is a set containing characters from most Western European languages. Character encodings are discussed in more detail in Chapter 6. Finally, the optional standalone="no" attribute informs the program that an outside DTD is needed to correctly interpret the document. If the value of standalone is yes, it means there is no DTD or the DTD is included in the document. XML documents should begin with an XML declaration, but it is not required.
7.3.2. Document Type DeclarationThe example also includes a document type (DOCTYPE) declaration. <!DOCTYPE accounts SYSTEM "simple.dtd"> The purpose of the DOCTYPE declaration is to refer to the DTD against which the document should be compared for validity. The declaration identifies the root element of the document (accounts, in the example). It also provides a pointer to the DTD itself. DOCTYPE declarations are discussed in the "DTD Syntax" section later in this chapter and again in Chapter 9 as they apply to XHTML. Together, the XML declaration and DOCTYPE are often referred to as the document prolog . For XML languages that don't use DTDs, the entire prolog is optional. For languages with DTDs, the DOCTYPE declaration is required for the document to validate. 7.3.3. CommentsYou can leave notes within an XML document in the form of a comment. Comments begin with <!-- and end with -->. If you've used comments in HTML, this syntax should be familiar. The example document contains the comment: <!-- more customers will be added soon --> Comments are not elements and, therefore, do not affect the structure of the document. They may be placed anywhere in a document except before an XML declaration or within a tag or another comment. 7.3.4. Processing InstructionsA processing instruction is a method for passing information to applications that may read the document. It may also include the program or script itself. Unlike comments, which are intended for humans, processing instructions are for computer programs or scripts. Processing instructions are indicated by <? at the beginning and ?> at the end of the instruction. The example document includes a processing instruction for a simple PHP command that displays the current date. <?php print date('Fj, Y'); ?> 7.3.5. Entity ReferencesIsolated markup characters (such as <, &, and >) are not permitted in the flow of text in an XML document and must be escaped using either a Numeric Character Reference or a predefined character entity. This is to avoid having the XML parser interpret any < symbol as the beginning of a new tag. In addition to using entity references in the content of the document, you must use them in attribute values. XML defines five character entities for use in all XML languages, listed in Table 7-1. Other entities may be defined in a DTD.
If you have a document that uses a lot of special characters, such as an example of source code, you can tell the XML parser that the text is simple character data (CDATA) and should not be parsed. To protect content from parsing, enclose it in a CDATA section , indicated by <![CDATA[ ... ]]>. This XHTML example uses a CDATA section to display sample markup on a web page without requiring every < and > character to be escaped: <p>This is sample SMIL markup:</p> <![CDATA[ <audio src="/books/4/439/1/html/2/audio_file.mp3" begin="0s" /> <seq> <img src="/books/4/439/1/html/2/image_1.jpg" begin="0s" /> <img src="/books/4/439/1/html/2/image_2.jpg" begin="5s" /> </seq> ]]> The five reserved characters (listed in Table 7-1) are also put to frequent use when writing scripts (such as JavaScript), making it necessary to designate those blocks of content as CDATA so they will be ignored by XML parsers. |