Valid XML Documents

XML documents whose syntax has been checked successfully are called valid documents; in particular, an XML document is considered valid if there is a document type definition (DTD) or XML schema associated with it and if the document complies with the DTD or schema. That's all there is to making a document valid. This chapter is all about creating basic DTDs. In the next chapter, I'll elaborate on the DTDs we create here and show how to declare entities, attributes, and notations.

You can find the formal rules for DTDs in the XML 1.0 recommendation, www.w3.org/TR/REC-xml, which also appears in Appendix A, "The XML Specification." The constraints that documents and DTDs must adhere to in order to create a valid document are marked with the text "Validity Constraint."

Note that DTDs are all about specifying the structure and syntax of XML documents (not their content). Various organizations can share a DTD to put an XML application into practice. We've seen quite a few examples of XML applications in Chapter 1, "Essential XML," and those applications can all be enforced with DTDs that the various organizations make public. We'll see how to create public DTDs in this chapter.

Most XML parsers, like the one in the Internet Explorer, require XML documents to be well formed but not necessarily valid (most XML parsers do not require a DTD, but if there is one, validating parsers will use it to check the XML document).

IE with DTDs

To see how to get Internet Explorer to work with DTDs, see the section "Validating XML Documents with DTDs in Internet Explorer" in Chapter 7, "Handling XML Documents with JavaScript."

In fact, we saw a DTD at the end of the previous chapter. In that chapter, I set up an example XML document named ch02_01.xml that stored customer orders. At the end of the chapter, I used the DOMWriter program that comes with IBM's XML for Java package to translate the document into canonical XML. To run it through that program, I needed to add a DTD to the document. Here's what it looked like:

Listing ch03_01.xml
 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [   <!ELEMENT DOCUMENT (CUSTOMER)*>   <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)>   <!ELEMENT NAME (LAST_NAME,FIRST_NAME)>   <!ELEMENT LAST_NAME (#PCDATA)>   <!ELEMENT FIRST_NAME (#PCDATA)>   <!ELEMENT DATE (#PCDATA)>   <!ELEMENT ORDERS (ITEM)*>   <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)>   <!ELEMENT PRODUCT (#PCDATA)>   <!ELEMENT NUMBER (#PCDATA)>   <!ELEMENT PRICE (#PCDATA)>   ]>  <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT> 

In this chapter, I'm going to take this DTD apart to see what makes it tick. Actually, this DTD is a pretty substantial one; to get us started and to show how DTDs work in overview, I'll start with a mini-example first. Here it is:

Listing ch03_02.xml
 <?xml version="1.0"?> <!DOCTYPE THESIS [     <!ELEMENT THESIS (P*)>     <!ELEMENT P (#PCDATA)> ]> <THESIS>     <P>         This is my Ph.D. thesis.     </P>     <P>         Pretty good, huh?     </P>     <P>         So, give me a Ph.D. now!     </P> </THESIS> 

Note the <!DOCTYPE> element here. This element is a document type declaration (DTDs are document type definitions ). You use document type declarations to indicate the DTD used for the document. The basic syntax for the document type declaration is <!DOCTYPE rootname [ DTD ]> (there are other variations we'll see in this chapter), where DTD is the document type definition you want to use. DTDs can be internal or external, as we'll see in this chapter. In this case, the DTD is internal:

 <?xml version="1.0"?>  <!DOCTYPE THESIS [  <!ELEMENT THESIS (P*)>   <!ELEMENT P (#PCDATA)>  ]> <THESIS>     <P>         This is my Ph.D. thesis.     </P>     <P>         Pretty good, huh?     </P>     <P>         So, give me a Ph.D. now!     </P> </THESIS> 

This DTD follows the World Wide Web Consortium (W3C) syntax conventions, which means that I specify the syntax for each element with <!ELEMENT> . Using this element, you can specify that the contents of an element can be either parsed character data, #PCDATA or other elements that you've created, or both. In this example, I'm indicating that the <THESIS> element must contain only <P> elements, but that it can contain zero or more occurrences of the <P> element (which is what the * after P in <!ELEMENT THESIS (P*)> means).

In addition to defining the <THESIS> element, I define the <P> element so that it can hold only textthat is, parsed character data (which is pure text, without any markup)with the term #PCDATA :

 <?xml version="1.0"?>  <!DOCTYPE THESIS [     <!ELEMENT THESIS (P*)>  <!ELEMENT P (#PCDATA)>  ]> <THESIS>     <P>         This is my Ph.D. thesis.     </P>     <P>         Pretty good, huh?     </P>     <P>         So, give me a Ph.D. now!     </P> </THESIS> 

In this way, I've specified the syntax of these two elements, <THESIS> and <P> . A validating XML processor can now validate this document using the DTD that it supplies .

And that's what a DTD looks like in overview; now it's time to dig into the full details. And we're going to take a look at all of them here and in the next chapter.



Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net