Chapter 3. Valid Documents: Creating Document Type Definitions
In the previous chapter, we saw all about creating well-formed
XML documents. However, there's more to creating good XML documents
than the simple (although essential) requirement that they be well
|
Valid XML Documents
XML documents whose syntax has been checked successfully are
called
valid
documents; in particular, an XML document
is
You can find the formal rules for DTDs in the XML 1.0
recommendation, www.w3.org/TR/REC-xml, which also appears in
Appendix A, "The XML Specification." The constraints that documents
and DTDs must
Note that DTDs are all about specifying the structure and syntax of XML documents (not their content). Various organizations can share a DTD to put an XML application into practice. We've seen quite a few examples of XML applications in Chapter 1, "Essential XML," and those applications can all be enforced with DTDs that the various organizations make public. We'll see how to create public DTDs in this chapter.
Most XML parsers, like the one in the Internet Explorer, require
XML documents to be well
In fact, we saw a DTD at the end of the previous chapter. In
that chapter, I set up an example XML document named ch02_01.xml
that stored customer orders. At the end of the chapter, I used the
DOMWriter program that comes with IBM's XML for Java package to
translate the document into canonical XML. To run it through that
program, I needed to add a DTD to the document. Here's what it
Listing ch03_01.xml<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2003</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>.25</PRICE> </ITEM> <ITEM> <PRODUCT>Oranges</PRODUCT> <NUMBER>24</NUMBER> <PRICE>.98</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Jones</LAST_NAME> <FIRST_NAME>Polly</FIRST_NAME> </NAME> <DATE>October 20, 2003</DATE> <ORDERS> <ITEM> <PRODUCT>Bread</PRODUCT> <NUMBER>12</NUMBER> <PRICE>.95</PRICE> </ITEM> <ITEM> <PRODUCT>Apples</PRODUCT> <NUMBER>6</NUMBER> <PRICE>.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Weber</LAST_NAME> <FIRST_NAME>Bill</FIRST_NAME> </NAME> <DATE>October 25, 2003</DATE> <ORDERS> <ITEM> <PRODUCT>Asparagus</PRODUCT> <NUMBER>12</NUMBER> <PRICE>.95</PRICE> </ITEM> <ITEM> <PRODUCT>Lettuce</PRODUCT> <NUMBER>6</NUMBER> <PRICE>.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT> In this chapter, I'm going to take this DTD apart to see what makes it tick. Actually, this DTD is a pretty substantial one; to get us started and to show how DTDs work in overview, I'll start with a mini-example first. Here it is: Listing ch03_02.xml
<?xml version="1.0"?>
<!DOCTYPE THESIS [
<!ELEMENT THESIS (P*)>
<!ELEMENT P (#PCDATA)>
]>
<THESIS>
<P>
This is my Ph.D. thesis.
</P>
<P>
Pretty good, huh?
</P>
<P>
So, give me a Ph.D. now!
</P>
</THESIS>
Note the
<!DOCTYPE>
element here. This element is
a document type
declaration
(DTDs are document type
definitions
). You use document type declarations to
<?xml version="1.0"?> <!DOCTYPE THESIS [ <!ELEMENT THESIS (P*)> <!ELEMENT P (#PCDATA)> ]> <THESIS> <P> This is my Ph.D. thesis. </P> <P> Pretty good, huh? </P> <P> So, give me a Ph.D. now! </P> </THESIS>
This DTD
In addition to defining the
<THESIS>
element, I
define the
<P>
element so that it can hold only
textthat is, parsed character data (which is pure text, without any
markup)with the
<?xml version="1.0"?>
<!DOCTYPE THESIS [
<!ELEMENT THESIS (P*)>
<!ELEMENT P (#PCDATA)>
]>
<THESIS>
<P>
This is my Ph.D. thesis.
</P>
<P>
Pretty good, huh?
</P>
<P>
So, give me a Ph.D. now!
</P>
</THESIS>
In this way, I've specified the syntax of these two elements,
<THESIS>
and
<P>
. A validating XML
processor can now validate this document using the DTD that it
And that's what a DTD looks like in overview; now it's time to dig into the full details. And we're going to take a look at all of them here and in the next chapter. |