Chapter 3. Valid XML Documents: Creating Document Type Definitions

CONTENTS

Chapter 3. Valid XML Documents: Creating Document Type Definitions

Creating Document Type Declarations
Creating Document Type Definitions
A DTD Example
External DTDs
Using Document Type Definitions with URLs
Public Document Type Definitions
Using Both Internal and External DTDs
Namespaces and DTDs
Validating Against a DTD

Chapter 2, "Creating Well-Formed XML Documents," explains all about creating well-formed XML documents. However, there's more to creating good XML documents than the simple (although essential) requirement that they be well-formed. Because you can create your own tags when you create an XML application, it's up to you to set their syntax. For example, can a <HOUSE> element contain plain text or only other elements such as <TENANT> or <OWNER>? Must a <BOOK> element contain a <PAGE_COUNT> element, or can it get by without one? It's up to you to decide. Using your own custom XML syntax is not only good for making sure that your documents are legible it can also be essential for programs that deal with documents via code.

XML documents whose syntax has been checked successfully are called valid documents; in particular, an XML document is considered valid if there is a document type definition (DTD) or XML schema associated with it and if the document complies with the DTD or schema. That's all there is to making a document valid. This chapter is all about creating basic DTDs. In the next chapter, I'll elaborate on the DTDs that we create here, showing how to declare entities, attributes, and notations.

You can find the formal rules for DTDs in the XML 1.0 recommendation, http://www.w3.org/TR/REC-xml (which also appears in Appendix A, "The XML 1.0 Specification"). The constraints that documents and DTDs must adhere to create a valid document are marked with the text "Validity Constraint."

Note that DTDs are all about specifying the structure and syntax of XML documents (not their content). Various organizations can share a DTD to put an XML application into practice. We saw quite a few examples of XML applications in Chapter 1, "Essential XML," and those applications can all be enforced with DTDs that the various organizations make public. We'll see how to create public DTDs in this chapter.

Most XML parsers, like the one in Internet Explorer, require XML documents to be well-formed but not necessarily valid. (Most XML parsers do not require a DTD, but if there is one, validating parsers will use it to validate the XML document.)

In fact, we saw a DTD at the end of the previous chapter. In that chapter, I set up an example XML document that stored customer orders named order.xml. At the end of the chapter, I used the DOMWriter program that comes with IBM's XML for Java package to translate the document into canonical XML; to run it through that program, I needed to add a DTD to the document. Here's what it looked like:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$14.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$1.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

In this chapter, I'm going to take this DTD apart to see what makes it tick. Actually, this DTD is a pretty substantial one, so to get us started and to show how DTDs work in overview, I'll start with a mini-example first:

<?xml version="1.0"?> <!DOCTYPE THESIS [     <!ELEMENT THESIS (P*)>     <!ELEMENT P (#PCDATA)> ]> <THESIS>     <P>         This is my Ph.D. thesis.     </P>     <P>         Pretty good, huh?     </P>     <P>         So, give me a Ph.D. now!     </P> </THESIS>

Note the <!DOCTYPE> element here. Technically, this element is not an element at all, but a document type declaration (DTDs are document type definitions). You use document type declarations to indicate the DTD used for the document. The basic syntax for the document type declaration is <!DOCTYPE rootname [DTD]> (there are other variations we'll see in this chapter) where DTD is the document type definition that you want to use. DTDs can be internal or external, as we'll see in this chapter in this case, the DTD is internal:

<?xml version="1.0"?> <!DOCTYPE THESIS [     <!ELEMENT THESIS (P*)>     <!ELEMENT P (#PCDATA)> ]> <THESIS>     <P>         This is my Ph.D. thesis.     </P>     <P>         Pretty good, huh?     </P>     <P>         So, give me a Ph.D. now!     </P> </THESIS>

This DTD follows the W3C syntax conventions, which means that I specify the syntax for each element with <!ELEMENT>. Using this declaration, you can specify that the contents of an element can be either parsed character data, #PCDATA or other elements that you've created, or both. In this example, I'm indicating that the <THESIS> element must contain only <P> elements but that it can contain zero or more occurrences of the <P> element which is what the asterisk (*) after P in <!ELEMENT THESIS (P*)> means.

In addition to defining the <THESIS> element, I define the <P> element so that it can only hold text that is, parsed character data (which is pure text, without any markup), with the term #PCDATA:

<?xml version="1.0"?> <!DOCTYPE THESIS [     <!ELEMENT THESIS (P*)>     <!ELEMENT P (#PCDATA)> ]> <THESIS>     <P>         This is my Ph.D. thesis.     </P>     <P>         Pretty good, huh?     </P>     <P>         So, give me a Ph.D. now!     </P> </THESIS>

In this way, I've specified the syntax of these two elements, <THESIS> and <P>. A validating XML processor can now validate this document using the DTD that it supplies.

And that's what a DTD looks like in overview; now it's time to dig into the full details. We're going to take a look at all of them here and in the next chapter.

Creating Document Type Declarations

You define the syntax and structure of elements using a document type definition (DTD), and you declare that definition in a document using a document type declaration. We've seen that you use <!DOCTYPE> to create a document type declaration. This element can take many different forms, as you see here (here, URL is the URL of a DTD, and rootname is the name of the root element); we'll see all these forms in this chapter:

<!DOCTYPE rootname [DTD]>
<!DOCTYPE rootname SYSTEM URL >
<!DOCTYPE rootname SYSTEM URL [DTD]>
<!DOCTYPE rootname PUBLIC identifier URL>
<!DOCTYPE rootname PUBLIC identifier URL [DTD]>

To use a DTD, you need a document type declaration, which means that you need <!DOCTYPE>. The <!DOCTYPE> declaration is part of a document's prolog (also called prologue). Here's how I add a document type declaration to the document order.xml that we developed in Chapter 2:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [     .     .     . ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Now it's up to us to supply the actual document type definition, the DTD, that's part of this <!DOCTYPE> declaration.

Creating Document Type Definitions

To introduce DTDs, I'll start with a DTD that's internal to the document whose syntax it specifies (we'll see how to create external DTDs later in the chapter). In this case, the DTD itself goes inside the square brackets in <!DOCTYPE> (note that I've set the standalone attribute to "yes" here because this document doesn't rely on any external resources):

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

When a DTD is in place and we'll see how to create this DTD in the following sections you have a valid document.

Having gotten <!DOCTYPE> in place, we're ready to start creating the DTD, starting with <!ELEMENT>.

Element Declarations

To declare the syntax of an element in a DTD, you use <!ELEMENT> like this: <!ELEMENT NAME CONTENT_MODEL>. Here, NAME is the name of the element that you're declaring; CONTENT_MODEL can be set to EMPTY or ANY, or it can hold mixed content (other elements as well as parsed character data) or child elements.

Here are a few examples. Note the expressions starting with % and ending with ;. Those expressions are parameter entity references, much like general entity references except that you use them in DTDs, not the body of the document (we'll see parameter entities in the next chapter):

<!ELEMENT direction (left, right, top?)> <!ELEMENT CHAPTER (INTRODUCTION, (P | QUOTE | NOTE)*, DIV*)> <!ELEMENT HR EMPTY> <!ELEMENT p (#PCDATA | I)* > <!ELEMENT %title; %content; > <!ELEMENT DOCUMENT ANY>

We're going to see how to create <!ELEMENT> declarations like these in this and the next chapter. I'll start by declaring the root element of the example document for this chapter, order.xml:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT ANY> ]> <DOCUMENT> </DOCUMENT>

Notice that I'm specifying a content model of ANY here; see the next section for the details on this keyword.

ANY

When you declare an element with the content model of ANY, that means that the declared element can contain any type of content any element in the document, as well as parsed character data. (Effectively, this means that the contents of elements that you declare with the ANY content model are not checked by XML validators. See the later section Validating Against a DTD for details on XML validators.)

Here's how you specify a content model of ANY:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT ANY> ]> <DOCUMENT> </DOCUMENT>

However, giving an element the content model ANY is often not a good idea because it removes syntax checking. It's usually far better to specify an actual content model, and I'll start doing that with a child list of elements.

Child Element Lists

Besides using the content model of ANY, you can specify that the element you're declaring contain another element by giving the name of that element in parentheses, like this:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> ]> <DOCUMENT>     .     .     . </DOCUMENT>

In this case, I'm indicating that the root element, <DOCUMENT>, can contain any number (including zero) of <CUSTOMER> elements (the way I specify that the <DOCUMENT> element can contain any number of <CUSTOMER> elements is with the asterisk after the parentheses we'll see how that works in a page or two).

Because the <DOCUMENT> element can contain any number of <CUSTOMER> elements, I can now add a <CUSTOMER> element to the document, like this:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> ]> <DOCUMENT>     <CUSTOMER>     .     .     .     </CUSTOMER> </DOCUMENT>

However, this is not a valid document because I haven't declared the <CUSTOMER> element yet. I'll do that next.

#PCDATA

Say that we want to let the <CUSTOMER> element store some plain text in particular, say that we want to store the name of a customer. All nonmarkup text is referred to as parsed character data in a DTD, and it is abbreviated as #PCDATA in element declarations. Parsed character data explicitly means text that does not contain markup, just simple character data.

The parsed character data is where you store the actual content of the document as plain text. Note, however, that this is the only way to specify the content of the document using DTDs you can't say anything more about the actual type of content.

For example, even though you might be storing numbers, that data is only plain text as far as DTDs are concerned. This lack of precision is one of the reasons that XML schemas, the alternative to DTDs, were developed. With schemas, you can specify much more about the type of data you're storing, such as whether it's in integer, floating point, or even date format, and XML processors can check to make sure the data matches the format that it's supposed to be expressed in. I'll take a look at schemas in Chapter 5, "Creating XML Schemas." (Note, however, that schemas are new enough that there's relatively little software support for them at this point; Internet Explorer has support for schemas, but it implements them according to an old, and unfortunately very out-of-date, W3C note.)

Here's how I declare the <CUSTOMER> element so that it can contain PCDATA (and only PCDATA):

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>     .     .     .     </CUSTOMER> </DOCUMENT>

Now I can add text to a <CUSTOMER> element in the document like this:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER> </DOCUMENT>

Note that elements that have been declared to hold PCDATA can hold only PCDATA; you cannot, for example, place another element in the <CUSTOMER> element the way that it has been declared now this document is not valid:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith         <CREDIT_RATING>             Lousy         </CREDIT_RATING>     </CUSTOMER> </DOCUMENT>

The content model that supports both PCDATA and other elements inside an element is called the mixed content model, and I'll take a look at it in a few pages (you can also support a mixed content model using the ANY content model, of course).

There's another thing to note here now that we're dealing with multiple declarations the order in which you declare elements doesn't matter, so this DTD, where I've declared the <DOCUMENT> element after the <CUSTOMER> element, works just as well:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT CUSTOMER (#PCDATA)> <!ELEMENT DOCUMENT (CUSTOMER)*> ]> <DOCUMENT>     <CUSTOMER>     .     .     .     </CUSTOMER> </DOCUMENT>

Note that although the order of element declarations is not supposed to matter and in practice, that's the way I've always seen it some XML processors may demand that you declare an element before using it in another declaration.

It's also possible to declare elements in such a way that they can contain multiple children. In fact, you can specify the exact types of child elements that an element can enclose, and in what order those child elements must appear. I'll take a look at that now.

Dealing with Multiple Children

When you want to declare an element that can contain multiple children, you have several options. DTDs use a syntax to deal with multiple children that is much like working with regular expressions in languages such as Perl, in case you're familiar with that. Here's the syntax that you can use (here, a and b are child elements of the element you're declaring):

a+ One or more occurrences of a.
a* Zero or more occurrences of a.
a? a or nothing.
a, b a followed by b.
a | b a or b, but not both.
(expression) Surrounding an expression with parentheses means that it's treated as a unit and may have the suffix operator ?, *, or +.

If you're not familiar with this kind of syntax, it's not much use asking why things are set up this way; this syntax has been around a long time, and W3C adopted it for DTDs because many people were familiar with it. If this looks totally strange to you, it's just one of the skills you'll have to master when writing DTDs but, fortunately, it soon becomes second nature.

I'll now take a look at each of these listed possibilities in detail.

One or More Children

If you must specify that the <DOCUMENT> element can contain only between 12 and 15 <CUSTOMER> elements, you'll have a problem when working with DTDs because the DTD syntax won't allow you to do that without getting very complex. However, you can specify that the <DOCUMENT> element must contain one or more <CUSTOMER> elements like this, using the + operator:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)+> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER>     <CUSTOMER>         Fred Smith     </CUSTOMER> </DOCUMENT>

In this case, the XML processor now knows that you want the <DOCUMENT> element to contain one or more <CUSTOMER> elements, which makes sense if you want a useful document that actually contains some data. In this way, we've been able to specify the syntax of the <DOCUMENT> element in some more detail.

Zero or More Children

Besides specifying one or more child elements, you can also declare elements so that they can enclose zero or more of a particular child element. This is useful if you want to allow an element to have a particular child element, or any number of such elements, but you don't want to force it to have that particular child element.

For example, a <CHAPTER> element might be capable of containing a <FOOTNOTE> element or even several <FOOTNOTE> elements, but you wouldn't necessarily want to force all <CHAPTER> elements to have <FOOTNOTE> elements. Using the * operator, you can do that.

The * operator means that the indicated child element can appear any number of times in the declared element (including zero times). Here's how I indicate that the <DOCUMENT> element can contain any number of <CUSTOMER> elements:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER>     <CUSTOMER>         Fred Smith     </CUSTOMER> </DOCUMENT>

Zero or One Child

Besides using + to specify one or more occurrences of a particular child element and * to specify zero or more occurrences of a child element, you can also use ? to specify zero or one occurrences of a child element. In other words, using ? indicates that a particular child element may be present in the element you're declaring, but it need not be.

For example, a <CHAPTER> element might be capable of containing one <OPENING_QUOTATION> element, but you wouldn't necessarily want to force all <CHAPTER> elements to have an <OPENING_QUOTATION> element. Using the ? operator, you can do that.

Here's an example; in this case, I'm allowing the <DOCUMENT> element to contain only zero or one <CUSTOMER> element (rather a limited clientele):

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)?> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER> </DOCUMENT>

We've advanced a little in DTD power now by allowing multiple child elements, but so far, we've allowed only child elements of the same type in any one declared element but that's about to change.

DTD Sequences

You can specify exactly what child elements a particular element can contain, and in what order, by using a sequence. A sequence is a comma-separated list of element names that tells the XML processor what elements must appear, and in what order.

For example, say that we want to change the <CUSTOMER> element so that instead of containing only PCDATA, it can contain other elements. Here, I'll let the <CUSTOMER> element contain one <NAME> element, one <DATE> element, and one <ORDERS> element, in exactly that order. The resulting declaration looks like this:

<!ELEMENT CUSTOMER (NAME,DATE,ORDERS)>

I can break this down further, of course; for example, I can specify that the <NAME> element must contain exactly one <LAST_NAME> element and one <FIRST_NAME> element, in that order, like this:

<!ELEMENT NAME (LAST_NAME,FIRST_NAME)>

White space doesn't matter, of course, so the same declaration could look like this:

<!ELEMENT   NAME        (LAST_NAME,      FIRST_NAME)>

Being able to specify the exact order that the elements in your document must take can be great when you're working with software that relies on such an order.

Here's how I'll elaborate the order.xml document to include the previous two sequences as well as a third one that makes sure that the <ITEM> element contains exactly one <PRODUCT> element, one <NUMBER> element, and one <PRICE> element, in that order. The resulting DTD enforces the syntax of the order.xml document that we developed in the previous chapter, and you can see the whole document, complete with working DTD, here:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$14.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$1.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

You can use the same element in a sequence a number of times, if you want. For example, here's how I make sure that the <CUSTOMER> element should hold exactly three <NAME> elements:

<!ELEMENT CUSTOMER (NAME,NAME,NAME)>

Here's another important note: You can use +, *, and ? operators inside sequences. For example, here's how I specify that there can be one or more <NAME> elements for a customer, an optional <CREDIT_RATING> element, any number of <DATE> elements, and a single orders element:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME+,CREDIT_RATING?,DATE*,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT CREDIT_RATING (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>             .             .             .

Using +, *, and ? inside sequences provides you with a lot of flexibility because now you can constrain how many times an element can appear in a sequence and even if it can be absent altogether.

Creating Subsequences with Parentheses

In fact, you can get even more powerful using the +, *, and ? operators inside sequences because, using parentheses, you can create subsequences that is, sequences inside sequences.

For example, say that I wanted the <CUSTOMER> element to be capable of holding one or more <NAME> element; for each <NAME> element, I also want to allow a possible <CREDIT_RATING> element. I can do that like this, creating the subsequence (NAME,CREDIT_RATING?) and allowing that subsequence to appear one or more times in the <CUSTOMER> element:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER ((NAME,CREDIT_RATING?)+,DATE*,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT CREDIT_RATING (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$14.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$1.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Defining subsequences like this, and using the +, *, and ? syntax, allows you to be very flexible when defining elements. Here's another example; in this case, I'm declaring an element named <COMMENTS> that must contain a <DATE> element and that then can contain one or more sequences of <TITLE>, <AUTHOR>, and <TEXT> elements:

<!ELEMENT COMMENTS (DATE,(TITLE,AUTHOR,TEXT)+)>

Choices

Besides using sequences, you can also use choices in DTDs. A choice lets you specify that one of a number of elements will appear at that particular location. Here's how a choice specifying one of the elements <a> or <b> or <c> looks: (a | b | c). When you use this expression, the XML processor knows that exactly one of the <a> or <b> or <c> elements can appear.

I'll put choices to work in the order.xml example now; in this case, I'll specify that the <ITEM> element must enclose a <PRODUCT> element, a <NUMBER> element, and exactly one element from the list <PRICE>, <CHARGEACCT>, and <SAMPLE>:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT, NUMBER, (PRICE | CHARGEACCT | SAMPLE))> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT CHARGEACCT (#PCDATA)> <!ELEMENT SAMPLE (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <SAMPLE>No Charge</SAMPLE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <CHARGEACCT>299930</CHARGEACCT>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <CHARGEACCT>299931</CHARGEACCT>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <CHARGEACCT>299932</CHARGEACCT>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

As you might expect, you can use the +, *, and ? with choices as well; here, I'm allowing one or more elements selected from a choice to appear in the <ITEM> element and allowing the choice to return any number of <CHARGEACCT> elements:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT, NUMBER, (PRICE | CHARGEACCT* | SAMPLE)+)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT CHARGEACCT (#PCDATA)> <!ELEMENT SAMPLE (#PCDATA)> ]>     .     .     .

As you can see, DTD syntax enables you to specify syntax fairly exactly (unless you want to specify a range or number of times that an element can appear, or its exact data type, of course). In fact, you can use two more content models as well mixed content models and empty content models.

Mixed Content

It is actually possible to specify that an element can contain both PCDATA and other elements; such a content model is called mixed. To specify a mixed content model, just list #PCDATA along with the child elements that you want to allow:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT, NUMBER, PRICE)> <!ELEMENT PRODUCT (#PCDATA | PRODUCT_ID)*> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT PRODUCT_ID (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>                     <PRODUCT_ID>                         124829548702121                     </PRODUCT_ID>                 </PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$14.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$1.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

However, there is a big drawback to using the mixed content model you can specify only the names of the child elements that can occur. You cannot set the child elements' order or number of occurrences. And inside the mixed content model, you cannot use the +, *, or ? operators.

Because of these severe restrictions, I suggest avoiding the mixed content model. You're almost always better off declaring a new element that can hold PCDATA and including that in a standard content model instead.

Why Use the Mixed Content Model?

One possible situation to use the mixed content model is when you're translating simple text documents into XML: Using the mixed content model can handle the case in which part of the document is in XML and part in simple text.

Empty Elements

The last remaining DTD content model is the empty content model. In this case, the elements that you declare cannot hold any content (either PCDATA or other elements).

Declaring an element to be empty is easy; you just use the keyword EMPTY, like this:

<!ELEMENT CREDIT_WARNING EMPTY>

Now you can use this new element, <CREDIT_WARNING>, like this:

<CREDIT_WARNING />

Note that although empty elements cannot contain any content, they can have attributes (such as the XHTML <img> element) we'll see how to add attributes to element declarations in the next chapter. Here's how I declare and put the <CREDIT_WARNING> element to work:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (CREDIT_WARNING?,NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT, NUMBER, PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT CREDIT_WARNING EMPTY> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <CREDIT_WARNING />         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$14.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$1.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

DTD Comments

As you can see, DTDs can become fairly complex, especially in longer and more involved documents. To make things easier for the DTD author, the XML specification allows you to place comments inside DTDs.

DTD comments are just like normal XML comments in fact, they are normal XML comments, and they're often stripped out by the XML processor. (W3C allows XML processors to remove comments, but some processors pass comments on to the underlying application.) Here's an example where I have added comments to order.xml:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!-- DOCUMENT is the root element --> <!ELEMENT DOCUMENT (CUSTOMER)*> <!-- CUSTOMER stores customer data --> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!-- NAME stores the customer name--> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!-- LAST_NAME stores customer's last name --> <!ELEMENT LAST_NAME (#PCDATA)> <!-- FIRST_NAME stores customer's last name --> <!ELEMENT FIRST_NAME (#PCDATA)> <!-- DATE stores order date --> <!ELEMENT DATE (#PCDATA)> <!-- ORDERS stores customer orders --> <!ELEMENT ORDERS (ITEM)*> <!-- ITEM represents a customer purchase --> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!-- PRODUCT represents a purchased product --> <!ELEMENT PRODUCT (#PCDATA)> <!-- NUMBER indicates the number of the item purchased --> <!ELEMENT NUMBER (#PCDATA)> <!-- PRICE is the item's price --> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$14.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$1.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

A DTD Example

Because being able to create DTDs is an essential XML skill these days (at least, until XML schemas are widely supported), I'll work through another example here.

This new example is a model for a book, complete with <CHAPTER>, <SECTION>, <PART>, and <SUBTITLE> elements. Here's what the document will look like:

<?xml version="1.0"?> <!DOCTYPE BOOK [     <!ELEMENT p (#PCDATA)>     <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)>     <!ELEMENT OPENER       (TITLE_TEXT)*>     <!ELEMENT TITLE_TEXT   (#PCDATA)>     <!ELEMENT SUBTITLE     (#PCDATA)>     <!ELEMENT INTRODUCTION (HEADER, p+)+>     <!ELEMENT PART         (HEADER, CHAPTER+)>     <!ELEMENT SECTION      (HEADER, p+)>     <!ELEMENT HEADER       (#PCDATA)>     <!ELEMENT CHAPTER      (CHAPTER_NUMBER, CHAPTER_TEXT)>     <!ELEMENT CHAPTER_NUMBER (#PCDATA)>     <!ELEMENT CHAPTER_TEXT (p)+> ]> <BOOK>     <OPENER>         <TITLE_TEXT>             All About Me         </TITLE_TEXT>     </OPENER>     <PART>         <HEADER>Welcome To My Book</HEADER>         <CHAPTER>             <CHAPTER_NUMBER>CHAPTER 1</CHAPTER_NUMBER>             <CHAPTER_TEXT>                 <p>Glad you want to hear about me.</p>                 <p>There's so much to say!</p>                 <p>Where should we start?</p>                 <p>How about more about me?</p>             </CHAPTER_TEXT>         </CHAPTER>     </PART> </BOOK>

In this case, I'll start the DTD by declaring the <p> element, which I want to hold text only that is, PCDATA, which you specify with #PCDATA:

<!ELEMENT p            (#PCDATA)>         .         .         .

Next, I'll declare the <BOOK> element, which is the root element. In this case, the <BOOK> element contains an <OPENER> element, possibly a <SUBTITLE> element, possibly an <INTRODUCTION> element, and one or more sections or parts declared with the <SECTION> and <PART> elements:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)>         .         .         .

Now I will declare the <OPENER> element. This element will hold the title text for the chapter, which I'll store in <TITLE_TEXT> elements:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*>         .         .         .

I'll declare the <TITLE_TEXT> element so that it contains plain text:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT   (#PCDATA)>         .         .         .

And I declare the <SUBTITLE> element, which must also contain PCDATA:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT   (#PCDATA)> <!ELEMENT SUBTITLE     (#PCDATA)>         .         .         .

I'll set up the <INTRODUCTION> element so that it contains a <HEADER> element and so that it must contain one or more <p> elements. I'll then allow that sequence to repeat, like this:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT   (#PCDATA)> <!ELEMENT SUBTITLE     (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+>         .         .         .

Next, the <PART> element contains a <HEADER> and one or more <CHAPTER> elements:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT   (#PCDATA)> <!ELEMENT SUBTITLE     (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> <!ELEMENT PART         (HEADER, CHAPTER+)>         .         .         .

In addition, I'll specify that the <CHAPTER> element must contain a <CHAPTER_NUMBER> and <CHAPTER_TEXT> element:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT   (#PCDATA)> <!ELEMENT SUBTITLE     (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> <!ELEMENT PART         (HEADER, CHAPTER+)> <!ELEMENT SECTION      (HEADER, p+)> <!ELEMENT HEADER       (#PCDATA)>     <!ELEMENT CHAPTER  (CHAPTER_NUMBER, CHAPTER_TEXT)>         .         .         .

The <CHAPTER_NUMBER> element contains parsed character data:

<!ELEMENT p            (#PCDATA)> <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER       (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT   (#PCDATA)> <!ELEMENT SUBTITLE     (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> <!ELEMENT PART         (HEADER, CHAPTER+)> <!ELEMENT SECTION      (HEADER, p+)> <!ELEMENT HEADER       (#PCDATA)> <!ELEMENT CHAPTER  (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)>         .         .         .

Finally, the <CHAPTER_TEXT> element can contain <p> paragraph elements:

<!ELEMENT p (#PCDATA)>     <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)>     <!ELEMENT OPENER       (TITLE_TEXT)*>     <!ELEMENT TITLE_TEXT   (#PCDATA)>     <!ELEMENT SUBTITLE     (#PCDATA)>     <!ELEMENT INTRODUCTION (HEADER, p+)+>     <!ELEMENT PART         (HEADER, CHAPTER+)>     <!ELEMENT SECTION      (HEADER, p+)>     <!ELEMENT HEADER       (#PCDATA)>     <!ELEMENT CHAPTER      (CHAPTER_NUMBER, CHAPTER_TEXT)>     <!ELEMENT CHAPTER_NUMBER (#PCDATA)>     <!ELEMENT CHAPTER_TEXT (p)+>

And that's it; the DTD is finished. Here's how it looks in the <!DOCTYPE> element in the complete document:

<?xml version="1.0"?> <!DOCTYPE BOOK [     <!ELEMENT p (#PCDATA)>     <!ELEMENT BOOK         (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)>     <!ELEMENT OPENER       (TITLE_TEXT)*>     <!ELEMENT TITLE_TEXT   (#PCDATA)>     <!ELEMENT SUBTITLE     (#PCDATA)>     <!ELEMENT INTRODUCTION (HEADER, p+)+>     <!ELEMENT PART         (HEADER, CHAPTER+)>     <!ELEMENT SECTION      (HEADER, p+)>     <!ELEMENT HEADER       (#PCDATA)>     <!ELEMENT CHAPTER      (CHAPTER_NUMBER, CHAPTER_TEXT)>     <!ELEMENT CHAPTER_NUMBER (#PCDATA)>     <!ELEMENT CHAPTER_TEXT (p)+> ]> <BOOK>     <OPENER>         <TITLE_TEXT>             All About Me         </TITLE_TEXT>     </OPENER>     <PART>         <HEADER>Welcome To My Book</HEADER>         <CHAPTER>             <CHAPTER_NUMBER>CHAPTER 1</CHAPTER_NUMBER>             <CHAPTER_TEXT>                 <p>Glad you want to hear about me.</p>                 <p>There's so much to say!</p>                 <p>Where should we start?</p>                 <p>How about more about me?</p>             </CHAPTER_TEXT>         </CHAPTER>     </PART> </BOOK>

External DTDs

The DTDs I've created in this chapter so far have all been built into the documents that they are targeted for. However, you can also create and use external DTDs, where the actual DTD is stored in an external file (usually with the extension .dtd).

Using external DTDs makes it easy to create an XML application that can be shared by many people in fact, that's the way many XML applications are supported. There are two ways to specify external DTDs as private DTDs or as public DTDs. I'll take a look at private DTDs first.

Private DTDs are intended for use by people or groups privately not for public distribution. You specify an external private DTD with the SYSTEM keyword in the <!DOCTYPE> element like this (notice also that because this document now depends on an external file, the DTD file order.dtd, I've changed the value of the standalone attribute from "yes" to "no"):

<?xml version = "1.0" standalone="no"?> <!DOCTYPE DOCUMENT SYSTEM "order.dtd"> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Here's the file order.dtd that holds the external DTD note that it simply holds the part of the document that was originally between the [ and ] in the <!DOCTYPE> element:

<!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>

Using Document Type Definitions with URLs

The previous example just listed the name of an external DTD in the <!DOCTYPE> element, but if the DTD is not in the same directory on the Web site as the document itself, you can specify a URI (which is currently implemented as URLs for today's XML processors) for the DTD like this:

<?xml version = "1.0" standalone="no"?> <!DOCTYPE DOCUMENT SYSTEM     "http://www.starpowder.com/dtd/order.dtd"> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

This is also very useful, of course, if you're using someone else's DTD. In fact, there's a special way of using DTDs intended for public distribution.

Public Document Type Definitions

When you have a DTD that's intended for public use, you use the PUBLIC keyword instead of SYSTEM in the <!DOCTYPE> document type declaration. To use the PUBLIC keyword, you must also create a formal public identifier (FPI), and there are specific rules for FPIs:

The first field in an FPI specifies the connection of the DTD to a formal standard. For DTDs that you're defining yourself, this field should be "-". If a nonstandards body has approved the DTD, use "+". For formal standards, this field is a reference to the standard itself (such as ISO/IEC 13449:2000).
The second field must hold the name of the group or person that is going to maintain or be responsible for the DTD. In this case, you should use a name that is unique and that identifies your group easily (for example, W3C simply uses W3C).
The third field must indicate the type of document that is described, preferably followed by a unique identifier of some kind (such as Version 1.0). This part should include a version number that you'll update.
The fourth field specifies the language that your DTD uses (for example, for English you use EN). Note that two-letter language specifiers allow only a maximum of 24 x 24 = 576 possible languages; expect to see three-letter language specifiers in the near future.
Fields in an FPI must be separated by double slash (//).

Here's how I can modify the previous example to include a public DTD, complete with its own FPI:

<?xml version = "1.0" standalone="no"?> <!DOCTYPE DOCUMENT PUBLIC "-//starpowder//Custom XML Version 1.0//EN"     "http://www.starpowder.com/steve/order.dtd"> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>$4.98</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Note the syntax of the <!DOCTYPE> element in this case: <!DOCTYPE rootname PUBLIC FPI URL>. Here's the external DTD, order.dtd, which is the same as in the previous example:

<!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>

Using Both Internal and External DTDs

In fact, you can use both internal and external DTDs at the same time, using these forms of the <!DOCTYPE> element: <!DOCTYPE rootname SYSTEM URL [DTD]> for private external DTDs and <!DOCTYPE rootname PUBLIC FPI URL [DTD]> for public external DTDs. In this case, the external DTD is specified by URL and the internal one by DTD.

Here's an example where I've removed the <PRODUCT> element from the external DTD order.dtd:

<!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>

Now I'll specify that I want to use this external DTD in the document's <!DOCTYPE> element but also add square brackets, [ and ], to enclose an internal DTD as well:

<?xml version = "1.0" standalone="no"?> <!DOCTYPE DOCUMENT SYSTEM "order.dtd" [     .     .     . ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>                     <PRODUCT_ID>                         198348209                     </PRODUCT_ID>                 </PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>                     <PRODUCT_ID>                         198348206                     </PRODUCT_ID>                 </PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Next, I add the declaration of the <PRODUCT> element to the internal part of the DTD, like this:

<?xml version = "1.0" standalone="no"?> <!DOCTYPE DOCUMENT SYSTEM "order.dtd" [ <!ELEMENT PRODUCT (PRODUCT_ID)> <!ELEMENT PRODUCT_ID (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>                     <PRODUCT_ID>                         198348209                     </PRODUCT_ID>                 </PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>                     <PRODUCT_ID>                         198348206                     </PRODUCT_ID>                 </PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

And that's all it takes; now this DTD uses both internal and external parts.

If It's Both Internal and External, Which Takes Precedence?

Theoretically, if an element or attribute is defined in both an internal and external DTD, the definition in the internal DTD is supposed to take precedence, overwriting the external definition. Things were arranged that way to let you customize external DTDs as you like. However, my experience is that most XML processors simply consider it an error if there is an element or attribute conflict between internal and external DTDs, and they usually just halt.

Namespaces and DTDs

There's one more topic that I want to cover now that we're discussing the basics of creating DTDs how to use namespaces when you're using DTDs. In fact, this will give us an introduction to the next chapter, where we'll work with declaring attributes as well as elements.

The important thing to recall is that as far as standard XML processors are concerned, namespace prefixes are just text prepended to tag and attribute names with a colon, so they change those tag and attribute names. This means that those names must be declared, with their prefixes, in the DTD.

Here's an example; I'll start with the easy case where I'm using a default namespace like this:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT xmlns="http://www.starpowder.com/dtd/">     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>     .     .     .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Some validating XML processors aren't going to understand the xmlns attribute that you use to declare a namespace. (See the later section "Validating Against a DTD," for details on XML validators.) This means that you must declare that attribute like this; here, I'm using the <!ATTLIST> element (as we'll see how to do in the next chapter) to declare this attribute, indicating that the xmlns attribute has a fixed value, which I'm setting to the namespace identifier, "http://www.starpowder.com/dtd/":

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ATTLIST DOCUMENT     xmlns CDATA #FIXED "http://www.starpowder.com/dtd/"> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT xmlns="http://www.starpowder.com/dtd/">     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>     .     .     .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Now I'm free to use the xmlns attribute in the root element. That's all there is to setting up a default namespace when using DTDs.

However, if you want to use a namespace prefix throughout a document, the process is a little more involved. In this next example, I use the namespace prefix doc: for the namespace "http://www.starpowder.com/dtd/". To do that, I declare a new attribute, xmlns:doc, and use that attribute in the root element like this to set up the namespace:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ATTLIST doc:DOCUMENT     xmlns:doc CDATA #FIXED "http://www.starpowder.com/dtd/"> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT xmlns:doc="http://www.starpowder.com/dtd/">     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>     .     .     .             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Now I can use the doc: prefix throughout the document, where necessary:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE doc:DOCUMENT [ <!ELEMENT doc:DOCUMENT (doc:CUSTOMER)*> <!ATTLIST doc:DOCUMENT     xmlns:doc CDATA #FIXED "http://www.starpowder.com/dtd/"> <!ELEMENT doc:CUSTOMER (doc:NAME,doc:DATE,doc:ORDERS)> <!ELEMENT doc:NAME (doc:LAST_NAME,doc:FIRST_NAME)> <!ELEMENT doc:LAST_NAME (#PCDATA)> <!ELEMENT doc:FIRST_NAME (#PCDATA)> <!ELEMENT doc:DATE (#PCDATA)> <!ELEMENT doc:ORDERS (doc:ITEM)*> <!ELEMENT doc:ITEM (doc:PRODUCT,doc:NUMBER,doc:PRICE)> <!ELEMENT doc:PRODUCT (#PCDATA)> <!ELEMENT doc:NUMBER (#PCDATA)> <!ELEMENT doc:PRICE (#PCDATA)> ]> <doc:DOCUMENT xmlns:doc="http://www.starpowder.com/dtd/">     <doc:CUSTOMER>         <doc:NAME>             <doc:LAST_NAME>Smith</doc:LAST_NAME>             <doc:FIRST_NAME>Sam</doc:FIRST_NAME>         </doc:NAME>         <doc:DATE>October 15, 2001</doc:DATE>         <doc:ORDERS>             <doc:ITEM>                 <doc:PRODUCT>Tomatoes</doc:PRODUCT>                 <doc:NUMBER>8</doc:NUMBER>                 <doc:PRICE>$1.25</doc:PRICE>             </doc:ITEM>             <doc:ITEM>                 <doc:PRODUCT>Oranges</doc:PRODUCT>                 <doc:NUMBER>24</doc:NUMBER>                 <doc:PRICE>$4.98</doc:PRICE>             </doc:ITEM>         </doc:ORDERS>     </doc:CUSTOMER>     <doc:CUSTOMER>         <doc:NAME>             <doc:LAST_NAME>Jones</doc:LAST_NAME>             <doc:FIRST_NAME>Polly</doc:FIRST_NAME>         </doc:NAME>         <doc:DATE>October 20, 2001</doc:DATE>         <doc:ORDERS>             <doc:ITEM>                 <doc:PRODUCT>Bread</doc:PRODUCT>                 <doc:NUMBER>12</doc:NUMBER>                 <doc:PRICE>$14.95</doc:PRICE>             </doc:ITEM>             <doc:ITEM>                 <doc:PRODUCT>Apples</doc:PRODUCT>                 <doc:NUMBER>6</doc:NUMBER>                 <doc:PRICE>$1.50</doc:PRICE>             </doc:ITEM>         </doc:ORDERS>     </doc:CUSTOMER>     <doc:CUSTOMER>         <doc:NAME>             <doc:LAST_NAME>Weber</doc:LAST_NAME>             <doc:FIRST_NAME>Bill</doc:FIRST_NAME>         </doc:NAME>         <doc:DATE>October 25, 2001</doc:DATE>         <doc:ORDERS>             <doc:ITEM>                 <doc:PRODUCT>Asparagus</doc:PRODUCT>                 <doc:NUMBER>12</doc:NUMBER>                 <doc:PRICE>$2.95</doc:PRICE>             </doc:ITEM>             <doc:ITEM>                 <doc:PRODUCT>Lettuce</doc:PRODUCT>                 <doc:NUMBER>6</doc:NUMBER>                 <doc:PRICE>$11.50</doc:PRICE>             </doc:ITEM>         </doc:ORDERS>     </doc:CUSTOMER> </doc:DOCUMENT>

And that's all it takes now this document, complete with namespace, is valid. This example has introduced us to a very important topic declaring attributes in DTDs. I'll take a look at how that works in the next chapter.

Validating Against a DTD

How do you really know if your XML document is valid? One way is to check it with an XML validator, and there are plenty out there to choose from. As explained in Chapter 1, validators are packages that will check your XML and give you feedback. For example, if you have the XML for Java parser from IBM's AlphaWorks installed, you can use the DOMWriter program as a complete XML validator. In Chapter 1, I created this document, greeting.xml:

<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT>     <GREETING>         Hello From XML     </GREETING>     <MESSAGE>         Welcome to the wild and woolly world of XML.     </MESSAGE> </DOCUMENT>

I tested this document using DOMWriter (we'll see more about the command-line syntax required in Chapter 11, "Java and the XML DOM"):

%java dom.DOMWriter greeting.xml greeting.xml: [Error] greeting.xml:2:11: Element type "DOCUMENT" must be declared [Error] greeting.xml:3:15: Element type "GREETING" must be declared [Error] greeting.xml:6:14: Element type "MESSAGE" must be declared. <?xml version="1.0" encoding="UTF-8"?> <DOCUMENT>     <GREETING>         Hello From XML     </GREETING>     <MESSAGE>         Welcome to the wild and woolly world of XML.     </MESSAGE> </DOCUMENT>

In this case, DOMWriter is complaining about the lack of a DTD in greeting.xml, which means that it can't check for the validity of the document.

Here's a list of some of the XML validators on the Web:

http://validator.w3.org/. The official W3C HTML validator. Although it's officially for HTML, it also includes some XML support. Your XML document must be online to be checked with this validator.
http://www.w3.org/People/Raggett/tidy/. Tidy is a beloved utility for cleaning up and repairing Web pages, and it includes limited support for XML. Your XML document must be online to be checked with this validator.
http://www.xml.com/xml/pub/tools/ruwf/check.html. This is XML.com's XML validator based on the Lark processor. Your XML document must be online to be checked with this validator.
http://www.ltg.ed.ac.uk/~richard/xml-check.html. The Language Technology Group at the University of Edinburgh's validator is based on the RXP parser. Your XML document must be online to be checked with this validator.
http://www.stg.brown.edu/service/xmlvalid/. This is an excellent XML validator from the Scholarly Technology Group at Brown University. This is the only online XML validator that I know of that allows you to check XML documents that are not online you can use the Web page's file upload control to specify the name of the file on your hard disk that you want to have uploaded and checked.

To see one of these online validators at work, take a look at Figure 3.1. There, I'm asking the XML validator from the Scholarly Technology Group to validate greeting.xml after I've added a DTD and purposely exchanged the order of the <MESSAGE> and </GREETING> tags:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE DOCUMENT [     <!ELEMENT DOCUMENT (GREETING, MESSAGE)>     <!ELEMENT GREETING (#PCDATA)>     <!ELEMENT MESSAGE (#PCDATA)> ]> <DOCUMENT>     <GREETING>         Hello From XML     <MESSAGE>     </GREETING>         Welcome to the wild and woolly world of XML.     </MESSAGE> </DOCUMENT>

Figure 3.2 shows the results; the validator indicates that there's a problem with these two tags.

Figure 3.1. Using an XML validator.

graphics/03fig01.gif

Figure 3.2. The results from an XML validator.

graphics/03fig02.gif

In general, then, you can use a validator to check your document, and there are plenty around. Validators can help a great deal as you're writing long and difficult XML documents because you can often check them at each development stage.

CONTENTS