The Issues with DTDs | Professional XML (Programmer to Programmer)

The previous chapter reviews DTDs and how to use them with your XML documents. It described DTDs as a means to create an XML vocabulary for your XML structures. Vocabulary is a valid word to describe how you define the structure of a document. Another word that is used just as often is schema, which is a synonym of the word vocabulary. They are interchangeable. XML Schemas are another vocabulary for your XML documents.

If DTDs were defined with the XML specification and utilized from the days of SGML, why the need for a new means of creating a vocabulary? The DTD method of defining a vocabulary for XML documents has some issues that require the change.

To understand these issues, look at a sample of a DTD that is embedded within a simple XML document. This document is presented in Listing 6-1.

Listing 6-1: An XML document with an embedded DTD

      <?xml version="1.0" encoding="UTF-8"?>      <!DOCTYPE Process [      <!ELEMENT Address (#PCDATA)>      <!ELEMENT City (#PCDATA)>      <!ELEMENT Country (#PCDATA)>      <!ELEMENT Item (#PCDATA)>      <!ELEMENT Name (#PCDATA)>      <!ELEMENT Order (Item, Quantity)>      <!ELEMENT Process (Name, Address, City, State, Country, Order)>      <!ELEMENT Quantity (#PCDATA)>      <!ELEMENT State (#PCDATA)>      ]>      <Process>       <Name>Bill Evjen</Name>       <Address>123 Main Street</Address>       <City>Saint Charles</City>       <State>Missouri</State>       <Country>USA</Country>       <Order>               <Item>52-inch Plasma</Item>               <Quantity>1</Quantity>       </Order>      </Process>

In this listing, you can see some obvious problems. The first is that a DTD definition looks nothing like XML. This makes it the process of learning DTDs more difficult than it should be. After learning the syntax of XML, the XML author also needs to learn another syntax that is quite different from XML. XML Schemas on the other hand make use of the standard XML syntax to create the vocabulary, thereby making the transition to XML Schemas quite simple.

Beyond the overall syntax provided via the DTD document, you can see that an element hierarchy is explicitly defined.

      <!ELEMENT Process (Name, Address, City, State, Country, Order)>

If you have another element besides the element <Process> that requires the same set of subelements, DTDs do not let you reuse a set declaration. Instead of providing set declaration that can be reused, DTDs have you declare the construction again. XML Schemas on the other hand allow you to create a group of elements or attributes that can be reused throughout the declaration set.

DTDs also don't give you any extensive datatyping capabilities. Instead, you can perform only simple datatyping. For an example, look at the following DTD statement:

      <!ELEMENT Quantity (#PCDATA)>

In this case, the element <Quantity> is defined with a #PCDATA statement, which means that the contents of the <Quantity> element can allow only parsed character data. You can't get much more detailed using DTDs. In the XML document that makes use of this element, you see that a number is provided as the content value.

      <Quantity>1</Quantity>

Although this statement is valid for the #PCDATA declaration, it would be better if it were more explicit and precise when stating the possible values of the <Quantity> element. Using XML Schemas, you can specify that contents of the <Quantity> element should be an int, double, long, or any of many other possible datatypes. This capability gives you tremendous power.

Another weakness of DTDs is that they give you a limited model for defining the cardinality of the possible number of times an element can appear in the XML document. You do this using one of a couple of available quantifiers. For instance:

      <!ELEMENT Mail (Name, Address+, ZipCode)>

In this case, the <Address> element being utilized as a child element of the <Mail> element and the + qualifier defines that the <Address> element must appear one or more times. Besides the + qualifier, you can also use the ? qualifier.

      <!ELEMENT Mail (Salutation?, Name, Address+, ZipCode)>

The ? qualifier states that the <Salutation> element should appear either zero or only one time within the <Mail> element. The last qualifier available in the DTD-world is the * qualifier.

      <!ELEMENT Mail (Salutation?, Name*, Address+, ZipCode)>

In this case, the <Name> element is defined as something that can appear zero or more times within the <Mail> element. You might be looking at these three qualifiers and be wondering where the problem lies, but imagine that you wanted the <Address> element to appear three times (no more or less). What if you wanted the <Address> element to appear between two and five times in the XML document? This is something that XML Schemas can do. You can get rather specific about how often items appear in the XML document. You definitely want the power to be this specific in your declarations of vocabulary definitions.

After reviewing this chapter, you see that XML Schemas provide you much more control over the vocabulary of your documents. They allow for a more specific validation process. That is why XML Schemas are the most popular methods for validating XML documents.