The Schema Recommendation has been approved for two years as this book goes to press. I wouldn't be surprised if you find some applications still using DTDs instead of schemas. Let's take a quick look at DTDs, using the XML representation of our simple CSV address book file (limited to only ten columns for brevity). Here's the instance document with two rows, followed by the DTD.
SimpleCSV with DTD (SimpleCSV1DTD.xml)
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE SimpleCSV SYSTEM "SimpleCSV1.dtd"> <SimpleCSV> <Row> <Column01>Jones</Column01> <Column02>Mary</Column02> <Column03>312 Renner Road</Column03> <Column04>Apartment C</Column04> <Column05>Richardson</Column05> <Column06>TX</Column06> <Column07>75080</Column07> <Column08>USA</Column08> <Column09>972-996-1051</Column09> </Row> <Row> <Column01>Smith</Column01> <Column02>Sue</Column02> <Column03>Highway 118</Column03> <Column05>Terlingua</Column05> <Column06>TX</Column06> <Column07>79852</Column07> <Column10>email@example.com</Column10> </Row> </SimpleCSV>
The line that contains DOCTYPE is the document type declaration. It says that the DTD associated with this document is named SimpleCSV and that it may be found at the URI specified after the SYSTEM keyword. (We'll talk more about URIs in the Understanding Namespaces section later in this chapter.)
Here's the SimpleCSV DTD.
DTD for SimpleCSV (SimpleCSV1.dtd)
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT SimpleCSV (Row+)> <!ELEMENT Row (Column01, Column02, Column03, Column04?, Column05, Column06, Column07, Column08?, Column09?, Column10?)> <!ELEMENT Column01 (#PCDATA)> <!ELEMENT Column02 (#PCDATA)> <!ELEMENT Column03 (#PCDATA)> <!ELEMENT Column04 (#PCDATA)> <!ELEMENT Column05 (#PCDATA)> <!ELEMENT Column06 (#PCDATA)> <!ELEMENT Column07 (#PCDATA)> <!ELEMENT Column08 (#PCDATA)> <!ELEMENT Column09 (#PCDATA)> <!ELEMENT Column10 (#PCDATA)>
The general form of an Element declaration is ELEMENT, followed by the Element name , followed by the Element content description in parentheses. The last ten lines describe the column Elements. They indicate that the columns have parsed character data (PCDATA) as content. The second line specifies that a Row Element is a sequence of Elements named Column01 through Column10. The question marks after some of the columns indicate , in the Extended Backus-Naur Form notation, that they are optional. The first line shows that the SimpleCSV document is composed of a sequence of one or more Row Elements, the plus sign (+) indicating one or more.
If our document used the ColumnNumber Attribute discussed in Chapter 2, the instance document would look like this:
The relevant part of the DTD would look like this, assuming that use of the Attribute was required:
<!ELEMENT Column (#PCDATA)> <!ATTLIST Column ColumnNumber CDATA #REQUIRED>
The general form of the ATTLIST declaration has the Element name followed by a series of individual Attribute declarations. Each of these declarations is a triple. The first member of the triple is the Attribute name. The second is the Attribute Type, which is usually a CDATA string or an enumeration. The third is a requirement designation (somewhat confusingly referred to in the Recommendation as the "default declaration"). The requirement designator can take on three basic forms: (1) a value of REQUIRED, meaning that the Attribute must be present; (2) a value of IMPLIED, meaning that it may be absent; and (3) a default value for the Attribute, declared in the requirement designator. In the third form, if the Attribute is not actually present in an instance document, the parser should return this default value as if the Attribute were present in the instance document with the value. The keyword FIXED may precede the default value in the declaration, indicating that the Attribute must always have the default value.
You may also encounter entities , which function like macros in programming languages. An entity is declared with a shorthand name and its contents. It can then be used in the DTD or an instance document and an XML processor will replace the Entity reference with its content. We saw an example of this in the previous chapter with the predefined entities for special syntax characters .
Those are the basics. If you need to delve any further, check my Web site for recommendations for a few good basic books on XML. The more technically inclined (or adventurous) may also want go straight to the authoritative source on DTDs, the W3C XML 1.0 Recommendation.