Document Type Definitions express the allowed elements and attributes in a certain document type and constrain the order in which elements must appear within that document type. A DTD is often composed of a single file, which contains declarations defining the element types and attribute lists. (In theory, a DTD may span more than one file; however, the mechanism for including one file inside another parameter entities is outside the scope of this appendix.)
D.1.1 Element Type Declarations
An element is the actual instance of the structure as found in an XML document, whereas the element type defines the element, giving it a name and a structure. The form of an element type declaration is:
<!ELEMENT element-name contentspec>
The allowable content defined by contentspec is defined in terms of a simple grammar, which allows the expression of sequence, alternatives, and iteration within elements. For a formal definition of the element type declaration, see Section 3.2 of the XML 1.0 specification at http://w3.org/TR/REC-xml#NT-elementdecl. Table D-1 introduces the most common constructs.
For a document to be valid, the DTD must provide an element type declaration for every element used in the document and the contents of all of those elements must conform to the content models specified in the element type declaration. Element type declarations leave off one important aspect of elements, however: attributes.
D.1.2 Attribute List Declarations
Inside a DTD, permissible attributes are specified on a per-element basis. An attribute list declaration takes this form:
<!ATTLIST element-name attribute-definitions >
In the attribute definitions, you have to identify the attribute's name and type, whether the attribute is optional or required, and, if necessary, the attribute's default value. Unlike elements, you can specify default values for attributes, which are inserted by an application when it parses the XML document, even if they're not explicitly written in the document. Attributes can store all kinds of content, but the main types used are CDATA (character data, including entity and character references), ID (identifiers whose value must be unique within the document), and IDREF and IDREFS (which point to ID values). Attribute definitions may also specify a list of acceptable values rather than a generic type. Attribute types are only a subset of the XSD types described in Appendix C all of them are textual. Table D-2 shows some common attribute definitions.
Here's a complete attribute declaration for a fictitious animals element, which must have a name, either two or four legs, and, optionally, a note field:
<!ATTLIST animal name CDATA #REQUIRED legs (two|four) "four" notes CDATA #IMPLIED >
While attributes can be very useful for annotations, Microsoft Office tends to use element content for information that's presented directly. You can certainly use attributes, but you may find it easier to stick with elements unless you have a particular reason to choose attributes.
D.1.3 Putting it Together
To demonstrate a complete DTD, we'll explore a document and its DTD. The document is shown in Example D-1, while the DTD is shown in Example D-2.
Example D-1. A valid XML document
<?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE authors SYSTEM "http://example.com/authors.dtd"> <authors> <person abbrev="edd"> <name>Edd Dumbill</name> <nationality>British</nationality> </person> <person abbrev="simonstl"> <name>Simon St.Laurent</name> <nationality>American</nationality> </person> <person abbrev="vdv"> <name>Eric van der Vlist</name> <nationality>French</nationality> </person> </authors>
The DOCTYPE declaration at the top of Example D-1 assumes that the DTD file shown in Example D-2 has been placed on a web server at example.com. Note that the document type declaration specifies the root element of the document, not the DTD itself. (You could use the same DTD for documents that used person, name, or nationality as the root element of a valid document.)
Example D-2. The DTD for Example D-1
<!ELEMENT authors (person)* > <!ELEMENT person (name,nationality)> <!ATTLIST person abbrev CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT nationality (#PCDATA)>
The DTD defines the structures you find in the document. There is an authors element type that may contain zero or more person elements. In this document, we have three person elements. There is a person element type that must contain a name element followed by a nationality element. Each of the person elements in the document has those parts in that order. The person elements are required to have an attribute named abbrev, and all of them do. Finally, the name element type and the nationality element type can only hold textual content. All of the name and nationality elements here do that.
A validating XML 1.0 processor is required to check the input document against its DTD. If it does not validate, errors are reported to the application, which typically rejects the document. Non-validating processors will accept the document even if it doesn't conform to structures defined by the DTD, and just use the DTD for things like default values for attributes. Microsoft Office and most Microsoft tools use non-validating XML 1.0 parsers. (Schema validation is a separate process, defined long after XML 1.0 was finished.)
D.1.4 Other DTD Features
DTDs include a number of other features that aren't covered here. Parameter entities and conditional sections make it possible for developers to create more flexible DTDs, turning features on and off or reusing them. Documents can contain internal subsets in the DOCTYPE declaration, adding their own information to the document type declaration. Entity declarations make it possible for developers to create named references to content, making it simpler to include external files or characters not easily accessed from the keyboard. Notation declarations and unparsed entities make it possible to create metadata and include non-XML content, though these are rarely used. DTD do not support namespaces or XML Schema datatypes directly at all.
While Microsoft Office applications can process these features when opening a file (except for notations and unparsed entities, which it ignores), all of the DOCTYPE information is removed when the document is saved back out. Because XSD provides no support at all for entities, you can't preserve the entity information from an XML DTD in a schema and use that with Office.