Flylib.com

Books Software

 
 
 

Chapter 4. XML Document Type Definitions


Chapter 4. XML Document Type Definitions

Document Type Definitions (DTDs) are important in data exchange. Parties exchanging data must agree on a format, and a DTD allows the specification of that format.

DTDs are used to specify the allowed syntax of an XML application [XML], including the values of entities and special properties of attributes—for example, that an attribute is a unique element identifier (ID). Familiarity with DTDs is useful because they are a fundamental part of XML parsing. In this book, we use DTDs to specify the syntax for XML signatures and some other XML security structures.

Recently, the W3C devised a new method of syntax specification, called XML Schema [Schema], which is described in detail in Chapter 5. Schemas are used in the XML Security standards and this book as the more authoritative syntax specification. Although schemas provide a more precise description and are better suited to handling XML namespaces [ Names ], they do not eliminate the need for DTDs. Also, as schemas are such a recent addition to the XML arsenal, fewer tools are available for handling schemas than for working with DTDs.

graphics/note.gif

When the DTD of some signed XML specifies default attribute values, the expanded value of entities, and so on, it is usually necessary to also sign the DTD, as discussed in Chapters 9 and 10 on canonicalization and signatures. Otherwise, an adversary could change the DTD and, in effect, change the signed XML meaning without breaking the signature.


If you are already familiar with DTDs, you can skip the rest of this chapter.


4.1 Introduction to DTDs

An XML document consists of a prolog and a body. The document prolog contains the XML declaration and the document type declaration for that document, both of which are optional. The document type declaration specifies the root element of the document, and it can specify the DTD. The document body contains the actual marked -up document.

The markup in an XML document describes the document's storage and logical structure and associates attributes and their values with its logical structures, as described in Chapter 3. XML DTDs provide a vocabulary and syntax for describing a document's structure. The XML Recommendation provides document type declarations to define constraints on the logical structure and to support the use of predefined storage units.

The XML Recommendation [XML] defines the XML document type declaration as containing or pointing to markup declarations that provide a grammar for a class of documents. A markup declaration serves one of four purposes:

  • It is an element type declaration (see Section 4.3).

  • It is an attribute-list declaration (see Section 4.4).

  • It is an entity declaration (see Section 4.5).

  • It is a notation declaration (see Section 4.6).

This grammar is known as a Document Type Definition. A DTD defines the allowable building blocks of an XML document; that is, it defines the document structure with a list of permissible elements, attributes, nestings, and so on.

graphics/note.gif

To make matters confusing, you could potentially use the abbreviation "DTD" for two different terms. The document type declaration includes everything between the string "<!DOCTYPE" and the matching ">". It can contain the Document Type Definition (DTD), or it can contain part of and point to the DTD. When people talk about the document type declaration, they usually say "doctype declaration" for short [XML A].


The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, it can contain the markup declarations directly in an internal subset, or it can do both. The DTD for a document consists of both subsets taken together.

Because you can define a DTD within a document or reference and access it externally, a single DTD can apply to one document or to many documents. External DTDs, by convention, appear in an ASCII text file with the extension .dtd. For example:

mydtd.dtd

Elements and attribute declarations form a framework against which a parser can test documents to see whether they meet the format described by the DTD. Declarations communicate information to the parser about document content, such as the following:

  • The allowable sequence and nesting of tags

  • Attribute values, including their types and defaults

  • Names of referenced external files, whether or not they contain XML, and the format of non-XML data that might be referenced

  • Entities that might be present

Although XML documents are not required to have a DTD, a large percentage of the XML specification [XML] deals with various sorts of declarations that are allowed in DTDs.

All XML documents must be well formed, as described in Chapter 3. A validating XML processor can use the DTD to validate a document—that is, not only to require it to be well formed but also to determine whether the document conforms to the definitions. If the document can be parsed successfully with a validating parser and its DTD, the document is valid. To be valid, a document's DTD must specify all of its structure. XML documents read by a nonvalidating parser do not have to be valid.