D.1 What Are DTDs?


Document Type Definitions express the allowed elements and attributes in a certain document type and constrain the order in which elements must appear within that document type. A DTD is often composed of a single file, which contains declarations defining the element types and attribute lists. (In theory, a DTD may span more than one file; however, the mechanism for including one file inside another parameter entities is outside the scope of this appendix.)

D.1.1 Element Type Declarations

An element is the actual instance of the structure as found in an XML document, whereas the element type defines the element, giving it a name and a structure. The form of an element type declaration is:

<!ELEMENT element-name contentspec>

The allowable content defined by contentspec is defined in terms of a simple grammar, which allows the expression of sequence, alternatives, and iteration within elements. For a formal definition of the element type declaration, see Section 3.2 of the XML 1.0 specification at http://w3.org/TR/REC-xml#NT-elementdecl. Table D-1 introduces the most common constructs.

Table D-1. Element type content specifications

Content specification

Meaning

<!ELEMENT e (#PCDATA)>

The e element may contain character data that is, text (and possibly entity and character references).

<!ELEMENT e EMPTY>

The e element has no content that is, it can only appear as <e/> or <e></e>.

<!ELEMENT e ANY>

The e element may contain character data or any other element defined in the DTD.

<!ELEMENT e (a+)>

The e element must contain at least one a element and may contain multiple a elements. (The plus means "one or more.")

<!ELEMENT e (a,b*,c+)>

The e element must contain the following sequence: one a element, followed by zero or more b elements, followed by one or more c elements. The asterisk means "zero or more."

<!ELEMENT e (#PCDATA|b)*>

The e element may contain b elements or character data, and they can all be mixed together.

<!ELEMENT e (a|b|c)* >

The e element may contain zero or more a, b, or c elements, in any order.


For a document to be valid, the DTD must provide an element type declaration for every element used in the document and the contents of all of those elements must conform to the content models specified in the element type declaration. Element type declarations leave off one important aspect of elements, however: attributes.

D.1.2 Attribute List Declarations

Inside a DTD, permissible attributes are specified on a per-element basis. An attribute list declaration takes this form:

<!ATTLIST element-name attribute-definitions >

In the attribute definitions, you have to identify the attribute's name and type, whether the attribute is optional or required, and, if necessary, the attribute's default value. Unlike elements, you can specify default values for attributes, which are inserted by an application when it parses the XML document, even if they're not explicitly written in the document. Attributes can store all kinds of content, but the main types used are CDATA (character data, including entity and character references), ID (identifiers whose value must be unique within the document), and IDREF and IDREFS (which point to ID values). Attribute definitions may also specify a list of acceptable values rather than a generic type. Attribute types are only a subset of the XSD types described in Appendix C all of them are textual. Table D-2 shows some common attribute definitions.

Table D-2. Attribute definitions

Attribute definition

Meaning

subject CDATA #REQUIRED

The subject attribute must always be present and it should contain only character data. It has no default value.

rating CDATA #IMPLIED

The rating attribute is allowed, but not mandatory. It has no default value.

play (scissors|paper|stone) "stone"

The play attribute may take only the values scissors, paper, or stone. If it is not specified, it is assumed to take the default value stone.

color CDATA #FIXED "purple"

The color attribute must take the value purple. If it is not specified on the element, the processing application provides purple as a default value.


Here's a complete attribute declaration for a fictitious animals element, which must have a name, either two or four legs, and, optionally, a note field:

<!ATTLIST animal        name CDATA #REQUIRED       legs (two|four) "four"        notes CDATA #IMPLIED >

While attributes can be very useful for annotations, Microsoft Office tends to use element content for information that's presented directly. You can certainly use attributes, but you may find it easier to stick with elements unless you have a particular reason to choose attributes.

D.1.3 Putting it Together

To demonstrate a complete DTD, we'll explore a document and its DTD. The document is shown in Example D-1, while the DTD is shown in Example D-2.

Example D-1. A valid XML document
<?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE authors SYSTEM "http://example.com/authors.dtd"> <authors>     <person abbrev="edd">         <name>Edd Dumbill</name>         <nationality>British</nationality>     </person>     <person abbrev="simonstl">         <name>Simon St.Laurent</name>         <nationality>American</nationality>     </person>     <person abbrev="vdv">         <name>Eric van der Vlist</name>         <nationality>French</nationality>     </person> </authors>

The DOCTYPE declaration at the top of Example D-1 assumes that the DTD file shown in Example D-2 has been placed on a web server at example.com. Note that the document type declaration specifies the root element of the document, not the DTD itself. (You could use the same DTD for documents that used person, name, or nationality as the root element of a valid document.)

Example D-2. The DTD for Example D-1
<!ELEMENT authors (person)* >    <!ELEMENT person (name,nationality)> <!ATTLIST person     abbrev CDATA #REQUIRED>    <!ELEMENT name (#PCDATA)> <!ELEMENT nationality (#PCDATA)>

The DTD defines the structures you find in the document. There is an authors element type that may contain zero or more person elements. In this document, we have three person elements. There is a person element type that must contain a name element followed by a nationality element. Each of the person elements in the document has those parts in that order. The person elements are required to have an attribute named abbrev, and all of them do. Finally, the name element type and the nationality element type can only hold textual content. All of the name and nationality elements here do that.

A validating XML 1.0 processor is required to check the input document against its DTD. If it does not validate, errors are reported to the application, which typically rejects the document. Non-validating processors will accept the document even if it doesn't conform to structures defined by the DTD, and just use the DTD for things like default values for attributes. Microsoft Office and most Microsoft tools use non-validating XML 1.0 parsers. (Schema validation is a separate process, defined long after XML 1.0 was finished.)

D.1.4 Other DTD Features

DTDs include a number of other features that aren't covered here. Parameter entities and conditional sections make it possible for developers to create more flexible DTDs, turning features on and off or reusing them. Documents can contain internal subsets in the DOCTYPE declaration, adding their own information to the document type declaration. Entity declarations make it possible for developers to create named references to content, making it simpler to include external files or characters not easily accessed from the keyboard. Notation declarations and unparsed entities make it possible to create metadata and include non-XML content, though these are rarely used. DTD do not support namespaces or XML Schema datatypes directly at all.

While Microsoft Office applications can process these features when opening a file (except for notations and unparsed entities, which it ignores), all of the DOCTYPE information is removed when the document is saved back out. Because XSD provides no support at all for entities, you can't preserve the entity information from an XML DTD in a schema and use that with Office.



Office 2003 XML
Office 2003 XML
ISBN: 0596005385
EAN: 2147483647
Year: 2003
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net