Document Type Definitions (DTDs) are one of two main types of documents you can use to specify XML document structure. Section 19.6 presents W3C XML Schema documents, which provide an improved method of specifying XML document structure.
Creating a Document Type Definition
Figure 19.4 presented a simple business letter marked up with XML. Recall that line 5 of letter.xml references a DTDletter.dtd (Fig. 19.9). This DTS specifies the business letter's element types and attributes, and their relationships to one another.
Figure 19.9. Document Type Definition (DTD) for a business letter.
(This item is displayed on page 945 in the print version)
1 2 3 4 letter ( contact+, salutation, paragraph+, 5 closing, signature )> 6 7 contact ( name, address1, address2, city, state, 8 zip, phone, flag )> 9 contact type CDATA #IMPLIED> 10 11 name ( #PCDATA )> 12 address1 ( #PCDATA )> 13 address2 ( #PCDATA )> 14 city ( #PCDATA )> 15 state ( #PCDATA )> 16 zip ( #PCDATA )> 17 phone ( #PCDATA )> 18 flag EMPTY> 19 flag gender (M | F) "M"> 20 21 salutation ( #PCDATA )> 22 closing ( #PCDATA )> 23 paragraph ( #PCDATA )> 24 signature ( #PCDATA )>
A DTD describes the structure of an XML document and enables an XML parser to verify whether an XML document is valid (i.e., whether its elements contain the proper attributes and appear in the proper sequence). DTDs allow users to check document structure and to exchange data in a standardized format. A DTD expresses the set of rules for document structure using an EBNF (Extended Backus-Naur Form) grammar. [Note: EBNF grammars are commonly used to define programming languages. For more information on EBNF grammars, please see en.wikipedia.org/wiki/EBNF or www.garshol.priv.no/download/text/bnf.html.]
Defining Elements in a DTD
The ELEMENT element type declaration in lines 45 defines the rules for element letter. In this case, letter contains one or more contact elements, one salutation element, one or more paragraph elements, one closing element and one signature element, in that sequence. The plus sign (+) occurrence indicator specifies that the DTD allows one or more occurrences of an element. Other occurence indicators include the asterisk (*), which indicates an optional element that can occur zero or more times, and the question mark (?), which indicates an optional element that can occur at most once (i.e., zero or one occurrence). If an element does not have an occurrence indicator, the DTD allows exactly one occurrence.
The contact element type declaration (lines 78) specifies that a contact element contains child elements name, address1, address2, city, state, zip, phone and flagin that order. The DTD requires exactly one occurrence of each of these elements.
Defining Attributes in a DTD
Line 9 uses the ATTLIST attribute-list declaration to define an attribute named type for the contact element. Keyword #IMPLIED specifies that if the parser finds a contact element without a type attribute, the parser can choose an arbitrary value for the attribute or can ignore the attribute. Either way the document will still be valid (if the rest of the document is valid)a missing type attribute will not invalidate the document. Other keywords that can be used in place of #IMPLIED in an ATTLIST declaration include #REQUIRED and #FIXED. Keyword #REQUIRED specifies that the attribute must be present in the element, and keyword #FIXED specifies that the attribute (if present) must have the given fixed value. For example,
address zip CDATA #FIXED "01757">
indicates that attribute zip (if present in element address) must have the value 01757 for the document to be valid. If the attribute is not present, then the parser, by default, uses the fixed value that the ATTLIST declaration specifies.
Character Data vs. Parsed Character Data
Keyword CDATA (line 9) specifies that attribute type contains character data (i.e., a string). A parser will pass such data to an application without modification.
Keyword #PCDATA (line 11) specifies that an element (e.g., name) may contain parsed character data (i.e., data that is processed by an XML parser). Elements with parsed character data cannot contain markup characters, such as less than (<), greater than (>) or ampersand (&). The document author should replace any markup character in a #PCDATA element with the character's corresponding character entity reference. For example, the character entity reference < should be used in place of the less-than symbol (<), and the character entity reference > should be used in place of the greater-than symbol (>). A document author who wishes to use a literal ampersand should use the entity reference & insteadparsed character data can contain ampersands (&) only for inserting entities. See Appendix H for a list of other character entity references.
Defining Empty Elements in a DTD
Line 18 defines an empty element named flag. Keyword EMPTY specifies that the element does not contain any data between its start and end tags. Empty elements commonly describe data via attributes. For example, flag's data appears in its gender attribute (line 19). Line 19 specifies that the gender attribute's value must be one of the enumerated values (M or F) enclosed in parentheses and delimited by a vertical bar (|) meaning "or." Note that line 19 also indicates that gender has a default value of M.
Well-Formed Documents vs. Valid Documents
In Section 19.3, we demonstrated how to use the Microsoft XML Validator to validate an XML document against its specified DTD. The validation revealed that the XML document letter.xml (Fig. 19.4) is well formed and validit conforms to letter.dtd (Fig. 19.9). Recall that a well-formed document is syntactically correct (i.e., each start tag has a corresponding end tag, the document contains only one root element, etc.), and a valid document contains the proper elements with the proper attributes in the proper sequence. An XML document cannot be valid unless it is well formed.
When a document fails to conform to a DTD or a schema, the Microsoft XML Validator displays an error message. For example, the DTD in Fig. 19.9 indicates that a contact element must contain the child element name. A document that omits this child element is still well formed, but is not valid. In such a scenario, Microsoft XML Validator displays the error message shown in Fig. 19.10.
Figure 19.10. XML Validator displaying an error message.
Introduction to Computers, the Internet and Visual C#
Introduction to the Visual C# 2005 Express Edition IDE
Introduction to C# Applications
Introduction to Classes and Objects
Control Statements: Part 1
Control Statements: Part 2
Methods: A Deeper Look
Classes and Objects: A Deeper Look
Object-Oriented Programming: Inheritance
Polymorphism, Interfaces & Operator Overloading
Graphical User Interface Concepts: Part 1
Graphical User Interface Concepts: Part 2
Strings, Characters and Regular Expressions
Graphics and Multimedia
Files and Streams
Extensible Markup Language (XML)
Database, SQL and ADO.NET
ASP.NET 2.0, Web Forms and Web Controls
Networking: Streams-Based Sockets and Datagrams
Searching and Sorting
Appendix A. Operator Precedence Chart
Appendix B. Number Systems
Appendix C. Using the Visual Studio 2005 Debugger
Appendix D. ASCII Character Set
Appendix E. Unicode®
Appendix F. Introduction to XHTML: Part 1
Appendix G. Introduction to XHTML: Part 2
Appendix H. HTML/XHTML Special Characters
Appendix I. HTML/XHTML Colors
Appendix J. ATM Case Study Code
Appendix K. UML 2: Additional Diagram Types
Appendix L. Simple Types