Document Type Definitions (DTDs)

Document Type Definitions (DTDs) are one of two main types of documents you can use to specify XML document structure. Section 19.6 presents W3C XML Schema documents, which provide an improved method of specifying XML document structure.

Software Engineering Observation 19 2

XML documents can have many different structures, and for this reason an application cannot be certain whether a particular document it receives is complete, ordered properly, and not missing data. DTDs and schemas (Section 19.6) solve this problem by providing an extensible way to describe XML document structure. Applications should use DTDs or schemas to confirm whether XML documents are valid.

Software Engineering Observation 19 3

Many organizations and individuals are creating DTDs and schemas for a broad range of applications. These collectionscalled repositoriesare available free for download from the Web (e.g., www.xml.org, www.oasis-open.org).

 

Creating a Document Type Definition

Figure 19.4 presented a simple business letter marked up with XML. Recall that line 5 of letter.xml references a DTDletter.dtd (Fig. 19.9). This DTS specifies the business letter's element types and attributes, and their relationships to one another.

Figure 19.9. Document Type Definition (DTD) for a business letter.

(This item is displayed on page 945 in the print version)

 1 
 2 
 3
 4  letter ( contact+, salutation, paragraph+,
 5  closing, signature )> 
 6
 7  contact ( name, address1, address2, city, state,
 8 zip, phone, flag )>
 9  contact type CDATA #IMPLIED>
10
11  name ( #PCDATA )>
12  address1 ( #PCDATA )>
13  address2 ( #PCDATA )>
14  city ( #PCDATA )>
15  state ( #PCDATA )>
16  zip ( #PCDATA )>
17  phone ( #PCDATA )>
18  flag EMPTY>
19  flag gender (M | F) "M">
20
21  salutation ( #PCDATA )>
22  closing ( #PCDATA )>
23  paragraph ( #PCDATA )>
24  signature ( #PCDATA )>

A DTD describes the structure of an XML document and enables an XML parser to verify whether an XML document is valid (i.e., whether its elements contain the proper attributes and appear in the proper sequence). DTDs allow users to check document structure and to exchange data in a standardized format. A DTD expresses the set of rules for document structure using an EBNF (Extended Backus-Naur Form) grammar. [Note: EBNF grammars are commonly used to define programming languages. For more information on EBNF grammars, please see en.wikipedia.org/wiki/EBNF or www.garshol.priv.no/download/text/bnf.html.]

Common Programming Error 19 8

For documents validated with DTDs, any document that uses elements, attributes or relationships not explicitly defined by a DTD is an invalid document.

 

Defining Elements in a DTD

The ELEMENT element type declaration in lines 45 defines the rules for element letter. In this case, letter contains one or more contact elements, one salutation element, one or more paragraph elements, one closing element and one signature element, in that sequence. The plus sign (+) occurrence indicator specifies that the DTD allows one or more occurrences of an element. Other occurence indicators include the asterisk (*), which indicates an optional element that can occur zero or more times, and the question mark (?), which indicates an optional element that can occur at most once (i.e., zero or one occurrence). If an element does not have an occurrence indicator, the DTD allows exactly one occurrence.

The contact element type declaration (lines 78) specifies that a contact element contains child elements name, address1, address2, city, state, zip, phone and flagin that order. The DTD requires exactly one occurrence of each of these elements.

Defining Attributes in a DTD

Line 9 uses the ATTLIST attribute-list declaration to define an attribute named type for the contact element. Keyword #IMPLIED specifies that if the parser finds a contact element without a type attribute, the parser can choose an arbitrary value for the attribute or can ignore the attribute. Either way the document will still be valid (if the rest of the document is valid)a missing type attribute will not invalidate the document. Other keywords that can be used in place of #IMPLIED in an ATTLIST declaration include #REQUIRED and #FIXED. Keyword #REQUIRED specifies that the attribute must be present in the element, and keyword #FIXED specifies that the attribute (if present) must have the given fixed value. For example,

 address zip CDATA #FIXED "01757">

indicates that attribute zip (if present in element address) must have the value 01757 for the document to be valid. If the attribute is not present, then the parser, by default, uses the fixed value that the ATTLIST declaration specifies.

Character Data vs. Parsed Character Data

Keyword CDATA (line 9) specifies that attribute type contains character data (i.e., a string). A parser will pass such data to an application without modification.

Software Engineering Observation 19 4

DTD syntax does not provide a mechanism for describing an element's (or attribute's) data type. For example, a DTD cannot specify that a particular element or attribute can contain only integer data.

Keyword #PCDATA (line 11) specifies that an element (e.g., name) may contain parsed character data (i.e., data that is processed by an XML parser). Elements with parsed character data cannot contain markup characters, such as less than (<), greater than (>) or ampersand (&). The document author should replace any markup character in a #PCDATA element with the character's corresponding character entity reference. For example, the character entity reference < should be used in place of the less-than symbol (<), and the character entity reference > should be used in place of the greater-than symbol (>). A document author who wishes to use a literal ampersand should use the entity reference & insteadparsed character data can contain ampersands (&) only for inserting entities. See Appendix H for a list of other character entity references.

Common Programming Error 19 9

Using markup characters (e.g., <, > and &) in parsed character data is an error. Use character entity references (e.g., <, > and & instead).

 

Defining Empty Elements in a DTD

Line 18 defines an empty element named flag. Keyword EMPTY specifies that the element does not contain any data between its start and end tags. Empty elements commonly describe data via attributes. For example, flag's data appears in its gender attribute (line 19). Line 19 specifies that the gender attribute's value must be one of the enumerated values (M or F) enclosed in parentheses and delimited by a vertical bar (|) meaning "or." Note that line 19 also indicates that gender has a default value of M.

Well-Formed Documents vs. Valid Documents

In Section 19.3, we demonstrated how to use the Microsoft XML Validator to validate an XML document against its specified DTD. The validation revealed that the XML document letter.xml (Fig. 19.4) is well formed and validit conforms to letter.dtd (Fig. 19.9). Recall that a well-formed document is syntactically correct (i.e., each start tag has a corresponding end tag, the document contains only one root element, etc.), and a valid document contains the proper elements with the proper attributes in the proper sequence. An XML document cannot be valid unless it is well formed.

When a document fails to conform to a DTD or a schema, the Microsoft XML Validator displays an error message. For example, the DTD in Fig. 19.9 indicates that a contact element must contain the child element name. A document that omits this child element is still well formed, but is not valid. In such a scenario, Microsoft XML Validator displays the error message shown in Fig. 19.10.

Figure 19.10. XML Validator displaying an error message.


Preface

Index

    Introduction to Computers, the Internet and Visual C#

    Introduction to the Visual C# 2005 Express Edition IDE

    Introduction to C# Applications

    Introduction to Classes and Objects

    Control Statements: Part 1

    Control Statements: Part 2

    Methods: A Deeper Look

    Arrays

    Classes and Objects: A Deeper Look

    Object-Oriented Programming: Inheritance

    Polymorphism, Interfaces & Operator Overloading

    Exception Handling

    Graphical User Interface Concepts: Part 1

    Graphical User Interface Concepts: Part 2

    Multithreading

    Strings, Characters and Regular Expressions

    Graphics and Multimedia

    Files and Streams

    Extensible Markup Language (XML)

    Database, SQL and ADO.NET

    ASP.NET 2.0, Web Forms and Web Controls

    Web Services

    Networking: Streams-Based Sockets and Datagrams

    Searching and Sorting

    Data Structures

    Generics

    Collections

    Appendix A. Operator Precedence Chart

    Appendix B. Number Systems

    Appendix C. Using the Visual Studio 2005 Debugger

    Appendix D. ASCII Character Set

    Appendix E. Unicode®

    Appendix F. Introduction to XHTML: Part 1

    Appendix G. Introduction to XHTML: Part 2

    Appendix H. HTML/XHTML Special Characters

    Appendix I. HTML/XHTML Colors

    Appendix J. ATM Case Study Code

    Appendix K. UML 2: Additional Diagram Types

    Appendix L. Simple Types

    Index



    Visual C# How to Program
    Visual C# 2005 How to Program (2nd Edition)
    ISBN: 0131525239
    EAN: 2147483647
    Year: 2004
    Pages: 600

    Flylib.com © 2008-2020.
    If you may any questions please contact us: flylib@qtcs.net