XML borrows the concept of Document Type Definitions (DTDs) from SGML. A DTD is a formal description of a particular class of XML documents. It defines what the XML document is supposed to mean. It is what specifies the particular XML markup that is being used within that class of XML documents. The XML markup specific to a given class of XML documents that is being defined with a DTD is specified using an XML declaration syntax. A DTD specifies what structures are permissible within an XML document. A DTD thus spells out what names are to be used for the different types of element, where they may occur, the attributes that can be assigned to elements, and how they all fit together.
Every element that can appear within an XML document needs to be declared within that document s DTD with an element declaration statement. The basic structure of an element declaration statement looks like:
<!ELEMENT element_name (content_description) ['? '* '+]>
where the ? , * , and + are wild-card references, with ? indicating that the preceding name (or associated content description) can occur zero or one time, * denoting zero or more times, and + denoting one or more times.
Thus, a simple DTD for an XML document containing contact information, as in the example used in the previous sections, may look like:
<!ELEMENT person (name, company*)> <!ELEMENT name (salutation?, first_name, middle_name*, last_name)> <!ELEMENT salutation (#PCDATA)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT middle_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ELEMENT company (#PCDATA)>
The term #PCDATA that is often seen in DTDs refers to Parsed Character Data (i.e., essentially a string of text). If an element is defined as being of type #PCDATA , then it cannot have subelements (or child elements). So, for example, if you define an element phone_number as being < !ELEMENT phone_number (#PCDATA) > , you will, within the corresponding XML document, have to just define the phone number as a string of text. You could not, in this instance, subdivide it further into separate elements corresponding to country_code, area_code, extension, and so forth. If XML attributes are permitted, they will be defined, relative to the appropriate element name, via an < !ATTLIST > declaration.
A DTD would typically be stored as a file (with a .dtd suffix), separate from the XML documents it describes. It could also be included within the XML document it describes. The location of the DTD that describes a given XML document is specified within that document via a Document Type Declaration. A typical Document Type Declaration would look like:
<!DOCTYPE contact_info SYSTEM http://www.wownh.com/ dtds/contactinfo.dtd>
The < !DOCTYPE > declaration usually follows the < ?xml version= ? > statement that kicks off an XML document. If the DTD is to be included within the XML document, then it is embedded as a part of the Document Type Declaration statement, as follows:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE person [ <!ELEMENT person (name, company*)> <!ELEMENT name (salutation?, first_name, middle_name*, last_name)> . ]> <person> <name> <salutation>Mr.</salutation> <first_name>Anura</first_name> .. .. </person>
The DTD for the XML document describing a Wordsworth poem, as shown in Figure 2.2, provided by Rutgers State University of New Jersey, would look like:
<!ELEMENT POEM (TITLE, AUTHOR, STANZA*)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT AUTHOR (FIRSTNAME, LASTNAME)> <!ELEMENT FIRSTNAME (#PCDATA)> <!ELEMENT LASTNAME (#PCDATA)> <!ELEMENT STANZA (LINE*)> <!ELEMENT LINE (#PCDATA)> <!ATTLIST LINE N CDATA #REQUIRED>
As repeatedly mentioned in this chapter, the success of XML is totally contingent on a common understanding of what an XML document represents at both ends of a transaction. Without that mutual understanding, XML is but an unnecessary overhead. DTDs are one of the ways to achieve this common understanding. XML schema is the other. Given the imperativeness of the mutual understanding if XML is to be applicable , there is now a relatively rich body of public DTDs for various industry sectors and applications (e.g., loan processing within the financial sector, property description within the context of real estate, engineering change [EC] management as it applies to supply chain management, and so forth). A collection of public DTDs for various industry sectors can be found at http://www.xml.org/xmlorg_registry. This site is maintained by the Organization for the Advancement of Structured Information Standards (OASIS), which is now endorsed and supported by the United Nations (UN). Public DTDs can also be found at http://www.schema.net and Microsoft s BizTalk site at http://www.biztalk.org. Figure 2.7 shows a proposed DTD available at xml.org submitted by the HR-XML Consortium, Inc., for describing a person s name in the context of XML-based HR applications.
<!-- Copyright 2000 The HR-XML Consortium (TM) --> <!-- version 1.0 October 17 2000 --> <!-- 11/05/2000 Changed all elements to UpperCamelCase --> <!ELEMENT PersonName (FormattedName* , GivenName* , PreferredGivenName? , MiddleName? , FamilyName* , Affix*)> <!ELEMENT FormattedName (#PCDATA)> <!ATTLIST FormattedName type (presentation legal sortOrder) 'presentation' > <!ELEMENT GivenName (#PCDATA)> <!ELEMENT PreferredGivenName (#PCDATA)> <!ELEMENT MiddleName (#PCDATA)> <!ELEMENT FamilyName (#PCDATA)> <!ATTLIST FamilyName primary (true false undefined ) ' undefined ' > <!ELEMENT Affix (#PCDATA)> <!ATTLIST Affix type (academicGrade aristocraticPrefix aristocraticTitle familyNamePrefix familyNameSuffix formOfAddress generation) #REQUIRED >
Since the roots of DTDs go back to SGML, their forte is that of describing conventional text documents. Consequently, DTDs just specify the structure of an XML document ”in terms of the elements that make up that document. DTDs, however, do not have a mechanism for expressing the content of elements in terms of specific data types. A DTD cannot be employed to specify numeric ranges for an element, to define limitations of what can occur, or to check on the text content. Furthermore, the syntax of DTDs is relatively complex and cumbersome, as you can see if you look at some of the public DTDs available at their Web sites. In addition, and ironically, DTDs are written in their own special syntax rather than in XML. The bottom line here is that DTDs are now being usurped by XML schema per a W3C recommendation, which can be found at: http://www.w3.org/TR/xmlschema-0/.
XML schema are written in standard XML. The XML dialect used to create a schema is referred to as the XML Schema Definition (XSD) Language. An XML schema can provide a far more comprehensive and rigorous description of the contents of an XML document in a modular, typed, and object-oriented manner. Since they permit data types to be rigorously specified, developing a good XML schema will require more thought and work than was required to create a DTD ” especially so if the XML document being described contains complex data types.
As with DTDs, industry- and application-specific XML schema are already available and many new ones are in the process of being defined. There are vendor-independent initiatives sponsored by the likes of OASIS, such as ebXML (for electronic business XML for e-business) and tpaML (for Trading Partner Agreement Markup Language). Industry-specific XML dialects are also being promoted by the likes of RosettaNet.org ”a self-funding, nonprofit consortium of major IT, electronic components , and semiconductor manufacturing companies.