Document Type Definitions | The Official XMLSPY Handbook

Historically, DTDs have been widely used to specify content models for XML documents. DTD syntax was predominantly inherited from Structured Generalized Markup Language (SGML), the technological predecessor of both HTML and XML. You can place a DTD at the top of every XML document that is meant to conform to it (an internal declaration), or you can save the DTD in a separate file and reference it by a special instruction near the top of the XML document (an external declaration). I highly recommend the use of external DTDs because they are reusable and are much less susceptible to versioning problems than internal declarations. If fact, internal declarations are so often problematic that I focus primarily on externally declared DTDs in this section.

In this section, you create a book markup language—a content model to describe books—and you express the content model as an external DTD. You then use the book DTD that you build to validate the book XML files, similar to the book file that you built in Chapter 2.

To create a new DTD in XMLSPY, choose File ? New or click the New File button on the main toolbar and choose Document Type Definition from the Create New Document dialog box. Save the file to your local file system as book.dtd. You build this empty DTD file so that you can use it for validating book instance documents.

On the CD For your convenience, all the files that you build in this chapter (and throughout the book) are included on the accompanying CD-ROM. The exclusive version of XMLSPY included on the CD automatically specifies the XMLSPY Handbook project as the default project. If the version of XMLSPY that you are using is not the one that came on this book’s CD (that is, you downloaded it from the Web or elsewhere), you can still access the example files. Simply copy the XMLSPY Handbook.spp file and all subdirectories (Ch2, Ch3, Ch4, and so on) from the CD to your local hard drive.

Figure 3-1 shows where—on the CD that accompanies this book—you can find all the files that you build in this chapter.

Figure 3-1: The project files for XMLSPY Handbook.

If you ever happen to inadvertently lose your default project settings, you can recover them by opening the XMLSPY Handbook project files from the Project menu. To do this, choose Project ? Open Project and locate the XMLSPY Handbook.spp file located in the Program files\Altova\XMLSPY\Examples\xmlspyhandbook directory. Here are the files you will use in the next few sections:

book.dtd:	The book content model expressed as an external DTD
bookinstance.xml:	A book instance document, similar to the book XML files you built in Chapter 2

These files are the completed versions. However, I recommend that you manually re-create the files in a separate directory and refer to the completed files only in the event that you get stuck. You get more from the experience if you manually type the DTD and XML instance document as you go along. The following listing provides the code for the bookinstance.xml document. Create and save a new bookinstance.xml file to your local file system with the contents of this listing:

<?xml version=”1.0” encoding=”UTF-8”?> <book isbn=”5-2341-9384-2” language=”English”>    <title>The XMLSPY Game</title>    <logo file=”cover.gif”/>       <author-list>          <author>             <firstname>David</firstname>             <lastname>Smith</lastname>          </author>       </author-list>    <copyright>1999</copyright> </book>

Note

If you are looking for the DTD specification, you might be surprised to find that there isn’t an official standalone DTD specification. However, DTDs are discussed as part of the XML 1.0 specification, and in a few other places here and there, such as the XHTML and HTML specifications. Appendix B provides a listing of important XML grammars and standards.

A DTD example

In this example, I define a DTD for describing books, such as the one described in bookinstance.xml. The first step in content model development is to list the design requirements. For this example, I have determined that a book has the following characteristics:

A required ISBN number
A required language attribute—either English, French, German, or Russian
Exactly one book title
An author list, containing at least one or possibly more authors, with each author having his or her respective first and last names
An optional logo file (multiple logos are permitted)
Optional copyright information

Figure 3-2 shows a tree representation of the preliminary design requirements for the book structure. Elements are shown as ovals, and attributes are shown as rectangles. A line joining two elements indicates a nested element, and a line joining an attribute and an element indicates that the attribute appears inside the indicated element. The numbers on the connecting lines indicate the cardinality or multiplicity of the relationship. For example, if both ends are labeled with a 1, it is a required element or attribute. You should design your DTD so that it matches your design criteria.

click to expand
Figure 3-2: A representation of a book’s content model.

Here we take a quick peek at the DTD for the book sketched out in Figure 3-2. The following listing shows all the elements and attributes and one entity we just described. Don’t worry if it doesn’t all make sense yet because everything will be explained in the next few sections.

<?xml version=”1.0” encoding=”UTF-8”?> <!ELEMENT book (title, logo*, author-list, copyright?)> <!ATTLIST book            isbn NMTOLKEN #REQUIRED            language (English | French | German | Russian ) #REQUIRED > <!ELEMENT title (#PCDATA)> <!ELEMENT author-list (author+)> <!ELEMENT author (firstname, lastname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT logo EMPTY> <!ATTLIST logo             file CDATA #REQUIRED > <!ELEMENT copyright (#PCDATA)> <!ENTITY copy “&#xA9;”>

Defining elements

DTDs have a very concise syntax for element declarations. Before declaring an element, you must determine how and what kind of information the element will represent. Typically, element declarations fall into one of these categories:

Empty element: Can contain optional attributes but no body content; essentially a placeholder
Text-only element: Contains only plain text content within the body of the element.
Container element: Contains nested child elements only
Mixed content element: Contains a mix of both child elements and textual content
Unspecified element: Content structure is unspecified; this element can cause errors and should be avoided

Table 3-1 provides a syntax example for each of the element declaration types just listed.

Table 3-1: SAMPLE ELEMENT DEFINITIONS USING DTDS
Type of Element	Example
Empty element	<!ELEMENT myelement EMPTY>
Text-only element	<!ELEMENT myelement (#PCDATA)>
Container element	<!ELEMENT myelement (child1, child2, child2)>
Mixed content element	<!ELEMENT myelement (#PCDATA \| child)*>
Unspecified element	<!ELEMENT myelement ANY>

All element declarations start with <!ELEMENT myelement, in which myelement is a placeholder that should be replaced with the name of the element you wish to define. Pay special attention to the last part of the element declaration, which specifies the element type.

Defining element type information requires the use of special symbols known as occurrence operators. Occurrence operators specify the number of times an element can appear within a document, as well as the sequence in which the element must appear. Table 3-2 lists all the available occurrence operators that you can use inside an element definition to specify an element’s type. The type of element determines the number of times an element can appear in an XML document and the kind of information it can contain.

Table 3-2: OCCURRENCE OPERATORS THAT SPECIFY THE NUMBER OF TIMES AN ELEMENT CAN APPEAR
Occurrence Operator	Description	Numerical Range
?	Optional—once or not at all	0 or 1
*	Optional and repeatable	0 to 8
+	At least once, optionally more	1 to 8
	Required (default behavior, no symbol required)	Exactly once
8 = infinity

As shown in Table 3-3, you can also use occurrence operators to specify the ordering or sequence in which element types can appear.

Table 3-3: OCCURRENCE OPERATORS THAT SPECIFY ALLOWABLE SEQUENCES OF ELEMENTS
Occurrence Operator	Description
,	Comma indicates that the child elements must appear in the same sequence in which they are defined.
\|	Logical OR operator indicates a choice; one of two or more specified elements must appear in the document.

Occurrence operators are used together to define all sorts of possible element structures, for example:

<!ELEMENT book (title, logo*, author-list, copyright?)>

This code defines a book element that is a container element for a sequence of child elements. Occurrence operators are appended to the right of each child element; otherwise, they are required elements by default. The commas separating the child elements indicate the sequence in which the elements must appear; the child elements are either required, optional, or repeatable depending on the occurrence operator appended to the end of the element name. This example shows several concepts simultaneously, so don’t worry if it doesn’t make complete sense right now. I revisit this example after I explain the most common cases in a little more detail.

Empty Elements

Empty (placeholder) elements are widely used in HTML, such as in the image and horizontal rule tags: <img ... /> and <hr ... />. Empty is a reference to the fact that the element has no body content, although empty elements can and often do contain attributes. To specify an empty element, you do not need to use any of the occurrence operators because an empty element is completely empty! Simply use the special keyword EMPTY as the element type. For example, <!ELEMENT logo EMPTY> defines the logo element as an empty element.

Elements that contain only text

To define an element that contains only text, you specify the type as (#PCDATA), which stands for Parsed Character Data (which is essentially a plain text string). For example, <!ELEMENT firstname (#PCDATA)> defines the firstname element as containing parsed character data. Remember to include the opening and closing parentheses, or XMLSPY will report that the DTD has a syntax error. There is no way to constrain the contents of the text that appears within a text-only element using a DTD, however you can accomplish this using an XML Schema. Again, occurrence operators are not required because there is no limit to how much text can be contained inside a (#PCDATA) element.

Cross-Reference You find out how to edit XML Schemas in Chapters 4 and 5.

Required child element(s)

To specify one or more required child element(s), type the name of the child element(s), separated by commas as necessary and enclosed in parentheses. By default, the sequence of child elements appearing in an XML instance document must appear in the same sequence as specified in the comma-separated listing in the element type definition in order to be valid. As an example, the author element is defined as a sequence of two required child elements: firstname and lastname:

<!ELEMENT author (firstname, lastname)>

You know that these are required elements because no occurrence operator is explicitly specified and, therefore, they are required by default. The sequence is indicated by the comma separating firstname and lastname. If, in an instance document, lastname were to appear before firstname, a validating XML processor would report that the document was invalid.

Optional child elements

To make a child element optional, you can append the ? occurrence operator to the right of the optional child element’s name, as it appears in the sequence of one or more elements. Type the following to add an optional logo element to the book element:

<!ELEMENT book (title, author-list, logo?)>.

Required and repeatable child elements

In the book element, the author-list element requires a minimum of one author, but it can potentially have many other coauthors. To specify this option, you can use the + occurrence operator, as follows:

<!ELEMENT author-list (author+)>

Optional and repeatable

The book element can contain a logo element that is both optional and repeatable. You specify these traits by using the * occurrence operator as follows:

<!ELEMENT book (title, author-list, logo*)>

Choice of two or more elements

DTDs support a choice construct that gives the author of an XML instance document the flexibility to choose one element out of several possible allowable elements. To specify a choice of several elements, you type the names of the permissible elements, separated by the logical OR occurrence operator (|), and enclosed in parentheses. For example, to specify an element called myelement as a choice between three child elements (choice1, choice2, and choice3), type the following:

<!ELEMENT myelement (choice1 | choice2 | choice3)>

Specifying Mixed Content

An element that allows both text and child elements is said to have mixed content. There is no special operator to specify mixed content. Rather, it is a consequence of specifying a combination of occurrence operators. For example, you can define a paragraph element that allows text, as well as bold and italic child elements, as follows:

<!ELEMENT para (#PCDATA | bold |italic )*>

In plain English, this element declaration reads like this: The para element can contain either parsed character data, bold, or italic elements, repeatable from zero to an infinite number of times. For example, the following bit of XML code would be valid:

<para>Building DTDs is <italic>Easy</italic> with <bold>XMLSPY</bold>.</para>

Defining attributes

DTDs offer a concise syntax for defining one or more attributes, and the basic syntax is shown in Figure 3-3.

click to expand
Figure 3-3: Defining a sequence of attributes in a DTD.

Note

As I mentioned in Chapter 2, there are no specific W3C guidelines on what to make an attribute versus an element. In general, however, information contained in an attribute tends to be about the document’s content, rather than being part of the content itself. As an example, the language attribute in the book element describes the information about the book but is not a part of the book element’s content. Of course, because there are no official guidelines, creating an additional language element would also work.

Attribute Types

DTDs support 10 different types of attributes. The following list describes each type, along with an example of how to use a couple of the most common types:

CDATA: Possibly the most commonly used attribute type. CDATA indicates that the attribute value consists of a character data string of arbitrary length, which can include any characters except for single and double quotation marks and angle brackets; entities are permitted. The logo element has a required file attribute of type CDATA, which you declare like this:
```
<!ATTLIST logo         file CDATA #REQUIRED >
```
ENTITY: Indicates that the attribute value corresponds to the value of an entity declared or referenced elsewhere in the DTD.
ENTITIES: Indicates that the attribute value takes on the value of multiple entity references, separated by whitespace.
Enumerated: Not an actual attribute type. If an attribute value must be equal to one value from a list of possible values, you can specify this behavior by typing
```
(option1|option2|option3| ...)
```
where optionx corresponds to a legal value. Each legal value is separated by the vertical bar (logical OR symbol), and the entire list is enclosed in parentheses.
ID: Specifies that an attribute is of type identifier, which implies that each occurrence of this field must have a unique value throughout the XML document. IDs are very helpful when you are navigating the XML document, particularly in XSLT stylesheet development.
IDREF: Specifies that an attribute is of type identifier reference. This type implies that it has a value that is a reference to another ID within the XML document. This is similar to the concept of a foreign key in relational databases.
IDREFS: Similar to an IDREF. An attribute of type IDREFS takes on multiple-element IDs as its value, with each ID separated by whitespace. You can use the IDREFS attribute to point to a list of related elements elsewhere in an XML document, or you can use it when performing database imports to an XML file in which a table contains multiple foreign key fields.
NMTOKEN: A token, as far as an XML parser is concerned, is a string of characters whose start and end is delimited by a token delimiter such as whitespace or a comma. As an example, the string
```
DTDs aren’t difficult
```
consists of three whitespace delimited tokens: DTDs, aren’t, and difficult. An attribute of type NMTOKEN can be any single-string token; however, it must abide by the basic rules of XML elements. Recall that a valid name token in XML consists of one or more alphanumeric characters, hyphens, or underscores. Spaces are not permitted because the parser would interpret the space as a token delimiter signifying the termination of the token. Of the three tokens shown here, only DTDs and difficult could pass as an attribute value of type NMTOKEN. The aren’t token contains an apostrophe, which is not a legal character for naming XML elements. NMTOKEN is an ideal candidate for the isbn attribute of the book as shown in the following line:
```
<!ATTLIST book        isbn NMTOLKEN #REQUIRED        ... >
```
NMTOKENS: A whitespace-delimited list of NMTOKENS, which means that you can include spaces in the attribute value, but each individual token must still satisfy the rules for naming XML elements. As an example, the string Illegal NMTOKEN value is a completely legal value for an attribute of type NMTOKENS because it consists of three whitespace delimited name tokens.
NOTATION: A reference to a notation defined elsewhere in the XML document.

Attribute Defaults

Three attribute default options are available and are described in the following list. You specify one of three keywords to explain how the default values should be handled, followed by the desired default value in quotation marks:

#FIXED: This reserved word means that the specified attribute value is a fixed constant; if the attribute appears inside an element, the value must equal the default value specified. If an author includes another value, the XML parser will return an error. Even if the attribute were omitted from an element, the parser would assume the default value. For reasons of inflexibility, this construct is seldom used.
#IMPLIED: This keyword makes an attribute optional, with a null or undefined value when the attribute is not present. Use an implied attribute if you don’t want to force the author to include an attribute and you don’t have an option for a default value.

#REQUIRED: This keyword requires that an attribute be present inside an element; if the attribute is missing, the document is invalid. Note that you cannot define the actual attribute default value; you may only require that the attribute be present.

Table 3-4: EXAMPLE SYNTAX FOR SPECIFYING ATTRIBUTE DEFAULTS
Default Type	DTD Syntax	Explanation	Sample Valid Usage
#FIXED	<!ATTLIST sender company CDATA #FIXED “Altova”>	Defines a company attribute for the sender element with fixed value “Altova”	<sender company= “Altova”/>
#IMPLIED	<!ATTLIST contact fax CDATA #IMPLIED>	Defines a fax attribute for the contact element — optional and no default value.	<contact fax= “978-816- 1606” />
#REQUIRED	<!ATTLIST logo file CDATA #REQUIRED >	Defines a required file attribute for the logo element	<logo file= “xmlspy.gif”/>

Simplified Attribute Declaration Process

Follow these steps to declare an attribute:

Begin an attribute list declaration with <!ATTLIST.
Inside the declaration, first specify the associated element name—attributes can only exist within an element’s opening tag. Thus, the associated element is the element that is meant to contain the attributes being defined.
Define the attribute’s name (for example: attribute1 or attribute2).
Specify the attribute type—choose from one of the 10 different data types, or simply choose CDATA (character data), which is the most general type, specifying that the attribute will consist of regular character data with no tags but possibly with entities.
Enter the default value (defaultvalue in Figure 3-3)—this will be the value for the attribute if none is explicitly set. Use the keywords #FIXED, #REQUIRED, or #IMPLIED to specify additional information about attribute defaults.
Repeat this process for as many attributes as there are in a particular element and terminate the attribute list with a closing >. Note that XML attributes are un-ordered, so the order in which you declare your attributes for a particular element makes no difference. An XML validator does not enforce a specific order for attributes.

Defining entities

DTDs offer a concise syntax for defining entities, which fall into two broad categories, general entities and parameter identities. The editing of both types of entities is supported by XMLSPY and is explained in the following sections.

General Entities

As you learned in Chapter 2, XML entities are often used to substitute a frequently occurring text string with a unique symbolic representation. Suppose that you are writing a book on XMLSPY 5, and you are saving the book’s contents into an industry standard XML-based book format, such as DocBook. To avoid having to write out XMLSPY 5 hundreds of times, you could define a shorthand notation such as &spy;. Whenever XMLSPY 5 is supposed to appear in the book text, you can simply type the entity notation instead. The XML processor substitutes the true value, XMLSPY 5, for the entity whenever it is processed.

In addition to letting you type fewer characters, entities have a huge added benefit in that they make a document easier to maintain. Suppose that one year from now, Altova releases a new version of XMLSPY called XMLSPY 2003. To update your book’s text files to include this new product name, you only have to change one entity definition, as opposed to searching the entire document to make the change. Other possible uses for general entities include a navigation menu bar for a Web site or header/footer information common to more than one page.

To declare a general entity, type the following line of code into your DTD:

<!ENTITY generalentityname "substitute text">

For example, the following line of code defines an entity called &spy; that has the value of XMLSPY 5:

<!ENTITY &spy; "XMLSPY 5">

As shown in the following code, a general entity definition can contain nested entities:

<!ENTITY product “XMLSPY 5”> <!ENTITY edition “Enterprise Edition”> <!ENTITY spy “&product; &edition;”>

In this case, &spy; resolves to XMLSPY 5 Enterprise Edition. Circular entity references (when two entity definitions both reference each other) are not permitted, however, and can cause unpredictable results.

General entities are also ideal for representing special symbols such as currency, mathematical, and other symbols that don’t fit on a 101-key English keyboard. In your XMLSPY 5 book DTD, you can use the following code to define an entity called copy with a value of “&#xA9”, which is the Unicode character reference for the copyright symbol ( ):

<!ENTITY copy “&#xA9;”>

General entity definitions can be used in a DTD for sections where you are specifying shortcuts for text that will be later used as document content; they cannot function as part of a DTD’s internal structure or markup. The following example makes this point by illegally trying to use the entity &pcd;, whose value is the reserved word, “(#PCDATA)”.:

<!ENTITY PCD “(#PCDATA)” > ... <!ENTITY firstname &pcd; > <!ENTITY lastname &pcd; >

It’s illegal to use the entity &pcd; in this way as a shortcut for defining an entity. However, you can use &pcd; in a completely different way—inside a paragraph within your document:

... <paragraph>Element content often consists of Parsed Character Data - &pcd; ... </paragraph>

In this case, the &pcd; entity resolves to be part of the instance document’s content. Of course, there are often situations in which you might want to define an entity for use in helping defining your DTD; and in this situation, parameter entities allow you to define strings of substitutable text within for use in specifying a DTD’s internal structure or markup. Parameter entities are the subject of the next section.

Parameter Entities

Parameter entities reference data which itself becomes a part of the DTD. They are not meant for substitution of anything that appears in the contents of an XML document. You declare a parameter entity by typing the following line of code into your DTD:

<!ENTITY % parameterentityname "substitute text">

The parameter entity declaration is the same as the general entity declaration except that it has a percent symbol before the entity name. To reference a parameter entity, simply type the percent sign (%), followed by the parameter entity’s name, and terminate with a semicolon. For example, the invalid general entity example of the previous section can be fixed using parameter entities as follows:

<!ENTITY % pcd “(#PCDATA)” > ... <!ENTITY firstname %pcd; > <!ENTITY lastname %pcd; >

The typical use of parameter entities in a DTD usually falls into one of two categories, both of which serve to modularize and increase the reusability of your DTD. The following sections explain these two categories.

SUBSTITUTING FREQUENTLY REOCCURRING MARKUP If a long string of text is repeated in multiple locations throughout a DTD, it is an ideal candidate for a parameter entity. Creating a parameter entity can save typing and make updating the DTD easier if the string of text happens to change. As an example, suppose that there is an attribute field, states, that is an enumeration of two-letter codes representing U.S. states and territories:

<!ENTITY % states “( AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|GU|HI|IA|ID| IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH| OK|OR|PA|PR|RI|SC|SD|TN|TX|UT|VA|VI|VT|WA|WI|WV|WY)”>

Now suppose that a DTD contains several elements, all of which contain an attribute whose value must correspond to an official U.S. state or territory two-letter abbreviation. Rather than retyping the long enumeration of valid tokens, you could simply do something like this:

<!ELEMENT shipping-address ( ... ) > <!ATTLIST shipping-address state %state; #REQUIRED> <!ELEMENT billing-address ( ... ) > <!ATTLIST billing-address state %state; #REQUIRED> <!ELEMENT mailing-address ( ... ) > <!ATTLIST mailing-address state %state; #REQUIRED>

In addition to the bonus of less typing, the DTD is more modular. If the company in this example ever expanded to offer services to both Canada and the United States, updating all address-related elements to support the two-letter abbreviations corresponding to Canadian provinces and territories would only require modifying the one parameter entity definition. Another example of this kind of substitution can be found in the HTML specification. The W3C has made extensive use of parameter entities to define attributes common to many elements, including name, title, ID, and various styles.

COMBINING DTDS USING PARAMETER ENTITIES Another common use of parameter entities is to combine multiple DTDs. Suppose that you need to build a single DTD which draws from externally defined DTDs that are either publicly available or that you have previously made. As an example, a new DTD for an online store might contain the products and address structures from another DTD. Here you could accelerate DTD development by using an external parameter entity reference as shown in the following code:

<?xml version=”1.0” encoding=”UTF-8”?> <!ENTITY % product SYSTEM “c:\dtds\product.dtd”> <!ENTITY % address SYSTEM “c:\dtds\address.dtd”> %product; %address; <!-- Define the rest of your DTD here --> ...

Again, just as in the DOCTYPE declaration, the use of the SYSTEM keyword here also tells the XML parser to fetch the DTD from an external location.

Note

This usage of parameter entities in this example is similar to how you might use the C/C++ #include preprocessor directive to import a header file containing constants. Although parameter entities do increase the overall modularity of a DTD, if you find yourself making heavy use of parameter entities in your DTDs, I highly recommend the use of XML Schemas to achieve even greater flexibility and modularity.

Declaring a DTD

Now that I’ve explained all the syntax involved in creating a DTD, including defining elements, attributes, and entities, take a minute to type the contents of the book.dtd file (as it was listed earlier in the chapter) into your own book.dtd file.

After you complete the book.dtd file, you can use it to create and validate other book instance documents. To use an instance document in conjunction with a DTD (for validation or editing purposes), an instance document must tell the XML processor where to find the corresponding DTD. The line of code that does this is called a document type declaration.

Caution

Be careful not to confuse a document type declaration with a Document Type Definition—even though they have the same initials. A document type declaration is the line in an XML file that links the file with its document type definition. The document type definition file defines all the rules for that particular class of document.

XMLSPY can associate an instance document ( bookinstance.xml) with its corresponding DTD ( book.dtd). To do this, open your bookinstance.xml file in the main editing window so that it is the active document in the XMLSPY editing environment. Then choose DTD/Schema ? Assign DTD, as shown in Figure 3-4.

Figure 3-4: Assigning a DTD to an XML document.

Note that XMLSPY automatically inserts the required document type declaration into the bookinstance.xml file, as shown in Figure 3-5.

click to expand
Figure 3-5: XMLSPY automatically inserts the DTD at the top of the XML file.

The parts of the document type declaration have been labeled in Figure 3-5. The following list describes these parts in more detail.

<!DOCTYPE: The special symbol that starts a document type declaration.
book: The DTD name. The name must correspond to the name of root element for the XML document that contains this document type declaration.
SYSTEM: The keyword that instructs the XML parser to look for an externally defined DTD at a specified location.
Path to the external DTD: The path that specifies a DTD location in the form of any Uniform Resource Identifier (URI), such as a path on the local file system or a Web URL.

Validating and editing XML documents using XMLSPY

At this point, you’ve built both a DTD and an instance document and have associated the instance document with the DTD using a document type declaration. Now, you’re ready to create, edit, and validate XML instance documents in XMLSPY.

Note

XMLSPY’s editing and validation support applies to any XML document with an associated content model; the content can be expressed as either a DTD or an XML Schema. Therefore, although in this section we show editing and validation XML documents with DTDs, the whole section is also applicable to XML Schemas.

In this section, you start by creating a new document called newbook.xml.

Choose File ? New or press the New File button on the main taskbar. The Create New Document window appears.
Select XML Document and click OK. The New File dialog box shown in Figure 3-6 appears. It asks whether the new instance document you are creating should be based on a DTD or an XML Schema.
Select the DTD option and click OK. If you want instead to edit and validate an instance document using an XML Schema, you simply choose the XML Schema option.

Figure 3-6: The New File dialog box.
When XMLSPY asks you to choose the DTD (see Figure 3-7), browse to the book.dtd file that you built earlier, click OK, and you’re finished!

Figure 3-7: Open the DTD that you want to associate with the XML instance document.

When you create a new XML document based on a content model (in this case, a DTD), XMLSPY automatically inspects the DTD or XML Schema and creates all mandatory elements and attributes, filling in default values whenever possible, as shown in the following listing:

<?xml version=”1.0” encoding=”UTF-8”?> <!DOCTYPE book SYSTEM “C:\Program Files\Altova\XMLSPY\Examples\ xmlspyhandbook\ch3\book.dtd”> <book isbn=”” language=””>    <title></title>    <author-list>       <author>          <firstname></firstname>          <lastname></lastname>       </author>    </author-list> </book>

XMLSPY includes several XML editing and XML document validation features that are grouped together into the category of intelligent editing and are available in both Text view and Grid view. The following list describes the intelligent editing features:

Code sensing: As you type new elements in your XML document, drop-down boxes appear helping you remember which elements are defined in your document (shown in Figure 3-8).
Auto-completion: XMLSPY will insert an element’s closing tag and mandatory child elements and attributes as you type.
Entry Helper windows: Three configurable helper windows showing you the elements, attributes, and entities that are in the scope of the cursor position—that is, elements, attributes and entities that may appear in the document at the present cursor location.
Document validation: Single-click document validation with error highlighting and reporting (shown in the next section).

Figure 3-8: XMLSPY offers intelligent editing support in both Text view and Grid view.

The code sensing and auto-completion features described in the preceding list are very straightforward. Entry Helper windows and document validation require some additional explanation and are discussed in the following sections.

Entry HELPER WINDOWS

Entry Helper windows for elements, attributes, and entities appear by default on the right side of the XMLSPY editing environment in both Text view and Grid view. If you click an element or attribute in a Entry Helper window, it is automatically inserted into the current document at the cursor position. One kind of Entry Helper window, the Elements window (shown in Figure 3-9), lists all the elements in the scope of the current XML document.

Figure 3-9: The Elements window displays all available elements.

The Attributes window is context-sensitive. It lists only the attributes that are in scope according to the position of the cursor. Two mandatory elements of the book element are listed in the Attributes window shown in Figure 3-10.

Figure 3-10: A context-sensitive Attributes window shows attributes that are in scope.

The Entities window displays the five default XML entities, in addition to any general entities declared within your content model. The © general entity defined in the book.dtd file is resolved and displayed in an Entities window shown in Figure 3-11. If you are editing a DTD file, any parameter entities you defined are visible; any other XML document general entities and built-in entities appear in the Entities window.

Figure 3-11: The Entities window displays all available entities.

Tip

If you are editing a DTD or an XML Schema while simultaneously working on an associated instance document in XMLSPY, you can force a manual refresh of Entry Helper windows after making changes to the content model. Choose XML ? Update Helper Entries and make sure that the contents of the Entry Helper windows reflect your most current content model revision.

Remember, if you accidentally close a Entry Helper window and want to get it back, or if you are editing a big document and want to turn off the entry windows to free up more space, choose Window ? Entry Helper. You can control all the functionality of Entry Helper windows with this command.

DOCUMENT VALIDATION

XMLSPY can validate your XML documents for you—simply click the green check mark button on the main toolbar or choose XML ? Validate. The validator can also be invoked by pressing the F8 key. If XMLSPY determines that an XML document is invalid, it highlights the line containing the error and provides an error message at the bottom of the main editing window explaining the situation. For example, in Figure 3-12, the validator has found an error and the error message is Required attribute ‘isbn’ of parent element “book” is expected before first child element. Fix the error and click the Revalidate button located on the status bar at the bottom of the main editing area. Continue this process until you get no errors and have validated your XML document.

click to expand
Figure 3-12: Troubleshooting an XML document with XMLSPY’s validator.

Editing DTDs with XMLSPY

Earlier in this chapter, I made the case that you should develop your content model (either a DTD or an XML Schema) before editing any XML instance documents. One common question is: What happens if you create various XML document fragments before having formalized a content model, or if you inherited an XML application that doesn’t use DTDs or XML Schemas? First, you should write: “I will not create XML documents without an associated DTD or XML Schema” a hundred times on a whiteboard or self-administer some other hard punishment. Seriously, XMLSPY offers two ways to jump-start DTD- or XML-Schema–based content model development, both of which are described in the following sections.

Inferring a DTD by Analyzing one or more Instance Documents

XMLSPY can inspect one or more XML document fragments and generate the corresponding DTD automatically. In the Chapter 3 project folder, you will find three related files named usecase1.xml, usecase2.xml, and usecase3.xml. They are all XML document fragments with no associated content model. In this example, I show how XMLSPY can automatically generate as much as 80% of the book.dtd file that you manually created earlier in this chapter. The use case files are listed in the following code:

<!-- Code listing for usecase1.xml --> <?xml version="1.0" encoding="UTF-8"?> <book isbn="1-8745-3543-7" language="English">    <title>The XMLSPY Game</title>    <logo file="cover.gif"/>    <author-list>       <author>          <firstname>James</firstname>          <lastname>Choi</lastname>       </author>    </author-list>    <copyright>1999</copyright> </book> <!-- Code listing for usecase2.xml --> <?xml version="1.0" encoding="UTF-8"?> <book isbn="4-3241-4900-1" language="German">    <title>XML the Hard way</title>    <logo file="xml.gif"/>    <author-list>       <author>          <firstname>Dilip</firstname>          <lastname>Ogale</lastname>       </author>       <author>          <firstname>Christopher</firstname>          <lastname>Williams</lastname>       </author>    </author-list>    <copyright>2002</copyright> </book> <!-- Code listing for usecase3.xml --> <?xml version="1.0" encoding="UTF-8"?> <book isbn="5-3324-4678-2" language="Spanish">    <title>Building XML portals</title>    <author-list>       <author>          <firstname>Allyson</firstname>          <lastname>Fluke</lastname>       </author>    </author-list>    <copyright>2001</copyright> </book>

The three use cases are all book-related instance documents with a few minor differences: usecase2.xml has two authors whereas the others have only one, and usecase3.xml has no logo element. There is sufficient information contained in these three use case documents for XMLSPY to autogenerate a near-complete DTD. To do so, select all the use cases in the Project menu clicking on each file icon while holding down the Shift key. With the three icons selected and highlighted in the Project menu, right-click anywhere on the highlighted region. Then choose Generate DTD/Schema from the submenu that appears (see Figure 3-13).

click to expand
Figure 3-13: XMLSPY can autogenerate a content model from an instance document.

Next, a generate DTD/Schema panel appears. Select DTD as the desired output language for the autogenerated DTD and click OK.

Note

Although this example highlights the autogeneration of a DTD, the autogenerated content models can be expressed in many different schema dialects, including DTD, XML Schema, DCD, XML Data (XDR), and Microsoft BizTalk Schema. You choose the desired output format in the Generate DTD/Schema configuration panel.

XMLSPY displays the autogenerated DTD as an external DTD file within the main editing window; the contents of the DTD are listed in the following code:

<?xml version=”1.0” encoding=”UTF-8”?> <!ELEMENT author (firstname, lastname)> <!ELEMENT author-list (author+)> <!ELEMENT book (title, logo?, author-list, copyright)> <!ATTLIST book       isbn (1-8745-3543-7 | 4-3241-4900-1 | 5-3324-4678-2) #REQUIRED       language (English | German | Spanish) #REQUIRED > <!ELEMENT copyright (#PCDATA)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT logo EMPTY> <!ATTLIST logo       file (cover.gif | xml.gif) #REQUIRED > <!ELEMENT title (#PCDATA)>

This autogenerated DTD is not an exact duplicate of the book.dtd file that you manually created earlier. But XMLSPY has inspected the three use case documents and has properly inferred some important aspects of your book content model, including the following:

An author-list consists of at least one or more authors.
Logo is an optional, empty element.
isbn and language are required elements.
The language attribute is an enumeration of possible values: English, German, Spanish.

Therefore, the autogenerated DTD is a reasonable content model approximation. You can manually refine the content model later by using your understanding of DTD syntax, as well as XMLSPY’s Text view or Grid view.

Note

The autogenerated content model could have resulted in an even closer approximation had there been more than three use case documents for XMLSPY to use as sample data.

Converting content models expressed in one language to another

XMLSPY includes translation utilities to convert from a content model expressed in one schema dialect to another. XMLSPY supports content-model conversion from any combination of two schema dialects taken from the following group: DTD, DCD, XML-Data (XDR), Microsoft BizTalk Schema, and W3C XML Schema—for a total of 10 possible distinct conversions. In this section, I show you how to convert a content model expressed in one language to another. First, open the content model that you wish to convert from. Then, choose Convert ? Convert DTD/Schema. The convert DTD/Schema window appears (see Figure 3-14).

Next, select the desired target language for the content-model conversion and click OK. The new file appears as a new pane within the XMLSPY editing environment.

Figure 3-14: The Convert DTD/Schema window.

Note

Conversion of an XML Schema to a DTD will be lossy. In other words, a considerable amount of information stored in the XML Schema–based content model is lost when converting to a DTD due to limitations of the DTD specification.

Limitations of DTDs

Clearly the biggest strength of the use of DTDs is their simplicity and concise syntax. Regrettably, this conciseness comes at the price of a few potential deficiencies, which are appropriate for motivating the upcoming two chapters dedicated to covering XML Schemas.

Not XML

Document Type Definitions are specified using a separate, non-XML-based meta-language which has very little to do with XML; they can’t even be parsed by an XML parser.

No Namespace Support

All declarations in a DTD are global. This means that the name of a valid element inside an XML instance document must have the same element name as defined by its respective DTD. The unfortunate side effect of global declarations is that you end up with a naming collision if you define two different elements with the same name, even if they appear in separate contexts. This makes the use of DTDs potentially dangerous in B2B applications that are required to use content models of various companies. The XML Schema introduces support for namespaces, which enables a content author to overcome these naming limitations.

No Data types

The absence of a mechanism for using existing predefined data types, as well as a mechanism for defining custom data types, makes it challenging to use DTD-based content models in conjunction with other strongly typed environments. These environments include relational databases, which contain an abundance of predefined SQL data types, as well as strongly typed programming languages such as Java and C++.

Not Object-Oriented

DTDs were created before the widespread usage of object-oriented programming methodologies. Object-oriented programming languages have helped to advance modern software development practices, particularly through the use of encapsulation, inheritance, and polymorphism. It can be challenging to use DTDs in conjunction with modern object-oriented programming languages such as C++, Java, or C# because DTDs have no notion of object-oriented features.