Chapter 5: Document Type Definitions (DTDs)


You know that, as an XML document author, you can create the XML document in whatever structure you decide on. You are able to decide on your own element names, you can determine how the data within these elements is represented, and you can even dictate the complete hierarchy of the data represented in the document. The structure you decide on is referred to as a vocabulary. This open set of rules may seem like anarchy, but this is what gives XML its power. It is a creative environment that allows you to build a true representation of your data.

This openness of XML vocabulary does, however, require a set of rules defined on the structure of XML documents. This set of rules, once in place, can then be used to validate XML documents that are created or being read. If you want to consume an XML document, you must have a means to run the document through a validation process to make sure it abides by the established rules to ensure easy processing. Otherwise, you must ensure this by laboriously parsing the XML document line by line.

The XML validation process is an important one. This book covers the three main ways to validate an XML document. Document Type Definitions, also known as DTDs, are ways you can apply this validation process. Other means include XML Schemas and RELAX NG. This chapter takes a look at DTDs and how you can create and work them.

Why Document Type Definitions?

Validation is important. If you plan to share information or services using an XML document between two working processes, applications, or other entities, you must put in place a set of rules that defines the structure of the XML document that is to be passed. You should be able to use the rule definition to perform validation against any XML document.

For instance, suppose you have created an XML document like the one presented in Listing 5-1.

Listing 5-1: A simple XML document

image from book
            <?xml version="1.0" encoding="UTF-8" ?>      <Process>         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>         <Order>            <Item>52-inch Plasma</Item>            <Quantity>1</Quantity>         </Order>      </Process> 
image from book

If your application depends upon a structure like the preceding one, you don't want to receive an XML document that doesn't conform to that structure (for example, Listing 5-2).

Listing 5-2: An XML document that does not follow the prescribed structure

image from book
      <?xml version="1.0" encoding="UTF-8" ?>      <Process>         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>         <Order>            <Item>52-inch Plasma</Item>            <Quantity>1</Quantity>            <Type>New</Type>         </Order>      </Process> 
image from book

As you look at the XML document presented in Listing 5-2, you can see that it doesn't follow the structure prescribed immediately prior in Listing 5-1. This XML document Listing 5-has an extra element (<Type>) that wasn't part of the original requirement. A departure like adding an extra element makes the XML document invalid and can break your consuming process. For this reason, you need a validation process.

The most common form of XML validation is done using XML Schemas. Why, then, would you ever want to learn about any other validation process? You should learn about the DTD format because it was the first method (used for quite some time) to validate the structure of XML documents. Although it has limitations, you may still encounter XML applications that depend on this type of validation. If you do encounter a DTD, you want to understand how to deal with it.

DTDs came from the SGML world. It was a good choice for defining XML documents because many SGML users had already used it to define their documents. Using DTDs in the new world of XML made the migration from SGML to XML that much easier.

DTDs, however, wasn't the best option for defining document structure. One problem was that the method was difficult to learn. DTDs are not written using XML. Instead, the syntax is quite different, and this means that an XML developer has to learn two types of syntaxes when working with XML documents. One other major difficulty is that this form of XML validation doesn't support the use of namespaces-something that is extremely important in XML.

Even though the DTD format is not ideal, you will often see it used. In fact, many of the HTML documents that you deal with today use some form of DTD to define the permissible structure of the HTML document.

For instance, if you create a new HTML document in Microsoft's Visual Studio, you get the results presented in Listing 5-3.

Listing 5-3: A basic HTML file using a DTD to define its structure

image from book
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">      <html xmlns="http://www.w3.org/1999/xhtml" >      <head>          <title>Untitled Page</title>      </head>      <body>      </body>      </html> 
image from book

At the top of the HTML document, you can see that a <!DOCTYPE> element is defined in the first line and that the URL of http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd is used to show where the DTD document for this HTML document is located.

If you pull up this particular DTD document, xhtml1-transitional.dtd, you find a large DTD document. Listing 5-4 shows a partial results from this file, which focuses on the definition of the Headings part of the HTML document.

Listing 5-4: The Headings defined within the xhtml1-transitional.dtd document

image from book
      <!--=================== Headings =========================================-->      <!--        There are six levels of headings from h1 (the most important)        to h6 (the least important).      -->      <!ELEMENT h1  %Inline;>      <!ATTLIST h1        %attrs;        %TextAlign;        >      <!ELEMENT h2 %Inline;>      <!ATTLIST h2        %attrs;              %TextAlign;        >      <!ELEMENT h3 %Inline;>      <!ATTLIST h3        %attrs;        %TextAlign;        >      <!ELEMENT h4 %Inline;>      <!ATTLIST h4        %attrs;        %TextAlign;        >      <!ELEMENT h5 %Inline;>      <!ATTLIST h5        %attrs;        %TextAlign;        >      <!ELEMENT h6 %Inline;>      <!ATTLIST h6        %attrs;        %TextAlign;        > 
image from book

This is just a partial result from the xhtml1-transitional.dtd file. In the HTML world (also the XHTML world), you can use a number of different DTDs to define the structure used in your HTML document. The following list includes some of the available DTDs that are provided for HTML.

  • q HTML 4.01 Strict-http://www.w3.org/TR/html401/strict.dtd

  • q HTML 4.01 Transitional-http://www.w3.org/TR/html401/loose.dtd

  • q HTML 4.01 Frameset-http://www.w3.org/TR/html401/frameset.dtd

  • q XHTML 1.0 Strict-http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

  • q XHTML 1.0 Transitional-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

  • q XHTML 1.0 Frameset-http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net