Item 8. Modularize DTDs | Effective XML: 50 Specific Ways to Improve Your XML

Large, monolithic DTDs are as hard to read and understand as large, monolithic programs. While DTDs are rarely as large as even medium- sized programs, they can nonetheless benefit from being divided into separate modules, one to a file. Furthermore, this allows you to combine different DTDs in a single application. For example, modularization allows you to include your own custom vocabularies in XHTML documents.

Modularization divides a DTD into multiple, somewhat independent units of functionality that can be mixed and matched to suit. A master DTD integrates all the parts into a single driver application. Parameterization is based on internal parameter entities. Modularization is based on external parameter entities. However, the techniques are much the same. By redefining various entities, you choose which modules to include where.

I'll demonstrate with another variation of the hypothetical MegaBank statement application used in earlier items. This time we'll look at a statement that's composed of several more or less independent parts. It begins with a batch of information about the bank branch where the account is held, then the list of transactions, and finally a series of legal fine print covering things like the bank's privacy policy and where to write if there's a problem with the statement. This latter part is written in XHTML. Example 8-1 is a complete bank statement that demonstrates all the relevant parts.

Example 8-1 A Full Bank Statement

 <?xml version="1.0"?> <!DOCTYPE Statement PUBLIC "-//MegaBank//DTD Statement//EN"                            "statement.dtd"> <Statement xmlns="http://namespaces.megabank.com/">   <Bank>     <Logo href="logo.jpg" height="125" width="125"/>     <Name>MegaBank</Name>     <Motto>We Really Pretend to Care</Motto>     <Branch>        <Address>           <Street>666 Fifth Ave.</Street>           <City>New York</City>           <State>NY</State>           <PostalCode>10010</PostalCode>           <Country>USA</Country>        </Address>     </Branch>   </Bank>   <Account>     <Number>00003145298</Number>     <Type>Savings</Type>     <Owner>John Doe</Owner>     <Address>        <Street>123 Peon Way</Street>        <Apt>28Z</Apt>        <City>Brooklyn</City>        <State>NY</State>        <PostalCode>11239</PostalCode>        <Country>USA</Country>     </Address>   </Account>   <Date>2003-30-02</Date>   <AccountActivity>       <OpeningBalance>5266.34</OpeningBalance>       <Transaction type="deposit">         <Date>2003-02-07</Date>         <Amount>300.00</Amount>       </Transaction>       <Transaction type="transfer">         <Account>           <Number>0000271828</Number>           <Type>Checking</Type>           <Owner>John Doe</Owner>         </Account>         <Date>2003-02-07</Date>         <Amount>200.00</Amount>       </Transaction>       <Transaction type="deposit">         <Date>2003-02-15</Date>         <Amount>512.32</Amount>       </Transaction>       <Transaction type="withdrawal">         <Date>2003-02-15</Date>         <Amount>200.00</Amount>       </Transaction>       <Transaction type="withdrawal">         <Date>2003-02-25</Date>         <Amount>200.00</Amount>       </Transaction>       <ClosingBalance>5488.66</ClosingBalance>   </AccountActivity>   <Legal>     <html xmlns="http://www.w3.org/1999/xhtml">       <body style="font-size: xx-small">          <h1>Important Information About This Statement</h1>          <p>            In the event of an error in this statement,            please submit complete details in triplicate to:          </p>          <p>            MegaBank Claims Department            3 Friday's Road            Adamstown            Pitcairn Island          </p>          <p>            We'll get back to you in four to six weeks.           (We're not sure which four to six weeks, but we do            know it's bound to be some period of four to six            weeks, sometime.)          </p>          <h2>Privacy Notice</h2>          <p>            You have none. Get used to it. We will sell your             information to the highest bidder, the lowest            bidder, and everyone in between. We'll give it             away to anybody who can afford a self-addressed             stamped envelope. We'll even trade it for a              lifetime supply of Skinny Dip Thigh Cream & trade;.            It's not like it costs us anything.          </p>        </body>       </html>     </Legal> </Statement>

As usual, a real-world document would be considerably more complex and contain a lot more data, but this is enough to present the basic ideas.

Leaving aside comments, a modular DTD normally starts in a driver module. This is a DTD fragment that declares a couple of crucial entity references, such as those defining the namespace URI and prefix, then loads the other modules. For example, in the bank statement application, a driver DTD might look something like Example 8-2.

Example 8-2 The Statement DTD Driver Module

 <!ENTITY % NS.prefixed "IGNORE" > <!ENTITY % stmt.prefix "" > <!ENTITY % stmnt-qnames.mod SYSTEM "stmnt-qnames.mod" > <!ENTITY % stmnt-framework.mod SYSTEM "stmnt-framework.mod" > %stmnt-framework.mod; <!ENTITY % stmnt-structure.mod SYSTEM "stmnt-structure.mod" > %stmnt-structure.mod; <!ENTITY % stmnt-address.mod SYSTEM "stmnt-address.mod" > %stmnt-address.mod; <!ENTITY % stmnt-branch.mod SYSTEM "stmnt-branch.mod" > %stmnt-branch.mod; <!ENTITY % stmnt-transaction.mod SYSTEM "stmnt-transaction.mod" > %stmnt-transaction.mod; <!ENTITY % stmnt-legal.mod SYSTEM "stmnt-legal.mod" > %stmnt-legal.mod;

Since all the modules are loaded via entity references, this allows local bank branches and subsidiaries to substitute their own modules by redefining those entities or by writing a different driver.

In this case, the various parts have been loaded in the following order.

Specify whether namespace prefixes are used. (Here they aren't, by default.)
Define the location of the qualified names module. This will be loaded by the framework.
Define and load the framework, which is responsible for loading common DTD parts that cross module boundaries, such as the qualified names module, the entities module, and any common content models or attribute definitions shared by multiple elements.
Define and load the structure module, which is responsible for merging the different modules into a single DTD for a complete document.
Define the separate modules that comprise different, somewhat independent parts of the application. Here there are four:
1. The address module
2. The branch module
3. The transaction module
4. The legal module

Conditional sections can control which modules are or are not included. This makes the DTD a little harder to read, which is why I didn't show it this way in the first place; but they're essential for customizability . Example 8-3 demonstrates.

Example 8-3 The Conditionalized Statement DTD Driver Module

 <!ENTITY % NS.prefixed "IGNORE" > <!ENTITY % stmt.prefix "" > <!-- Address Module   --> <!ENTITY % stmnt-address.module "INCLUDE" > <![%stmnt-address.module;[ <!ENTITY % stmnt-address.mod      SYSTEM "stmnt-address.mod" > %stmnt-address.mod;]]> <!ENTITY % stmnt-branch.module "INCLUDE" > <![%stmnt-branch.module;[ <!ENTITY % stmnt-branch.mod      SYSTEM "stmnt-branch.mod" > %stmnt-branch.mod;]]> <!ENTITY % stmnt-qnames.mod      SYSTEM "stmnt-qnames.mod" > <!ENTITY % stmnt-framework.mod SYSTEM "stmnt-framework.mod" > %stmnt-framework.mod; <!ENTITY % stmnt-transaction.module "INCLUDE" > <![%stmnt-transaction.module;[ <!ENTITY % stmnt-transaction.mod      SYSTEM "stmnt-transaction.mod" > %stmnt-transaction.mod;]]> <!ENTITY % stmnt-legal.module "INCLUDE" > <![%stmnt-legal.module;[ <!ENTITY % stmnt-legal.mod SYSTEM "stmnt-legal.mod" > %stmnt-legal.mod;]]>

Not all pieces can be turned on or off. Generally, the framework and the qualified names modules are required and thus are not wrapped in conditional sections.

Now let's take a look at what you might find inside the individual modules. The qualified names module (Example 8-4), which has not yet been loaded, only referenced, defines the names of elements that will be used in different content models using the parameterization techniques shown in Item 7.

Example 8-4 The Qualified Names Module

 <!ENTITY % Statement.qname      "%statement.prefix;%statement.colon;Statement"> <!ENTITY % Bank.qname      "%statement.prefix;%statement.colon;Bank"> <!ENTITY % Date.qname      "%statement.prefix;%statement.colon;Date"> <!ENTITY % Transaction.qname      "%statement.prefix;%statement.colon;Transaction"> <!ENTITY % Amount.qname      "%statement.prefix;%statement.colon;Amount"> <!ENTITY % Account.qname      "%statement.prefix;%statement.colon;Account"> <!ENTITY % Number.qname      "%statement.prefix;%statement.colon;Number"> <!ENTITY % Owner.qname      "%statement.prefix;%statement.colon;Owner"> <!ENTITY % Type.qname      "%statement.prefix;%statement.colon;Type"> <!ENTITY % OpeningBalance.qname      "%statement.prefix;%statement.colon;OpeningBalance"> <!ENTITY % ClosingBalance.qname      "%statement.prefix;%statement.colon;ClosingBalance"> <!ENTITY % Logo.qname "%statement.prefix;%statement.colon;Logo"> <!ENTITY % Name.qname "%statement.prefix;%statement.colon;Name"> <!ENTITY % Motto.qname     "%statement.prefix;%statement.colon;Motto"> <!ENTITY % Branch.qname     "%statement.prefix;%statement.colon;Branch"> <!ENTITY % Address.qname     "%statement.prefix;%statement.colon;Address"> <!ENTITY % Street.qname     "%statement.prefix;%statement.colon;Street"> <!ENTITY % Apt.qname "%statement.prefix;%statement.colon;Apt"> <!ENTITY % City.qname "%statement.prefix;%statement.colon;City"> <!ENTITY % State.qname     "%statement.prefix;%statement.colon;State"> <!ENTITY % PostalCode.qname     "%statement.prefix;%statement.colon;PostalCode"> <!ENTITY % Country.qname     "%statement.prefix;%statement.colon;Country"> <!ENTITY % AccountActivity.qname     "%statement.prefix;%statement.colon;AccountActivity"> <!ENTITY % Legal.qname     "%statement.prefix;%statement.colon;Legal">

The framework module shown in Example 8-5 actually loads the qualified names module. In addition, it loads any general modules used across the DTD, such as those for defining common attributes or character entities. In other words, the framework module defines those modules that cross other module boundaries.

Example 8-5 The Framework Module

 <!ENTITY % stmnt-qname.mod SYSTEM "stmnt-qnames.mod" > %stmnt-qname.mod; <!ENTITY % stmnt-attribs.module "INCLUDE" > <![%stmnt-attribs.module;[ <!ENTITY % stmnt-attribs.mod SYSTEM "stmnt-attribs.mod" > %stmnt-attribs.mod; ]]> <!ENTITY % stmnt-model.module "INCLUDE" > <![%stmnt-model.module;[%stmnt-model.mod; ]]> <!ENTITY % stmnt-charent.module "INCLUDE" > <![%stmnt-charent.module;[ <!ENTITY % stmnt-charent.mod SYSTEM "stmnt-charent.mod" > %stmnt-charent.mod; ]]>

The structure module (Example 8-6) defines the root element. It provides the overall architecture of a statement document, that is, a Statement element that contains a variety of child elements mostly drawn from other modules.

Example 8-6 The Structure Module

 <!ELEMENT %Statement.qname; (   %Bank.qname;,   %Account.qname;,   %Date.qname;,   %AccountActivity.qname;,   %Legal.qname; )> <!ENTITY % NamespaceDeclaration     "xmlns%statement.colon;%statement.prefix;"> <!ATTLIST %Statement.qname; %NamespaceDeclaration;               CDATA #FIXED "http://namespaces.megabank.com/">

The final step is to define the individual modules for the different classes of content. Example 8-7 demonstrates the transaction module. It's very similar to the parameterized version of the DTD shown in Item 7.

Example 8-7 The Transaction Module

 <!ELEMENT %AccountActivity.qname; (     %OpeningBalance.qname;,     (%Transaction.qname;)*,     %ClosingBalance.qname; )> <!ENTITY  % OpeningBalance.content " #PCDATA "> <!ELEMENT %OpeningBalance.qname; (%OpeningBalance.content;)> <!ENTITY  % ClosingBalance.content " #PCDATA "> <!ELEMENT %ClosingBalance.qname; (%ClosingBalance.content;)> <!ENTITY  % Amount.content " #PCDATA "> <!ELEMENT %Amount.qname; (%Amount.content;)> <!ENTITY  % Date.content " #PCDATA "> <!ELEMENT %Date.qname; (%Date.content;)> <!ENTITY  % TransactionContent     "%Account.qname;?, %Date.qname;, %Amount.qname;"> <!ELEMENT %Transaction.qname; ( %TransactionContent; )> <!ENTITY % TypeAtt    "type"> <!ENTITY % type.extra ""> <!ATTLIST %Transaction.qname; %TypeAtt;   (withdrawal  deposit  transfer %type.extra; )   #REQUIRED >

Notice how almost everything has been parameterized with entity references so that almost any piece can be changed independent of the other pieces. The other modules are similar. In general, the dependencies are limited and unidirectional. The modules depend on the framework and the document model, but not vice versa. This allows you to add and remove modules by adjusting the document model and the framework alone. You can change the individual parts of the module by redefining the various entities. Together with parameterization this makes the DTD extremely flexible. Not all DTDs require this level of customizability, but for those that do, modularization is extremely powerful.