Item 7. Parameterize DTDs | Effective XML: 50 Specific Ways to Improve Your XML

No one XML application can serve all uses. No one DTD can describe every necessary document. As obvious as this statement seems, there have been a number of failed efforts to develop DTDs that describe all possible documents in a given field, even fields as large as all business documents. A much more sensible approach is to design DTDs so they can be customized for different local environments. Elements and attributes can be added to or removed from particular systems. Names can be translated into the local language. Even content models can be adjusted to suit the local needs.

You cannot override attribute lists or element declarations in a DTD. However, you can override entity definitions, and this is the key to making DTDs extensible. If the attribute lists and element declarations are defined by reference to parameter entities, redefining the parameter entity effectively changes all the element declarations and attribute lists based on that parameter entity. When writing a parameterized DTD, almost everything of interest is defined as a parameter entity reference, including:

Element names
Attribute names
Element content models
Attribute types
Attribute lists
Namespace URIs
Namespace prefixes

The extra level of indirection allows almost any aspect of the DTD to be changed. For example, consider a simple XML application that describes bank statements. A traditional monolithic DTD for this application might look like the listing below.

 <!ELEMENT Statement (Bank, Account, Date, OpeningBalance,                      Transaction*, ClosingBalance)> <!ATTLIST Statement     xmlns CDATA #FIXED "http://namespaces.megabank.com/"> <!ELEMENT Account (Number, Type, Owner)> <!ELEMENT Number         (#PCDATA)> <!ELEMENT Type           (#PCDATA)> <!ELEMENT Owner          (#PCDATA)> <!ELEMENT OpeningBalance (#PCDATA)> <!ELEMENT ClosingBalance (#PCDATA)> <!ELEMENT Bank           (#PCDATA)> <!ELEMENT Date           (#PCDATA)> <!ELEMENT Amount         (#PCDATA)> <!ELEMENT Transaction (Account?, Date, Amount)> <!ATTLIST Transaction     type (withdrawal  deposit  transfer) #REQUIRED>

(I've stripped out most of the comments to save space. You can see them in Item 5 though. Of course, this example is a lot simpler than any real bank statement application would be.)

We can parameterize the DTD by defining each of the element and attribute names, content models, and types as parameter entity references. For example, the Number element could be defined like this:

 <!ENTITY % Number "Number"> <!ELEMENT %Number; (#PCDATA)>

However, you normally shouldn't parameterize just the name . You should also parameterize the content model.

 <!ENTITY % Number.content " #PCDATA "> <!ELEMENT %Number; (%Number.content;)>

This is longer and less clear, but it is much more extensible. For example, suppose that in a particular special document you want to change the content model from simple PCDATA to a branch code followed by a customer code like this:

 <Number>   <BranchCode>00003</BranchCode>   <CustomerCode>145298</CustomerCode> </Number>

You can do this in the internal DTD subset by overriding the Number.content parameter entity reference.

 <!DOCTYPE Statement PUBLIC "-//MegaBank//DTD Statement//EN"                            "modular_statement.dtd" [   <!ENTITY % Number.content " BranchCode, CustomerCode ">   <!ELEMENT BranchCode   (#PCDATA)>   <!ELEMENT CustomerCode (#PCDATA)> ]>

Of course you can also do this in other DTDs that import the original DTD as well as in the internal DTD subset. You just need to be careful that the new entity definitions appear before the original entity definitions. (Declarations in the internal DTD subset are considered to come before everything in the external DTD subset.)

The same principles apply for elements such as Transaction with more complicated content models. Every element name is replaced by a parameter entity reference that points to the name. Every content model is replaced by a parameter entity reference that points to the content model. For example, the complete parameterized declaration of Transaction might look like this:

 <!ENTITY % AccountElement     "Account"> <!ENTITY % DateElement        "Date"> <!ENTITY % AmountElement      "Amount"> <!ENTITY % TransactionElement "Transaction"> <!ENTITY % TransactionContent             "%AccountElement;?, %DateElement;, %AmountElement;"> <!ELEMENT %TransactionElement; ( %TransactionContent; )>

One of the most common adjustments to content models is adding an extra child element. While this can be done by redefining the complete content entity, it's even better to prepare for this in advance by defining an empty extra entity, as shown below.

 <!ENTITY % Transaction.extra ""> <!ENTITY % TransactionContent   "%AccountElement;?, %DateElement;, %AmountElement;    %Transaction.extra;">

By default this adds nothing to the content model. However, redefining Transaction.extra allows new elements to be added.

 <!ELEMENT ApprovedBy (#PCDATA)> <!ENTITY % Transaction.extra ", ApprovedBy">

This works even better with choices than with sequences thanks to the insignificance of order. With a sequence we can add the new elements only at the end. With a choice, the new elements can appear anywhere .