3.7 Parameter Entities | XML in a Nutshell, Third Edition

It is not uncommon for multiple elements to share all or part of the same attribute lists and content specifications. For instance, any element that's a simple XLink will have xlink:type and xlink:href attributes, and perhaps xlink:show and xlink:actuate attributes. In XHTML, a th element and a td element contain more or less the same content. Repeating the same content specifications or attribute lists in multiple element declarations is tedious and error-prone . It's entirely possible to add a newly defined child element to the declaration of some of the elements but forget to include it in others.

For example, consider an XML application for residential real estate listings that provides separate elements for apartments, sublets, coops for sale, condos for sale, and houses for sale. The element declarations might look like this:

 <!ELEMENT apartment (address, footage, rooms, baths, rent)> <!ELEMENT sublet    (address, footage, rooms, baths, rent)> <!ELEMENT coop      (address, footage, rooms, baths, price)> <!ELEMENT condo     (address, footage, rooms, baths, price)> <!ELEMENT house     (address, footage, rooms, baths, price)>

There's a lot of overlap between the declarations, i.e., a lot of repeated text. And if you later decide you need to add an additional element, available_date for instance, then you need to add it to all five declarations. It would be preferable to define a constant that can hold the common parts of the content specification for all five kinds of listings and refer to that constant from inside the content specification of each element. Then to add or delete something from all the listings, you'd only need to change the definition of the constant.

An entity reference is the obvious candidate here. However, general entity references are not allowed to provide replacement text for a content specification or attribute list, only for parts of the DTD that will be included in the XML document itself. Instead, XML provides a new construct exclusively for use inside DTDs, the parameter entity , which is referred to by a parameter entity reference . Parameter entities behave and are declared almost exactly like a general entity. However, they use a % instead of an & , and they can only be used in a DTD, while general entities can only be used in the document content.

3.7.1 Parameter Entity Syntax

A parameter entity reference is declared much like a general entity reference. However, an extra percent sign is placed between the <!ENTITY and the name of the entity. For example:

 <!ENTITY % residential_content "address, footage, rooms, baths"> <!ENTITY % rental_content      "rent"> <!ENTITY % purchase_content    "price">

Parameter entities are dereferenced in the same way as a general entity reference, only with a percent sign instead of an ampersand:

 <!ELEMENT apartment (%residential_content;, %rental_content;)> <!ELEMENT sublet    (%residential_content;, %rental_content;)> <!ELEMENT coop      (%residential_content;, %purchase_content;)> <!ELEMENT condo     (%residential_content;, %purchase_content;)> <!ELEMENT house     (%residential_content;, %purchase_content;)>

When the parser reads these declarations, it substitutes the entity's replacement text for the entity reference. Now all you have to do to add an available_date element to the content specification of all five listing types is add it to the residential_content entity like this:

 <!ENTITY % residential_content "address, footage, rooms,                                 baths, available_date">

The same technique works equally well for attribute types and element names . You'll see several examples of this in the next chapter on namespaces and in Chapter 9.

This trick is limited to external DTDs. Internal DTD subsets do not allow parameter entity references to be only part of a markup declaration. However, parameter entity references can be used in internal DTD subsets to insert one or more entire markup declarations, typically through external parameter entities.

3.7.2 Redefining Parameter Entities

What makes parameter entity references particularly powerful is that they can be redefined. If a document uses both internal and external DTD subsets, then the internal DTD subset can specify new replacement text for the entities. If ELEMENT and ATTLIST declarations in the external DTD subset are written indirectly with parameter entity references, instead of directly with literal element names, the internal DTD subset can change the DTD for the document. For instance, a single document could add a bedrooms child element to the listings by redefining the residential_content entity like this:

 <!ENTITY % residential_content "address, footage, rooms,                                 bedrooms, baths, available_date">

In the event of conflicting entity declarations, the first one encountered takes precedence. The parser reads the internal DTD subset first. Thus, the internal definition of the residential_content entity is used. When the parser reads the external DTD subset, every declaration that uses the residential_content entity will contain a bedrooms child element it wouldn't otherwise have.

Modular XHTML, which we'll discuss in Chapter 7, makes heavy use of this technique to allow particular documents to select only the subset of HTML that they actually need.

3.7.3 External DTD Subsets

Real-world DTDs can be quite complex. The SVG DTD is over 1,000 lines long. The XHTML 1.0 strict DTD (the smallest of the three XHTML DTDs) is more than 1,500 lines long. And these are only medium- sized DTDs. The DocBook XML DTD is over 11,000 lines long. It can be hard to work with, comprehend, and modify such a large DTD when it's stored in a single monolithic file.

Fortunately, DTDs can be broken up into independent pieces. For instance, the DocBook DTD is distributed in 28 separate pieces covering different parts of the spec: one for tables, one for notations, one for entity declarations, and so on. These different pieces are then combined at validation time using external parameter entity references .

An external parameter entity is declared using a normal ENTITY declaration with a % sign just like a normal parameter entity. However, rather than including the replacement text directly, the declaration contains the SYSTEM keyword, followed by a URL to the DTD piece it wants to include. For example, the following ENTITY declaration defines an external entity called "names" whose content is taken from the file at the relative URL names.dtd . Then the parameter entity reference %names; inserts the contents of that file into the current DTD.

 <!ENTITY % names SYSTEM "names.dtd"> %names;

You can use either relative or absolute URLs. In most situations, relative URLs are more practical.