Section 15.5. Entities Breaking up is easy to do | XML in Office 2003: Information Sharing with Desktop XML


Prev	don't be afraid of buying books	Next

15.5. Entities: Breaking up is easy to do

XML allows flexible organization of document text. The XML constructs that provide this flexibility are called entities. They allow a document to be broken up into multiple storage objects and are important tools for reusing and maintaining text. Entities are used in many publishing-oriented applications of XML but are much less common in machine-to-machine applications.

In simple cases, an entity is like an abbreviation in that it is used as a short form for some text. We call the "abbreviation" the entity name and the long form the entity content. That content could be as short as a character or as long as a chapter. For instance, in an XML document, the entity XSL could have the phrase "Extensible Style Language" as its content. Using a reference to that entity is like using "XSL" as an abbreviation for that phrase – the parser replaces the reference with the content.

You create the entity with an entity declaration. Example 15-12 is an entity declaration for an abbreviation.

Example 15-12. Entity used as an abbreviation

    <!ENTITY XSL "Extensible Style Language"> ]>

Like other markup declarations, entity declarations occur in the document type declaration section of the document prolog (Example 15-13).

Example 15-13. Entity declarations occur in the document type declaration

 <!DOCTYPE mydoc ...[   <!ENTITY XSL "Extensible Style Language">   ...other markup declarations ... ]>

Note

You can use entities with schemas. In that case your "DTD" would consist solely of declarations needed for the entities.

Entities can be much more than just abbreviations. Another way to think of an entity is as a box with a label. The label is the entity's name. The content of the box is some sort of text or data. The entity declaration creates the box and sticks on a label with the name. Sometimes the box holds XML text that is going to be parsed (interpreted according to the rules of the XML notation), and sometimes it holds data, which should not be.

15.5.1 Parsed entities

If the content of an entity is XML text that the parser should parse, the XML spec calls it a parsed entity.

If the content of an entity is data that is not to be parsed, the XML spec calls it an unparsed entity.

The abbreviation in Example 15-12 is a parsed entity. Parsed entities, being XML text, can also contain markup. Example 15-14 is a declaration for a parsed entity with some markup in it.

Example 15-14. Parsed entity with markup

 <!ENTITY XSL "<title>Extensible Style Language</title>">

Because the entity content in the example is in the entity declaration, the entity is called an internal entity. Only parsed entities can be internal entities.

15.5.2 External entities

The parser can also fetch content from somewhere on the Web and put that into the box. This is an external entity. For instance, it could fetch a chapter of a book and put it into an entity. This would allow you to reuse the chapter between books. Another benefit is that you could edit the chapter separately with a sufficiently intelligent editor. This would be very useful if you were working on a team project and wanted different people to work on different parts of a document at once. Example 15-15 illustrates.

Example 15-15. External entity declaration

 <!ENTITY intro-chapter SYSTEM "http://www.megacorp.com/intro.xml">

Entities also allow you to edit very large documents without running out of memory. Depending on your software and needs, either each volume or even each article in an encyclopedia could be an entity.

15.5.3 Entity references

An author or DTD designer refers to an entity through an entity reference. The XML parser replaces the reference by the content, as if it were an abbreviation and the content was the expanded phrase. This process is called entity inclusion or entity replacement. After the operation we say either that the entity reference has been replaced by the entity content or that the entity content has been included.

Which you would use depends on whether you are talking from the point of view of the entity reference or the entity content. The content of parsed entities is called their replacement text.^[4]

^[4] If you are a programmer, you might think of entities as macros and call the process entity expansion.

Example 15-16 is an example of a parsed entity declaration and its associated reference.

Example 15-16. Entity declaration

 <!DOCTYPE MAGAZINE[ ... <!ENTITY title "Hacker Life"> ... ]> <MAGAZINE> <TITLE>&title;</TITLE> ... <P>Welcome to the introductory issue of &title;. &title; is geared to today's modern hacker.</P> ... </MAGAZINE>

Anywhere in the document instance that the entity reference "&title;" appears, it is replaced by the text "Hacker Life". It is just as valid to say that "Hacker Life" is included at each point where the reference occurs.

The ampersand character starts all general entity references and the semicolon ends them. The text between is an entity name.

15.5.4 How entities are used

Here are some examples of what you can do with entities:

You could store every chapter of a book in a separate file and link them together as entities.
You could "factor out" often-reused text, such as a product name, into an entity so that it is consistently spelled and displayed throughout the document.
You could update the product name entity to reflect a new version. The change would be instantly visible anywhere the entity was used.
You could create an entity that would represent "legal boilerplate" text (such as a software license) and reuse that entity in many different documents.

Note

We have explained only the basics about entities. For the full story, see the XML Recommendation or The XML Handbook.


	Amazon