An instance of XML is a hierarchical set of entities. Entities constitute the physical structure of an XML document. Figure 3-7 shows the different types of entities other than the special document entity. Figure 3-7. Entity typesXML uses entities to represent special characters, to represent repetitious text, and to include the content of other files. These storage units of XML documents contain particular parts of the document [Harold]. Namely, an entity contains either text or binary data but not both. It may be a file, a database record, or a network resource. Entities also contain either parsed or unparsed data. Parsed data consists of characters, some of which form the character data in the document, and some of which form markup. Every entity has a unique name except the document entity. The document entity consists of the XML declaration, the DTD, and the root element, as shown in Example 3-11. Example 3-11 Document entity<?xml version="1.0"?> <!DOCTYPE memo SYSTEM "InternalMemo.dtd"> <memo> M </memo> Entities are reusable chunks of data, much like macros, and are part of XML's inheritance from SGML. XML classifies entities into two categories: general or parameter entities. Each of those can be either an internal entity or an external entity. General external entities can be parsed or unparsed (see Figure 3-7). You must declare an entity before you can refer to it through an entity reference. Entity references are part of the logical structure of an XML document and are described later in this chapter. Figure 3-8 declares that the entity xml has the text contents "Extensible Markup Language". The processor reads the entity reference of "&xml;" and replaces each instance with "Extensible Markup Language". Figure 3-8. An entity declaration and entity reference3.6.1 General and Parameter EntitiesGeneral entities can appear anywhere in text or markup and are mostly used to represent larger chunks of data. External general entities can reference other documents, such as images and video clips. To insert entities into a document, you use entity references. You declare general entities in the DTD with the markup as shown in Example 3-12. Note that the entity name is the abbreviation for the replacement text. Example 3-12 An entity declaration and entity reference<!ENTITY name "definition or replacement text"> Parameter entities appear only in the DTD. With a parameter entity, a percent sign ("%") appears before the entity name instead of the ampersand ("&") that is used by general entities. 3.6.2 Internal and External EntitiesInternal entities define the entity replacement text and are stored completely within the XML document. An internal entity is a parsed entity. That is, the processor will parse the entity's replacement text as part of the document in which a reference to it occurs. In contrast, an external entity points to a system or public identifier. External entities acquire their content from another source located through a URI [RFC 2396]. The content of an external entity is not part of the current document. An external entity may or may not be parsed. Unparsed entities let you reference non-XML data, such as an image, from a document. An entity can be referenced at many places where it is logically inserted. If an entity is unparsed, a "notation" must identify the type of document referenced by the entity. You must use the keyword NDATA to introduce any unparsed entity such as an image document, as follows: <!ENTITY bsf1 SYSTEM http://pics.bigstickfarm.com/bsf1.gif NDATA GIF> Figure 3-9 gives an example of each type of entity: internal, external, parsed, and unparsed. Figure 3-9. Internal, external, parsed, and unparsed entity declarations3.6.3 Entity ReferencesAn entity reference refers to the content of a named entity. Entity references point to parsed general entities and use the ampersand ("&") as a beginning delimiter and the semicolon (";") as an ending delimiter. Parameter-entity references use the percent sign ("%") and semicolon as corresponding delimiters. You invoke parsed entities by name using entity references. To distinguish markup elements in a document, the XML specification reserves some characters to identify the start and end of markup. For instance, the ampersand character ("&") and the left angle bracket ("<") may normally appear in their literal form only when used as markup delimiters. XML provides an alternative method to represent these characters in the content of a document. Character references allow you to insert specific characters as described in Section 3.4.7. When using entities and entity references, note the following points:
|