XML documents may be composed of multiple parsed entities, as you learned in Chapter 3. These external parsed entities may be DTD fragments or chunks of XML that will be inserted into the master document using external general entity references. In either case, the external parsed entity does not necessarily use the same character set as the master document. Indeed, one external parsed entity may be referenced in several different files, each of which is written in a different character set. Therefore, it is important to specify the character set for an external parsed entity independently of the character set that the including document uses.
To accomplish this task, each external parsed entity should have a text declaration . If present, the text declaration must be the very first thing in the external parsed entity. For example, this text declaration says that the entity is encoded in the KOI8-R character set:
<?xml version="1.0" encoding="KOI8-R"?>
The text declaration looks like an XML declaration. It has version info and an encoding declaration. However, a text declaration must not have a standalone declaration. Furthermore, the version information may be omitted. A legal text declaration that specifies the encoding as KOI8-R might look like this:
However, this is not a legal XML declaration.
Example 5-1 shows an external parsed entity containing several verses from Pushkin's The Bronze Horseman in a Cyrillic script. The text declaration identifies the encoding as KOI8-R. Example 5-1 is not a well- formed XML document because it has no root element. It exists only for inclusion in other documents.
Example 5-1. An external parsed entity with a text declaration identifying the character set as KOI8-R
External DTD subsets reside in external parsed entities and, thus, may have text declarations. Indeed, they should have text declarations if they're written in a character set other than one of the Unicode's variants. Example 5-2 shows a DTD fragment written in KOI8-R that might be used to validate Example 5-1 after it is included as part of a larger document.
Example 5-2. A DTD with a text declaration identifying the character set as KOI8-R