Chapter 20. XML 1.0 Reference

CONTENTS

  •  20.1 How to Use This Reference
  •  20.2 Annotated Sample Documents
  •  20.3 XML Syntax
  •  20.4 Constraints
  •  20.5 XML Document Grammar

This chapter is intended to serve as a comprehensive reference to the Extensible Markup Language (XML) 1.0 W3C recommendation (Second Edition), dated 6 October 2000. We have made every effort to cover the contents of the official W3C document exhaustively. However, if you are implementing an XML parser, editor, or other tool, you should also review the latest revision of the recommendation on the Web at http://www.w3.org/TR/REC-xml.

20.1 How to Use This Reference

This chapter consists of examples of XML documents and DTDs, followed by detailed reference sections that describe every feature of the 1.0 specification and a listing of possible well-formedness and validity errors. The syntax elements of a valid XML document are introduced in the rough order in which they appear in an XML document. Each entry explains the syntactic structure, where it can be used, and the applicable validity and well-formedness constraints. Each reference section contains a description of the XML language structure, an informal syntax, and an example of the syntax's usage where appropriate.

20.2 Annotated Sample Documents

These examples are intended as a mnemonic aid for XML syntax and as a quick map from a specific instance of an XML language construct to its corresponding XML syntax reference section. The sample document and DTD incorporate features defined in the XML 1.0 and Namespaces in XML recommendations.

The sample XML application describes the construction of a piece of furniture. Within the figures, each distinct language construct is enclosed in a box, with the relevant reference section name provided as a callout. By locating a construct in the sample, then locating the associated reference section, you can quickly recognize and learn about unfamiliar XML syntax. Four files make up this sample application:

bookcase.xml

The document shown in Figure 20-1 uses furniture.dtd to describe a simple bookcase.

Figure 20-1. bookcase.xml

figs/xian2_2001.gif

furniture.dtd

The XML document type definition shown in Figure 20-2 provides a simple grammar for describing components and assembly details for a piece of furniture.

Figure 20-2. furniture.dtd

figs/xian2_2002.gif

bookcase_ex.ent

The external entity file shown in Figure 20-3 contains additional bookcase-specific elements for the bookcase.xml document.

Figure 20-3. bookcase_ex.ent

figs/xian2_2003.gif

parts_list.ent

Figure 20-4 contains an external parsed general entity example that contains the parts list for the bookcase example document.

Figure 20-4. parts_list.ent

figs/xian2_2004.gif

20.3 XML Syntax

For each section of this reference that maps directly to an XML language structure, an informal syntax reference describes theat structure's form. The following conventions are used with these syntax blocks:

Format

Meaning

DOCTYPE

Bold text indicates literal characters that must appear as written within the document (e.g., DOCTYPE).

encoding-name

Italicized text indicates that the user must replace the text with real data. The item indicates what type of data should be inserted (e.g., encoding-name = en-us).

|

The vertical bar | indicates that only one out of a list of possible values can be selected.

[ ]

Square brackets indicate that a particular portion of the syntax is optional.

20.3.1 Global Syntax Structures

Every XML document is broken into two primary sections: the prolog and the document element. A few documents may also have comments or processing instructions that follow the root element in a sort of epilog (an unofficial term). The prolog contains structural information about the particular type of XML document you are writing, including the XML declaration and document type declaration. The prolog is optional, and if a document does not need to be validated against a DTD, it can be omitted completely. The only required structure in a well-formed XML document is the top-level document element itself.

The following syntax structures are common to the entire XML document. Unless otherwise noted within a subsequent reference item, the following structures can appear anywhere within an XML document.

Whitespace  

   

Whitespace is defined as a space, tab, or empty line (which is composed of a carriage return, line feed, or combination of the two). Whitespace serves the same purpose in XML as it does in most programming and natural languages: to separate tokens and language elements from one another. XML has simplified the task of determining which whitespace is significant to an application and which is not. To an XML parser, all whitespace in element content is significant and will be passed to the client application. Whitespace within tags for instance, between attributes is not significant. Consider the following example:

<p>  This sentence has extraneous    line breaks.</p>

After parsing, the character data from this example element is passed to the underlying application as:

   This sentence has extraneous line breaks.

Though XML specifies that all whitespace in element content be preserved for use by the client application, an additional facility is available to the XML author to further hint that an element's character data's space and formatting should be preserved. For more information, see the discussion of the xml:space attribute in Special Attributes later in this chapter.

Names  

   

To ease the burden of those who write XML parsers, XML names must adhere to the following lexical conventions:

  • Begin with a letter, _, or : character.

  • After the first character, be composed only of letters, digits, ., -, _, and : characters.

In this context, a letter is any Unicode character that matches the Letter production from the EBNF grammar at the end of this chapter.

According to the XML 1.0 specification, the : character may be used freely within names, although the character is now officially reserved as part of the Namespaces in XML recommendation. Even if a document does not use namespaces, the colon should still not be used within identifiers to maintain compatibility with namespace-aware parsers. See the Section 20.3.4 in this chapter for more information about how namespace-aware identifiers are formed.

Names should also avoid starting with the three-letter sequence X, M, L, unless specifically sanctioned by an XML specification.

Character References  

&#decimal-number; &#xhexadecimal-number; 
 

All XML parsers are based on the Unicode character set, no matter what the external encoding of the XML file is. It is theoretically possible to author documents directly in Unicode, but many text-editing, storage, and delivery systems still use the ASCII character set. To allow XML authors to include Unicode characters in their documents' content without forcing them to abandon their existing editing tools, XML provides the character reference mechanism.

A character reference allows an author to insert a Unicode character by number into the output stream produced by the parser to an XML application. Consider an XML document that includes the following character data:

&#xa9; 2002 O'Reilly &#38; Associates

In this example, the parser would replace the character reference with the actual Unicode character and pass it to the client application:

  2002 O'Reilly & Associates

Character references may not be used in element or attribute names, though they may be used in attribute values.

Predefined Entities  

   

Besides user-defined entity references, XML includes the five named entity references shown in Table 20-1 that can be used without being declared. These references are a subset of those available in HTML documents.

Table 20-1. Predefined entities

Entity

Character

XML declaration

&lt;

<

<!ENTITY lt "&#38;#60;">
&gt;

>

<!ENTITY gt "&#62;">
&amp;

&

<!ENTITY amp "&#38;#38;">
&apos;

"

<!ENTITY apos "&#39;">
&quot;

"

<!ENTITY quot "&#34;">

The &lt; and &amp; entities must be used wherever < or & appear in document content. The &gt; entity is frequently used wherever > appears in document content, but is only mandatory to avoid putting the sequence ]]> into content. &apos; and &quot; are generally used only within attribute values to avoid conflicts between the value and the quotes used to contain the value.

Though the parser must recognize these entities regardless of whether they have been declared, you can declare them in your DTD without generating errors.

The presence of these "special" predefined entities creates a conundrum within an XML document. Because it is possible to use these references without declaring them, it is possible to have a valid XML document that includes references to entities that were never declared. The XML specification actually encourages document authors to declare these entities to maintain the integrity of the entity declaration-reference rule. In practical terms, declaring these entities only adds unnecessary complexity to your document.

CDATA (Character Data) Sections  

<![CDATA[unescaped character & markup data]]> 
 

XML documents consist of markup and character data. The < or & characters cannot be included inside normal character data without using a character or entity reference, such as &amp; or &#38;. By using a reference, the resulting < and & characters are not recognized as markup by the parser, but will become part of the data stream to the parser's client application.

For large blocks of character data particularly if the data contains markup, such as an HTML or XML fragment the CDATA section can be used. Within a CDATA block, every character between the opening and closing tag is considered character data. Thus, special characters can be included in a CDATA section with impunity, except for the CDATA closing sequence, ]]>.

CDATA sections are very useful for tasks such as enclosing XML or HTML documents inside of tutorials explaining how to use markup, but it is difficult to process the contents of CDATA sections using XSLT, the DOM, or SAX as anything other than text.

CDATA sections cannot be nested. The character sequence ]]> cannot appear within data that is being escaped, or the CDATA block will be closed prematurely. This situation should not be a problem ordinarily, but if an application includes XML documents as unparsed character data, it is important to be aware of this constraint. If it is necessary to include the CDATA closing sequence in the data, close the open CDATA section, include the closing characters using character references to escape them, then reopen the CDATA section to contain the rest of the character data.

Entity References  


   

An XML entity can best be understood as a macro replacement facility, in which the replacement can be either parsed (the text becomes part of the XML document) or unparsed. If unparsed, the entity declaration points to external binary data that cannot be parsed. Additionally, the replacement text for parsed entities can come from a string or the contents of an external file. During parsing, a parsed entity reference is replaced by the substitution text that is specified in the entity declaration. The replacement text is then reparsed until no more entity or character references remain.

To simplify document parsing, two distinct types of entities are used in different situations: general and parameter. The basic syntax for referencing both entity types is almost identical, but specific rules apply to where each type can be used.

Parameter Entity References  

%name;
 

When an XML parser encounters a parameter entity reference within a document's DTD, it replaces the reference with the entity's text. Whether the replacement text is included as a literal or included from an external entity, the parser continues parsing the replacement text as if it had always been a part of the document. This parsing has interesting implications for nested entity references:

<!ENTITY % YEAR "2001"> <!ENTITY COPYRIGHT "&#xa9; %YEAR;"> . . . <copyright_notice>&COPYRIGHT;</copyright_notice>

After the necessary entity replacements are made, the previous example would yield the following canonical element:

<copyright_notice>  2001</copyright_notice>

XML treats parameter entity references differently depending on where they appear within the DTD. References within the literal value of an entity declaration (such as Copyright &#xa9; %YEAR;) are valid only as part of the external subset. Within the internal subset, parameter entity references may occur only where a complete markup declaration could exist. In other words, within the internal subset, parameter references can be used only to include complete markup declarations.

Parameter entity references are recognized only within the DTD; therefore, the % character has no significance within character data and does not need to be escaped.

General Entity References  

&name;
 

General entity references are recognized only within the parsed character data in the body of an XML document. They may appear within the parsed character data contained in an element start- and end-tag, or within the value of an attribute. They are not recognized within a document's DTD (except inside default values for attributes) or within CDATA sections.

The sequence of operations that occurs when a parsed general entity is included by the XML parser can lead to interesting side effects. An entity's replacement text is, in turn, read by the parser. If character or general entity replacements exist in the entity replacement text, they are also parsed and included as parsing continues.

Comments  

<!-- comment text --> 
 

Comments can appear anywhere in your document or DTD, outside of other markup tags. XML parsers are not required to preserve contents of comment blocks, so they should be used only to store information that is not a part of your application. In reality, most information you might consider storing in a comment block probably should be made an official part of your XML application. Rather than storing data that will be read and acted on by an application in a comment, as is frequently done in HTML documents, you should store it within the element structure of the actual XML document. Enhancing the readability of a complex DTD or temporarily disabling blocks of markup are effective uses of comments.

The character sequence -- cannot be included within a comment block, except as part of the tag closing text. Because comments cannot be nested, commenting out a comment block is impossible. If large blocks of markup that include comments must be temporarily disabled, consider wrapping them in a CDATA section to cause the parser to read them as simple text instead of markup.

Processing Instructions  

<?target [processing-instruction data]?> 
 

Processing instructions provide an escape mechanism that allows an XML application to include instructions to an XML processor that are not part of the XML markup or character data. The processing instruction target can be any legal XML name, except xml in any combination of upper- and lowercase (see Chapter 2). Linking to a stylesheet to provide formatting instructions for a document is a common use of this mechanism. According to the principles of XML, formatting instructions should remain separate from the actual content of a document, but some mechanism must associate the two. Processing instructions are significant only to applications that recognize them.

The notation facility can indicate exactly what type of processing instruction is included, and each individual XML application must decide what to do with the additional data. No action is required by an XML parser when it recognizes that a particular processing instruction matches a declared notation. When this facility is used, applications that do not recognize the public or system identifiers of a given processing instruction target should realize that they could not properly interpret its data portion.

Character Encoding Autodetection

The XML declaration must be the very first item in a document so that the XML parser can determine which character encoding was used to store the document. A chicken-and-egg problem exists, involving the XML declaration's encoding="..." clause: the parser can't parse the clause if it doesn't know what character encoding the document uses. However, since the first five characters of your document must be the string <?xml (if it includes an XML declaration), the parser can read the first few bytes of your document and, in most cases, determine the character encoding before it has read the encoding declaration.

XML Declaration  


<?xml version="1.0" [encoding="encoding-name"][ standalone="yes|no"]?> 
 

The XML declaration serves several purposes. It tells the parser what version of the specification was used, how the document is encoded, and whether the document is completely self-contained or has references to external entities.

The XML declaration, if included, must be the first thing that appears in an XML document. Nothing, except possibly a Unicode byte-order mark, may appear before this structure's initial < character.

Version Information  

... version="1.0" ...
 

The version information attribute denotes which version of the XML specification was used to create the current document. At this time, the only valid version is 1.0.

Encoding Declaration  

... encoding="encoding-name" ... 
 

The encoding declaration, if present, indicates which character-encoding scheme was used to store the document. Although all XML documents are ultimately handled as Unicode by the parser, the external storage scheme may be anything from an ASCII text file using the Latin-1 character set (ISO-8859-1) to a file with native Japanese characters.

XML parsers may also recognize other encodings, but the XML specification only requires that they recognize UTF-8 and UTF-16 encoded documents. Many parsers also support additional character encodings. For a thorough discussion of character-encoding schemes, see Chapter 26.

Standalone Declaration  

... standalone="yes|no" ... 
 

If a document is completely self contained (the DTD, if there is one, is contained completely within the original document), then the standalone="yes" declaration may be used. If this declaration is not given, the value no is assumed, and all external entities are read and parsed. It is possible to convert any document in which standalone="no" to a standalone document by replacing each external entity reference with the text contained in the external entity file.

From the standpoint of an XML application developer, this flag has no effect on how a document is parsed. However, if it is given, it must be accurate. Setting standalone="yes" when a document does require DTD declarations that are not present in the main document file is a violation of XML validity rules.

20.3.2 DTD (Document Type Definition)

Chapter 2 explained the difference between well-formed and valid documents. Well-formed documents that include and conform to a given DTD are considered valid. Documents that include a DTD and violate the rules of that DTD are invalid. The DTD is comprised of the DOCTYPE declaration and both the internal subset (declarations contained directly within the document) and the external subset (declarations that are included from outside the main document).

Parameter Entities  

   

The parameter entity mechanism is a simple macro replacement facility that is only valid within the context of the DTD. Parameter entities are declared and then referenced from within markup or possibly from within other entity declarations. The source of the entity replacement text can be either a literal string or the contents of an external file. Parameter entities simplify maintenance of large, complex documents by allowing authors to build libraries of commonly used entity declarations.

Parameter Entity Declarations  

<!ENTITY % name "Replacement text."> <!ENTITY % name SYSTEM      "system-literal"> <!ENTITY % name PUBLIC "pubid-literal"      "system-literal"> 
 

Parameter entities are declared within the document's DTD and must be declared before they are used. The declaration provides two key pieces of information:

  • The name of the entity, which is used when it is referenced

  • The replacement text, either directly or indirectly through a link to an external entity

Be aware that an XML parser performs some preprocessing on the replacement text before it is used in an entity reference. Most importantly, parameter entity references in the replacement text are recursively expanded before the final version of the replacement text is stored. Character references are also replaced immediately with the specified character. This replacement can lead to unexpected side effects, particularly when constructing parameter entities that declare other parameter entities. For full disclosure of how entity replacement is implemented by an XML parser and what kinds of unexpected side effects can occur, see Appendix D of the XML 1.0 specification. The specification is available on the World Wide Web Consortium web site (http://www.w3.org/TR/REC-xml#sec-entexpand ).

General Entities  

   

General entities are declared within the document type definition and then referenced within the document's text and attribute content. When the document is parsed, the entity's replacement text is substituted for the entity reference. The parser then resumes parsing, starting with the text that was just replaced.

General entities are declared within the DTD using a superset of the syntax used to declare parameter entities. Besides the ability to declare internal parsed entities and external parsed entities, you can declare external unparsed entities and associate an XML notation name with them.

Internal entities are used when the replacement text can be efficiently stored inline as a literal string. The replacement text within an internal entity is included completely in the entity declaration itself, obviating the need for an external file to contain the replacement text. This situation closely resembles the string replacement macro facilities found in many popular programming languages and environments:

<!ENTITY name "Replacement text">

There are two types of external entities: parsed and unparsed. When a parsed entity is referenced, the contents of the external entity are included in the document, and the XML parser resumes parsing, starting with the newly included text. When an unparsed entity is referenced, the parser supplies the application with the unparsed entity's URI, but it does not insert that data into the document or parse it. What to do with that URI is up to the application. Any entity declared with an XML notation name associated with it is an external unparsed entity, and any references to it within the document must be made using attribute values of type ENITITY or ENTITIES:

<!ENTITY name SYSTEM      "system-literal"> <!ENTITY name PUBLIC      "pubid-literal" "system-literal">
Text Declarations  

<?xml[ version="1.0"] encoding="encoding-name"?> 
 

Files that contain external parsed entities must include a text declaration if the entity file uses a character encoding other than UTF-8 or UTF-16. This declaration would be followed by the replacement text of the external parsed entity.

External parsed entities may contain only document content or a completely well-formed subset of the DTD. This restriction is significant because it indicates that external parameter entities cannot be used to play token-pasting games by splitting XML syntax constructs into multiple files, then expecting the parser to reassemble them.

Unparsed Entities  

   

It may be necessary at times to include data in your XML document that should not be parsed. For instance, your XML document may need to include pointers to graphics files that will be used by an application. These files are logically part of the document, but should not be parsed. The XML language allows you to declare external unparsed entities that can be included as attribute values within the content of your document:

<!ENTITY name SYSTEM       "system-literal" NDATA notation_name > <!ENTITY name PUBLIC "pubid-literal "     "system-literal" NDATA notation_name >

To include unparsed entities, you must first declare a notation that will be referenced in the actual entity declaration:

<!NOTATION gif SYSTEM "images/gif">

Then declaring the entity itself is possible:

<!ENTITY bookcase_pic SYSTEM "bookcase.gif" NDATA gif>

As an unparsed general entity, it can be referenced only as an attribute value of type ENTITY or ENTITIES:

<picture src="bookcase_pic" type="gif"/>

When an XML parser parses this element, the information contained in the entity and notation declarations can be used to identify the actual type of data stored in the external entity. For example, a program could choose to display the contents of a GIF external entity on the screen, once the actual format is known.

XLink and similar mechanisms are commonly used in place of unparsed entities.

External Subset  

   

The document type declaration can include part or all of the document type definition from an external file. This external portion of the DTD is referred to as the external DTD subset and may contain markup declarations, conditional sections, and parameter entity references. It must include a text declaration if the character encoding is not UTF-8 or UTF-16:

<?xml[version="1.0"]encoding="encoding-name"?>

This declaration (if present) would then be followed by a series of complete DTD markup statements, including ELEMENT, ATTLIST, ENTITY, and NOTATION declarations, as well as conditional sections, and processing instructions. For example:

<!ELEMENT furniture_item (desc, %extra_tags; user_tags?, parts_list,      assembly+)> <!ATTLIST furniture_item     xmlns CDATA #FIXED "http://namespaces.oreilly.com/furniture/" > ...
Internal DTD Subset  

   

The internal DTD subset is the portion of the document type definition included directly within the document type declaration between the [ and ] characters. The internal DTD subset can contain markup declarations and parameter entity references, but not conditional sections. A single document may have both internal and external DTD subsets, which, when taken together, form the complete document type definition. The following example shows the internal subset, which appears between the [ and ] characters:

<!DOCTYPE furniture_item SYSTEM "furniture.dtd" [ <!ENTITY % bookcase_ex SYSTEM "Bookcase_ex.ent"> %bookcase_ex; <!ENTITY bookcase_pic SYSTEM "bookcase.gif" NDATA gif> <!ENTITY parts_list SYSTEM "parts_list.ent"> ]>
Element Type Declaration  


   

Element type declarations provide a template for the actual element instances that appear within an XML document. The declaration determines what type of content, if any, can be contained within elements with the given name. The following sections describe the various element content options available.

Since namespaces are not explicitly included in the XML 1.0 recommendation, element and attribute declarations within a DTD must give the complete (qualified) name that will be used in the target document. This means that if namespace prefixes will be used in instance documents, the DTD must declare them just as they will appear, prefixes and all. While parameter entities may allow instance documents to use different prefixes, this still makes complete and seamless integration of namespaces into a DTD-based application very awkward.

Empty Element Type  

<!ELEMENT name EMPTY>
 

Elements that are declared empty cannot contain content or nested elements. Within the document, empty elements may use one of the following two syntax forms:

<name [attribute="value"...]/> <name [attribute="value"...]></name>
Any Element Type  

<!ELEMENT name ANY>
 

This content specifier acts as a wildcard, allowing elements of this type to contain character data or instances of any valid element types that are declared in the DTD.

Mixed Content Element Type  

<!ELEMENT name (#PCDATA[| name]+)*> <!ELEMENT name (#PCDATA)>
 

Element declarations that include the #PCDATA token can include text content mixed with other nested elements that are declared in the optional portion of the element declaration. If the #PCDATA token is used, it is not possible to limit the number of times or sequence in which other nested elements are mixed with the parsed character data. If only text content is desired, the asterisk is optional.

Constrained Child Nodes  

<!ELEMENT name (child_node_regexp)[? | * | +]>
 

XML provides a simple regular-expression syntax that can be used to limit the order and number of child elements within a parent element. This language includes the following operators:

Operator

Meaning

Name

Matches an element of the given name

( ... )

Groups expressions for processing as sets of sequences (using the comma as a separator) or choices (using | as a separator)

?

Indicates that the preceding name or expression can occur zero or one times at this point in the document

*

Indicates that the preceding name or expression can occur zero or more times at this point in the document

+

Indicates that the preceding name or expression must occur one or more times at this point in the document

Attribute List Declaration  

<!ATTLIST element_name [attribute_name attribute_type default_decl]*>
 

In a valid XML document it is necessary to declare the attribute names, types, and default values that are used with each element type.

The attribute name must obey the rules for XML identifiers, and no duplicate attribute names may exist within a single declaration.

Attributes are declared as having a specific type. Depending on the declared type, a validating XML parser will constrain the values that appear in instances of those attributes within a document. The following table lists the various attribute types and their meanings:

Attribute type

Meaning

CDATA

Simple character data.

ID

A unique ID value within the current XML document. No two ID attribute values within a document can have the same value, and no element can have two attributes of type ID.

IDREF, IDREFS

A single reference to an element ID (IDREF) or a list of IDs (IDREFS), separated by spaces. Every ID token must refer to a valid ID located somewhere within the document that appears as the ID type attribute's value.

ENTITY, ENTITIES

A single reference to a declared unparsed external entity (ENTITY) or a list of references (ENTITIES), separated by spaces.

NMTOKEN, NMTOKENS

A single name token value (NMTOKEN) or a list of name tokens (NMTOKENS), separated by spaces.

NOTATION Attribute Type  

... NOTATION (notation [| notation]*) ...
 

The NOTATION attribute mechanism lets XML document authors indicate that the character content of some elements obey the rules of some formal language other than XML. The following short sample document shows how notations might be used to specify the type of programming language stored in the code_fragment element:

<?xml version="1.0"?> <!DOCTYPE code_fragment [ <!NOTATION java_code PUBLIC "Java source code"> <!NOTATION c_code PUBLIC "C source code"> <!NOTATION perl_code PUBLIC "Perl source code"> <!ELEMENT code_fragment (#PCDATA)> <!ATTLIST code_fragment           code_lang NOTATION (java_code | c_code | perl_code) #REQUIRED> ]> <code_fragment code_lang="c_code">     main( ) { printf("Hello, world."); } </code_fragment>
Enumeration Attribute Type  

... (name_token [| name_token]*) ...
 

This syntax limits the possible values of the given attribute to one of the name tokens from the provided list:

<!ELEMENT door EMPTY> <!ATTLIST door           state (open | closed | missing) "open"> . . . <door state="closed"/>
Default Values  

   

If an optional attribute is not present on a given element, a default value may be provided to be passed by the XML parser to the client application. The following table shows various forms of the attribute default value clause and their meanings:

Default value clause

Explanation

#REQUIRED

A value must be provided for this attribute.

#IMPLIED

A value may or may not be provided for this attribute.

[#FIXED ] "default value"

If this attribute has no explicit value, the XML parser substitutes the given default value. If the #FIXED token is provided, this attribute's value must match the given default value. In either case, the parent element always has an attribute with this name.

The #FIXED modifier indicates that the attribute may contain only the value given in the attribute declaration. Although redundant, it is possible to provide an explicit attribute value on an element when the attribute was declared as #FIXED. The only restriction is that the attribute value must exactly match the value given in the #FIXED declaration.

Special Attributes  

   

Some attributes are significant to XML and must be declared and implemented in a particular way:

xml:space

The xml:space attribute tells an XML application whether the whitespace within the specified element is significant:

<!ATTLIST element_name xml:space (default|preserve)   default_decl> <!ATTLIST element_name xml:space (default) #FIXED 'default' > <!ATTLIST element_name xml:space (preserve) #FIXED 'preserve' >
xml:lang

For an element's character content, the xml:lang attribute allows a document author to specify the human language for an element's character content. If used in a valid XML document, the document type definition must include an attribute type declaration with the xml:lang attribute name. See Chapter 5 for an explanation of language support in XML.

Notation Declaration  

<!NOTATION notation_name SYSTEM "system-literal"> <!NOTATION notation_name PUBLIC "pubid-literal"> <!NOTATION notation_name PUBLIC "pubid-literal" "system-literal">
 

Notation declarations are used to provide information to an XML application about the format of the document's unparsed content. Notations are used by unparsed external entities, processing instructions, and some attribute values.

Notation information is not significant to the XML parser, but it is preserved for use by the client application. The public and system identifiers are made available to the client application so that it may correctly interpret non-XML data and processing instructions.

Conditional Sections  

   

The conditional section markup provides support for conditionally including and excluding content at parse time within an XML document's external subset. Conditional sections are not allowed within a document's internal subset. The following example illustrates a likely application of conditional sections:

<!ENTITY % debug 'IGNORE' > <!ENTITY % release 'INCLUDE' >   <!ELEMENT addend (#PCDATA)> <!ELEMENT result (#PCDATA)>   <![%debug;[ <!ELEMENT sum (addend+, result)> ]]> <![%release;[ <!ELEMENT sum (result)> ]]>

20.3.3 Document Body

Elements are an XML document's lifeblood. They provide the structure for character data and attribute values that make up a particular instance of an XML document type definition. The !ELEMENT and !ATTLIST declarations from the DTD restrict the possible contents of an element within a valid XML document. Combining elements and/or attributes that violate these restrictions generates an error in a validating parser.

Start-Tags and End-Tags  

<element_name [attribute_name="attribute value"]*> ...</element_name> 
 

Elements that have content (either character data, other elements, or both) must start with a start-tag and end with an element end-tag.

Empty-Element Tags  

<element_name [attribute_name="attribute value"]*></empty_element> <element_name [attribute_name="attribute value"]* /> 
 

Empty elements have no content and are written using either the start- and end-tag syntax mentioned previously or the empty-element syntax. The two forms are functionally identical, but the empty-element syntax is more succinct and more frequently used.

Attributes  

attribute_name="attribute value" attribute_name='attribute value' 
 

Elements may include attributes. The order of attributes within an element tag is not significant and is not guaranteed to be preserved by an XML parser. Attribute values must appear within either single or double quotations. Attribute values within a document must conform to the rules explained in Section 20.4.1 of this chapter.

Note that whitespace may appear around the = character.

The value that appears in the quoted string is tested for validity, depending on the attribute type provided in the !ATTLIST declaration for the element type. Attribute values can contain general entity references, but cannot contain references to external parsed entities. See Section 20.4.1 of this chapter for more information about attribute-value restrictions.

20.3.4 Namespaces

Although namespace support was not part of the original XML 1.0 recommendation, Namespaces in XML was approved less than a year later (January 14, 1999). Namespaces are used to identify uniquely the element and attribute names of a given XML application from those of other applications. See Chapter 4 for more detailed information.

The following sections describe how namespaces impact the formation and interpretation of element and attribute names within an XML document.

Unqualified Names  

name 
 

An unqualified name is an XML element or attribute name that is not associated with a namespace. This could be because it has no namespace prefix and no default namespace has been declared. All unprefixed attribute names are unqualified because they are never automatically associated with a default namespace. XML parsers that do not implement namespace support (of which there are very few) or parsers that have been configured to ignore namespaces will always return unqualified names to their client applications. Two unqualified names are considered to be the same if they are lexically identical.

Qualified Names  

[prefix:]local_part 
 

A qualified name is an element or attribute name that is associated with an XML namespace. There are three possible types of qualified names:

  • Unprefixed element names that are contained within the scope of a default namespace declaration

  • Prefixed element names

  • Prefixed attribute names

Unlike unqualified names, qualified names are considered the same only if their namespace URIs (from their namespace declarations) and their local parts match.

Default Namespace Declaration  

xmlns="namespace_URI" 
 

When this attribute is included in an element start-tag, it and any unprefixed elements contained within it are automatically associated with the namespace URI given. If the xmlns attribute is set to the empty string, any effective default namespace is ignored, and unprefixed elements are not associated with any namespace.

An important caveat about default namespace declarations is that they do not affect unprefixed attributes. Unprefixed attributes are never explicitly named in any namespace, even if their containing element is.

Namespace Prefix Declaration  

xmlns:prefix="namespace_URI" 
 

This declaration associates the namespace URI given with the prefix name given. Once it has been declared, the prefix may qualify the current element name, attribute names, or any other element or attribute name within the scope of the element that declares it. Nested elements may redefine a given prefix, using a different namespace URI if desired.

20.4 Constraints

In addition to defining the basic structures used in documents and DTDs, XML 1.0 defines a list of rules regarding their usage. These constraints put limits on various aspects of XML usage, and documents cannot in fact be considered to be "XML" unless they meet all of the well-formedness constraints. Parsers are required to report violations of these constraints, though only well-formedness constraint violations require that processing of the document halt completely. Namespace constraints are defined in Namespaces in XML, not XML 1.0.

20.4.1 Well-Formedness Constraints

Well-formedness refers to an XML document's physical organization. Certain lexical rules must be obeyed before an XML parser can consider a document well-formed. These rules should not be confused with validity constraints, which determine whether a particular document is valid when parsed using the document structure rules contained in its DTD. The Backus-Naur Form (BNF) grammar rules must also be satisfied. The following sections contain all well-formedness constraints recognized by XML Version 1.0 parsers, including actual text from the 1.0 specification.

PEs in Internal Subset  

   

Text from specification

In the internal DTD subset, parameter entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)

Explanation

It is only legal to use parameter entity references to build markup declarations within the external DTD subset. In other words, within the internal subset, parameter entities may only be used to include complete markup declarations.

External Subset  

   

Text from specification

The external subset, if any, must match production for extSubset.

Explanation

The extSubset production constrains what type of declaration may be contained in the external subset. This constraint generally means that the external subset of the DTD must only include whole declarations or parameter entity references. See the extSubset production in the EBNF grammar at the end of this chapter for specific limitations.

PE Between Declarations  

   

Text from specification

The replacement text of a parameter entity reference in a DeclSep must match the production extSubsetDecl.

Explanation

The replacement text of parameter entities may contain declarations that might not be allowed if the replacement text appeared directly. Parameter entity references in the internal subset cannot appear within declarations, but this rule does not apply to declarations that have been included via parameter entities.

Element Type Match  

   

Text from specification

The Name in an element's end-tag must match the element type in the start-tag.

Explanation

Proper element nesting is strictly enforced, and every open tag must be matched by a corresponding close tag. Of course empty elements do not require and may not have a close tag.

Unique Att Spec  

   

Text from specification

No attribute name may appear more than once in the same start-tag or empty- element tag.

Explanation

Attribute names must be unique within a given element.

No External Entity References  

   

Text from specification

Attribute values cannot contain direct or indirect entity references to external entities.

Explanation

XML parsers report an error when asked to replace references to external parsed entities within attribute values.

No < in Attribute Values  

   

Text from specification

The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

Explanation

This restriction is meant to simplify the task of parsing XML data. Since attribute values can't even appear to contain element data, simple parsers need not track literal strings. Just by recognizing < and > characters, simple parsers can check for proper markup formation and nesting.

Legal Character  

   

Text from specification

Characters referred to using character references must match the production for Char.

Explanation

Any characters that the XML parser generates must be real characters. A few character values in Unicode are not valid standalone characters.

Entity Declared  

   

Text from specification

In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with standalone='yes', for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference must match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration. Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.

Explanation

This long constraint lists the only situations in which an entity reference may appear without a corresponding entity declaration. Since a nonvalidating parser is not obliged to read and parse the external subset, the parser must give the document the benefit of the doubt, if an entity could possibly have been declared.

Parsed Entity  

   

Text from specification

An entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of type ENTITY or ENTITIES.

Explanation

Since unparsed entities can't be parsed, don't try to force the parser to parse them.

No Recursion  

   

Text from specification

A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

Explanation

Be careful how you structure your entities; make sure you don't inadvertently create a circular reference:

<!ENTITY a "&b;"> <!ENTITY b "&c;"> <!ENTITY c "&a;"> <!--wrong!-->
In DTD  

   

Text from specification

Parameter entity references may only appear in the DTD.

Explanation

This constraint is self evident because the % character has no significance outside of the DTD. Therefore, it is perfectly legal to have an element like this in your document:

<ok>%noproblem;</ok>

The text %noproblem; is passed on by the parser without generating an error.

20.4.2 Validity Constraints

The following sections contain all validity constraints that are enforced by a validating parser. Each includes actual text from the XML 1.0 specification and a short explanation of what the constraint actually means.

Root Element Type  

   

Text from specification

The Name in the document type declaration must match the element type of the root element.

Explanation

The name provided in the !DOCTYPE declaration identifies the root element's name and must match the name of the root element in the document.

Proper Declaration/PE Nesting  

   

Text from specification

Parameter entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration is contained in the replacement text for a parameter entity reference, both must be contained in the same replacement text.

Explanation

This constraint means you can't create a parameter entity that completes one DTD declaration and begins another; the following XML fragment would violate this constraint:

<!ENTITY % finish_it ">"> <!ENTITY % bad "won't work" %finish_it; <!--wrong!-->
Standalone Document Declaration  

   

Text from specification

The standalone document declaration must have the value "no" if any external markup declarations contain declarations of: attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or entities (other than amp, lt, gt, apos, quot), if references to those entities appear in the document, or attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or element types with element content, if whitespace occurs directly within any instance of those types.

Explanation

This laundry list of potential standalone flag violations can be read to mean, "If you have an external subset in your DTD, ensure that your document doesn't depend on anything in it if you say standalone='yes' in your XML declaration." A more succinct interpretation would be, "If your document has an external DTD subset, just set standalone to no."

Element Valid  

   

Text from specification

An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds: The declaration matches EMPTY and the element has no content. The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional whitespace (characters matching the nonterminal S) between the start-tag and the first child element, between child elements, or between the last child element and the end-tag. Note that a CDATA section containing only whitespace does not match the nonterminal S, and hence cannot appear in these positions. The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model. The declaration matches ANY, and the types of any child elements have been declared.

Explanation

If a document includes a DTD with element declarations, make sure the actual elements in the document match the rules set down in the DTD.

Attribute Value Type  

   

Text from specification

The attribute must have been declared; the value must be of the type declared for it.

Explanation

All attributes used on elements in valid XML documents must have been declared in the DTD, including the xml:space and xml:lang attributes. If you declare an attribute for an element, make sure that every instance of that attribute has a value conforming to the type specified. (For attribute types, see Attribute List Declaration.)

Unique Element Type Declaration  

   

Text from specification

No element type may be declared more than once.

Explanation

Unlike entity and attribute declarations, only one declaration may exist for a particular element type.

Proper Group/PE Nesting  

   

Text from specification

Parameter entity replacement text must be properly nested with parenthesized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

For interoperability, if a parameter entity reference appears in a choice, seq, or Mixed construct, its replacement text should contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

Explanation

This constraint restricts the way parameter entities can be used to construct element declarations. It is similar to the Proper Declaration/PE Nesting constraint in that parameter entities may not be used to complete or open new parenthesized expressions. It prevents the XML author from hiding significant syntax elements inside parameter entities.

No Duplicate Types  

   

Text from specification

The same name must not appear more than once in a single mixed-content declaration.

Explanation

Don't list the same element type name more than once in the same mixed-content declaration.

ID  

   

Text from specification

Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

Explanation

No two attribute values for attributes declared as type ID can have the same value. This constraint is not restricted by element type, but it is global across the entire document.

One ID per Element Type  

   

Text from specification

No element type may have more than one ID attribute specified.

Explanation

Each element can have at most one ID type attribute.

ID Attribute Default  

   

Text from specification

An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

Explanation

To avoid potential duplication, you can't declare an ID attribute to be #FIXED or provide a default value for it.

IDREF  

   

Text from specification

Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e., IDREF values must match the value of some ID attribute.

Explanation

ID references must refer to actual ID attributes that exist within the document.

Entity Name  

   

Text from specification

Values of type ENTITY must match the Name production, and values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

Explanation

Attributes declared to contain entity references must contain references to unparsed entities declared in the DTD.

Name Token  

   

Text from specification

Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

Explanation

If an attribute is declared to contain a name or list of names, the values must be legal XML name tokens.

Notation Attributes  

   

Text from specification

Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

Explanation

Attributes that must contain notation names must contain names that reference notations declared in the DTD.

One Notation per Element Type  

   

Text from specification

No element type may have more than one NOTATION attribute specified.

Explanation

A given element can have only one attribute declared with the NOTATION attribute type. This constraint is provided for backward compatibility with SGML.

No Notation on Empty Element  

   

Text from specification

For compatibility, an attribute of type NOTATION must not be declared on an element declared EMPTY.

Explanation

Empty elements cannot have NOTATION attributes in order to maintain compatibility with SGML.

Enumeration  

   

Text from specification

Values of this type must match one of the Nmtoken tokens in the declaration.

Explanation

Assigning a value to an enumerated type attribute that isn't listed in the enumeration is illegal in the DTD.

Required Attribute  

   

Text from specification

If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

Explanation

Required attributes must appear in the document and have a value assigned to them if they are declared as #REQUIRED in the DTD.

Attribute Default Legal  

   

Text from specification

The declared default value must meet the lexical constraints of the declared attribute type.

Explanation

If you provide a default attribute value, it must obey the same rules that apply to a normal attribute value within the document.

Fixed Attribute Default  

   

Text from specification

If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

Explanation

If you choose to provide an explicit value for a #FIXED attribute in your document, it must match the default value given in the attribute declaration.

Proper Conditional Section/PE Nesting  

   

Text from specification

If any of the "<![", "[", or "]]>" of a conditional section is contained in the replacement text for a parameter entity reference, all of them must be contained in the same replacement text.

Explanation

If you use a parameter entity to contain the beginning of a conditional section, the parameter entity must also contain the end of the section.

Entity Declared  

   

Text from specification

In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities amp, lt, gt, apos, quot, in the form specified in 4.6 Predefined Entities. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any attribute-list declaration containing a default value with a direct or indirect reference to that general entity.

Explanation

Parameter and general entity declarations must precede any references to these entities. All entity references must refer to previously declared entities. The specification also states that declaring the five predefined general entities (amp, lt, gt, apos, and quot) is a good idea. In reality, declaring the predefined general entities adds unnecessary complexity to most applications.

Notation Declared  

   

Text from specification

The Name must match the declared name of a notation.

Explanation

External unparsed entities must use a notation that is declared in the document.

Unique Notation Name  

   

Text from specification

Only one notation declaration can declare a given Name.

Explanation

Declaring two notations with the same name is illegal.

20.4.3 Namespace Constraints

The following list contains all constraints defined by the namespaces specification. Each includes actual text from the Namespaces in XML specification and a short explanation of what the constraint actually means.

Leading "XML"  

   

Text from specification

Prefixes beginning with the three-letter sequence x, m, l, in any case combination, are reserved for use by XML and XML-related specifications.

Explanation

Just like most other names in XML, namespace prefixes names can't begin with xml unless they've been defined by the W3C.

Prefix Declared  

   

Text from specification

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e., an element in whose content the prefixed markup occurs). The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. The prefix xmlns is used only for namespace bindings and is not itself bound to any namespace name.

Explanation

You have to declare all namespaces before you can use them. The prefixes have no meaning without the declarations, so using a prefix without a declaration context is an error. The namespace with the prefix xml is permanently defined, so there is no need to redeclare it. The xmlns prefix used by namespace declarations is not considered a namespace prefix itself, and no declaration is needed for it.

20.5 XML Document Grammar

The Extended Backus-Naur Form (EBNF) grammar, shown in the following section, was collected from the XML 1.0 Recommendation, Second Edition. It brings all XML language productions together in a single location and describes the syntax that is understood by XML 1.0-compliant parsers. Each production has been numbered and cross-referenced using superscripted numbers.

20.5.1 Extended Backus-Naur Form (EBNF) Grammar

20.5.1.1 Document
[1] document ::= prolog22 element39 Misc27
20.5.1.2 Character range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
20.5.1.3 Whitespace
[3] S ::= (#x20 | #x9 | #xD | #xA)+
20.5.1.4 Names and tokens
[4] NameChar ::= Letter84 | Digit88 | '.' | '-' | '_' | ':' | CombiningChar87 | Extender89 [5] Name ::= (Letter84 | '_' | ':') (NameChar4)* [6] Names ::= Name5 (#x20 Name5)* [7] Nmtoken ::= (NameChar4)+ [8] Nmtokens ::= Nmtoken7 (#x20 Nmtoken7)*
20.5.1.5 Literals
[9] EntityValue ::= '"' ([^%&"] | PEReference69 | Reference67)* '"' |  "'" ([^%&'] | PEReference69 | Reference67)* "'" [10] AttValue ::= '"' ([^<&"] | Reference67)* '"' |  "'" ([^<&'] | Reference67)* "'" [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar13* '"' | "'" (PubidChar13 - "'")* "'" [13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'( )+,./:=?;!*#@$_%]
20.5.1.6 Character data
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
20.5.1.7 Comments
[15] Comment ::= '<!--' ((Char2 - '-') | ('-' (Char2 - '-')))* '-->'
20.5.1.8 Processing instructions
[16] PI ::= '<?' PITarget17 (S3 (Char2* - (Char2* '?>' Char2*)))? '?>' [17] PITarget ::= Name5 - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
20.5.1.9 CDATA sections
[18] CDSect ::= CDStart19 CData20 CDEnd21 [19] CDStart ::= '<![CDATA[' [20] CData ::= (Char2* - (Char2* ']]>' Char2*)) [21] CDEnd ::= ']]>'
20.5.1.10 Prolog
[22] prolog ::= XMLDecl23? Misc27* (doctypedecl28 Misc27*)? [23] XMLDecl ::= '<?xml' VersionInfo24 EncodingDecl80? SDDecl32? S3? '?>' [24] VersionInfo ::= S3 'version' Eq ("'" VersionNum26 "'" | '"' VersionNum26 '"') [25] Eq ::= S3? '=' S3? [26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ [27] Misc ::= Comment15 | PI16 | S3
20.5.1.11 Document type definition
[28] doctypedecl ::= '<!DOCTYPE' S3 Name5 (S3 ExternalID)? S3? ('[' intSubset28b ']' S3?)? '>' [28a] DeclSep ::= PEReference69 | S3 [28b] intSubset ::= (markupdecl29 | DeclSep28a)* [29] markupdecl ::= elementdecl45 | AttlistDecl52 | EntityDecl70 | NotationDecl82 | PI16 | Comment15
20.5.1.12 External subset
[30] extSubset ::= TextDecl77? extSubsetDecl31 [31] extSubsetDecl ::= ( markupdecl29 | conditionalSect61 | DeclSep28a)*
20.5.1.13 Standalone document declaration
[32] SDDecl ::= S3 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))
20.5.1.14 Element
[39] element ::= EmptyElemTag44 | STag40 content43 ETag42 
20.5.1.15 Start-tag
[40] STag ::= '<' Name5 (S3 Attribute)* S3? '>' [41] Attribute ::= Name5 Eq AttValue10
20.5.1.16 End-tag
[42] ETag ::= '</' Name5 S3? '>'
20.5.1.17 Content of elements
[43] content ::= CharData14? ((element39 | Reference67 | CDSect18 | PI16 | Comment15) CharData14?)*
20.5.1.18 Tags for empty elements
[44] EmptyElemTag ::= '<' Name5 (S3 Attribute41)* S3? '/>'
20.5.1.19 Element type declaration
[45] elementdecl ::= '<!ELEMENT' S3 Name5 S3 contentspec46 S3? '>' [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed51 | children47
20.5.1.20 Element-content models
[47] children ::= (choice49 | seq50) ('?' | '*' | '+')? [48] cp ::= (Name5 | choice49 | seq50) ('?' | '*' | '+')? [49] choice ::= '(' S3? cp48 ( S3? '|' S3? cp48 )+ S3? ')' [50] seq ::= '(' S3? cp48 ( S3? ',' S3? cp48 )* S3? ')'
20.5.1.21 Mixed-content declaration
[51] Mixed ::= '(' S3? '#PCDATA' (S3? '|' S3? Name5)* S3? ')*' | '(' S3? '#PCDATA' S3? ')'
20.5.1.22 Attribute-list declaration
[52] AttlistDecl ::= '<!ATTLIST' S3 Name5 AttDef53* S3? '>' [53] AttDef ::= S3 Name5 S3 AttType54 S3 DefaultDecl60
20.5.1.23 Attribute types
[54] AttType ::= StringType55 | TokenizedType56 | EnumeratedType57 [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' |  'NMTOKEN' | 'NMTOKENS'
20.5.1.24 Enumerated attribute types
[57] EnumeratedType ::= NotationType58 | Enumeration59 [58] NotationType ::= 'NOTATION' S3 '(' S3? Name5 (S3? '|' S3? Name5)* S3? ')' [59] Enumeration ::= '(' S3? Nmtoken7 (S3? '|' S3? Nmtoken7)* S3? ')'
20.5.1.25 Attribute defaults
[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S3)? AttValue10
20.5.1.26 Conditional section
[61] conditionalSect ::= includeSect62 | ignoreSect63 [62] includeSect ::= '<![' S3? 'INCLUDE' S3? '[' extSubsetDecl31 ']]>' [63] ignoreSect ::= '<![' S3? 'IGNORE' S3? '[' ignoreSectContents64* ']]>' [64] ignoreSectContents ::= Ignore65 ('<![' ignoreSectContents64 ']]>' Ignore65)* [65] Ignore ::= Char2* - (Char2* ('<![' | ']]>') Char2*)
20.5.1.27 Character reference
[66] CharRef ::= '&#' [0-9]+ ';'  | '&#x' [0-9a-fA-F]+ ';'
20.5.1.28 Entity reference
[67] Reference ::= EntityRef68 | CharRef66 [68] EntityRef ::= '&' Name5 ';' [69] PEReference ::= '%' Name5 ';'
20.5.1.29 Entity declaration
[70] EntityDecl ::= GEDecl71 | PEDecl72 [71] GEDecl ::= '<!ENTITY' S3 Name5 S3 EntityDef73 S3? '>' [72] PEDecl ::= '<!ENTITY' S3 '%' S3 Name5 S3 PEDef74 S3? '>' [73] EntityDef ::= EntityValue9 | (ExternalID75 NDataDecl76?) [74] PEDef ::= EntityValue9 | ExternalID75
20.5.1.30 External entity declaration
[75] ExternalID ::= 'SYSTEM' S3 SystemLiteral11 | 'PUBLIC' S3 PubidLiteral12 S3 SystemLiteral11 [76] NDataDecl ::= S3 'NDATA' S3 Name5
20.5.1.31 Text declaration
[77] TextDecl ::= '<?xml' VersionInfo24? EncodingDecl80 S3? '?>'
20.5.1.32 Well-formed external parsed entity
[78] extParsedEnt ::= TextDecl77? content43 [79] extPE ::= TextDecl77? extSubsetDecl31
20.5.1.33 Encoding declaration
[80] EncodingDecl ::= S3 'encoding' Eq ('"' EncName81 '"' | "'" EncName81 "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
20.5.1.34 Notation declarations
[82] NotationDecl ::= '<!NOTATION' S3 Name5 S3 (ExternalID75 | PublicID83) S3? '>' [83] PublicID ::= 'PUBLIC' S3 PubidLiteral12
20.5.1.35 Characters
[84] Letter ::= BaseChar85 | Ideographic86 [85] BaseChar ::= [#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] |  [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] |  [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] |  [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 |  [#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] | [#x03D0-#x03D6] |  #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] |  [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] |  [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] |  [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] | [#x05D0-#x05EA] |  [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] | [#x0671-#x06B7] |  [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] |  [#x0905-#x0939] | #x093D | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] |  [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] |  [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] |  [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] |  [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] |  #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] |  [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] |  [#x0B13-#x0B28] | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D |  [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] |  [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] |  [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] |  [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] |  [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] |  [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] |  [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] | [#x0D60-#x0D61] |  [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] |  #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] |  [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 |  [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] |  [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] | [#x1105-#x1107] |  #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C |  #x114E | #x1150 | [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 | #x1165 |  #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E | #x11A8 |  #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB |  #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] |  [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 |  #x1F5B | #x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE |  [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] |  [#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] |  #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] |  [#xAC00-#xD7A3] [86] Ideographic ::= [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029] [87] CombiningChar ::= [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] |  [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] |  #x05C4 | [#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] |  [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C |  [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] |  #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] | [#x09CB-#x09CD] |  #x09D7 | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] |  [#x0A47-#x0A48] | [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC |  [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] | #x0B3C |  [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] |  [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] | [#x0BCA-#x0BCD] | #x0BD7 |  [#x0C01-#x0C03] | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] |  [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] |  [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] | [#x0D02-#x0D03] | [#x0D3E-#x0D43] |  [#x0D46-#x0D48] | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] |  [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] |  [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] |  [#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] |  #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 | #x309A [88] Digit ::= [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] |  [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] |  [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] |  [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29] [89] Extender ::= #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 |  #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]
CONTENTS


XML in a Nutshell
XML in a Nutshell, 2nd Edition
ISBN: 0596002920
EAN: 2147483647
Year: 2001
Pages: 28

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net