[Definition: Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications.] Each attribute specification has a name and a value. Element [39] element ::= EmptyElemTag STag content ETag [WFC: Element Type Match] [VC: Element Valid] This specification does not constrain the semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X' 'x')('M' 'm')('L' 'l')) are reserved for standardization in this or future versions of this specification.
3.1 Start-Tags, End-Tags, and Empty-Element Tags[Definition: The beginning of every non-empty XML element is marked by a start-tag. ] Start-tag [40] STag ::= '<' Name (S Attribute) [WFC: Unique Att Spec] * S? '>' [41] Attribute ::= Name Eq AttValue [VC: Attribute Value Type] [WFC: No External Entity References] [WFC: No < in Attribute Values] The Name in the start- and end-tags gives the element's type. [Definition: The Name-AttValue pairs are referred to as the attribute specifications of the element], [Definition: with the Name in each pair referred to as the attribute name ] and [Definition: the content of the AttValue (the text between the ' or " delimiters) as the attribute value. ] Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.
An example of a start-tag: <termdef id="dt-dog" term="dog"> [Definition: The end of every element that begins with a start-tag must be marked by an end-tag containing a name that echoes the element's type as given in the start-tag:] End-tag [42] ETag ::= '</' Name S? '>' An example of an end-tag: </termdef> [Definition: The text between the start-tag and end-tag is called the element's content: ] Content of Elements [43] content ::= CharData? ((element Reference CDSect PI Comment) /* */ CharData?) [Definition: An element with no content is said to be empty. ] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. [Definition: An empty-element tag takes a special form:] Tags for Empty Elements [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [WFC: Unique Att Spec] Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag should be used, and should only be used, for elements which are declared EMPTY. Examples of empty elements: <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/> 3.2 Element Type DeclarationsThe element structure of an XML document may, for validation purposes, be constrained using element type and attribute-list declarations. An element type declaration constrains the element's content. Element type declarations often constrain which element types can appear as children of the element. At user option, an XML processor may issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error. [Definition: An element type declaration takes the form:] Element Type Declaration [45] elementdecl ::= '<!ELEMENT' S Name S [VC: Unique Element contentspec S? '>' Type Declaration] [46] contentspec ::= 'EMPTY' 'ANY' Mixed children where the Name gives the element type being declared.
Examples of element type declarations: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATAemph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY> 3.2.1 Element Content[Definition: An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S).][Definition: In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear.] The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles: Element-content Models [47] children ::= (choice seq) ('?' '*' '+')? [48] cp ::= (Name choice seq) ('?' '*' '+')? [49] choice ::= '(' S? cp ( S? '' S? cp )+ S? ')' /* */ /* */ [VC: Proper Group/PE Nesting] [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' /* */ [VC: Proper Group/PE Nesting] where each Name is the type of an element which may appear as a child. Any content particle in a choice list may appear in the element content at the location where the choice list appears in the grammar; content particles occurring in a sequence list must each appear in the element content in the order given in the list. The optional character following a name or list governs whether the element or the content particles in the list may occur one or more (+), zero or more (*), or zero or one times (?). The absence of such an operator means that the element or content particle must appear exactly once. This syntax and meaning are identical to those used in the productions in this specification. The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see E Deterministic Content Models.
Examples of element-content models: <!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p list note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; %dict.mix;)*> 3.2.2 Mixed Content[Definition: An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements.] In this case, the types of the child elements may be constrained, but not their order or their number of occurrences: Mixed-content Declaration [51] Mixed ::= '(' S? '#PCDATA' (S? '' S? Name) * S? ')*' '(' S? '#PCDATA' S? ')' [VC: Proper Group/PE Nesting] [VC: No Duplicate Types] where the Names give the types of elements that may appear as children. The keyword #PCDATA derives historically from the term "parsed character data."
Examples of mixed content declarations: <!ELEMENT p (#PCDATAaulbiem)*> <!ELEMENT p (#PCDATA %font; %phrase; %special; %form;)* > <!ELEMENT b (#PCDATA)> 3.3 Attribute-List DeclarationsAttributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in 3 .1 Start-Tags, End-Tags, and Empty-Element Tags. Attribute-list declarations may be used:
[Definition: Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type:] Attribute-list Declaration [52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>' [53] AttDef ::= S Name S AttType S DefaultDecl The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute. When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name in an attribute-list declaration, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error. 3.3.1 Attribute TypesXML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3 Attribute-List Declarations Attribute Types [54] AttType ::= StringType TokenizedType EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' [VC: ID] [VC: One ID per Element Type] [VC: ID Attribute Default] 'IDREF' [VC: IDREF] 'IDREFS' [VC: IDREF] 'ENTITY' [VC: Entity Name] 'ENTITIES' [VC: Entity Name] 'NMTOKEN' [VC: Name Token] 'NMTOKENS' [VC: Name Token]
[Definition: Enumerated attributes can take one of a list of values provided in the declaration]. There are two kinds of enumerated types: Enumerated Attribute Types [57] EnumeratedType ::= NotationType Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name [VC: Notation (S? '' S? Name)* S? ')' Attributes] [VC: One Notation Per Element Type] [VC: No Notation on Empty Element] [59] Enumeration ::= '(' S? Nmtoken (S? '' [VC: Enumeration] S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.
For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type. 3.3.2 Attribute DefaultsAn attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults [60] DefaultDecl ::= '#REQUIRED' '#IMPLIED' (('#FIXED' S)? AttValue) [VC: Required Attribute] [VC: Attribute Default Legal] [WFC: No < in Attribute Values] [VC: Fixed Attribute Default] In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. [Definition: If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.]
Examples of attribute-list declarations: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bulletsorderedglossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST"> 3.3.3 Attribute-Value NormalizationBefore the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.
If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character. Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed , the white space character is replaced with a space character (#x20) in the normalized value. All attributes for which no declaration has been read should be treated by a non-validating processor as if declared CDATA. Following are examples of attribute normalization. Given the following declarations: <!ENTITY d "
"> <!ENTITY a "
"> <!ENTITY da "
"> the attribute specifications in the left column below would be normalized to the character sequences of the middle column if the attribute a is declared NMTOKENS and to those of the right columns if a is declared CDATA.
Note that the last example is invalid (but well- formed ) if a is declared to be of type NMTOKENS. 3.4 Conditional Sections[Definition: Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them.] Conditional Section [61] conditionalSect ::= includeSect ignoreSect [62] includeSect ::= '<![' S? 'INCLUDE' S? '[' /* */ extSubsetDecl ']]>' [VC: Proper Conditional Section/PE Nesting] [63] ignoreSect ::= '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' /* */ [VC: Proper Conditional Section/PE Nesting] [64] ignoreSectContents ::= Ignore ('<![' ignoreSectContents ']]>' Ignore)* [65] Ignore ::= Char* - (Char* ('<![' ']]>') Char*)
Like the internal and external DTD subsets , a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space. If the keyword of the conditional section is INCLUDE, then the contents of the conditional section are part of the DTD. If the keyword of the conditional section is IGNORE, then the contents of the conditional section are not logically part of the DTD. If a conditional section with a keyword of INCLUDE occurs within a larger conditional section with a keyword of IGNORE, both the outer and the inner conditional sections are ignored. The contents of an ignored conditional section are parsed by ignoring all characters after the " [ " following the keyword, except conditional section starts " <![ " and ends " ]] >", until the matching conditional section end is found. Parameter entity references are not recognized in this process. If the keyword of the conditional section is a parameter-entity reference, the parameter entity must be replaced by its content before the processor decides whether to include or ignore the conditional section. An example: <!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]> |