The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lower case letter. Literal strings are quoted. Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters : #xN where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML. [a-zA-Z], [#xN-#xN] matches any Char with a value in the range(s) indicated (inclusive). [abc], [#xN#xN#xN] matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets. [^a-z], [^#xN-#xN] matches any Char with a value outside the range indicated. [^abc], [^#xN#xN#xN] matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets. "string" matches a literal string matching that given inside the double quotes. 'string' matches a literal string matching that given inside the single quotes. These symbols may be combined to match more complex patterns as follows , where A and B represent simple expressions: (expression) expression is treated as a unit and may be combined as described in this list. A? matches A or nothing; optional A . A B matches A followed by B . This operator has higher precedence than alternation ; thus A B C D is identical to (A B) (C D) . A B matches A or B but not both. A - B matches any string that matches A but does not match B . A+ matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+ B+ is identical to (A+) (B+) . A* matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A* B* is identical to (A*) (B*) . Other notations used in the productions are: /* ... */ comment. [ wfc: ... ] well- formedness constraint; this identifies by name a constraint on well- formed documents associated with a production. [ vc: ... ] validity constraint; this identifies by name a constraint on valid documents associated with a production. |