6 Notation

The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form

 symbol ::= expression 

Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lower case letter. Literal strings are quoted.

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters :

#xN

where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.

[a-zA-Z], [#xN-#xN]

matches any Char with a value in the range(s) indicated (inclusive).

[abc], [#xN#xN#xN]

matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.

[^a-z], [^#xN-#xN]

matches any Char with a value outside the range indicated.

[^abc], [^#xN#xN#xN]

matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.

"string"

matches a literal string matching that given inside the double quotes.

'string'

matches a literal string matching that given inside the single quotes.

These symbols may be combined to match more complex patterns as follows , where A and B represent simple expressions:

(expression)

expression is treated as a unit and may be combined as described in this list.

A?

matches A or nothing; optional A .

A B

matches A followed by B . This operator has higher precedence than alternation ; thus A B C D is identical to (A B) (C D) .

A B

matches A or B but not both.

A - B

matches any string that matches A but does not match B .

A+

matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+ B+ is identical to (A+) (B+) .

A*

matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A* B* is identical to (A*) (B*) .

Other notations used in the productions are:

/* ... */

comment.

[ wfc: ... ]

well- formedness constraint; this identifies by name a constraint on well- formed documents associated with a production.

[ vc: ... ]

validity constraint; this identifies by name a constraint on valid documents associated with a production.



Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net