9.5 Formal Generative Specification


The formal description of XML canonicalization is to create an XPath node-set data model of the XML document or subset, as described in Section 9.4, and then generate the external XML representation according to the rules given in this section. This generative process starts at the root node and uses the UTF-8 character encoding. You can use any other processing that produces the same external sequence of octets in place of the specification provided here.

Note that the result is not guaranteed to be well-formed XML. For example, the XPath node-set could, in addition to the always-present root node, contain only text or only attribute nodes. Nevertheless, XML canonicalization is most commonly used with node-sets that do yield well-formed XML. Such node-sets are always produced, for example, when either of the standard XML canonicalization XPath expressions given in Section 9.4.1 is applied to well-formed XML.

When the output of XML canonicalization is well formed, then applying the same XML canonicalization again to that output does not change it. That is, this operation is idempotent. Such a feature is considered valuable in a method of canonicalization because, in complex processing, it is difficult to avoid the possibility of canonicalizing data more than once.

9.5.1 The Root Node

The root node is the parent of the entire document or document subset. The output is the result of processing each of the child nodes under the root node in document order and, within element start tags, processing the namespace and attribute nodes having that element as a parent, in the order described below. Processing generates no XML declaration, DTD information, or byte order mark (BOM, an artifact of [Unicode]).

9.5.2 Element Nodes

If the element appears in the node-set, then output the following items in the order given here. If the element is not present in the node-set, then omit items 1, 2, 5, 7, 8, and 9 and output items 3, 4, and 6.

  1. An open angle bracket ("<")

  2. The element name (with the namespace prefix, if one is present)

  3. The result of processing the Element Node Namespace Axis section

  4. The result of processing the Element Node Attribute Axis section

  5. A close angle bracket (">")

  6. The result of processing the child nodes of the element that appear in the node-set (in document order)

  7. An open angle bracket and forward slash ("</")

  8. The element name (with the namespace prefix, if one is present)

  9. A close angle bracket (">")

Element Node Namespace Axis

Process each node that appears in the element's namespace axis in alphabetical order by prefix, as described in Section 9.4.3. The processing of each namespace node is described in Section 9.5.4.

Element Node Attribute Axis

Process each node that appears in the element's attribute axis alphabetically by URI and local part, as described in Section 9.4.3. Section 9.5.3 describes the processing of each attribute node.

9.5.3 Attribute Nodes

If the attribute node is not present in the node-set, output nothing. If the attribute node appears in the node-set, output the following items in the order given below. Namespace declarations are not considered to be attributes; Section 9.5.4 describes their handling. Special considerations apply to attributes in the xml Namespace.

  1. A space (x20) character

  2. The attribute name (with the namespace prefix, if one is present)

  3. The two-character sequence equals sign, double quote ("="")

  4. The XPath string value of the node modified as described below

  5. A double quote (""")

Modify the string value in item 4 with the following substitutions:

  • Replace ampersand ("&") with the five-byte character reference for ampersand ("&amp;").

  • Replace open angle bracket with the four-byte character reference for open angle bracket (&lt;).

  • Replace double quote (""") with the six-byte character reference for double quote ("&quot;").

  • Replace horizontal tab (x09) with the five-byte character reference for horizontal tab ("&#x9;").

  • Replace new line (x0A) with the five-byte character reference for new line ("&#xA;").

  • Replace carriage return (x0D) with the five-byte character reference for carriage return ("&#xD;").

Special Handling of Attributes in the xml Namespace

The preceding description is accurate for Exclusive XML Canonicalization. In Canonical XML, however, we want to include ancestor environment characteristics that might affect the XML being canonicalized. For this reason, Canonical XML imports to each apex element (i.e., each element that is a child of the root node) all ancestor xml namespace attributes (e.g., xml:lang, xml:space, or xml:base) that are in scope and do not already appear at that apex element. Such imported attributes are output as described earlier in this section for other attributes.

9.5.4 Namespace Nodes

If a namespace node is not part of the node-set, output nothing. If it is part of the node-set, however, special criteria are applied to determine whether outputting the namespace node will have an effect. Nothing is output if the namespace node would have no effect.

Keep in mind that on input, XPath propagates a namespace declaration down to all of its descendants, creating namespace nodes for each one, until it reaches a node where the same prefix is given a different namespace value. (The same thing happens with the default namespace it is just like a namespace assigned to the null prefix except that when you use the null prefix, and the colon is omitted.)

XML canonicalization considers a namespace node to have no effect at the current node if it meets any of the following criteria:

  • It is a duplicate namespace declaration (same prefix and URI) of a namespace declaration at the first ancestor element in the node-set above the current node.

  • It is a declaration that the default namespace is a null URI and the current node is the apex output node.

  • It is a declaration of the "xml" prefix, as that should always be "http://www.w3.org/XML/1998/namespace".

If a namespace node does have effect and appears in the node-set, it is output in the same way as an attribute node described in Section 9.5.3, with the following exceptions:

  • The namespace prefix fills the role of an attribute name.

  • "xmlns:" fills the role of a namespace prefix.

  • For the default namespace, "xmlns" fills the role of an attribute name with no prefix.

Which namespace nodes are considered to affect the output differs between Canonical XML and Exclusive XML Canonicalization as follows:

  • In Canonical XML, all namespace nodes at an apex element are considered to have an effect and appear in the node-set unless they are a declaration of the xml prefix or of a null default namespace URI.

  • In Exclusive XML Canonicalization, namespace nodes are considered to have an effect and appear in the node-set at any element where their namespace prefix is visible in the element name or the name of an attri bute, unless they are a declaration of the xml prefix or of a null default namespace URI at an apex. Occurrences of the same namespace declaration on ancestor nodes have no effect unless they are accompanied by visible use of the prefix.

  • Namespace nodes are treated as specified in Canonical XML, rather than Exclusive XML Canonicalization, if their prefixes appear on the inclusive namespace prefix list parameter to Exclusive XML Canonicalization.

9.5.5 Text Nodes

If the text node is not part of the node-set, output nothing. If it appears in the node-set, output the XPath string value of the text node with the following substitutions:

  • Replace ampersand ("&") with the five-byte character reference for ampersand ("&amp;").

  • Replace open angle bracket ("<") with the four-byte character reference for open angle bracket ("&lt;").

  • Replace close angle bracket (">") with the four-byte character reference for close angle bracket ("&gt;").

  • Replace carriage return (x0D) with the five-byte character reference for carriage return ("&#xD;").

9.5.6 Processing Instruction Nodes

If the processing instruction (PI) does not appear in the node-set, output nothing. If the PI is part of the node-set, output the following in the order given:

  1. A new line character (x0A) if the PI is a child of the root node, there is at least one element node child of the root node, and the PI node appears after the last element node child of the root in document order

  2. The two-character opening PI sequence less than, question mark ("<?")

  3. The PI target name of the PI node

  4. If the string value of the PI node is not null, a space character (x20) and then that string value encoded as described in Section 9.5.5 for a text node

  5. The two-character closing PI sequence question mark, greater than ("?>")

  6. A new line character (x0A) if the PI is a child of the root node, there is at least one element node child of the root node, and the PI node appears before the first element node child of the root in document order

9.5.7 Comment Nodes

If the comment does not appear in the node-set, output nothing. If the comment is part of the node-set, output the following in the order given:

  1. A new line character (x0A) if the comment is a child of the root node, there is at least one element node child of the root node, and the comment node appears after the last element node child of the root in document order

  2. The four-character opening comment sequence of less than, exclamation point, hyphen, hyphen ("<!--")

  3. The XPath string value of the comment node encoded as described in Section 9.5.5 for a text node

  4. The three-character closing comment sequence of hyphen, hyphen, greater than ("-->")

  5. A new line character (x0A) if the comment is a child of the root node, there is at least one element node child of the root node, and the comment node appears before the first element node child of the root in document order



Secure XML(c) The New Syntax for Signatures and Encryption
Secure XML: The New Syntax for Signatures and Encryption
ISBN: 0201756056
EAN: 2147483647
Year: 2005
Pages: 186

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net