The formal description of XML canonicalization is to create an XPath node-set data model of the XML document or subset, as described in Section 9.4, and then generate the external XML representation according to the rules given in this section. This generative process starts at the root node and uses the UTF-8 character encoding. You can use any other processing that produces the same external sequence of octets in place of the specification provided here.
Note that the result is not guaranteed to be well-formed XML. For example, the XPath node-set could, in addition to the always-present root node, contain only text or only attribute nodes. Nevertheless, XML canonicalization is most commonly used with node-sets that do yield well-formed XML. Such node-sets are always produced, for example, when either of the standard XML canonicalization XPath expressions given in Section 9.4.1 is applied to well-formed XML.
When the output of XML canonicalization is well formed, then applying the same XML canonicalization again to that output does not change it. That is, this operation is idempotent. Such a feature is considered valuable in a method of canonicalization because, in complex processing, it is difficult to avoid the possibility of canonicalizing data more than once.
9.5.1 The Root Node
The root node is the parent of the entire document or document subset. The output is the result of processing each of the child nodes under the root node in document order and, within element start tags, processing the namespace and attribute nodes having that element as a parent, in the order described below. Processing generates no XML declaration, DTD information, or byte order mark (BOM, an artifact of [Unicode]).
9.5.2 Element Nodes
If the element appears in the node-set, then output the following items in the order given here. If the element is not present in the node-set, then omit items 1, 2, 5, 7, 8, and 9 and output items 3, 4, and 6.
Element Node Namespace Axis
Process each node that appears in the element's namespace axis in alphabetical order by prefix, as described in Section 9.4.3. The processing of each namespace node is described in Section 9.5.4.
Element Node Attribute Axis
Process each node that appears in the element's attribute axis alphabetically by URI and local part, as described in Section 9.4.3. Section 9.5.3 describes the processing of each attribute node.
9.5.3 Attribute Nodes
If the attribute node is not present in the node-set, output nothing. If the attribute node appears in the node-set, output the following items in the order given below. Namespace declarations are not considered to be attributes; Section 9.5.4 describes their handling. Special considerations apply to attributes in the xml Namespace.
Modify the string value in item 4 with the following substitutions:
Special Handling of Attributes in the xml Namespace
The preceding description is accurate for Exclusive XML Canonicalization. In Canonical XML, however, we want to include ancestor environment characteristics that might affect the XML being canonicalized. For this reason, Canonical XML imports to each apex element (i.e., each element that is a child of the root node) all ancestor xml namespace attributes (e.g., xml:lang, xml:space, or xml:base) that are in scope and do not already appear at that apex element. Such imported attributes are output as described earlier in this section for other attributes.
9.5.4 Namespace Nodes
If a namespace node is not part of the node-set, output nothing. If it is part of the node-set, however, special criteria are applied to determine whether outputting the namespace node will have an effect. Nothing is output if the namespace node would have no effect.
Keep in mind that on input, XPath propagates a namespace declaration down to all of its descendants, creating namespace nodes for each one, until it reaches a node where the same prefix is given a different namespace value. (The same thing happens with the default namespace it is just like a namespace assigned to the null prefix except that when you use the null prefix, and the colon is omitted.)
XML canonicalization considers a namespace node to have no effect at the current node if it meets any of the following criteria:
If a namespace node does have effect and appears in the node-set, it is output in the same way as an attribute node described in Section 9.5.3, with the following exceptions:
Which namespace nodes are considered to affect the output differs between Canonical XML and Exclusive XML Canonicalization as follows:
9.5.5 Text Nodes
If the text node is not part of the node-set, output nothing. If it appears in the node-set, output the XPath string value of the text node with the following substitutions:
9.5.6 Processing Instruction Nodes
If the processing instruction (PI) does not appear in the node-set, output nothing. If the PI is part of the node-set, output the following in the order given:
9.5.7 Comment Nodes
If the comment does not appear in the node-set, output nothing. If the comment is part of the node-set, output the following in the order given: