What is canonicalization? It is the extraction of the "standard form" of some data and the discarding of "insignificant" aspects of the data's surface representations, usually by restricting all surface representation choices to a single option. For example, ordinary ASCII text files appear on modern computers with a variety of conventions to indicate end-of-line. If you want to calculate a signature over such a file and then verify it when the file moves to a different platform with a different end-of-line convention, the signer and verifier need to use the canonicalized file with a standard end-of-line.
In principle, the standard form of data and the aspects considered insignificant depend on the particular application. However, for most types of data, such as ASCII text or XML, a standard canonicalization, perhaps with a few variations, tends to become widespread. Over time, many applications for which canonicalization is important adopt that standard. This chapter and the official documents [Canon, Exclusive] describe the canonicalizations that have been adopted so far for XML.
Getting just the right canonicalization is one of the most important, yet trickiest aspects of digital authentication. Proper canonicalization of the data being signed is essential in all nontrivial cases to ensure robust and secure signatures. Less commonly known is the fact that it can also make encryption more useful. Thus, canonicalization of XML data is a significant consideration for any type of signature over or encryption of such XML data (see Sections 9.1 and 9.2), including the XML Digital Signature and XML Encryption standards.
Two primary versions of XML canonicalization exist, both of which produce as output a "printing" or "serialization" of the XML in the UTF-8 character encoding. The first, inclusive version, known as Canonical XML [Canon], incorporates into the output the XML context of the subdocument being canonicalized. The second version, known as Exclusive XML Canonicalization [Exclusive], excludes the XML context of the subdocument from the output, as far as it practically can.
Section 9.3 describes the standardized canonicalizations of XML viewed as the transformation of arbitrary, well-formed XML into XML output. Sections 9.4 and 9.5 give the full formal specifications of the XML canonicalizations, and Section 9.6 describes their limitations.
Figure 9-1 shows how later chapters depend on this chapter.
Figure 9-1. Chapter dependencies after the basics
Actual experience showed that, all too often, when signed XML moved from one document to another, signatures broke due to changes in XML context. This problem arose frequently in protocol applications including use of SOAP. To solve this problem, Exclusive XML Canonicalization was designed to minimize the dependence of a canonical form on context [Exclusive]. The XMLDSIG Working Group carried out the work on exclusive canonicalization.