Chapter 9. XML Canonicalization: The Key to Robustness

What is canonicalization? It is the extraction of the "standard form" of some data and the discarding of "insignificant" aspects of the data's surface representations, usually by restricting all surface representation choices to a single option. For example, ordinary ASCII text files appear on modern computers with a variety of conventions to indicate end-of-line. If you want to calculate a signature over such a file and then verify it when the file moves to a different platform with a different end-of-line convention, the signer and verifier need to use the canonicalized file with a standard end-of-line.

In principle, the standard form of data and the aspects considered insignificant depend on the particular application. However, for most types of data, such as ASCII text or XML, a standard canonicalization, perhaps with a few variations, tends to become widespread. Over time, many applications for which canonicalization is important adopt that standard. This chapter and the official documents [Canon, Exclusive] describe the canonicalizations that have been adopted so far for XML.

Getting just the right canonicalization is one of the most important, yet trickiest aspects of digital authentication. Proper canonicalization of the data being signed is essential in all nontrivial cases to ensure robust and secure signatures. Less commonly known is the fact that it can also make encryption more useful. Thus, canonicalization of XML data is a significant consideration for any type of signature over or encryption of such XML data (see Sections 9.1 and 9.2), including the XML Digital Signature and XML Encryption standards.

Two primary versions of XML canonicalization exist, both of which produce as output a "printing" or "serialization" of the XML in the UTF-8 character encoding. The first, inclusive version, known as Canonical XML [Canon], incorporates into the output the XML context of the subdocument being canonicalized. The second version, known as Exclusive XML Canonicalization [Exclusive], excludes the XML context of the subdocument from the output, as far as it practically can.

Section 9.3 describes the standardized canonicalizations of XML viewed as the transformation of arbitrary, well-formed XML into XML output. Sections 9.4 and 9.5 give the full formal specifications of the XML canonicalizations, and Section 9.6 describes their limitations.

Figure 9-1 shows how later chapters depend on this chapter.

Figure 9-1. Chapter dependencies after the basics



The Canonical XML specification originated in the W3C Core XML Group. Later, it was discovered that XMLDSIG was the most vitally concerned user of Canonical XML. As a result, the canonicalization effort was transferred to the XMLDSIG Working Group where that specification [Canon], which is now a full W3C Recommendation, was completed under the authorship of John Boyer. (John works in Victoria, British Columbia, and one of the XMLDSIG meetings occurred there. This meeting included high tea at the Empress Hotel, which is highly recommended.)

Actual experience showed that, all too often, when signed XML moved from one document to another, signatures broke due to changes in XML context. This problem arose frequently in protocol applications including use of SOAP. To solve this problem, Exclusive XML Canonicalization was designed to minimize the dependence of a canonical form on context [Exclusive]. The XMLDSIG Working Group carried out the work on exclusive canonicalization.

Secure XML(c) The New Syntax for Signatures and Encryption
Secure XML: The New Syntax for Signatures and Encryption
ISBN: 0201756056
EAN: 2147483647
Year: 2005
Pages: 186 © 2008-2017.
If you may any questions please contact us: