E.4 Security and Canonicalization | Secure XML: The New Syntax for Signatures and Encryption

Security is a subtle area. Some problems can be solved in a general way, and those solutions are typically incorporated into standard security syntaxes such as those for ASN.1 [RFC 2630] and XML [XMLDSIG, XMLENC]. With application-specific questions, and particularly questions of exactly what information you need to authenticate or encrypt, more complex solutions are needed.

Questions of exactly what needs to be secured and how to do so robustly are deeply entwined with canonicalization. They are somewhat different for authentication and encryption.

E.4.1 Canonicalization

Chapter 9 describes canonicalization as the transformation of the "significant" information in a message into a "standard" form, discarding "insignificant" information. For example, it might involve encoding into a standard character set or changing line endings into a standard encoding and discarding the information about the original character set or line ending encodings. Obviously, what is "standard" and what is "significant" vary with the application or protocol and can be tricky to determine. For a particular syntax, such as ASCII [ASCII], ASN.1 [ASN.1], or XML [XML], a standard canonicalization is often specified or developed through practice. This effort leads to the design of applications that assume such standard canonicalization and, in turn, reduces the need for customized, application-specific canonicalization.

PAPER: From the paper point of view, canonicalization is suspect, if not outright evil. After all, if you have a piece of paper with writing on it, you can view any modification to "standardize" its format as an unauthorized change in the original message as created by the "author." With this perspective, digital signatures are viewed as authenticating signatures or seals or time stamps on the bottom of the "piece of paper." They do not justify and should not depend on changes in the message appearing above them. Similarly, encryption is seen as just putting the "piece of paper" in a vault that only certain people can open, and does not justify any standardization or canonicalization of the message.

PROTOCOL: From the protocol point of view, a pattern of bits is calculated; processed, stored, and communicated; and finally parsed and acted on. Most of these bits have never been seen and never will be seen by a person. In fact, many of the parts of the message will be artifacts of encoding, protocol structure, and computer representation, rather than anything intended for a person to see. In theory, it might be possible to convey unchanged the "original" idiosyncratic form of any digitally signed part through the computer process, storage, and communications channels that implement the protocol and usefully signed in that form. In practical systems of any complexity, however, achieving this goal is unreasonably difficult for most parts of messages. Even if it were possible, the result would be virtually useless, as you would still have to repeatedly test the equivalence of the local message form with the preserved original form. Thus signed data must be canonicalized as part of signing and verification to compensate for insignificant changes made in processing, storage, and communication. Even if, miraculously, an initial system design avoids all cases of signed message reconstruction based on processed data or reencoding based on the character set, line ending, capitalization, numeric representation, time zones, or whatever, later protocol revisions and extensions are certain to eventually require such reconstruction and/or reencoding. As a consequence, canonicalization is clearly a necessity for protocol applications. It is just a question of which canonicalization or canonicalizations to use.

E.4.2 Digital Authentication

PAPER: The paper-oriented view on authentication tends to focus on "digital signatures" and "forms." Because they are always worried about human third parties and viewing the document in isolation, individuals taking this perspective want the "digital signature" characteristics of "non-repudiability" and similar characteristics. (See any standard reference on the subject for the special meaning of these terms in this context.) According to this point of view, you have a piece of paper or form that a person signs. Sometimes a signature covers only part of a form, but that's usually because a signature can cover only data that are already there. Normally, at least one signature covers the "whole" document/form. Thus the goal is to insert digital signatures into documents without changing the document type and even "inside" the data being signed (which requires a mechanism to skip the signature so that it does not try to sign itself). This view was well represented in the standardization of XML digital signatures resulting in provisions for enveloped signatures and the enveloped signature transform algorithm.

PROTOCOL: From a protocol-oriented view, the right kind of authentication to use whether a "digital signature" or symmetric keyed authentication code is just another engineering decision affected by questions of efficiency, desired security model, and so on. Furthermore, the concept of signing a "whole" message seems bizarre (unless it is a copy being saved for archival purposes, in which case you might be signing an entire archive at once anyway). Typical messages consist of various pieces with various destinations, sources, and security requirements. Additionally, you can't sign certain fields because they change as the message is communicated and processed, such as hop counts, routing history, or local forwarding tags. One protocol message commonly contains a mix of different kinds of authentication.

E.4.3 Canonicalization and Digital Authentication

For authenticating protocol system messages of practical complexity, you have three choices

Doing too little canonicalization and having brittle authentication, useless due to insignificant failures to verify
Doing the sometimes difficult and tricky work of selecting or designing an appropriate canonicalization or canonicalizations to be used as part of authentication generation
Doing too much canonicalization and having insecure authentication, which is useless because it still verifies even when the signed data change significantly

The only useful option is the second choice.

E.4.4 Encryption

In terms of processing, transmission, and storage, encryption turns out to be much easier than signatures to implement. Why? The output of encryption is essentially arbitrary binary information, and it is clear from the very beginning that those bits need to be transferred to the destination in some absolutely clean way that does not change even one bit. Because the encrypted bits are, by definition, meaningless to humans, the paper-oriented person has no incentive to change them to make them more "readable." For this reason, appropriate techniques of encoding at the source, such as base-64 [RFC 2045], and decoding at the destination are always incorporated to protect or "armor" the encrypted data.

While the application of canonicalization is more obvious with digital signatures, it may also apply to encryption, particularly the encryption of parts of a message. Sometimes elements of the environment containing the plain text data affect its interpretation. Consider the effects of the character encoding or bindings of dummy symbols. When the data is decrypted, the decryption may take place in an environment with different character encoding and dummy symbol bindings. With a plain text message part, it is usually clear which of these environmental elements should be incorporated in or conveyed with the message. An encrypted message part, however, is opaque. Thus you may need some canonical representation that incorporates such environmental factors.

PAPER: From the paper perspective, you think about encryption of the entire document. Because signatures are always envisioned as human assent, people with this point of view vehemently assert that encrypted data should never be signed unless you know what the plain text means.

PROTOCOL: With the protocol perspective, messages are complex, composite, multilevel structures. Some pieces of them are forwarded multiple hops. Thus the design question becomes which fields should be encrypted by which techniques to which destinations and with which canonicalization scheme. It sometimes makes perfect sense to sign encrypted data you don't understand; for example, the signature could just be for integrity protection or a time stamp, as the protocol specifies.