Hashing Details and Requirements | Securing Web Services with WS-Security: Demystifying WS-Security, WS-Policy, SAML, XML Signature, and XML Encryption

< Day Day Up >

Encrypting large byte arrays is computationally expensive. You therefore have a strong need for a small and consistent size byte array created out of any arbitrary plaintext input. This way, you can limit the computation time spent performing the encryption, and many simplifications ensue downstream when the encrypted message is always of a fixed small size. However, it would be disastrous if two different messages mapped to the same exact byte array. This would mean that someone could substitute a new message for the original and fool the recipient into thinking the new fraudulent message is the correct one. It must be impossible to reverse the function, take the small output byte array, and regenerate the original message. Small fixed size, collision avoidance , and non- reversibility are characteristics of good hash functions.

Motivation for Using Hash Functions

Integrity (proof that a message has not been altered in any way) requires the use of a hash function. But functions that avoid duplicate values are exceedingly rare. The birthday paradox demonstrates this: With any random 23 people in a room, the chance that two have the same birthday is 50%. This example says that mapping a set of 365 possible input values (possible birthdays in the year) into 23 output values or keys (the people in the room) has a 50% chance of a collision into the same key.

A very common, easy-to-understand hash function is the modulo operation that gives the remainder after a division operation:

h ( K ) = K mod M

For example, the hash function f ( x ) = x mod 7 takes as input the infinite domain of integers and produces as output only the range of values 0, 1, 2, 3, 4, 5, and 6. If x were an entire message represented as an integer, this hash function would still produce only a value in the range 0..6. Unfortunately, several messages would end up producing the same output value; in other words, there would be many collisions.

Requirements for Digital Signature

Hash functions used in digital signatures must be highly collision resistant. The goal is to make finding two different inputs that produce the same output be intractable. If this were easy, an intruder would make that substitution, and the hash function would fail to provide the basic message integrity you require.

The requirements can be stated as follows for a hash function H ( M ) operating on an arbitrary-length message M that returns a fixed-length hash value h :

h = H ( M ), where h is of length m

What you require is an efficient, one-way algorithm that always generates unique results.

The idea is to hash the entire plaintext message and then to protect that hash value from being modified in any way. If the sender and receiver input the exact same message to the exact same hash algorithm and the original hash code is carefully protected during transmission, the recipient can check and verify the integrity of the message without the huge expense of trying to encrypt the entire message. That, in fact, is the basic outline of a digital signature, which is the basis for XML Signature, as you saw in Chapter 4, "Safeguarding the Identity and Integrity of XML Messages."

< Day Day Up >