Understanding the Message Digest | Java Security Solutions

A message digest (MD) is an algorithm that uses a hash function to create a digest. The digest is simply the fingerprint of the original message. The digest is used to validate that the message has not been altered . In order to check the integrity of a digest, it must be compared against the original digest, which must be trusted by the receiver as being untampered with. For instance, if the message is M, and a message digest is used (MD), a digest (D) is produced. This is illustrated in the following equation.

MD ₁ (M) ₁ = D ₁

When the message needs to be validated again at a later time, the message is hashed to a new digest. If any data is changed in the message, even by one bit, the message digest must produce a different digest as illustrated in the next equation:

MD ₂ (M) ₂ = D ₂

Now the two digests are compared, and if there is a difference between the digests, D ₂ is considered invalid or altered.

Note	In the D ₁ and D ₂ comparison, D ₁ must be trusted by the receiver as being the original digest and so it is up to the organization to keep it safe. One suggestion is to put D ₁ in an LDAP server.

Encryption and digests

Another use of the digest is that it is encrypted in a message such as SSL or X.509 to be unencrypted by a public key and checked for corruption of the data. Since the private key is needed to encrypt the digest, only the owner of the private key can generate the digest. The owner of the private key is usually the initiator of the message, so this scenario works well. Any user that has a copy can decrypt the message, but cannot encrypt the message without a private key. This private-public key scenario is an example of a key pair.

Cross-Reference

Key pairs and associated uses are discussed in Chapter 8. Chapter 24 discusses X.509, and Chapter 22 discusses SSL.

If you are familiar with Serial Communications and TCP/IP, this type of message integrity check may look familiar. In TCP/IP, there is a Cyclic Redundancy Code (CRC) to ensure that the receiver received the message in its entirety. If the receiver calculates the CRC and it doesn't match the message, the TCP/IP packet is retransmitted. The CRC code uses a 12-bit, 16-bit, or 32-bit CRC size. First, the CRC uses a polynomial calculation to sum the bits in the message into the desired bit- size CRC digest. Then, the CRC is used to detect errors in a transmission. The idea of using a digest for messages has been around for quite some time in other protocols; the algorithms have evolved over time.

Many algorithms can be used for checking the message digest, such as MD2, MD4, MD5, SHA-0, SHA-1, RIPEMD-160, Tiger, and many more. When testing the message, the tester must be aware of the algorithm that is being used. If the digest was hashed using MD5 and the message to be validated was hashed using SHA-1, then the digests is different even if the messages are the same. An organization needs to establish standards for which algorithms it uses for the MD.

Tip	Using the Java JDK 1.4 limits the MD primarily to MD5 and SHA-1, two of the most popular algorithms; these algorithms are discussed later in this chapter.

Differentiating MDs

Many characteristics are used to differentiate MDs. Each MD usually has an initialization registers set of four or five values that will be the first values used in the hash. The registers were originally optimized for 32-bit processing machines and are the values that will initialize the registers. The initialization values are important to ensure that the input data is not the firstof the initialization variables , so that even less can be known about the input data. When the algorithm is initialized, buffers need to be zeroed out. When the digest is returned, the algorithm needs to be initialized again to start a new digest. Many algorithms use temporary buffers and have the capability to add input data through an update method.

One of the characteristics of the message digest is referred to as a one-way hash . A one-way hash means that the input data cannot be recovered by looking at the digest or hash. After the initialization of the message, data can be inputted for the algorithm to compute. The data must not exceed the message digest's maximum size. The message digest breaks down the input data into blocks. Most algorithms use a 512-bit block size, but the block size is algorithm-specific. If the data input is smaller than the block size, the algorithm must pad the data to reach the correct block size. Lengths are added inside many of the blocks to contain the length of the original message. After the input data is entered and formatted to the correct block size, each block will go through the algorithm's computations .

Breaking down the algorithm

The algorithm is normally broken down into rounds and operations . The rounds are a set of like operations performed on the data block. For example, SHA-1 has four rounds, and each round has 20 steps. The step is the number of times that the data is transformed. A round is the number of completely different transformations on the data. After the data has been hashed upon, the result needs to be compressed into a digest. The compression will take the 512-bit block and put it into a 160-bit digest in SHA-1; other algorithms have different sizes. An example of the padding, initialization, and updates for SHA-1 is displayed in Figure 9-2. Many of the message digests have different values, different operations in the computation, and several other factors; but the basic flow remains the same.

Figure 9-2: The message digest process

The initial variables in the five registers in SHA-1 are variables to initialize the chaining variables. The initial variables are hashed with the input message block. The result of the hash is used as initial variables in the next input message block that will be hashed. Then the result of that hash is used next as chaining variables, and this process continues until the final phase is called by the application to change the hash into the hash digest. The hash in SHA-1 has five integer registers until the final phase, and when the entering the final phase, the hash is converted to 20 bytes.

Note

The general steps of a message digest algorithm can be described as:

Step 1: Initialization.

Step 2: Break the data input into the appropriate block size, padding if necessary.

Step 3: Append the length.

Step 4: Pass each block through the algorithm's rounds and operations.

Step 5: Compress to digest the data.