The MessageDigest Class | Cryptographic Streams

The java.security.MessageDigest class is an abstract class that represents a hash code and its associated algorithm. Concrete subclasses (actually concrete subclasses of java.security.MessageDigestSPI, though the difference isn't relevant from a client's point of view) implement particular, professionally designed, well-known hash code algorithms. Thus, rather than constructing instances of this class directly, you ask the static MessageDigest.getInstance( ) factory method to provide an implementation of an algorithm with a particular name. Table 12-1 lists the standard names for several message digest algorithms. A stock installation of the JDK won't have all of these, but you can install more providers that support additional algorithms.

Table 12-1. Message digest algorithms
Name	Algorithm
SHA-1	Produces 20-byte digests; suitable for documents of less than 264 bits; recently compromised.
SHA	In Java, this is an alias for SHA-1. In other contexts, it refers to SHA-0, a compromised and withdrawn standard Java has never supported. It sometimes also refers to the whole family of Secure Hash Algorithms as defined in Secure Hash Standard, NIST FIPS 180-2 Secure Hash Standard (SHS); http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf.
SHA-256	Produces 32-byte digests; suitable for documents of less than 264 bits.
SHA-384	Produces 48-byte digests; suitable for documents of less than 2128bits.
SHA-512	Produces 64-byte digests; suitable for documents of less than 2128bits.
MD2	RSA Message Digest 2 as defined in RFC 1319 and RFC 1423 (RFC 1423 corrects a mistake in RFC 1319); produces 16-byte digests; suitable for use with digital signatures.
MD5	RSA Message Digest 5 as defined in RFC 1321; produces 16-byte digests; quite fast on 32-bit machines.
RipeMD160	RACE Integrity Primitives Evaluation Message Digest; produces 20-byte digests; designed in the open unlike the NSA-designed SHA formats.
Tiger	An algorithm invented by Ross Anderson and Eli Biham for efficiency on 64-bit platforms; produces 24-byte digests; used on the Gnutella file sharing network.
Whirlpool	An unpatented algorithm designed by Vincent Rijmen and Paulo S. L. M. Barreto; produces 64-byte digests; suitable for documents of less than 2256 bits.

SHA-1 is a least common denominator available pretty much anywhere the MessageDigest class is. It's also available in many other non-Java software packages and used by numerous protocols including PGP, SSL, SSH, IPSec, and BitTorrent. It's also used in various proprietary systems such as the Microsoft XBox.

The last couple of years have seen a flurry of attacks on hash functions. New, more practical attacks seem to be published every few months; and as an old NSA saying goes, "Attacks always get better; they never get worse." SHA-1, MD2, MD4, MD5, RIPEMD-160, and related algorithms are weakening by the month. New protocols and applications should use one of the more recent, more secure algorithms such as SHA-256, SHA-512, or Whirlpool. If the attacks improve, some of the older protocols that depend on SHA-1 may need to be revised or replaced as well.

12.2.1. Calculating Message Digests

There are four steps to calculating a hash code for a file or other sequential set of bytes with a MessageDigest object; Figure 12-1 shows a flowchart for this process.

Figure 12-1. The four steps to calculating a message digest

Pass the name of the algorithm to the static MessageDigest.getInstance( ) factory method to get a new MessageDigest object.
Feed bytes into the update( ) method.
If more data remains, repeat step 2.
Invoke a digest( ) method to complete computation of the digest and return it as an array of bytes.

Once the digest( ) method has been invoked, the digest is reset. You can begin again at step 1 to calculate a new digest, but you cannot update the digest you've already created.

Example 12-1, URLDigest, is a simple program that uses the MessageDigest class to calculate the SHA-1 hash for a web page named on the command line. The main( ) method gets the input stream from a URL as discussed in Chapter 5 and passes it to printDigest( ). The printDigest( ) method gets an SHA MessageDigest object named sha with the getInstance( ) factory method. It then repeatedly reads data from the input stream. All bytes read are passed to sha.update( ). When the stream is exhausted, the sha.digest( ) method is called; it returns the SHA hash of the URL as an array of bytes, which is then printed.

Example 12-1. URL. Digest

import java.net.*;
import java.io.*;
import java.security.*;
import java.math.*;
public class URLDigest {
 public static void main(String[] args)
 throws IOException, NoSuchAlgorithmException {
 URL u = new URL(args[0]);
 InputStream in = u.openStream( );
 MessageDigest sha = MessageDigest.getInstance("SHA");
 byte[] data = new byte[128];
 while (true) {
 int bytesRead = in.read(data);
 if (bytesRead < 0) break;
 sha.update(data, 0, bytesRead);
 }
 byte[] result = sha.digest( );
 for (int i = 0; i < result.length; i++) {
 System.out.print(result[i] + " ");
 }
 System.out.println( );
 System.out.println(new BigInteger(result));
 }
}

Here's a sample run. The digest is shown both as a list of bytes and as one very long integer. The java.math.BigInteger class converts the bytes to a decimal integer. This class was added to the core API precisely to support cryptography, where arithmetic with very large numbers is common.

$ java URLDigest http://www.oreilly.com/
54 9 -70 68 64 109 58 -80 -52 36 -69 51 -13 -90 40 -75 -114 78 59 76
308502434441296110421463252179020572520338045772

This output doesn't really mean anything to a human reader. However, if you were to run the program again, you'd get a different result if the web page had changed in some way. Even a small change that would be unlikely to be noticed by a human or an HTML parserfor instance, adding an extra space to the end of one linewould be picked up by the digest. If you only want to detect significant changes, you have to first filter the insignificant data from the stream in a predictable fashion before calculating the message digest.

12.2.2. Creating Message Digests

There are no public constructors in java.security.MessageDigest. Instead, you use one of two MessageDigest.getInstance( ) factory methods to retrieve an object configured with a particular algorithm.

public static MessageDigest getInstance(String algorithm)
 throws NoSuchAlgorithmException
public static MessageDigest getInstance(String algorithm, String provider)
 throws NoSuchAlgorithmException, NoSuchProviderException

For example:

MessageDigest sha256 = MessageDigest.getInstance("SHA-256");
MessageDigest md2 = MessageDigest.getInstance("MD2", "Cryptix");

Each of these methods returns an instance of a MessageDigest subclass that's configured with the requested algorithm. These subclasses and the associated MessageDigestSPI classes that actually implement the algorithms are installed when you install a cryptographic provider.

Each provider offers a possibly redundant collection of message digest algorithms. The factory method design pattern used here allows for the possibility that a particular algorithm may be provided by different classes in different environments. For instance, the SHA-256 algorithm may be supplied by the sun.security.provider.SHA256 class on one system and by the cryptix.jce.provider.md.SHA256 class in another. Some standard algorithm names are listed in Table 12-1. If you request an algorithm that none of the installed providers can supply, getInstance( ) throws a NoSuchAlgorithmException. Most of the time, you're content to simply request an algorithm and let any provider that can fulfill your request provide it. However, if you want to specify a particular provider by name (for instance, because it has an especially fast native-code implementation of the algorithm), you can pass the provider name as the second argument to MessageDigest.getInstance( ). If the provider you request isn't found, getInstance( ) throws a NoSuchProviderException.

12.2.3. Feeding Data to the Digest

Once you have a MessageDigest object, you digest the data by passing bytes into one of three update( ) methods. If you're digesting some other form of data, such as Unicode text, you must first convert that data to bytes.

public void update(byte input)
public void update(byte[] input)
public void update(byte[] input, int offset, int length)
public final void update(ByteBuffer input) // Java 5

For example:

byte[] data = new byte[128];
int bytesRead = in.read(data);
sha.update(data, 0, bytesRead);

The first update( ) method takes a single byte as an argument. The second method takes an entire array of bytes. The third method takes the subarray of input beginning at offset and continuing for length bytes. All the bytes remaining in the buffer are digested.

In Java 5 and later, you can pass a ByteBuffer instead of an array of bytes. For the moment, you can think of a ByteBuffer as an object-oriented wrapper around an array that also keeps track of the current position within the array. We'll explore byte buffers in a couple of chapters. For now, the methods that operate directly on byte arrays will suffice.

You can call update( ) repeatedly as long as you have more data to feed it. Example 12-1 passed in bytes as they were read from the input stream. The only restriction is that the bytes should not be reordered between calls to update( ).

12.2.4. Finishing the Digest

Digest algorithms cannot finish the calculation and return the digest until the last byte is received. When you are ready to finish the calculation and receive the digest, you invoke one of three overloaded digest( ) methods:

public byte[] digest( )
public byte[] digest(byte[] input)
public int digest(byte[] output, int offset, int length)
 throws DigestException

The first digest( ) method simply returns the digest as an array of bytes based on the data that was already passed in through update( ). For example:

byte[] result = sha.digest( );

The second digest( ) method receives one last chunk of data in the input array, then returns the digest. The third digest( ) method calculates the digest and places it in the array output beginning at offset and continuing for at most length bytes, then returns the number of bytes in the digest. If the digest has more than length bytes, this variant throws a DigestException. After you've called digest( ), the MessageDigest object is reset so it can be reused to calculate a new digest.

12.2.5. Reusing Digests

Creating a new message digest with MessageDigest.getInstance( ) carries some overhead. Therefore, when calculating digests for many different streams with the same algorithm, you should reset the digest and reuse it. The reset( ) method accomplishes this:

public void reset( )

In practice, you rarely call reset( ) directly because the digest( ) method invokes the reset( ) method after it's through. Once you've reset a message digest, all information you had previously passed into it through update( ) is lost.

12.2.6. Comparing Digests

It's not all that hard to loop through two byte arrays to see whether or not they're equal. Nonetheless, if you do have two MessageDigest objects, the MessageDigest class does provide the simple static method MessageDigest.isEqual( ) that does the work for you. As you certainly expect, this method returns true if the two byte arrays are byte-for-byte identical or false otherwise.

public static boolean isEqual(byte[] digest1, byte[] digest2)

A little surprisingly, MessageDigest does not override equals( ). Therefore, md1.equals(md2) returns true if and only if md1 and md2 are both references to the same MessageDigest object.

Example 12-2 uses this method to compare the byte arrays returned by two MD5 digests, one for an original web page and one for a mirror copy of the page. The URLs to compare are passed in from the command line. It would not be hard to expand this to a general program that automatically checked a list of mirror sites to determine whether they needed to be updated.

Example 12-2. TrueMirror

import java.net.*;
import java.io.*;
import java.security.*;
public class TrueMirror {
 public static void main(String[] args)
 throws IOException, NoSuchAlgorithmException {

 if (args.length != 2) {
 System.err.println("Usage: java TrueMirror url1 url2");
 return;
 }
 URL source = new URL(args[0]);
 URL mirror = new URL(args[1]);
 byte[] sourceDigest = getDigestFromURL(source);
 byte[] mirrorDigest = getDigestFromURL(mirror);
 if (MessageDigest.isEqual(sourceDigest, mirrorDigest)) {
 System.out.println(mirror + " is up to date");
 }
 else {
 System.out.println(mirror + " needs to be updated");
 }
 }
 public static byte[] getDigestFromURL(URL u)
 throws IOException, NoSuchAlgorithmException {
 MessageDigest md5 = MessageDigest.getInstance("MD5");
 InputStream in = u.openStream( );
 byte[] data = new byte[128];
 while (true) {
 int bytesRead = in.read(data);
 if (bytesRead < 0) { // end of stream
 break;
 }
 md5.update(data, 0, bytesRead);
 }
 return md5.digest( );
 }
}

Here's a sample run:

$ java TrueMirror http://www.cafeaulait.org/ http://www.ibiblio.org/javafaq/
http://www.ibiblio.org/javafaq/ is up to date

12.2.7. Accessor Methods

The MessageDigest class contains three getter methods that return information about the digest:

public final Provider getProvider( )
public final String getAlgorithm( )
public final int getDigestLength( )

The getProvider( ) method returns a reference to the instance of java.security.Provider that provided this MessageDigest implementation. The getAlgorithm( ) method returns a string containing the name of the digest algorithm as given in Table 12-1 for example, "SHA" or "MD2". Finally, getdigestLength( ) returns the length of the digest in bytes. Digest algorithms usually have fixed lengths. For instance, SHA-1 digests are always 20 bytes long. However, this method allows for the possibility of variable length digests. It returns 0 if the length of the digest is not yet available.

Basic I/O

Introducing I/O

Output Streams

Input Streams

Data Sources

File Streams

Network Streams

Filter Streams

Print Streams

Data Streams

Streams in Memory

Compressing Streams

JAR Archives

Cryptographic Streams

Object Serialization

New I/O

Buffers

Channels

Nonblocking I/O

The File System

Working with Files

File Dialogs and Choosers

Text

Character Sets and Unicode

Readers and Writers

Formatted I/O with java.text

Devices

The Java Communications API

USB

The J2ME Generic Connection Framework

Bluetooth

Character Sets

Character Sets