The Fundamentals of Cryptography | Designing Secure Web-Based Applications for Microsoft Windows 2000 with CDROM

[Previous] [Next]

Cryptography is a branch of computer science concerned with protecting information. Aspects of data protection include making data private, verifying the authenticity of data, and verifying that data hasn't been tampered with.

Data Privacy

Data privacy is usually provided by using symmetric encryption, in which two or more parties share a secret, usually referred to as a key. An encryption algorithm, also called a cipher, and the key are used to encrypt the data. The data is then sent to the recipient, who uses the same encryption algorithm and the same key to decrypt the data. As long as the key is kept secret, the data remains private. If the key is compromised, an attacker can decrypt the data with ease.

The term symmetric refers to the requirement that the same key be used for encryption and decryption. Symmetric key systems are often referred to as private key systems because the keys used must remain private. Figure 15-1 shows an example of symmetric key cryptography.

A Cryptographic Maxim
Keep private keys private.

As a colleague once said, the value of a secret key is inversely proportional to its accessibility. Put another way, a secret shared by many people is no longer a secret.

Figure 15-1. Symmetric key cryptography.

The ability of a symmetric system to withstand attack comes from the strength of the algorithm and the size of the key. Some encryption systems are strong, others aren't. By strong, we mean that the system is well designed, secure, and has withstood all previous attacks. Also, any vulnerability in the algorithm is known.

The size of the key is measured in bits; some cryptographic algorithms have 40-bit keys, some have 56-bit keys, some have 128-bit keys, and still others support variable-size keys. It takes longer to perform an exhaustive check of all possible keys if the key size is large. The number of keys in a symmetric cipher, also called the keyspace, is 2^x, where x is the key size. Using the same encryption algorithm, it takes longer to check all possible keys in a 56-bit key than it does in a 40-bit key—65,536 times longer, to be precise, because 56 bits (2⁵⁶) is 65,536 times larger than 40 bits (2⁴⁰).

Of course, an exhaustive key search might not be necessary if the cipher itself is weak.

What Is an Exhaustive Key Search?
An exhaustive key search is simply a test of every possible key available to an encryption algorithm. For example, let's say you receive a message and you know it was encrypted using a 56-bit cipher such as the Data Encryption Standard (DES). You don't know the key used to encrypt the message, so you muster all the computing power available to you and test all the keys—from 0 through 2⁵⁶-1 or zero through 72,057,594,037,927,899 or, in hexadecimal, 0x0 through 0xFFFFFFFFFFFFFF. This is an example of an exhaustive key search.

To put key size into perspective, Table 15-1 shows various key sizes along with the corresponding number of keys in the keyspace and the time it takes to check all keys at 1.6 million keys per second and at 10 billion keys per second. We chose 1.6 million keys per second because this is the approximate speed at which a 600-MHz Pentium III can test RC5 keys. (RC5 is a popular symmetric cipher developed by Ron Rivest of RSA Data Security.)

Table 15-1. Key sizes and the time required to test all keys in the keyspace.

Key Size (x)	Number of Keys (2^x)	Time to Check All Keys (at 1.6 Million Keys per Second)	Time to Check All Keys (at 10 Billion Keys per Second)
40	1,099,511,627,776	8 days	109 seconds
56	72,057,594,037,927,900	1,427 years	83 days
64	18,446,744,073,709,600,000	365,338 years	58.5 years
128	3.40282E+38	6.73931E+24 years	1.07829E+21 years

As you can see, a 40-bit symmetric key is utterly insecure. Even a lowly PC can break a 40-bit encryption system in a matter of a few days. A 56-bit system can be broken in days by a moderate to large amount of CPU power.

Remember that each extra bit in the key size doubles the keyspace. A 41-bit key has twice as many keys as a 40-bit key.

Key-cracking competitions

Over the last few years, a number of cryptographic vendors have created contests in which they encrypt a message with a private key and challenge the Internet community to determine the message. Distributed.net is one group that has taken these challenges seriously and in a number of cases has successfully revealed the key using exhaustive key searches.

What makes the distributed.net work so interesting is that they've cracked the keys using thousands of small computers rather than "Big Iron" machines such as mainframe computers. This shows that if small-key cryptographic algorithms are susceptible to attack from small computers, imagine how vulnerable they are to large computers!

The process works like this: Each computer is assigned a small portion of the keyspace to check. If it doesn't find a probable key among its assigned chunk of keys, it informs a central key server that it hasn't found the key and downloads another set of keys to try. This process continues until the key is found.

For more information on key-cracking attempts, go to http://www.distributed.net.

How do you know when you've found the key?

This is an interesting question: if you're performing an exhaustive key search, how do you know you have the right key? In some cases, it's easy to tell because many common file formats begin with a well-known header. For example, let's say that Alice sends Bob an encrypted photograph in Graphic Interchange Format (GIF) format. The first three letters of a GIF graphics file are G, I, and F. If the program testing all keys finds these three letters at the start of the file, chances are good that you have found the key.

A Cryptographic Sanity Check
Let's assume for a moment that the entire keyspace of an encryption algorithm can be tested in eight days at a cost of $2 million. The total cost depends on a number of factors, including the cost of the hardware and software. You must consider two issues:

The value of the data

Whether the data is relevant after the key search

For example, why would you spend $2 million testing all keys if the data is worth only $17? Worse, why would you spend eight days testing all keys if the data is stale after seven days? Let's say that you set out to determine the content of a competitor's message. After eight days, you find that the message was a secret press announcement. However, it was secret for only seven days, after which it made the headlines in all the national newspapers. All of your efforts and expenses were wasted.

Symmetric key cryptographic algorithms

Examples of symmetric key encryption systems include the following:

DES, a common 56-bit algorithm

Triple-DES (3DES), which is DES used three times with two keys to create an effective 112-bit keyspace or DES used with three keys to create a 168-bit keyspace

RC2, RC4, and RC5, which are variable key-length ciphers developed by RSA Data Security

IDEA, a 128-bit system used in Pretty Good Privacy (PGP)

Skipjack, an 80-bit cipher used by the U.S. government for sensitive but nonclassified data

In the overall scheme of cryptography systems, symmetric key systems offer reasonably quick encryption and decryption. However, there's one big problem: how do Alice and Bob agree on a key to use when they're sending secure messages to one another? We'll answer that question a little later. For the moment, let's see how Alice can send a message to Bob and be sure that the message isn't tampered with.

Another Cryptographic Sanity Check
A symmetric key can be any value that falls within the bounds of the keyspace. For example, DES has a 56-bit keyspace, so a DES key can be any value from 0 through 2⁵⁶. Passwords, on the other hand, tend to use a limited keyspace because they're normally restricted to printable characters, such as 0-9, a-z, A-Z, and punctuation so that they're easy for people to remember.

Many cryptographic solutions take a password and use it directly as the key to a cryptographic algorithm such as DES. This is much less secure than a 56-bit keyspace because the keyspace is no longer 0 to 2⁵⁶ keys in size—it's a subset of the keyspace limited to 0-9, a-z, A-Z, and punctuation, which is substantially smaller than 2⁵⁶. If an attacker knows that the key is derived from a cleartext password, she knows that she doesn't have to test the entire keyspace.

Another way to look at this problem is to consider what happens when a homeowner decides to buy strong locks for the front door and then keeps the key under the doormat. An attacker won't attempt to pick the lock; he'll first look for the key—under a flower pot, under a window ledge, or, of course, under the welcome mat.

Verifying Data Integrity

Verifying that data hasn't changed or been tampered with is an extremely important aspect of data security. If Alice places an order for 100 widgets, she wants to be sure that the order isn't changed by an attacker to an order for 150 widgets.

Cryptographic systems use hash functions, also called digest functions, to provide such integrity checks. Simply put, a hash function takes a large chunk of data and produces a much smaller digest, usually 128 bits or 160 bits in length. Digests are somewhat similar to cyclical redundancy checks (CRCs) in data communication. Their main characteristics include the following:

A small change in the originating data creates a massive change in the resulting digest.

It isn't feasible to create data that exactly matches a specific digest.

It's impossible to determine the data based on just the digest.

A good analogy for a digest is your thumbprint. A thumbprint uniquely identifies you but tells nothing about you or about what you know. It's also unlikely that you'll find someone else with a thumbprint that matches yours.

It's common practice to hash all data before sending it to a recipient and then append the hash to the original data. On receipt, the recipient recalculates the hash of the data and checks it against the appended hash. If the two are the same, the data hasn't changed. Right? No, not really! There's nothing stopping an attacker from intercepting and changing the message, recalculating the hash, and appending the new hash to the changed message. However, we'll cover this issue in detail later when we discuss digital signatures.

Figures 15-2 shows Alice sending an order to Bob, and Figure 15-3 shows Bob verifying that the message was not tampered with.

click to view at full size.

Figure 15-2. Alice sends an order to Bob.

click to view at full size.

Figure 15-3. Bob verifies that the order wasn't tampered with.

Message Authentication Code, or MAC, also uses hash functions to provide evidence when data has been tampered with, and it also verifies the authenticity of the data.

For example, Alice and Bob agree on a secret key for their conversation. When Alice sends data to Bob, she hashes the data and then hashes the secret key with the resulting hash. The function looks like this:

 MAC = HASH( HASH( data ), secret )

Upon receiving the data, Bob rehashes the document, applies the secret key to the result, and if the resulting MAC is the same as the MAC in the message, he knows that the data was not tampered with and that someone who knows the secret key sent the message.

Examples of common hash functions include

MD4, created by RSA Data Security, which yields a 128-bit digest

MD5, created by RSA Data Security, which yields a 128-bit digest

SHA-1, created by National Institute of Standards and Technology (NIST), which produces a 160-bit hash

Hashing is a quick process, even with very large quantities of data.

Verifying Authenticity (Sort Of)

Another form of data encryption uses asymmetric keys; it's often called public key encryption. Rather than having two or more parties share one key, asymmetric systems use key pairs. Each party has two mathematically related keys, one called the private key and the other called the public key. Depending on the algorithm used, the two keys are usually two very large numbers. One feature of these systems is that knowing the public key doesn't mean you can easily determine the private key, and vice versa.

As the name implies, the public key can be made public, but the private key must remain private. In fact, the private key often doesn't leave the hardware on which it was generated.

Have Keys, Will Travel
It's imperative that the private keys in an asymmetric or public key system remain private. The foundation of a public key infrastructure (PKI) is the protection of private keys. However, this leads to a significant problem: what happens if users need access to their private keys from multiple workstations? There are a number of solutions, some convenient, some secure:

Store your private keys on each workstation. This leads to two problems: how to securely transfer the keys to each workstation and how to keep others from accessing the keys after the user logs off the workstation.

Restrict users to using only one workstation. This is difficult to enforce, especially in a shared-workstation scenario.

Store the keys on hardware devices such as PCMCIA cards or smartcards. This is a reasonable solution and is well supported by Windows 2000. In this scenario, the user need not store private key information on a computer. When a user wants to use another system, she must provide her hardware token in order to use her private keys. Fortezza PCMCIA cards are supported by Internet Information Services (IIS) 5 and Microsoft Internet Explorer 5; see the IIS documentation for further details about Fortezza. Smartcards are natively supported by Windows 2000, IIS 5, and Internet Explorer 5.

Public key cryptography also has the following important attributes:

Data encrypted with the public key can be decrypted only with the private key.

Data encrypted with the private key can be decrypted only with the public key.

If these two statements haven't sunk in, read them again!

If Alice receives a message from Bob and the message can be decrypted using Bob's public key, as shown in Figure 15-4, it must have been encrypted using Bob's private key and hence must have come from Bob because only Bob has access to Bob's private key. We've just proved the authenticity of a message, right? Actually, we haven't. How do we know that Bob's public key was used in the first place? We're on the right track, though. We'll explain how you can verify Bob's public key when we discuss certificates.

click to view at full size.

Figure 15-4. Bob sends a message to Alice in such a way that Alice knows it could have come only from Bob.

If Alice wants to send a private message to Bob (a message that only Bob can read), as shown in Figure 15-5, all she has to do is use Bob's public key (it's public, remember) to encrypt the message. Because only Bob has access to his private key and only the private key can decrypt the message, only Bob can decrypt the message. This is a way of sending a private message to Bob. (Actually, it isn't. How do we know it's Bob's public key? As just noted, however, we're on the right track. More about this later when we cover certificates.)

Two Simple Rules of Thumb
If you have a private key, you can send authenticated messages. If you have someone else's public key, you can send them encrypted messages.

click to view at full size.

Figure 15-5. Alice sends a message to Bob that only he can read.

The most famous example of a public key encryption system is RSA, named after the three inventors of the algorithm, Rivest, Shamir, and Adleman. One downside of most public-key cryptography systems is they're extremely slow, so it's impractical to encrypt and decrypt large amounts of data.

Digital Signatures and Signing

Signing is a process that combines two cryptographic technologies—public key (asymmetric) encryption and hashing. Signing is a simple procedure. Alice's software hashes the document, such as an e-mail message, and then encrypts the resulting digest with her private key. The resulting signature is appended to the message. When the recipient, Bob, receives the e-mail, his software hashes the document and uses Alice's public key to decrypt the signature, as shown in Figure 15-6. If the two hashes match, we know two things:

The document hasn't been tampered with. If it had, the hashes wouldn't match.

The message must have come from Alice because only her public key can decrypt the signature; hence, she had access to the private key associated with the public key.

click to view at full size.

Figure 15-6. Bob signs a document using his private key so that anyone with his public key can verify that the document came from him and was not tampered with.

Again, we must ask how Bob knows that Alice is the sender. In other words, how does he really know that the public key belongs to Alice and no one else? We'll address this issue shortly.

Key Agreement

So how do Alice and Bob agree on a key? If asymmetric cryptography weren't slow, Alice could just encrypt a message using Bob's public key. However, encrypting even a small message is excruciatingly slow. Normally, Alice and Bob agree on a symmetric key used for bulk encryption and use that key to encrypt the message. Among the many mechanisms for doing this, we'll cover one: using public key encryption for key agreement.

Alice wants to send a large secret message to Bob. To do so, she thinks up a key to use to encrypt the message—or better yet, she lets her computer derive a random number to act as the key. This key is often referred to as a session key because it's used only for the duration of the communication. She encrypts the session key using Bob's public key and sends the encrypted blob to Bob. This is a reasonably quick process because a key is usually less than 128 bits in length. His software uses his private key to decrypt the blob and derives the secret key. Alice sends her messages to Bob using the session key that Bob now knows.

We've already mentioned the flaw here: how does Bob know that Alice sent the original message containing the key and how does Alice know that she's sending the key to Bob? In other words, how do Alice and Bob know that each other's public keys are valid? That is the purpose of certificates, our next subject.