Keeping Data Secret: Cryptography, Codes, and Ciphers

Considering the extent to which we make use of various forms of cryptography in our day-to-day lives, most people display an immense lack of understanding of the implications of this use, and of the conclusions that may be drawn with respect to other encodings or encryptions that are proposed. Codes and ciphers are neither necessarily difficult nor necessarily secure. Few people realize that the alphabet that they use every day in writing is a written code substituting for spoken language. Many, however, will implicitly and comfortably trust a program or Web Site that claims that all of their personal information will be encoded to protect it from illegitimate access. If customers are willing to believe that "encoding" their information makes it secure, but aren't well-enough informed to question the validity of the claim that encoding their information actually makes it secure, what reason does an admittedly unscrupulous merchant have to actually invest in a secure encryption system? The lesson to be learned, which will be amplified upon later, is that no encoding or encryption should be considered to be secure if it hasn't been exposed to the light of public scrutiny and verified to be secure in practice. This section details several of the currently popular encoding and encryption systems used in computer security. Some of these apply directly to Mac OS X user passwords, which are covered in the next chapter because they are used in the available authentication systems. Others described here are used in other security systems you may encounter while using your computer, such as online merchant customer information systems, the Mac OS Keychain, or in encrypting files or network transmission such as email, and are interesting both by way of comparison and as a reference for these other environments.

The lack of general lay understanding of the cryptography field isn't at all aided by the fact that the language used is rather jargonistic, and the meanings of key terms are dependent on the background of the speaker and the context of the discussion. Depending on whether you ask a cryptologist or a mathematician the definition of the word "code," you'll get two different answers, and a computing security professional might give you a sideways twist on a third.

The cryptologist will tell you that a cipher is a modification of a message (not necessarily a textual message) by the algorithmic substitution of new data based on the content of the original message, and that a code is a cipher in which the substitutions are linguistic in nature. That is to say, that if you were speaking with one friend about a roommate's birthday, and needed to keep the content of your discussions private because the birthday boy might be listening, you might devise a code for discussing your preparations . You might choose to equate "cake" with "llama", "bake" with "spank", "invite" with " tickle " and "friend" with "badger." A conversation between you and your friend might come out something like this:

"Hi, Jim. Just checkingdid you spank the llama yet?"

"You bet, Mary, and I also remembered to tickle the badgers."

Which is unlikely to convey much meaning to your roommate beyond the fact that you're either speaking in code, or slightly nutty. Such a system, however, doesn't need to be particularly invisible to be quite effective. One regularly hears about the FBI's scanning of Internet traffic to find people transmitting information for the purposes of commission of a crime (http://www.fbi.gov/hq/lab/carnivore/carnivore.htm). It's reasonable to expect that they are looking for and flagging messages with potentially interesting words, phrases, or patterns in them. In light of the recent terrorist activity, it's likely that the FBI is flagging and examining communications containing such words as "bomb," or " nuclear ." People wishing to discuss the construction and delivery of a bomb to its target, however, could almost certainly develop a simple code based around baked goods and grandma's house that would completely remove all suspect words from their communications, while still conveying all the necessary information to a coconspirator about the progress and delivery schedule. This is, in fact, one of the reasons that granting the government broad invasive powers with respect to communications as a response to National Security needs is a Bad Idea. Such knee-jerk reactions decrease online users' freedoms and liberties, while affording no real increase in security. Ben Franklin had a few choice things to say about those who would try to make the exchange (http://www.brainyquote.com/quotes/quotes/b/q118446.html).

A particularly aptly constructed code, with substitutions very carefully chosen to avoid sounding out of place would allow two people to converse without necessarily raising suspicion that they were actually speaking in code. "You have a midterm", for example, would probably make a good substitution for "Invite the guests" in a college setting where "Remember, you have a midterm today" is unlikely to raise an eyebrow. Noun-for-noun, verb-for-verb or phrase-for-phrase substitutions, however, are not strictly required. You could just as well substitute the numbers "2" for "invite" and "7" for "friend," though "I also remembered to 2 the 7s" is somewhat more obviously encoded.

Speaking mathematically, however, a code doesn't need to be substituted at the level of linguistic components . Morse code, which substitutes intermixed long and short sound periods for letters , is legitimately a code in mathematical parlance, though it would be considered a simple substitution cipher, and not properly a code, by a cryptologist.

Regardless of the exact definition used, codes universally involve the substitution of linguistic or alphabetic components of a message with other linguistic or symbolic components through the use of a dictionary. This dictionary provides equivalences between words or characters in the unencoded, or plaintext , message, and words, character groups, or symbols to be transposed into the encoded version of the message. The dictionary may be derived in some algorithmic or automated fashion, but there just as easily may be no definable relationship between the unencoded and encoded versions other than through the existence of the dictionary. This implies that a code is likely to be unbreakable without the construction (or other acquisition) of a corresponding dictionary, but that once such a dictionary has been acquired , any message using that code is easily decodable through the use of the dictionary.

On the other hand, if you were corresponding with your friend via writing, you might find it easier to use a cipher to hide the content of your message. A cipher uses algorithmic substitution upon the message at the character level (or, more properly, at a level ignoring the linguistic structure), making it much easier to apply. A code dictionary listing equivalences for every possible word you might want to transmit without your roommate being able to decipher the message would probably be impractical to construct. With a cipher, however, construction of the dictionary is not necessary, as the substitutions are algorithmic in nature, and carried out at the letter or symbol level. You might, for example, choose to substitute every letter you wished to encode with the letter following it in the alphabet. This wouldn't be a particularly difficult cipher for your roommate to guess, and subsequently decipher, but regardless, the prior discussion in this form would be rendered like this:

"Hi, Jim. Just checkingdid you cblf the dblf yet?"

"You bet, Mary, and I also remembered to jowjuf the gsjfoet."

Because application of the cipher isn't limited to a predetermined dictionary of words, it would be equally easy to encipher the entire exchange by using this scheme, reducing the chance that your roommate would be able to make intelligent guesses at the content from the context:

"Ij, Kjn. Kvtu difdljoheje zpv cblf uif dblf zfu?"

"Zpv cfu, Nbsz, boe J bmtp sfnfncfsfe up jowjuf uif gsjfoet."

This is, in fact, a very simple method of enciphering text, credited in its initial form to none other than Julius Caesar, who, it is said, implemented it in the form of a pair of rotatable rings on a staff, each containing the alphabet in order. Setting any character in one ring above a different character in the other allows one to encipher and decipher text written at any offset from the plaintext versionproviding, of course, the offset is known.

A particularly common variant of this type of simple substitution cipher is the rot13 cipher, which, instead of transposing each character by the one following it, transposes for the one rotated 13 characters ahead. An appealing feature of this cipher is that because the English alphabet is 26 characters in length, the same algorithm that creates ( enciphers ) the enciphered text ( ciphertext ) can be used to convert the ciphertext back to the plaintext message. The rot13 cipher is so common in the Unix and Usenet News environment that the ability to encipher to and decipher from it is built into many Usenet News clients and email readers. For those lacking such capability, the tr command, or a short perl script can suffice with only a bit of extra cutting and pasting:

 %  tr 'a-zA-Z' 'n-za-mN-ZA-M'   Hi, Jim. Just checking - did you bake the cake yet?  Uv, Wvz. whfg purpxvat - qvq lbh onxr gur pnxr lrg?  You bet, Mary, and I also remembered to invite the friends.  Lbh org, Znel, naq V nyfb erzrzorerq gb vaivgr gur sevraqf. %  tr 'a-zA-Z' 'n-za-mN-ZA-M'   Uv, Wvz. whfg purpxvat - qvq lbh onxr gur pnxr lrg?  Hi, Jim. just checking - did you bake the cake yet?  Lbh org, Znel, naq V nyfb erzrzorerq gb vaivgr gur sevraqf.  You bet, Mary, and I also remembered to invite the friends.

If you prefer Perl, the following snippit of code will do the same:

 #!/usr/local/bin/perl while($line=<>) {  $line =~y/a-zA-Z/n-za-mN-ZA-M/;  print $line; }

The problem with simple substitution ciphers such as this is that the nonrandom patterns of the English language (or actually, it seems, any human language) allow an intelligent person in possession of a sufficient amount of enciphered text, to mount a very precise, and almost inevitably successful attack against the cipher algorithm. Many longtime denizens of the Usenet News hierarchy have become so familiar with the rot13 cipher that they can read many words in rot13 as easily as they can in plain English. This overwhelming weakness lies in the fact that the substitutions are carried out in a perfectly regular way, and really amount to nothing more than a definition of a new alphabet for the same language. The characters used look just like the characters used in common English writing, but in the rot13 alphabet, n is pronounced like the English a , o is pronounced like the English b , and so on. If one studies the language that has been encoded, patterns in letter usage begin to emerge, and it becomes relatively simple to determine the encoding algorithm, and by that, the original plaintext message. For example, if you examine letter frequency tables for the English language, you will find that the letter e is used far more frequently than any other letter. Not coincidentally, if you count up the various characters used in the ciphertext of the messages above, you'll find that the most prevalent letters are r , v , and g , with 13, 8, and 7 uses respectively. Depending on who you ask regarding letter frequency tables, they'll tell you that t , followed by a , o , and i (the latter 3 with very nearly identical frequency) are the next most frequently used letters in English. With these bits of data it isn't too difficult to start back-substituting letters into the ciphertext, and very quickly arrive at the deciphered plaintext message. Just working with the e , the second line of the ciphertext becomes (with bold uppercase plaintext predictions inserted):

"Lbh org, Znel, naq V nyfb e E z E zo E e E q gb vaivg E gu E sev E aqf."

How many three letter words do you know that end in 'e'? How many single-letter words do you know? If either v or g stands for t , which one is really it? Based on that, what do b and u stand for? As you can see, this type of cipher provides little real protection for a message, though it can make it more difficult to read for those insufficiently immersed in the alternate alphabet to be able to read it clearly.

There are a multitude of interesting Web resources devoted to codes and ciphers, one of the most immediately readable being "The Secret Language," by Ron Hipschman (http://www.exploratorium.edu/ronh/secret/secret.html). Ron provides a down-to-earth explanation of simple substitution ciphers, as well as transposition ciphers (a technique that is not covered in this book). He also includes a nice summary of frequency tables for single letters, letter pairs, initial letters, and common words through four letters in length.

Perhaps the most famous literary discussion of ciphers, however, is contained in Edgar Alan Poe's "The Gold Bug," in which the entire plot of the story hinges on the deciphering of a secret message. "The Gold Bug" can currently be downloaded from the Oxford Text Archive, available online at http://ota.ahds.ac.uk/ texts /1855.html.

To make ciphers more secure, a number of methods have been invented to add complexity to the resulting ciphertext, and thereby make it more difficult to decipher. Interestingly, the most successful of these have not relied on making the algorithm obtusely complex and/or obscure. Rather they have relied on the use of an algorithm, the action of which is permuted in some fashion by the use of a key . The algorithm is then able to be well-publicized, tested , and verified, but any given ciphertext cannot be decrypted by simple knowledge of the algorithm; any decryption requires access to the key.

One simple but important adaptation of the simple substitution cipher was invented in the 16th century by Blaise de Vigenere from the court of Henry III of France. It was believed to be essentially unbreakable for several centuries. The Vigenere cipher works very similarly to the initial cipher discussed in this chapter, where letters are substituted by the letter immediately following them in the alphabet. To make the message much more secure, however, the offset to the substitution letter is not fixed at "the following character" (an offset of 1), but instead may be any offset from 0 to 26. This makes deconvolution of the ciphertext message by examining letter frequency nearly impossible , as the first e in a plaintext message may be substituted with an f , but the second could be substituted with an o , or a z , or any other letter. Counting the occurrence of characters in the ciphertext is therefore almost useless. To decipher a ciphertext written in the Vigenere cipher, and indeed to encipher it, one needs a key.

Despite the difficulty in deciphering a message in this cipher, the algorithm itself is simple to explain. First, one must understand that although a simple substitution cipher essentially replaces one alphabet with another, this cipher replaces the alphabet in which a message is written with many others. To do this substitution, one constructs a matrix of alphabets as shown in Figure 4.1:

Figure 4.1. The 26 alphabets of the Vigenere cipher.

graphics/04fig01.jpg

A key is then chosen with which to encipher the message. Typically the key is a short word or phrase. The plaintext message is written down, and the key written out in a repeating fashion over the entire length of the plaintext message. If you pick "birthday" as the key for the previous exchange, the enciphering of the first line of the conversation would begin as shown in Figure 4.2.

Figure 4.2. A plaintext message and key layout for the Vigenere cipher.

Enciphering the message then proceeds on a character-by-character basis. For each character, the appropriate alphabet to use is looked up based on the corresponding letter of the key. For the first character, H , the key character is B ; therefore we look to the row of the matrix constructed in Figure 4.1 that starts with a B , and look up the entry that corresponds to the H column from the first row (that is to say, an I ). The second character is an i , enciphered from the alphabet starting with I , so it gets replaced with a q . The ciphertext is constructed in this fashion for the entire plaintext message. Note that the e 's are going to be encoded from the alphabets starting with B, H, and D , so the ciphertext is going to contain different characters for them, defeating simple attacks by character frequency in the ciphertext.

Special characters such as spaces can be ignored ( causing them to disappear in the output), absorb a character of the key, resolve to a default character from the alphabet, or be explicitly encoded into the cipher with the addition of more columns in the matrix shown in Figure 4.1. Especially for spaces, either ignoring them or encoding them into the alphabet matrix is a better idea than allowing them to absorb a key character: Following the encoding shown gives an attacker potentially useful hints through analysis of the likely words based on character count. If you add a space entry to the matrix in Figure 4.1 following the Z entry on each row, and remove the punctuation, the message then encodes as shown in Figure 4.3. Deciphering the ciphertext is exactly the reverse of the method used for enciphering the text.

Figure 4.3. The message enciphered with the Vigenere cipher and with "BIRTHDAY" used as the key.

graphics/04fig03.gif

Note that spaces still occur in the ciphertext, but that due to the space now being a valid encoding character in each of the new alphabets, a space in the ciphertext rarely corresponds to a space in the plaintext, defeating attempts to use word- size statistics against the encoding. In this case, the ciphertext character with the most occurrences is Q , but this character stands for i , a space, and y in different places in the ciphertext. It took until 1863 for an officer of the Prussian military to determine a method in which the length of the key could be predicted . If the length of the key can be guessed, then the ciphertext can be broken into <keylength> different cipher-subtexts, each encoded with a different simple substitution cipher. If enough raw ciphertext that uses the same key can be accumulated , then these subsets of the ciphertext can be successfully attacked by the use of character frequency tables.

Since that time, a multitude of ciphers have been proposed, used, and broken. The best of them to date are in use protecting your communications in Secure Shell (SSH, discussed in Chapter 14, "Remote Access: Secure Shell, VNC, Timbuktu, Apple Remote Desktop"), on secure Web sites (Chapter 15, "Web Server Security," and Appendix C, "Secure Web Development"), and in applications such as PGP and GPG (later in this chapter). The most obvious things all successful algorithms have in common are complex keys that must be shared to allow decryption but that must be protected to prevent unauthorized access, and widely published and studied algorithms, allowing cryptography experts to eliminate places where vulnerabilities, such as the ability to recover the key from the ciphertext, might occur.

To make semantic matters more confusing, although cryptography is properly the practice of employing codes and ciphers, the computing security world has taken to using the term encryption to mean only the application of specific types of ciphersthose being ones that require a key to be exchanged for decryption. Cryptanalysis is therefore the science and practice of studying cryptographic systems to discover and exploit their weaknesses, and cryptology is the umbrella science encompassing both cryptography and cryptanalysis. More confusingly, the cryptology field uses the terms encoding and encoded in a generic sense to mean the application of any cryptographic technique to transform plaintext data, including the application of ciphers. I could therefore legitimately have used the term encoded in this chapter where I have used the more precise term enciphered . The computing security field, however, prefers to use encoded to indicate an encoding (or enciphering), which specifically is not meant to obfuscate the content (such as "Morse code," or "Pascal (programming language) code"), and deprecates the use of encipher as a verb entirely, except in cases where cultural bias makes the use of encrypt unpalatable. This leaves , at least according to IETF RFC 2828 (http://www.ietf.org/rfc/rfc2828.txt), the discussion of proper ciphers that don't meet the RFC definition of an encryption scheme in undiscussable limbo.

Be that semantic soup as it may, we will adopt the only conglomeration of these terms that seems to cover the spectrum while remaining sensible for use in this book.

Encoding is used in the mathematical sense of transformation from one symbol set to another. This may be the encoding of a program's logic as commands in a formal program language, or the substitution of llama for cake . It also includes the possibility of symbolic transformations at the character level, such as the use of Morse code. In transmitting data over the Internet, it's also common to see information encoded from one form that's incompatible with a transport system into another that is. You'll therefore frequently see binary information, such as images, attached to pure-text transport systems such as email with the image encoded into a textual representation (such as Base64) that the mail transport system can handle.
Enciphering is used to indicate the application of a character-symbol-level transformation of a plaintext message with the intent to disguise the content.
Encryption is used as RFC 2828 recommends, to indicate the application of an algorithmic cipher that requires key exchange for decoding.

The next few sections briefly outline the currently prevalent encryption methods and how they apply to securing small data tokens such as passwords. While reading these, keep in mind that a message to be encoded (plaintext in the previous examples) does not literally need to be a textual message. Any data stream may be the message, including, but not limited to, images, network traffic, software executables, or binary data. Nor does the encrypted (ciphertext) output literally need to be textual in nature.

DES: Data Encryption Standard

DES, Data Encryption Standard, is an example of a symmetric , or secret key , cryptosystem. Symmetric cryptosystems use the same key for encryption and decryption.

DES was developed in the 1970s at IBM with some input from the National Security Agency (NSA) and the former National Bureau of Standards (NBS), which is now the National Institute of Standards and Technology (NIST). The U.S. government adopted DES as a standard in July 1977 and reaffirmed it in 1983, 1988, 1993, and 1999. It is defined in the Federal Information Processing Standards (FIPS) publication 46-3, available at http://csrc.nist.gov/ publications /fips/fips46-3/fips46-3.pdf.

DES is a 64-bit block cipher that uses a 56-bit key during execution. In other words, it encrypts data by breaking the message into 64-bit blocks and encrypting them with a 56-bit key. The same key is used for encryption and decryption. It is susceptible to brute-force attacks from today's hardware, and is considered to be a weak encryption algorithm now. In FIPS 46-3 the government allows its use only in legacy systems, and instead lists Triple DES as the FIPS-approved symmetric algorithm of choice.

Although the algorithm itself is more complex than can be explained in detail here, in basic form it can be understood in the following form:

The 56-bit key is extended to 64 bits with parity and error-correction data. This becomes key K .
The message is segmented into 64-bit blocks.
A block B of the message M is passed through a known permutation IP , which rearranges its bits in a predetermined fashion. The result is PB .
The PB is injected into a central encryption loop and a counter N is initialized .
The incoming block is split into left and right 32-bit halves L and R .
48 bits of the K are selected into Kn , based on the contents of K and iteration count N .
A cipher function F is applied to R , using Kn . The output is R' .
A replacement PB is assembled from R' and L , in that order ( R' becomes the leftmost 32 bits; L becomes the rightmost).
The process repeats an additional 15 times from step 5.
The final reassembled PB is reversed once more to PB' .
PB' is subjected to an inverse transformation, IP' , which is the reverse of the initial reordering IP . The output of this operation is 64 bits of the encrypted output for M .
The next block of 64 bits from M is chosen as B , and the algorithm repeats from step 3.

RSA Laboratories has thus far sponsored three DES-cracking challenges, each of which have been solved. Rocke Verser's group at the University of Illinois solved the first challenge in 1997, and the Electronic Frontier Foundation solved the second challenge in 1998. distributed.net and Electronic Frontier Foundation won the third contest in 1999. Information on the latter two contests can be found at http://www.eff.org/descracker.html and http://www.distributed.net/des/.

Interestingly, the strength of the basic algorithm behind DES has been affirmed by these cracking attempts, rather than deprecated as insecure . The insecurity lies in its use of only a 56-bit key, which can be attacked in a brute-force fashion quite easily by today's standards. With more than 20 years of scrutiny behind it, there still has been no algorithmic solution devised that can back-calculate the original message from the encrypted output, or guess the key without an exhaustive search of the entire key space. This, to say the least, is an impressive accomplishment.

DES is being replaced by the Advanced Encryption Standard (AES). AES was adopted as a standard in May 2002 and is defined in the FIPS publication 197, available at http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf. AES is a 128-bit symmetric block cipher that uses 128-, 192-, and 256-bit keys. The algorithm can handle additional block sizes and key lengths, but the standard has not adopted them at this time.

RSA: The Rivest-Shamir-Adelman Algorithm

RSA is an example of an asymmetric or public key cryptosystem. Asymmetric cryptosystems use the different keys for encryption and decryption. Encryption is done with a public key, but decryption is done with a private key. The basic notion is that each party who wishes to engage in secure communication of information generates a public and a private key. These keys are related in a precise mathematical fashion such that a piece of information to be exchanged between individual A and individual B that is encrypted with B's public key can only be decrypted with B's private key. Authentication of A as the sender can also be performed if a validation signature is attached in both plaintext, and encrypted with A's private key to the encrypted data that is sent to B. The encrypted portion of this signature can only be decrypted with A's public key, thereby validating A as the sender.

RSA was developed in 1977 by Ronald Rivest, Adi Shamir, and Leonard Adelman, and it takes its name from the initials of the developers' last names . The latest version of the documentation on RSA is available at http://www.rsasecurity.com/rsalabs/pkcs/pkcs-1/.

The RSA algorithm is based on factoring. The product of two large prime numbers similar in length, p and q , gives a number called the modulus , n . Choose another number, e , such that it is less than n and is relatively prime to ( p 1) x ( q 1) , which is to say that e and ( p 1) x ( q 1) have no factors larger than 1. Choose another number d , such that (( e x d )1) is divisible by ( p 1) x ( q 1) . The number e is known as the public exponent, and the number d is known as the private exponent. The public key is the pair of numbers ( n,e ) , and the private key is the pair ( n,d ) .

If individual A wishes to send a message M to individual B , A encrypts M into ciphertext C by exponentiation: C = ( M ^ e ) mod n , where e and n are the elements of B 's public key. ^ is the exponentiation operator, ( i ^ j ) is i raised to the power j . mod is the modulus mathematical operator; ( i mod j ) produces the remainder of i divided by j . B decodes the message also by exponentiation: M = ( C ^ d ) mod n , where d and n are B 's private key. The relationship between e and d is what causes this reversal by exponentiation to work, and allows correct extraction of M from C . Because only B knows the private key ( d,n ) , only B can extract the message.

To sign the message, A attaches a signature s to the ciphertext C , and also an encrypted signature es . es is generated by exponentiation with A 's private key: es = ( s ^ d ) mod n , where d and n are from A 's private key. B computes a decrypted signature ds by exponentiation: ds = ( es ^ e ) mod n , where e and n are from A 's public key, and compares ds with s . They will agree only if es was encrypted with A 's private key, and since only A possesses this key, agreement serves as validation that the message is from A .

This algorithm really is quite trivial in design, and relies on the fact that it is computationally quite difficult to factor large numbers. In practice, this works as follows (only using much larger values). We will calculate a key pair for two individuals A and B , and carry out a simple signed data exchange:

A picks p = 11 , q = 19 . n = p * q = 209 .
( p 1) x ( q 1) = 180 . 180 factors to 2 x 2 x 3 x 3 x 5 . Pick e = 11 . (11 is prime, and not a factor of 180. e being prime is not a requirement of the algorithm, but makes the calculations easier to demonstrate . It's also bad form to pick e equal to p or q in practice.)
((11 x d )1)"180 needs to be a whole number. Pick d = 131 . (131 x 11)1 = 1440 = 8 x 180 .
A 's public key is then (209,11) and A 's private key is (209,131) .
B picks p = 7 , q = 23 . n = p x q = 161 .
( p 1) x ( q 1) = 132 . 132 factors to 2 x 2 x 3 x 11 . Pick e = 7 . (Usually, one would want e to be smaller than ( p 1) x ( q 1) ), but still a large number. Again, 7 is chosen for convenience.)
((7 x d )1)"132 needs to be a whole number. Pick d = 151 . (151 x 7)1 = 1056 = 8 x 132 .
B 's public key is then (161,7) , and B 's private key is (161,151) .

Let us say that A wishes to send the (very short) message 13 to B . The following steps are taken:

B sends his public key to A ; A sends her public key to B .
A encrypts the message with B 's public key (161,7) as (13^7) mod 161 . 13^7 = 62748517 . 62748517/161 = 389742 with a remainder of 55 . 55 becomes the encrypted ciphertext C of the message.
A also wants to sign the message with the signature 17 . A encrypts the signature with her private key of (209,131) as es = (17^131) mod 209 .
17^131 = 154457393072366309211243453140265336130679022855532916933912066667508381014824689935939748335468475166201536661429828074002635558976064397547405836087319391984433. (Don't believe us? Install Perl's Math::BigFloat module (you need a more recent version than Apple provides please see http://www.cpan.org/; perl -MCPAN -e shell should get you well on your way) and try out the sample code in Listing 4.1 following this example).
es = 154457393072366309211243453140265336130679022855532916933912066667508381014824689935939748335468475166201536661429828074002635558976064397547405836087319391984433 mod 209 = 6.

A then has a ciphertext C for the message that consists of the value 55 , and a composite signature that consists of the nonencrypted signature 17 , and the encrypted signature 6 . These values are sent to B . No special care need be taken to protect them, because they are both encrypted and signed. They cannot be decrypted without the use of B 's private key and A 's public key. They cannot be modified without the use of A 's private key. Upon receipt, B applies the following steps:

To validate the message, B decrypts the encrypted portion of the signature with A 's public key (as it was encoded with A 's private key). ds = ( es ^ e ) mod n = (6^11) mod 209 = 362797056 mod 209 = 17 .
B compares ds = 17 to the nonencrypted portion of the signature (sent as the value 17 ), finds that they match, and can be comfortable that indeed the message came from A .
B then decrypts the ciphertext C by using his private key (as the message was encoded with his public key). M = ( C ^ d ) mod n = (55^151) mod 161 .
55^151 = 62339901772061144190031435235862186847664389389245399078404740141850360058606222097315541538269927995472819853970163068214487686533684900208227797646627070069088808092811324611434117366172479352641014470894910862931278714392513418118824120028875768184661865234375.
M = 62339901772061144190031435235862186847664389389245399078404740141850360058606222097315541538269927995472819853970163068214487686533684900208227797646627070069088808092811324611434117366172479352641014470894910862931278714392513418118824120028875768184661865234375 mod 161 = 13.

And, at this point, B has recovered A 's message of 13, verified that A is the sender by determining that A 's public key is the one necessary to properly decrypt the signature, and the communication is at an end.

Listing 4.1 A Public-Key Encryption Demo. (You will need to install a more current version of Math::BigFloat than Apple provides to run this code.)

 #!/usr/local/bin/perl use Math::BigFloat;  #This line will break if you haven't updated BigFloat Math::BigFloat->div_scale(2048); #  These modify their arguments #  $x->bmod($y);                # modulus ($x % $y) = remainder of $x/$y #  $x->bpow($y);                # power of arguments ($x raised to the $y) #  If you want to keep $x and operate on it at the same time, pass the #    operation through copy() first. $Ap = Math::BigFloat->new(11);  # Individual A's prime p $Aq = Math::BigFloat->new(19);  # A's prime q $Ae = Math::BigFloat->new(11);  # A's public exponent e $Ad = Math::BigFloat->new(131); # A's public exponent d                                 # in real use, Ae should not be                                 # equal to Ap or Aq. $Bp = Math::BigFloat->new(7);   # Individual B's prime p $Bq = Math::BigFloat->new(23);  # B's prime q $Be = Math::BigFloat->new(7);   # B's public exponent e $Bd = Math::BigFloat->new(151); # B's public exponent d $M  = Math::BigFloat->new(13);  # Message M $s  = Math::BigFloat->new(17);  # A's signature                                 # Message M and Signature s can be                                 # no larger in value than Min(An,Bn) print "RSA example: person A encrypts and signs a message to person B\n"; print "A's primes:\n"; print " Ap = $Ap\n"; print " Aq = $Aq\n"; print "A's exponents:\n"; print " public exponent  Ae = $Ae\n"; print " private exponent Ad = $Ad\n"; print "B's primes:\n"; print " Bp = $Bp\n"; print " Bq = $Bq\n"; print "B's exponents:\n"; print " public exponent  Be = $Be\n"; print " private exponent Bd = $Bd\n"; print "\n"; $An = $Ap->copy()->bmul($Aq);    # $An = $Ap*$Aq $Bn = $Bp->copy()->bmul($Bq);    # $Bn = $Bp*$Bq print "A's modulus Ap*Aq:\n"; print " An = Ap*Aq = $An\n"; print "B's modulus Bp*Bq:\n"; print " Bn = Bp*Bq = $Bn\n"; print "\n"; print "Message M:\n"; print " M = $M\n"; $C  = $M->copy()->bpow($Be);     # ciphertext temporary = (message ^ Be) print "Encryption begins:\n"; print " M^Be = $M^$Be\n"; print " M^Be = $C\n";       $C->bmod($Bn);             # ciphertext C = (message ^ Be) mod Bn print "Encryption finished:\n"; print " C = (M^Be) mod Bn = $C\n"; print "\n\n"; print "A signs C:\n"; print " s = $s\n"; print "Signature Encryption begins:\n"; $es  = $s->copy()->bpow($Ad);    # A encrypts s with A's private key (temp) print " s^Ad = $s^$Ad\n"; print " s^Ad = $es\n";       $es->bmod($An);            # encrypted sig es = (s ^ Ad) mod An print "Signature Encryption finished:\n"; print " es = (s^Ad) mod An = $es\n"; print "\n\n"; print "Full message reads;\n"; print "ciphertext    = $C\n"; print "signature     = $s\n"; print "encrypted sig = $es\n"; print "\n\n"; print "B begins decrypting:\n"; print "check the signature:\n"; $ds = $es->copy()->bpow($Ae);    # decrypt using A's public key print " es^Ae = $es^$Ae\n"; print " es^Ae = $ds\n";       $ds->bmod($An); print " ds = (es^Ae) mod An = $ds\n"; if($ds->bcmp($s) == 0)           # sig s and decrypted sig ds match if zero {   print " ds = $ds = $s = s : Signatures match - sender verified\n"; } else {   print " ds = $ds != $s = s :  Signatures do not match, message forged\n";   exit; } print "decrypt the message into dM:\n"; $dM = $C->copy()->bpow($Bd); print " C^Bd = $C^$Bd\n"; print " C^Bd = $dM\n";       $dM->bmod($Bn); print " dM = (C^Bd) mod Bn = $dM\n"; print "Message decrypted, sender verified, end of communication.\n"; print "\n\n"; exit;

CAUTION

The Perl code in Listing 4.1 requires a modern version of the Math::BigFloat package. You can install this through the CPAN module if you choose, by entering

  perl -MCPAN -e shell   install Math::BigFloat

This may die with an error complaining about Scalar.pm, as there is (as of this writing) an inconsistency in the archived packages and requirements. If you get this error, reissue the install command as

  force install Math::BigFloat

Typically, messages would be composed of ASCII (American Standard Code for Information Interchange) text or binary data. From the point of view of the algorithm, it makes no difference. ASCII, or any other digital storage code for textual data, provides a conversion between individual characters in the alphabet and numeric values. If the message were text such as " A! ", the numeric values that correspond to the alphabetic characters are what would be used. In ASCII, " A " has the decimal value 65 , and " ! " has the value 33 . The message can be considered to be a 2-byte value with " A " being the high-order byte, and " ! " being in the low-order byte.

In decimal, then, the value would be (65 x 256) + 33 = 16673 . This is the value to which A would apply B 's public key. One can directly encrypt any string that is less than the minimum value of A 's and B 's moduli . Data streams or strings greater than this value (longer than the bitstream length encoding this value) must be broken into blocks of this length or shorter and sequentially encoded.

NOTE

Although this example demonstrates the mechanism of RSA encryption, the practice is often somewhat different. For example, the nonencrypted portion of the signature should not be passed as plaintext, because if it is, then all an attacker must do is steal that signature and the encrypted version to be able to forge A 's signature on other messages. Instead, the nonencrypted version of the signature would typically be passed in the data encrypted with B 's public keyit is therefore no longer "nonencrypted," but is rather encrypted with a different key, preventing anyone not in possession of both A 's public key and B 's private key from making the comparison.

Also, it is common practice to use RSA as an encryption envelope, rather than to encrypt the entire message. In this use, some symmetric secret-key encryption algorithm (for example 3DES) is used to encrypt the main data stream, then the key used for this encryption, and a checksum calculated for the original data, is passed to RSA as data to encrypt and sign. Receipt of the RSA-encrypted information along with the 3DES data stream enables the recipient to recover the key necessary to decrypt the 3DES stream, the RSA signature verifies its origin, and the checksum verifies the transmission integrity.

If you would like to see what happens when someone tries to forge a signature block, change the value of $es between the encryption and decryption sections of the code.

RSA Laboratories recommends that for general corporate use, 1024-bit keys should be sufficient. For truly valuable data, 2048-bit keys should be used. For less valuable data, 768-bit keys may be appropriate. These key sizes actually refer to the size of the modulus, n . The primary reason to choose a shorter key length over a greater one is that as the keys become longer, the calculations required become more costly in terms of computational time. This makes using larger keys take longer. As machines become faster, this becomes less of an issue, and 1024- or 2048-bit keys are not problematic on modern hardware.

As with DES, RSA Laboratories has sponsored contests for cracking RSA. The most recent contest was the factorization of the 155-digit (512-bit) RSA Challenge, which was solved in 1999. It took 35.7 CPU years. Details on the contest are available at http://www.rsasecurity.com/rsalabs/challenges/factoring/rsa155.html. RSA Laboratories is currently sponsoring more RSA factoring challenges, ranging from a 576-bit challenge to a 2048-bit challenge.

The RSA algorithm is vulnerable to chosen plaintext attacks, where the attacker can have any text encrypted with the unknown key and then determine the key that was used to encrypt the text. It is also vulnerable to fault attacks in the cryptosystem itself, particularly in private key operations. Causing one bit of error at the proper point can reveal the private key.

Note that the strength of the algorithm relies on the fact that it is computationally exceedingly difficult to factor very large numbers. If one must consider every prime less than half some number n as a potential factor of n , the task quickly becomes astronomically intensive computationally, as n becomes large. With n on the order of 1024 bits, and well-chosen primes near 512 bits in size, the number of prime numbers that would need to be examined by a brute-force iteration through all primes is on the order of 10^150 . This is somewhere around 10^70 times more prime numbers that would need to be examined than there are electrons in the known universe.

It is interesting to compare the philosophy of the RSA algorithm with the DES algorithm. In the case of DES, the algorithm has no known weaknesses, but fails to be secure due to the limited length of the key. RSA, on the other hand has a very significant known weakness: If a method could be found for quickly factoring large numbers without resorting to brute-force methods, d could be recovered from e and n in the public key. RSA, therefore, is secure only as long as the keys are extremely large, and there is no known way to factor them quickly. So long as large numbers remain difficult to factor, however, RSA's weakness is unexploitable. Thankfully, although there are faster ways to factor numbers than the brute-force examination of every possible prime, none appear likely to bring this computational task into the realm of affordable desktop computing capabilities.

MD5: Message Digest Function

MD5 is a message digest function developed by Ronald Rivest in 1991. It is defined in RFC 1321, available at http://www.ietf.org/rfc/rfc1321.txt?number=1321.

MD5 is a one-way hash function that takes an arbitrary-length message and returns a 128-bit value. The algorithm works in five steps.

The message is padded so that its length plus 64 bits is divisible by 512. That is to say, it is extended an arbitrary length, until it is 64 bits shy of being an exact multiple of 512 bits in length.
A 64-bit representation of the length of the message before it was padded is appended to the result of the first step. If the message is longer than 2^64 (not a particularly likely occurrence, as this is 16,777,216 terabytes), the lower-order 64 bits of the length are used. The length of the result of this step is a multiple of 512 bits. As the length is a multiple of 512 bits, it is also a multiple of 16 32-bit words ( 16 x 32 = 512 ).
Four 32-bit words are used to initialize the message digest buffer.
Four functions are defined that each take three 32-bit words as input, and each returns a single 32-bit word as output. The message is passed, in 16-word blocks, through a predetermined Boolean combination of these four functions. The output of each iteration is 4 32-bit words, which initialize the buffer for the next iteration.
The final 4 words written to the buffer are combined into a single 128-bit value, which is returned as the result of the hash.

Upon consideration, it should be obvious that the operation is lossy, and does not constitute a compression or encryption of the data that can be reversed by application of a suitable key. No matter the size of the input, the result is always 128 bits of data. The utility in this is that it is improbable (1 in 2^128 likely) that two randomly chosen pieces of data will result in the same hash output. The unlikelihood of this occurring allows the MD5 hash to be used to predict whether two pieces of information are in fact the same, or more commonly, to determine whether a piece of information is the same as it once was.

NOTE

This does not imply that two documents cannot have the same MD5 checksum. The checksum usually contains less information than the document, and there are many more potential documents than MD5 checksums. In fact, there are an infinite number of possible documents if the document size is infinitely variable, whereas there are only a finite number of possible 128-bit MD5 checksums. This implies that there are many documents that would hash to any given MD5 checksum. The MD5 checksum space is large enough, however, that for practical purposes, you are unlikely to meet two documents that hash to the same value by chance. If you'd like an estimate of the probability that two documents in a large collection might hash to the same value, do some reading on the birthday problem (http://www.studyworksonline.com/cda/content/explorations/0,,NAV2-76_SEP949,00.shtml, http://www-stat. stanford .edu/~susan/surprise/Birthday.html). If my calculations are correct, if you have 17,884,898,788,718,400,000 documents in your database, there's about a 50% chance that two will hash to the same value via MD5. That's many fewer than 2^128, but its still a larger number of documents than most of us will ever see.

As an example of the first use, consider the task of trying to create a central data repository of all documents used in some place of business. If the environment is like most, there are probably hundreds of unique documents, but also dozens of copies of identical documents, kept by dozens of different people. If you were to try to build a database containing only one copy of each, it could become very time consuming to compare every bit of every document against every bit of every other document to determine those that are duplicates, and those that are new, unique, and need to be added to the database. Instead, you might calculate an MD5 checksum for each unique document as you add it to the database. An incoming document then could be checked for potential duplication in the database if its MD5 checksum was calculated and only the checksum was compared, rather than the entire document, against the checksums existing in the database. If the incoming document's checksum does not already exist in the database, it is definitely unique, as MD5 always hashes a given input to the same output. If the incoming document's checksum does already exist, it might be the same as an existing document, or it might be one of the many possible documents that hash to that particular MD5 checksum value. In this case, the document needs to be checked on a bit-for-bit basis with the possible duplicate, or duplicates, in the database to determine whether it needs to be added or discarded. Because you have only a 1 in 2^128 chance of any document hashing to the same value as another, it's unlikely that you will find many that have the same hash value, even if you store a very large number of documents.

This use is also commonly applied as a method to check whether a user has entered a password correctly. The system stores the MD5 checksum of each user's actual password in a table, and when a user attempts to authenticate, the MD5 checksum of the password is compared with the stored checksum value. If they match, the user has probably entered the proper password. Because many entered values will hash to every possible MD5 checksum, however, it's entirely possible with this scheme that the user has entered some other password that also hashes to the same checksum value. Possible, but unlikely.

In the second use, the MD5 checksum is used to verify that a piece of information has not been changed by some action. For example, if you are archiving valuable data, you might want to calculate MD5 checksums for the data and store them separately. When the data needs to be examined at a later date, MD5 checksums can again be calculated. If they agree, there is a high probability that the data has not been changed. If they disagree , something has changed in the data since the original checksum was calculated. This technique is useful both for verifying archival storage and for providing a method for others to verify that network transmissions have been received unaltered. For example, MD5 can be used to provide a digital signature for when a large file has to be compressed before being encrypted with a private key for a public key cryptosystem. Arriving at a duplicate checksum after decryption and decompression is strong evidence that the file has not been tampered with.

As a signature, MD5's weakness is that it is a hash, and does not produce a unique value for every possible input. Therefore, it is possible for an attacker to construct and insert a forged message that hashes to the same MD5 checksum as the original. Attacks on MD5 therefore are attacks to find collisions, multiple inputs that can produce the same hash. Structural algorithmic solutions to allow an attacker to do so easily have not at this time been discovered , but brute-force attacks have been demonstrated to be possible with a sufficient investment of computational resources. MD5 has not yet been declared an insecure checksum solution, but its strength is in doubt. In 1995 Hans Dobbertin, in "Alf Swindles Ann," CryptoBytes (3) 1 , showed how to defeat MD4, which shares many features with MD5, via hash collisions. P. van Oorschot and M. Wiener, in "Parallel Collision Search with Application to Hash Functions and Discrete Logarithms," Proceedings of 2nd ACM Conference on Computer and Communication Security (1994), calculate that for $10 million in computing hardware, a hash collision could be determined in 24 days or less (in 1994 dollars and time). Dobbertin, in "The Status of MD5 After a Recent Attack," CryptoBytes (2) 2 (1996), extends the techniques applied to break the MD4 hash in under a minute to the compression function of MD5. Each of these results suggests that although MD5 may remain suitable for applications such as the detection of possible database entry duplications, its utility as a seal against unauthorized data modification may be limited, or may be eliminated completely in the near future.

Other Encryption Algorithms

A plethora of other encryption algorithms and software is available, though only a few that have stood the test of time and public examination long enough to be considered reasonably secure. The following list includes a number of the more popular variants and brief descriptions of them:

3DES . Triple-DES is a variant of DES that uses DES three times. Encryption is generally performed in an encrypt-decrypt-encrypt sequence with three independent 168-bit keys. It is considered more secure than DES, but is slower.
Blowfish . A 64-bit block cipher with variable-length keys up to 448 bits. It was developed by Bruce Schneier and designed for 32-bit machines. It is much faster than DES. Dr. Dobb's Journal sponsored a cryptanalysis contest of Blowfish in April 1995. Results included an attack on a 3-round version of Blowfish (Blowfish has 16 rounds), a differential attack on a simplified variant of Blowfish, and the discovery of weak keys. The attack based on weak keys is effective only against reduced-round variants of Blowfish. For details on the results of the contest, see Schneier's paper at http://www.counterpane.com/bfdobsoyl.html. The original paper on Blowfish is available at http://www. counterpane .com/bfsverlag.html. Overall, Blowfish is generally considered to be secure.
CAST-128 . A 64-bit block cipher with variable-length keys, up to 128 bits. It is a DES-like encryption. CAST takes its name from the original developers, Carlisle Adams and Stafford Tavares. CAST-128 is also part of Pretty Good Privacy (PGP). CAST-256 is an extension that uses a 128-bit block size with keys up to 256 bits.
RC4 . RC4 is a variable key-size stream cipher designed by Ronald Rivest. The algorithm is fast and is used in a variety of products, including SSL and WEP implementations . The algorithm was considered to be a trade secret until code that was alleged to be RC4 was posted to a mailing list in the mid-1990s. This code, Alleged RC4, is known as arcfour.
Arcfour . Arcfour is a stream cipher that is compatible with RC4. It can be used with a variety of key lengths, with 128-bit keys being common. An expired draft on arcfour is available at http://www.mozilla.org/projects/security/pki/nss/draft-kaukonen-cipher-arcfour-03.txt.
DSA . Digital Signature Algorithm is a public key signature-only algorithm, based on the Diffie-Hellman discrete logarithm problem. It is the underlying algorithm of the DSS, Digital Signature Standard, which was endorsed by NIST as the digital authentication standard of the U.S. government in 1994. DSA is defined in FIPS 186-2, which is available at http://csrc.nist.gov/publications/fips/fips186-2/fips186-2-change1.pdf.
IDEA . Internation Data Encryption Algorithm, IDEA, is a redesign of an encryption algorithm called PES (Proposed Encryption Standard). PES was designed by Xuejia Lai and James Massey, and its revision, IDEA, was designed by Xuejia Lai, James Massey, and Sean Murphy. IDEA is a 64-bit block cipher that uses a 128-bit key. As with DES, the same algorithm is used for encryption and decryption. The design of IDEA enables it to be easily implemented in hardware or software. The software implementation is considered comparable to DES by some people, and faster than DES by others. Although a class of weak keys has been discovered for it, IDEA is generally considered to be secure.

Mac OS X Cryptography Applications

As Mac OS X matures, we are beginning to see a significant increase in the available cryptographic offerings for the platform. The first two of these we will present provide almost identical functionality: implementation of the Pretty Good Privacy (PGP) public-key encryption system. The third is rather different, in that it's really an interface to the built-in encryption capabilities that exist in the libraries on which OS X is based.

PGP

PGP is the 1991 brainchild of Phil Zimmerman. Although a rather clever programmer, Phil overestimated the good sense and underestimated the unbelievable lack of computing sophistication of the U.S. government and justice system when he developed and published a system for implementing public-key cryptography in a simple fashion for exchanging information through email. Phil essentially said, "Hey, cool! Implementing this public-key stuff in a relatively seamless and easy-to-use form isn't so toughwe can make this into a worldwide resource!" Shortly afterwards, the U.S. government slapped him with charges alleging that he was exporting weapons technology to enemies of the United States. It may seem peculiar, but what he was accused of was writing and distributing software containing the RSA algorithm in a way that someone might export it out of the country. Phil didn't export it, and even if he had, one might think that the fact that such algorithms were well known and could be implemented by any competent programmer on the planet would make it not all that big a deal to possibly make it available to someone who might export the binary. Not so. The government classified Phil's action in the same category as exporting tanks and nuclear weapons. The case was eventually dropped in 1996, but for years a large contingent of Internet and security-conscious individuals fought on Phil's behalf , both collecting money for his defense, and satirizing the situation in an attempt to make the justice system see just what folly the case actually was. http://people. qualcomm .com/ggr/about_pgp.html provides an insight into the thinking of the day. At the same time this was going on, it was established that books containing the source code or algorithms describing encryption technology could be exported (so we shouldn't have to ask you to help bail us out of jail for that bit of Perl we showed you previously) because they were literary works, but that nonreadable (that is, executable) versions were weapons. For a while, there was a T-shirt available with the source for the algorithm printed on it, and a bar-code that encoded the executable. The shirt was advertised with the slogan "For the price of this T-shirt, you too can become an International Arms Dealer." The manufacturer promised to refund anyone's money who was convicted, and many wondered whether you could be convicted of carrying a concealed weapon if you wore the T-shirt inside out.

Shortly after the government dropped its case against Zimmerman, he formed a company and began distributing PGP as a commercial application. There were some particularly nice features available, such as the ability to PGP-encrypt an entire disk partition and have anything written to that disk encrypted, but the company was soon absorbed by Network Associates, the former distributors of McAfee AntiVirus. This business group did a reasonable job of upkeep on the product until 2001, when it decided to drop PGP from its product line. For a while it looked like the very nice PGP product line was going to disappear, but thankfully the original principals of PGP Corporation managed to work out a deal to reacquire the rights in June of 2002, and development, including the production of a Mac OS X version, has begun once again.

Usage

Commercial and freeware versions of PGP are available at http://www.pgp.com/. Various bundled software packages are offered commercially, depending on your needs. A freeware PGP package is also available, with limited capabilities that are probably sufficient for the home individual. Most importantly, the freeware version allows you to make and manage keys and encrypt and decrypt files.

Download and install PGP. The documentation recommends rebooting the machine afterward. After it is installed, generate a set of keys for yourself. The option to generate a set of keys appears the first time you launch the application. If you do not make a set of keys then, you can either select New Keys from the Keys menu of the PGP application, or select New Keys from the PGPkeys window that opens when you launch PGP. You can choose to generate your keys in Expert mode. If you do not choose Expert mode, your keys are generated using the default parameters that you can see in Expert mode. The Setup portion for the Expert mode is shown in Figure 4.4.

Figure 4.4. Expert mode for the Setup step in key generation in PGP.

graphics/04fig04.jpg

In Normal mode, you need to supply only a name and email address. In the Expert mode, you also specify a key type, key size, and key expiration. The default key type is Diffie-Hellman/DSS. You can also choose RSA or RSA Legacy. The default key size is 2048 bits, but you can specify anything from 1024 to 4096. The default expiration is none, but you can specify an actual expiration date, if you prefer.

After setting up the key, you are asked to provide a passphrase to protect your private key. As you type your passphrase, a horizontal bar displays an approximation for the quality of your passphrase, much like a progress bar. Then your PGP key pair is generated.

In the PGPkeys window, which opens when you launch PGP, your new key is listed, as shown in Figure 4.5. From this window you can do a number of other things, including revoking your key, deleting it, sending your public key to a keyserver, exporting your keys to a file, and searching for public keys. If the PGPkeys window is not open when you need it, simply select that window under the Window menu.

Figure 4.5. Your new key appears in the PGPkeys window.

graphics/04fig05.jpg

In Preferences, you can specify a number of options, including editing the keyserver listing, shown in Figure 4.6, and specifying encryption algorithms and where your keys are stored.

Figure 4.6. The keyserver preferences in PGP.

graphics/04fig06.jpg

After you have your set of keys, you can experiment with encrypting and decrypting something. A convenient program to use for testing encryption and decryption is the TextEdit application. Type something and then select it. Under the Services option of the TextEdit menu, select PGP, then select Encrypt. Figure 4.7 shows how everything looks as you start to do this.

Figure 4.7. Encrypting text in TextEdit using PGP.

graphics/04fig07.jpg

Select a recipient in the next window that appears. If you haven't imported anyone else's public keys, then yours is the only option. The text is then encrypted, as shown in Figure 4.8.

Figure 4.8. An encrypted version of the text shown in Figure 4.7.

graphics/04fig08.jpg

To decrypt the encrypted text, select Decrypt from the PGP menu under the Services menu under the TextEdit window. Enter your passphrase. Then the decrypted text appears in a PGP window, as shown in Figure 4.9.

Figure 4.9. The encrypted text is shown decrypted in the PGP window.

graphics/04fig09.jpg

You can easily use PGP for encryption or decryption in applications that use the Services menu. As you might expect, Mail is one of those applications.

To exchange PGP-encrypted files or messages, you first need to get the public key for anyone with whom you want to exchange encrypted information. If the person's public key is on a public keyserver, search the keyserver, then drag the correct result from the search window to the PGPkeys window. If you have received the person's public key as a file some other way, choose Import in the PGPkeys window and select the correct file. When you are encrypting the data, select your recipient. Then send or exchange the data in whatever way was predetermined. The recipient can decrypt the data with his private key.

To receive encrypted data, you have to make your public key available to others. You can either send it to a keyserver from the PGPkeys window, or you can export your public key to a file and exchange it some other way. If you choose to send it to a keyserver, it does not matter which one you send it to. The keyservers regularly sync their databases.

GPG

GPG, Gnu Privacy Guard, is a Gnu replacement for the PGP suite of tools. GPG received a real boost from the apparent death of PGP, especially as far as Mac OS X development is concerned , and should be considered an interesting study in the Open Source community's response to a perceived lack in a software functionality niche. It's designed as a drop-in replacement for PGP, allowing the same content to be transmitted and decoded with the same key sets.

Usage

A version of GnuPG for Mac OS X is available from http://macgpg. sourceforge .net/. This site includes GnuPG and various GUI tools. The version of GPG that is available for Mac OS X 10.2 is slightly newer than the version that is available for Mac OS X 10.1. The site also has links to some other tools, such as Gpg Tools and tools for getting various mail programs to work with GPG.

From the Mac GnuGP site, download the version of GPG suitable to your version of Mac OS X and the quarterly files. Download separately anything that may have been updated after the quarterly download was made available.

If you install GPG by using the installer, it installs in /usr/local/bin by default, with documentation in /Library/Documentation/GnuPG/ . The documentation also suggests that you can install it via fink (http://fink.sourceforge.net/) if you prefer. fink will probably put it somewhere under the /sw hierarchy. Install the other applications as well.

After GPG is installed, you are ready to generate a key pair. You can do this either by selecting Generate under the Key menu in GPGkeys, or by running gpg --gen-key at the command line. GPGkeys is a hybrid application that opens a terminal and runs gpg --gen-key at the command line for you. When you are generating a key pair, you have to answer a few questions. You select an encryptional algorithm, the key size, and an expiration date. Then you provide a name, email address, and a comment, if any, to identify the key. Sample output follows:

 %  gpg --gen-key  gpg (GnuPG) 1.2.1; Copyright (C) 2002 Free Software Foundation, Inc. This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the file COPYING for details. gpg: keyring `/Users/sage/.gnupg/secring.gpg' created gpg: keyring `/Users/sage/.gnupg/pubring.gpg' created Please select what kind of key you want:    (1) DSA and ElGamal (default)    (2) DSA (sign only)    (5) RSA (sign only) Your selection?  1  DSA keypair will have 1024 bits. About to generate a new ELG-E keypair.               minimum keysize is  768 bits               default keysize is 1024 bits     highest suggested keysize is 2048 bits What keysize do you want? (1024) Requested keysize is 1024 bits Please specify how long the key should be valid.          0 = key does not expire       <n>  = key expires in n days       <n>w = key expires in n weeks       <n>m = key expires in n months       <n>y = key expires in n years Key is valid for? (0) Key does not expire at all Is this correct (y/n)?  y  You need a User-ID to identify your key; the software constructs the user id from Real Name, Comment and Email Address in this form:     "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>" Real name:  Sage Ray  Email address:  sageray@mac.com  Comment: You selected this USER-ID:     "Sage Ray <sageray@mac.com>" Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?  O  You need a Passphrase to protect your secret key. We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. ++++++++++++++++++++++++++++++.++++++++++.+++++.+++++++++++++++++++++++++++++ +.++++++++++.+++++.+++++++++++++++++++++++++++++++++++.+++++>++++++++++...... ...+++++ We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. ++++++++++.+++++++++++++++.+++++.++++++++++++++++++++++++++++++++++++++++.++++ +.++++++++++++++++++++.++++++++++++++++++++++++++++++..+++++>+++++>+++++...... .+++++^^^ gpg: /Users/sage/.gnupg/trustdb.gpg: trustdb created public and secret key created and signed. key marked as ultimately trusted. pub  1024D/190CF224 2002-12-31 Sage Ray <sageray@mac.com>      Key fingerprint = 8FD6 DC90 D32C F092 7F04  DD18 1D97 A19F 190C F224 sub  1024g/D5B2A9F2 2002-12-31

A basic configuration file is created in ~/.gnupg/gpg.conf . GPGPreferences is a GUI application that installs in the Other section of the System Preferences. As of this writing, GPGPreferences edits ~/.gnugp/options , but the version of GPG available for Mac OS X 10.2 no longer uses that file for preferences. However, you can watch changes it makes to the options file and make those changes to ~/.gnugp/gpg.conf instead.

You may want to consider specifying key searching behavior and a keyserver that GPG should search. You can uncomment a line for one of the ones listed in gpg.conf , or add a line for a different server of your preference. You can find some key servers listed in GPGPreferences, and you can also check http://www.pgpi.org/ for keyserver listings.

For a keyserver entry, you might have a line like this:

 keyserver ldap://keyserver.pgp.com

If you specify a keyserver, include options for keyserver functions, such as the following:

 keyserver-options auto-key-retrieve

The gpg.conf contains a list of keyserver options and their functions. The preceding option automatically fetches keys as needed from the keyserver when verifying signatures or when importing keys that have been revoked by a revocation key that is not present on the keyring.

After you have generated a key, it is added to the list of keys that GPGKeys can manage. Figure 4.10 shows a sample of the Public tab after generating a key pair. GPGKeys adds that key to your public keyring, which is ~/.gnupg/pubring.gpg .

Figure 4.10. A key now appears in GPGkeys.

graphics/04fig10.jpg

As with PGP, you can test encrypting and decrypting in TextEdit from the Services menu. Type some text in TextEdit and select it. Then select either the encryption option under the GPG menu or the encryption option under the Gpg Tools menu under Services, if you installed Gpg Tools. Although either method should work for encryption and decryption, we find the Gpg Tools interface to be slightly easier to work with. Figure 4.11 shows the encryption interface using Gpg Tools. Figure 4.12 shows a sample of the encryption done by GPG. This is an encryption of the same text that was used in the PGP example.

Figure 4.11. Starting the encryption process using the Gpg Tools interface to GPG.

graphics/04fig11.jpg

Figure 4.12. The text has been encrypted by GPG.

graphics/04fig12.jpg

As with PGP, you can easily use GPG for encryption or decryption in applications that use the Services menu, such as TextEdit or Mail. Remember that the Mac GnuPG site also has scripts available to allow certain other mail programs to work with GPG.

After you have tested that encryption and decryption are working for you, you are ready to exchange keys and share encrypted data. You can either make your public key available via a public key server or export it to a file and exchange it some other way. To send it to a keyserver, you can choose the Send to Keyserver option under the Key menu in GPGkeys.

As with PGP, to send encrypted data to someone, you need that person's public key. You can either get that person's public key from a keyserver or you can exchange it by some other means and import it. You can search a keyserver from GPGkeys and select the correct key to be retrieved. Following is sample output from importing a public key served by a keyserver.

 %  gpg --search-keys rumpf  gpgkeys: searching for "rumpf" from LDAP server keyserver.pgp.com Keys 1-10 of 11 for "rumpf" (1)     Rumpf Thomas <a9002596@unet.univie.ac.at>           4096 bit DSS/DH key 2A9F0F7893D0AE13, created 1997-08-26 (2)     Jean-Claude Rumpf <jcrumpf@pt.lu>           2048 bit DSS/DH key 43A5FBC5D172566C, created 1999-09-14 (3)     Johannes Rumpf <Joerumpf@t-online.de>           2048 bit DSS/DH key 74C5630780911F16, created 2000-05-01, expires 2001-01-02 (4)     Robert Wolfgang Rumpf <rumpf.1@osu.edu>           2048 bit DSS/DH key CA663A997E4B35DF, created 1999-07-14 (5)     Michael Rumpf <Michael.Rumpf@t-online.de>           2048 bit DSS/DH key DE0D93830EF28E0E, created 1999-11-14 (6)     Michael Rumpf <Michael.Rumpf@t-online.de>           2048 bit DSS/DH key 838DD3F7CBB774F2, created 1999-11-14 (7)     Beate Rumpf <BeRu-FoxAti@t-online.de>           2048 bit DSS/DH key 3C6D928B10B7807C, created 2000-07-07 (8)     Beate Rumpf <BeRu-FoxAti@t-online.de>           2048 bit DSS/DH key 712971B918247DDC, created 2000-07-08 (9)     Beate Rumpf <BR-FoxAti@web.de>           2048 bit DSS/DH key 29661EA322068AE3, created 2000-07-23 (10)    Patrick Rumpf <xp590@excite.com>           3072 bit DSS/DH key 73C41FE244791040, created 2000-08-03 Enter number(s), N)ext, or Q)uit >  4  gpg: key 7E4B35DF: public key "Robert Wolfgang Rumpf <rumpf.1@osu.edu>" imported gpg: Total number processed: 1 gpg:               imported: 1

If you exchange keys with individuals in a more personal fashion, simply select the appropriate key file by using the Import option in GPGkeys. If you have trouble, make sure the key does not have any extraneous information, such as mail headers.

PuzzlePalace

If you are not interested in installing PGP or GPG, but you are interested in some encryption capabilities, you might be interested in PuzzlePalace, a shareware package available from http://personalpages.tds.net/~brian_hill/puzzlepalace.html.

PuzzlePalace makes use of functionality available in OpenSSL, making it possible to exchange encrypted data with anyone who has access to OpenSSL. There is also an option to post-process the file with Base64 to send it as an attachment.

PuzzlePalace is easy to use. Just select the type of encryption you want to use and drag the file that you want to encrypt. If you only want to create a Base64-encoded file, and don't want to apply any additional encipherment, set the cipher to None. If you've selected encipherment, you are then asked to protect the file with a passphrase, and to retype the passphrase for verification. To use the file, drag it out of the PuzzlePalace window. Figure 4.13 shows the PuzzlePalace interface after it has just encrypted a file. The icon for an encrypted file has a lock.

Figure 4.13. A file has just been encoded with Triple DES encryption in PuzzlePalace.

graphics/04fig13.jpg

To decode the file, drag it to the PuzzlePalace window and enter the passphrase for the file. PuzzlePalace then decodes the file. To read it, just drag the file out of the window and read it. The icon for a decrypted file in the PuzzlePalace window is the file's normal icon.

Top