Section 4.1. Introduction to Encryption

4.1. Introduction to Encryption

Let's suppose that you carry your laptop home from work every day, bring it back to the office the following morning, and then tether it to your desktop with a locking cable protected by a combination lock. You know how important it is to remember the lock combination, don't you? If you ever forget it, your laptop will end up married to your desk until you pry it free by cutting the cable. Maybe you remember numbers easily, but I don't. It's hard enough for me to even remember my own telephone number, let alone the plethora of secret numbers in my lifemy Social Security number, bank account PIN, voice mail password, and anniversary (oops!). To make things easier, I have devised an ingenious method for remembering that lock combinationI have written down the code on a label and put that label on the lock itself!

And now you must be wondering if you would ever be able to trust me with something secure!

Like the rest of humanity, I have a brain that is part hard drive (disk) and part random-access memory (RAM), and numbers seem to go into RAM more often than not. After a period of usage, the numbers are conveniently aged out to make room for more (not unlike the System Global Area of an Oracle instance) and are forgotten. In computers, this process is expected and is built into the design. Database systems are designed to store information and make it accessible to users when asked. Historically, the assumption has been that users who demand access will already have been authenticated to establish that they are who they claim to be. The mere storage of sensitive information, therefore, has not been considered a potential security breach.

That may have been true at one time, but not today, with intruders seemingly everywherethey may be curiosity seekers; they may be planning to sell account data to your competitors; or they may be seeking to disrupt your system for revenge. The attack might come from outside, via the Internet, or inside your organization. (Indeed, research shows that most hacking does come from within.) As countless security breaches have shown, sensitive data clearly needs to be protected from anyone not authorized to see that data. What options does Oracle provide for that protection?

Pan back to my lock combinationit's 3451. Not being a complete idiot, I don't write that number on my lock. Instead, I have a secret number that I always remember6754, and using this number I modify the lock combination by adding the corresponding digits:

     3 + 6 = 9     4 + 7 = 11     5 + 5 = 10     1 + 4 = 5

The resulting numbers are 9, 11, 10, and 5. In my scheme, I use only single-digit numbers, so I wrap the double-digit numbers around the number 10; hence, 10 becomes 0, 11 becomes 1, and so on. Using my secret key 6754, I have transformed the number 3451 into 9105. It's the latter number that I write on the combination lock, not the actual code. If I forget the combination, I will be able to read that number and use my magic number 6754 to reverse the logic I applied earlier so I can use the number 3451 to unlock the key. The number 9105 is for the whole world to see, but the thief still won't be able to unlock the combination unless he also knows the key, 6754.

In this way, I have encrypted the number represented by my lock combination. The number 6754 is the key to the encryption process. This type of encryption I've performed here is known as symmetric encryption because the same key is used to encrypt and decrypt. (In contrast, with asymmetric encryption , described later in this chapter, there are two distinct keys: a public key and a private key.) The logic I described to encrypt the code is a very simplistic implementation of an encryption algorithm.

4.1.1. Encryption Components

Let's summarize what we have learned so far. An encryption system has several basic components, as shown in Figure 4-1.

The algorithm
The key
The type of encryption (symmetric, in this case, because the same key is used both to encrypt and to decrypt)

Figure 4-1. Symmetric encryption components

Let's assume that a thief intent on stealing my laptop is trying to open the lock. What does she need in order to succeed? First, she has to know the algorithm; let's assume here that that she knows it, perhaps because I boasted about my cleverness at work, or she read this book, or this algorithm is public knowledge. Second, she needs to learn the key. That is something I can protect. Even if the thief knows about the algorithm, I can still hide the key effectively. But as there are only 4 digits in the key, it takes only up to 10⁴, or 10,000, attempts by the thief to guess the key. And because each attempt has an equal probability of getting it right or wrong, in theory, the thief has a 1 in 5,000 chance to guess the right key. Can she do it? In this case, the thief will have to manually turn the wheels of the combination lock 5,000 times. That's daunting, but theoretically possible. Suddenly, I don't feel so secure anymore.

What are the ways that I can protect my lock combination?

I can hide the algorithm.
I can make the key difficult to guess.
I can take both of these steps together.

The first option is impossible if I am using a publicly known algorithm. I could develop my own, but the time and effort may not be worth it. It might later be found out anyway, and changing an algorithm is a very difficult task. That rules out the third option, too, leaving the second option as the only viable one.

4.1.2. The Effects of Key Length

My lock combination is the digital equivalent of sensitive data. If an intruder wants to crack the encrypted key, 10,000 iterations to guess the code is trivialhe'll be able to crack it in under a second. What if I use an alphanumeric key instead of an all-numeric one? That gives 36 possible values for each character of the key, so the intruder will have to guess up to 36⁴, or 1,679,616, combinationsmore difficult than 10,000, but still not beyond reach. The key must be strengthened, or "hardened," by making it longer than 4 characters. Table 4-1 shows how the maximum number of guesses required increases with the increase in the key length. Therefore, the secret to hardening the key is to increase the length of the key.

Table 4-1. Alphanumeric key length and maximum number of guesses required to crack the key
Key length	Maximum number of guesses required
4	1,679,616
5	60,466,176
6	2,176,782,336
7	78,364,164,096
8	2,821,109,907,456
9	101,559,956,668,416
10	3,656,158,440,062,976

Remember that computers think in terms of bits and bytes (i.e., binary numbers), not alphanumeric characters. The possible values of a key position are 0 and 1, so the 10-digit key needs only 2¹⁰, or 1024, combinations, an extremely easy number to handle. Practically speaking, a key must be much longer. The length of a key is described in bits, so a key of 64 numbers is said to be of 64-bit. Table 4-2 shows the relationship between key length and number of guesses required for a binary type key.

Table 4-2. Binary key length and maximum number of guesses required to crack the key
Key length	Maximum number of guesses required
56	72,057,594,037,927,936
57	144,115,188,075,855,872
58	288,230,376,151,711,744
59	576,460,752,303,423,488
60	1,152,921,504,606,846,976
61	2,305,843,009,213,693,952
62	4,611,686,018,427,387,904
63	9,223,372,036,854,775,808
64	18,446,744,073,709,551,616
65	36,893,488,147,419,103,232

The longer the key, the more difficult it is to crack the encryption. But longer keys also extend the elapsed time needed to do encryption and decryption, as the CPU has to do more work. In designing an encryption infrastructure, you may need to make a compromise between key size and reduced security.

4.1.3. Symmetric Encryption Versus Asymmetric Encryption

In the earlier example, the same key is used to encrypt and decrypt. As I mentioned, this type of encryption is known as symmetric encryption . There is an inherent problem with this type of encryption: because the same key must be used to decrypt the data, the key must be made known to the recipient. The key, which is generally referred to as the secret key , has to be either known by the recipient before she receives the encrypted data (i.e., there needs to be a "knowledge-sharing agreement") or the key has to be sent as a part of the data transmission. For data at rest (on disk), the key will have to be stored as a part of the database in order for an application to decrypt it. There are obvious risks in this situation. A key that is being transmitted may be intercepted by an intruder, and a key that is stored in the database may be stolen.

To address this problem, another type of encryption is often used, one in which the key used to encrypt is different from the one used to decrypt. Because the keys differ, this is known as asymmetric encryption . Because two keys are generateda public key and a private keyit is also known as public-key encryption. The public key, which is required for the encryption, is made known to the sender and, in fact, can be freely shared. The other key, the private key, is used only to decrypt the data encrypted by the public key and must be kept secret.

Let's see how public-key encryption might work in real life. As shown in Figure 4-2, John (on the left) is expecting a message from Jane (on the right). Here are the steps in the encryption process:

John generates two keysa public key and a private key.
He sends the public key to Jane.
Jane has an original message (known as the cleartext) that she encrypts using the public key, and she sends the encrypted message to John.
John decrypts it using the private key he generated earlier.

Note carefully here that there is no exchange of decryption keys between the parties. The public key is sent to the sender, but because that is not what is needed to decrypt the value, it does not pose a threat from a potential key theft.

However, you should be aware of the effect of spoofing or phishing here, which can render this process of data encryption insecure. Here is a scenario:

John generates a public-private key pair and hands the public key over to Jane.

Figure 4-2. Basic asymmetric encryption
An intruder is sniffing the communication line and obtains John's public key. Sometimes that's not even necessary, as John may have made his public key available to the public intentionally.
The intruder creates another public-private key pair with his software (using John's name so the public key looks like it was from him).
The intruder sends "his" new public key that he generated with his software, not the original one created by John. Jane does not know the difference; she thinks it is John's real public key.
Jane encrypts the message using this public key and sends the encrypted message to John.
However, the intruder is still sniffing the line and intercepts this message. He has the private key for the public key, so he can decrypt the message. In an instant, the intended security advantage is lost.
There is a slight problem, though. When John eventually gets the encrypted message and tries to decrypt it, he will be unsuccessful, because the private key that needs to be used is not the correct one. He will get suspicious. To prevent this, the intruder will just have to re-encrypt the message using John's real public key and pass on the encrypted message to him. John is unlikely to know that something like this has happened.

Scary? Of course. So, what's the solution? The solution is to somehow verify the authenticity of the public key and ascertain its source as the correct sender. This can be done using a fingerprint match . The topic is beyond the scope of this book, but essentially, when Jane encrypts with the public key, she checks the fingerprint of the key to make sure the key does indeed belong to John. (This discussion also highlights how the communication lines between the source and the destination must be highly secure.)

The key used to encrypt is not the key used to decrypt, so how does the decryption process know the key used during the encryption process? Recall that both keys are generated at the same time by the receiver, which ensures that there is a mathematical relationship between them. One is simply the inverse of the other: whatever one does, the other simply undoes it. The decryption process can therefore decipher the value without knowing the encryption key.

Because public and private keys are mathematically related, it is theoretically possible to guess the private key from the public key, although it is a rather laborious process that requires factoring an extremely large number. So, to reduce the risk of brute-force guessing, very high key lengths are used, typically 1,024-bit keys, instead of the 56-, 64-, 128-, or 256-bit keys used in symmetric encryption. Note that a 1,024-bit key is typical, not the norm. Keys of shorter lengths are also used.

Oracle provides asymmetric encryption at two points:

During transmission of data between the client and the database
During authentication of users

Both of these functions require use of Oracle's Advanced Security Option (ASO) , an extra-cost option that is not provided by default. That tool simply enables asymmetric key encryption on those functions; it does not provide a simple ready-to-use interface that you can use to build a data-at-rest encryption solution.

The only developer-oriented encryption tools freely available in Oracle provide for symmetric encryption. For this reason, I focus on symmetric encryption, not asymmetric encryption, in this chapter.

Because asymmetric encryption systems use different keys to encrypt and decrypt, the source and destination need not know the key that will be used to decrypt. In contrast, symmetric encryption systems do use the same key, so safeguarding the keys when using such systems is very important.

4.1.4. Encryption Algorithms

There are many widely used and commercially available encryption algorithms, but we'll focus here on the symmetric key algorithms supported by Oracle for use in PL/SQL applications. The DES and Triple DES algorithms are supported by both of Oracle's built-in encryption packages: DBMS_CRYPTO and DBMS_OBFUSCATION_TOOLKIT; only DBMS_CRYPTO, introduced in Oracle Database 10g Release 1, supports AES, however.

Data Encryption Standard (DES)

Historically, DES has been the predominant standard used for encryption . It was developed more than 20 years ago for the National Bureau of Standards (which later became the National Institute of Standards and Technology, or NIST), and subsequently DES became a standard of the American National Standards Institute (ANSI) . There is a great deal to say about DES and its history, but my purpose here is not to describe the algorithm but simply to summarize its adaptation and use inside the Oracle database. This algorithm requires a 64-bit key, but discards 8 of them, using only 56 bits. An intruder would have to use up to 72,057,594,037,927,936 combinations to guess the key.

DES was an adequate algorithm for quite a while, but the decades-old algorithm shows signs of age. Today's powerful computers might find it easy to crack open even the large number of combinations needed to expose the key.

Triple DES (DES3)

NIST went on to solicit development of another scheme based on the original DES that encrypts data twice or thrice, depending upon the mode used. An intruder trying to guess a key would face 2,112 and 2,168 combinations, in two-pass and three-pass encryption routines, respectively. DES3 uses a 128-bit or 192-bit key, depending on whether it is using a two-pass or three-pass scheme.

Triple DES is now also showing signs of age and, like DES, has become susceptible to determined attacks.

Advanced Encryption Standard (AES)

In November 2001, the Federal Information Processing Standards (FIPS) Publication 197 announced the approval of a new standard, the Advanced Encryption Standard, which became effective in May 2002. The full text of the standard can be obtained from NIST at http://csrc.nist.gov/CryptoToolkit/aes/round2/r2report.pdf (or visit the book's web site for the link).

Later in this chapter, I'll show how you can use these algorithms by specifying options or selecting constants in Oracle's built-in packages.

4.1.5. Padding and Chaining

When a piece of data is encrypted, it is not encrypted as a whole by the algorithm. It's usually broken into chunks of eight bytes each, and then each chunk is operated on independently. Of course, the length of the data may not be an exact multiple of eight; in such a case, the algorithm adds some characters to the last chunk to make it exactly eight bytes long. This process is known as padding . This padding also has to be done right so an attacker won't be able to figure out what was padded and then guess the key from there. To securely pad the values, you can use a pre-developed padding method, which is available in Oracle, known as Public Key Cryptography System #5 (PKCS#5). There are several other padding options that allow for padding with zeros and for no padding at all. Later in this chapter, I'll show how you can use padding by specifying options or selecting constants in Oracle's built-in packages.

When data is divided into chunks, there needs to be a way to connect back together those chunks, a process known as chaining. The overall security of an encryption system depends upon how chunks are connected and encryptedindependently or in conjunction with the adjacent chunks. Oracle supports the following chaining methods:

CBC: Cipher Block Chaining, the most common chaining method.
ECB: Electronic Code Book
CFB: Cipher Feedback
OFB: Output Feedback

Later in this chapter, I'll show how you can use these methods by specifying options or selecting constants in Oracle's built-in packages.