Section 4.6. Cryptographic Hashing

4.6. Cryptographic Hashing

Encryption provides a way to ensure that only authorized people can see your data. It does so by disguising sensitive data. In some cases, however, you may not be interested in disguising data, but simply in protecting it from manipulation. Suppose that you have stored payment information for vendors. That data by itself may not be sensitive enough to require encryption, but you may want a way to ensure that someone does not alter the numbers to increase a payment amount. How can you do that? The answer lies in a process known as cryptographic hashing . Let's start by looking at a real-world example.

4.6.1. The Case of the Suspicious Sandwich

Let's suppose that you leave your sandwich open on your desk when you go over to the fax machine to pick up an important fax. When you come back, you feel somehow that the sandwich has shifted a little to the left. Has someone tampered with your sandwich, perhaps placing some barbiturate in it to knock you out so that he can steal your cool new wireless mouse? Or perhaps he's after the PL/SQL book hidden in your drawer? Maybe it's not drugs, but sand, in your sandwich? The possibilities spin madly through your mind, and you lose your appetite.

To calm your fears, you decide to challenge the integrity of the sandwich. Being the careful fellow you are, you had previously weighed the sandwich and recorded its weight out to the 10^th digit after the decimal point. Confronted with the possibility of an altered sandwich, you weigh it again and compare the results. They are exactly the same, down to that 10^th digit of precision.What a relief! If someone had actually changed the sandwich in any wayfor example, by adding barbiturates or sandthe weights would have been different, thus revealing the adulteration.

Pay close attention to the concepts presented here. You did not "hide" the sandwich (i.e., encrypt it); you simply created your own method of calculating a value that represents the sandwich. You could then compare the before and after values. The value you arrived at could have been based on any algorithmin this case, it was the weight of the sandwich.

If you were examining data, rather than sliced bread and meat, an algorithm could be used to generate the value. This process is known as hashing. It's different from encryption in that it is one-way onlyyou can decrypt encrypted data, but you can't de-hash hashed values. Hashing the same piece of data more than once produces the same value regardless of the number of times it is done. If data has been modified in any way, the generated hash value will not be the same, thus revealing the contamination.

There is always a theoretical risk that two different pieces of data will hash to the same value, but you can minimize the probability of this happening by using a sufficiently sophisticated hashing algorithm. One such algorithm is known as a Message Digest (MD). One MD variant known as MD5 was once a standard, but it did not prove secure enough to maintain its standardization; a newer standard called Secure Hash Algorithm Version 1 (SHA-1) is more often used today and is available in Oracle Database 10g.

4.6.2. MD5 Hashing in Oracle9i Database

Let's see how we can use hashing in real-life database administration. When a sensitive piece of information is sent to a different location, you might calculate the hash value beforehand and send it in another shipment or transmission. The recipient can then calculate the hash value of the received data and compare it against the hash value that you sent out.

In Oracle9i Database, the DBMS_OBFUSCATION_TOOLKIT package provides a hashing function for MD5, an implementation of the Message Digest protocol. To hash a string, I might specify:

     DECLARE        l_hash     VARCHAR2 (2000);        l_in_val   VARCHAR2 (2000);     BEGIN        l_in_val := 'Account Balance is 12345.67';        l_hash := DBMS_OBFUSCATION_TOOLKIT.md5 (input_string => l_in_val);        l_hash := RAWTOHEX (UTL_RAW.cast_to_raw (l_hash));        DBMS_OUTPUT.put_line ('Hashed Value = ' || l_hash);     END;     /

Here I provided a simple string "Account Balance is 12345.67", and got its hash value. The MD5 function returns a VARCHAR2 value, but as in the encryption process observed earlier, it contains control characters. Hence, I must convert it to RAW and then to hexadecimal for easy storage. The above code segment returns:

     Hashed Value = A09308E539C35C97CD612E918BA58B4C

In this example, you will note two important differences between hashing and encryption:

In hashing, the input string does not have to be padded to make it a certain length, whereas it does in encryption.
In hashing, there is no key. Because there is no key involved, there is no need to store or supply the key at any point of the send-receive process, which makes a hashing system extremely simple.

Because I may want to obtain this value and transmit it to the recipient, I can create a stored function that does that for us. Let's use the same code example and create the function, as follows.

     /* File on web: get_hash_val_9i.sql */     CREATE OR REPLACE FUNCTION get_hash_val (p_in VARCHAR2)        RETURN VARCHAR2     IS        l_hash   VARCHAR2 (2000);     BEGIN        l_hash :=           RAWTOHEX              (UTL_RAW.cast_to_raw                   (DBMS_OBFUSCATION_TOOLKIT.md5 (input_string      => p_in)                    )              );        RETURN l_hash;     END;

Let's generate some representative outputs from this function.

     BEGIN        DBMS_OUTPUT.put_line (   'Hashed = '                              || get_hash_val ('Account Balance is 12345.67')                             );        DBMS_OUTPUT.put_line (   'Hashed = '                              || get_hash_val ('Account Balance is 12345.67')                             );     END;

The output is:

     Hashed = A09308E539C35C97CD612E918BA58B4C     Hashed = A09308E539C35C97CD612E918BA58B4C

As you can see, the function returns an identical value each time for the same input string, a fact that can be used in validating the integrity of that particular piece of data. Note that here I am referring to the integrity of the data, not the database; the latter is assured by the Oracle database in enforcing constraints and in transactions. When a legitimate user updates a value that does not violate defined constraints, the data (but not the database) becomes corrupt. For example, if someone updates an account balance via ad hoc SQL from $12,345.67 to $21,345.67, that fact may not be detected at all unless the organization provides tracking capabilities.

If the hash value for a column like Social Security number was calculated beforehand and stored somewhere, and then after retrieval the recalculated hash value is compared against the stored one, it will signal a possibly malicious data manipulation because the hash values will not be the same. Let's see how this works.

     DECLARE        l_data   VARCHAR2 (200);     BEGIN        l_data := 'Social Security Number = 123-45-6789';        DBMS_OUTPUT.put_line ('Hashed = ' || get_hash_val (l_data));        --        -- someone manipulated the data and changed it        --        l_data := 'Social Security Number = 023-45-6789';        DBMS_OUTPUT.put_line ('Hashed = ' || get_hash_val (l_data));     END;

The output is:

     Hashed = 098D833A81B279E54992BFB1ECA6E428     Hashed = 6682A974924B5611FA9D809357ADE508

Note how the hash values differ. The resulting hash value will be different if the data was modified in any way, even if the value itself is unchanged. If a space, punctuation, or anything else is modified, the hash value will not be the same.

It is theoretically possible that two different input values will produce the same hash value. However, by relying on widely used algorithms such as MD5 and SHA-1, you are ensured that the probability of a hash conflict is usually a statistically remote 1 in 10³⁸ (depending on the algorithm chosen). If you cannot afford to take even that chance, you will need to write conflict-resolution logic around your use of the hash function.

4.6.3. SHA-1 Hashing in Oracle Database 10g

As mentioned earlier, the MD5 protocol is not considered sufficiently secure for modern data protection, and SHA-1 is often used instead. SHA-1 is not available in the DBMS_OBFUSCATION_TOOLKIT, but in Oracle Database 10g, you can use the HASH function available in the DBMS_CRYPTO package to perform SHA-1 hashing . Here is the declaration of the function:

     DBMS_CRYPTO.hash (        src in raw,        typ in pls_integer)      return raw;

Because HASH accepts only the RAW datatype as input, I have to convert the input character string to RAW using the technique described earlier for encryption.

     l_in := utl_i18n.string_to_raw (p_in_val, 'AL32UTF8');

This converted string can now be passed to the hash function.

In the second parameter typ (which must be declared in the PLS_INTEGER datatype ) you specify the algorithm to use for hashing. You can select any of the algorithms in Table 4-7.

Table 4-7. Hashing algorithms in DBMS_CRYPTO
Constant	Description
DBMS_CRYPTO.HASH_MD5	Message Digest 5
DBMS_CRYPTO.HASH_MD4	Message Digest 4
DBMS_CRYPTO.HASH_SH1	Secure Hashing Algorithm 1

For example, to get the hash value for a RAW datatype variable, you might write a function as follows:

     /* File on web: get_sha1_hash_val.sql */     CREATE OR REPLACE FUNCTION get_sha1_hash_val (p_in RAW)        RETURN RAW     IS        l_hash   RAW (4000);     BEGIN        l_hash := DBMS_CRYPTO.HASH (src => p_in, typ => DBMS_CRYPTO.hash_sh1);        RETURN l_hash;     END;     /

For MD5 hashing, you would change the value of the parameter typ from DBMS_CRYPTO.HASH_SH1 to DBMS_CRYPTO.HASH_MD5. In my function to get the hash value, I can make it generic enough to accept any algorithm.

Finally, because the return value is in RAW, I need to convert it to VARCHAR2, as follows.

     l_enc_val := rawtohex (l_enc_val, 'AL32UTF8');

Putting everything together, this is what the function looks like.

     /* File on web: get_hash_val_10g.sql */     CREATE OR REPLACE FUNCTION get_hash_val (        p_in_val      IN   VARCHAR2,        p_algorithm   IN   VARCHAR2 := 'SH1'     )        RETURN VARCHAR2     IS        l_hash_val    RAW (4000);        l_hash_algo   PLS_INTEGER;        l_in          RAW (4000);        l_ret         VARCHAR2 (4000);     BEGIN        l_hash_algo :=           CASE p_algorithm              WHEN 'SH1'                 THEN DBMS_CRYPTO.hash_sh1              WHEN 'MD4'                 THEN DBMS_CRYPTO.hash_md4              WHEN 'MD5'                 THEN DBMS_CRYPTO.hash_md5           END;        l_in := utl_i18n.string_to_raw (p_in_val, 'AL32UTF8');        l_hash_val := DBMS_CRYPTO.HASH (src => l_in, typ => l_hash_algo);        l_ret := rawtohex(l_hash_val);        RETURN l_ret;     END;

Let's see how we can use it in an example:

     SQL> SELECT get_hash_val ('Test')       2  FROM DUAL       3  /     GET_HASH_VAL('TEST')     ----------------------------------------     640AB2BAE07BEDC4C163F679A746F7AB7FB5D1FA

I am using this function to hash a VARCHAR2 value, returning a VARCHAR2 field that can be stored and transmitted. By default this function uses the SHA-1 algorithm, but it can accept the other two algorithms as well.

4.6.4. Other Uses of Hashing

Hashing has many uses beyond cryptographyfor example, in web programming and virus detection .

Web applications are stateless: they do not keep the connection open to the database server for the duration of the transaction. In other words, there is no concept of a "session," and therefore there is no locking of the type Oracle users rely upon. This means that there is no easy way to find out whether data on a web page has changed. But if a hash value is stored along with the data, a new hash value can be recalculated and compared with the stored value. If the two values do not agree, the data has changed.

Hashing is also helpful in determining whether data can be trusted. Consider the case of a virus that updates critical documents stored inside the database. This is not something that can be easily caught by a trigger. However, if the document contains a hash value, then by comparing a computed hash value with the stored value, you can determine whether the document has been tampered with and you will know whether you can trust that document.

4.6.5. Message Authentication Code in Oracle Database 10g

The type of hashing discussed so far in this chapter is a very helpful technique, but it has certain limitations:

Anyone can verify the authenticity of transmitted data by using the hash function. In certain types of ultra-secure systems, where only a particular recipient is expected to verify the authenticity of the message or data, this may not be appropriate.
Anyone can calculate the same hash value if the algorithm is known, and he can then update the values in the checksum columns, hiding the compromise in the data.
For the reason stated in the previous problem, the hash value cannot be stored along with the data in a reliable manner. Anyone with update privileges on the table can update the hash value as well. Similarly, someone can generate the hash value and update the data in transit. For this reason, the hash value cannot accompany the data. It has to travel separately, which adds to the complexity of the system.

These limitations can be overcome by a modified implementation of hashing, one in which the exclusivity of the hashing mechanism at the receiver's end can be ascertained by a password or key. This special type of hash value is known as a Message Authentication Code (MAC). The sender calculates the MAC value of the data using a predetermined key that is also known to the receiver but not sent with the data. The sender then sends the MAC to the receiver along with the data, not separate from it. After receiving the data, the receiver also calculates the MAC value using the same key and matches it with the MAC value sent with the data. This mechanism is shown schematically in Figure 4-9.

Figure 4-9. Message Authentication Code usage

Like hashing, MAC also follows the standard algorithms, MD5 and SHA-1. As with the HASH function, the parameter typ is used to specify the algorithm to be used. Select DBMS_CRYPTO.HMAC_MD5 or DBMS_CRYPTO.HMAC_SH1, respectively. Here is how I can calculate the MAC value of an input string using the SHA-1 algorithm.

     /* File on web: get_sha1_mac_val.sql */     CREATE OR REPLACE FUNCTION get_sha1_mac_val (p_in RAW, p_key RAW)        RETURN RAW     IS        l_mac   RAW (4000);     BEGIN        l_mac :=           DBMS_CRYPTO.mac (src      => p_in,                            typ      => DBMS_CRYPTO.hmac_sh1,                            key      => p_key);        RETURN l_mac;     END;     /

Using my hashing function as a model, I can also create my own generic MAC calculations.

     /* File on web: get_mac_val.sql */     CREATE OR REPLACE FUNCTION get_mac_val (        p_in_val      IN   VARCHAR2,        p_key         IN   VARCHAR2,        p_algorithm   IN   VARCHAR2 := 'SH1'     )        RETURN VARCHAR2     IS        l_mac_val    RAW (4000);        l_key        RAW (4000);        l_mac_algo   PLS_INTEGER;        l_in         RAW (4000);        l_ret        VARCHAR2 (4000);     BEGIN        l_mac_algo :=           CASE p_algorithm              WHEN 'SH1'                 THEN DBMS_CRYPTO.hmac_sh1              WHEN 'MD5'                 THEN DBMS_CRYPTO.hash_md5           END;        l_in := utl_i18n.string_to_raw (p_in_val, 'AL32UTF8');        l_key := utl_i18n.string_to_raw (p_key, 'AL32UTF8');        l_mac_val := DBMS_CRYPTO.mac (src => l_in, typ => l_mac_algo, key=>l_key);        l_ret := RAWTOHEX (l_mac_val);        RETURN l_ret;     END;

Let's test this function to get the MAC value of the data "Test Data" and the key "Key".

     SQL> SELECT get_mac_val ('Test Data','Key')       2  FROM DUAL       3  /     GET_MAC_VAL('TESTDATA','KEY')     -----------------------------------------     8C36C24C767E305CD95415C852E9692F53927761

Because a key is required to generate the checksum value, the MAC method provides more security than the hashing method. For example, in a banking application, the integrity of character data such as a Social Security number (SSN) in a bank account is important. Assume that the ACCOUNTS table looks like this.

     ACCOUNT_NO      NUMBER(10)     SSN             CHAR(9)     SSN_MAC         VARCHAR2(200)

When an account is created, the MAC value is calculated on the SSN field using a predetermined key, such as "A Jolly Good Rancher". The column SSN_MAC is updated by the statement:

     UPDATE accounts        SET ssn_mac = get_mac_val (ssn, 'A Jolly Good Rancher')      WHERE account_no = account_no;

Now assume that sometime afterward, an intruder updates the SSN field. If the SSN_MAC field contained the hash value of the column SSN, the intruder could calculate the hash value herself and update the column with the new value, as well. Later, when the hash value is calculated on the SSN column and compared to the stored SSN_MAC value, they would match, hiding the fact that the data was compromised! However, if the column contained the MAC value of the column, rather than the hash value, the calculation of the new value would require the key "A Jolly Good Rancher". Because the intruder does not know that, the updated value will not generate the same MAC value, thus revealing that the data was compromised.