4.6. Cryptographic HashingEncryption provides a way to ensure that only authorized people can see your data. It does so by disguising sensitive data. In some cases, however, you may not be interested in disguising data, but simply in protecting it from manipulation. Suppose that you have stored payment information for vendors. That data by itself may not be sensitive enough to require encryption, but you may want a way to ensure that someone does not alter the numbers to increase a payment amount. How can you do that? The answer lies in a process known as cryptographic hashing . Let's start by looking at a real-world example. 4.6.1. The Case of the Suspicious SandwichLet's suppose that you leave your sandwich open on your desk when you go over to the fax machine to pick up an important fax. When you come back, you feel somehow that the sandwich has shifted a little to the left. Has someone tampered with your sandwich, perhaps placing some barbiturate in it to knock you out so that he can steal your cool new wireless mouse? Or perhaps he's after the PL/SQL book hidden in your drawer? Maybe it's not drugs, but sand, in your sandwich? The possibilities spin madly through your mind, and you lose your appetite. To calm your fears, you decide to challenge the integrity of the sandwich. Being the careful fellow you are, you had previously weighed the sandwich and recorded its weight out to the 10th digit after the decimal point. Confronted with the possibility of an altered sandwich, you weigh it again and compare the results. They are exactly the same, down to that 10th digit of precision.What a relief! If someone had actually changed the sandwich in any wayfor example, by adding barbiturates or sandthe weights would have been different, thus revealing the adulteration. Pay close attention to the concepts presented here. You did not "hide" the sandwich (i.e., encrypt it); you simply created your own method of calculating a value that represents the sandwich. You could then compare the before and after values. The value you arrived at could have been based on any algorithmin this case, it was the weight of the sandwich. If you were examining data, rather than sliced bread and meat, an algorithm could be used to generate the value. This process is known as hashing. It's different from encryption in that it is one-way onlyyou can decrypt encrypted data, but you can't de-hash hashed values. Hashing the same piece of data more than once produces the same value regardless of the number of times it is done. If data has been modified in any way, the generated hash value will not be the same, thus revealing the contamination. There is always a theoretical risk that two different pieces of data will hash to the same value, but you can minimize the probability of this happening by using a sufficiently sophisticated hashing algorithm. One such algorithm is known as a Message Digest (MD). One MD variant known as MD5 was once a standard, but it did not prove secure enough to maintain its standardization; a newer standard called Secure Hash Algorithm Version 1 (SHA-1) is more often used today and is available in Oracle Database 10g. 4.6.2. MD5 Hashing in Oracle9i DatabaseLet's see how we can use hashing in real-life database administration. When a sensitive piece of information is sent to a different location, you might calculate the hash value beforehand and send it in another shipment or transmission. The recipient can then calculate the hash value of the received data and compare it against the hash value that you sent out. In Oracle9i Database, the DBMS_OBFUSCATION_TOOLKIT package provides a hashing function for MD5, an implementation of the Message Digest protocol. To hash a string, I might specify: DECLARE l_hash VARCHAR2 (2000); l_in_val VARCHAR2 (2000); BEGIN l_in_val := 'Account Balance is 12345.67'; l_hash := DBMS_OBFUSCATION_TOOLKIT.md5 (input_string => l_in_val); l_hash := RAWTOHEX (UTL_RAW.cast_to_raw (l_hash)); DBMS_OUTPUT.put_line ('Hashed Value = ' || l_hash); END; / Here I provided a simple string "Account Balance is 12345.67", and got its hash value. The MD5 function returns a VARCHAR2 value, but as in the encryption process observed earlier, it contains control characters. Hence, I must convert it to RAW and then to hexadecimal for easy storage. The above code segment returns: Hashed Value = A09308E539C35C97CD612E918BA58B4C In this example, you will note two important differences between hashing and encryption:
Because I may want to obtain this value and transmit it to the recipient, I can create a stored function that does that for us. Let's use the same code example and create the function, as follows. /* File on web: get_hash_val_9i.sql */ CREATE OR REPLACE FUNCTION get_hash_val (p_in VARCHAR2) RETURN VARCHAR2 IS l_hash VARCHAR2 (2000); BEGIN l_hash := RAWTOHEX (UTL_RAW.cast_to_raw (DBMS_OBFUSCATION_TOOLKIT.md5 (input_string => p_in) ) ); RETURN l_hash; END; Let's generate some representative outputs from this function. BEGIN DBMS_OUTPUT.put_line ( 'Hashed = ' || get_hash_val ('Account Balance is 12345.67') ); DBMS_OUTPUT.put_line ( 'Hashed = ' || get_hash_val ('Account Balance is 12345.67') ); END; The output is: Hashed = A09308E539C35C97CD612E918BA58B4C Hashed = A09308E539C35C97CD612E918BA58B4C As you can see, the function returns an identical value each time for the same input string, a fact that can be used in validating the integrity of that particular piece of data. Note that here I am referring to the integrity of the data, not the database; the latter is assured by the Oracle database in enforcing constraints and in transactions. When a legitimate user updates a value that does not violate defined constraints, the data (but not the database) becomes corrupt. For example, if someone updates an account balance via ad hoc SQL from $12,345.67 to $21,345.67, that fact may not be detected at all unless the organization provides tracking capabilities. If the hash value for a column like Social Security number was calculated beforehand and stored somewhere, and then after retrieval the recalculated hash value is compared against the stored one, it will signal a possibly malicious data manipulation because the hash values will not be the same. Let's see how this works. DECLARE l_data VARCHAR2 (200); BEGIN l_data := 'Social Security Number = 123-45-6789'; DBMS_OUTPUT.put_line ('Hashed = ' || get_hash_val (l_data)); -- -- someone manipulated the data and changed it -- l_data := 'Social Security Number = 023-45-6789'; DBMS_OUTPUT.put_line ('Hashed = ' || get_hash_val (l_data)); END; The output is: Hashed = 098D833A81B279E54992BFB1ECA6E428 Hashed = 6682A974924B5611FA9D809357ADE508 Note how the hash values differ. The resulting hash value will be different if the data was modified in any way, even if the value itself is unchanged. If a space, punctuation, or anything else is modified, the hash value will not be the same.
4.6.3. SHA-1 Hashing in Oracle Database 10gAs mentioned earlier, the MD5 protocol is not considered sufficiently secure for modern data protection, and SHA-1 is often used instead. SHA-1 is not available in the DBMS_OBFUSCATION_TOOLKIT, but in Oracle Database 10g, you can use the HASH function available in the DBMS_CRYPTO package to perform SHA-1 hashing . Here is the declaration of the function: DBMS_CRYPTO.hash ( src in raw, typ in pls_integer) return raw; Because HASH accepts only the RAW datatype as input, I have to convert the input character string to RAW using the technique described earlier for encryption. l_in := utl_i18n.string_to_raw (p_in_val, 'AL32UTF8'); This converted string can now be passed to the hash function. In the second parameter typ (which must be declared in the PLS_INTEGER datatype ) you specify the algorithm to use for hashing. You can select any of the algorithms in Table 4-7.
For example, to get the hash value for a RAW datatype variable, you might write a function as follows: /* File on web: get_sha1_hash_val.sql */ CREATE OR REPLACE FUNCTION get_sha1_hash_val (p_in RAW) RETURN RAW IS l_hash RAW (4000); BEGIN l_hash := DBMS_CRYPTO.HASH (src => p_in, typ => DBMS_CRYPTO.hash_sh1); RETURN l_hash; END; / For MD5 hashing, you would change the value of the parameter typ from DBMS_CRYPTO.HASH_SH1 to DBMS_CRYPTO.HASH_MD5. In my function to get the hash value, I can make it generic enough to accept any algorithm. Finally, because the return value is in RAW, I need to convert it to VARCHAR2, as follows. l_enc_val := rawtohex (l_enc_val, 'AL32UTF8'); Putting everything together, this is what the function looks like. /* File on web: get_hash_val_10g.sql */ CREATE OR REPLACE FUNCTION get_hash_val ( p_in_val IN VARCHAR2, p_algorithm IN VARCHAR2 := 'SH1' ) RETURN VARCHAR2 IS l_hash_val RAW (4000); l_hash_algo PLS_INTEGER; l_in RAW (4000); l_ret VARCHAR2 (4000); BEGIN l_hash_algo := CASE p_algorithm WHEN 'SH1' THEN DBMS_CRYPTO.hash_sh1 WHEN 'MD4' THEN DBMS_CRYPTO.hash_md4 WHEN 'MD5' THEN DBMS_CRYPTO.hash_md5 END; l_in := utl_i18n.string_to_raw (p_in_val, 'AL32UTF8'); l_hash_val := DBMS_CRYPTO.HASH (src => l_in, typ => l_hash_algo); l_ret := rawtohex(l_hash_val); RETURN l_ret; END; Let's see how we can use it in an example: SQL> SELECT get_hash_val ('Test') 2 FROM DUAL 3 / GET_HASH_VAL('TEST') ---------------------------------------- 640AB2BAE07BEDC4C163F679A746F7AB7FB5D1FA I am using this function to hash a VARCHAR2 value, returning a VARCHAR2 field that can be stored and transmitted. By default this function uses the SHA-1 algorithm, but it can accept the other two algorithms as well. 4.6.4. Other Uses of HashingHashing has many uses beyond cryptographyfor example, in web programming and virus detection . Web applications are stateless: they do not keep the connection open to the database server for the duration of the transaction. In other words, there is no concept of a "session," and therefore there is no locking of the type Oracle users rely upon. This means that there is no easy way to find out whether data on a web page has changed. But if a hash value is stored along with the data, a new hash value can be recalculated and compared with the stored value. If the two values do not agree, the data has changed. Hashing is also helpful in determining whether data can be trusted. Consider the case of a virus that updates critical documents stored inside the database. This is not something that can be easily caught by a trigger. However, if the document contains a hash value, then by comparing a computed hash value with the stored value, you can determine whether the document has been tampered with and you will know whether you can trust that document. 4.6.5. Message Authentication Code in Oracle Database 10gThe type of hashing discussed so far in this chapter is a very helpful technique, but it has certain limitations:
These limitations can be overcome by a modified implementation of hashing, one in which the exclusivity of the hashing mechanism at the receiver's end can be ascertained by a password or key. This special type of hash value is known as a Message Authentication Code (MAC). The sender calculates the MAC value of the data using a predetermined key that is also known to the receiver but not sent with the data. The sender then sends the MAC to the receiver along with the data, not separate from it. After receiving the data, the receiver also calculates the MAC value using the same key and matches it with the MAC value sent with the data. This mechanism is shown schematically in Figure 4-9. Figure 4-9. Message Authentication Code usageLike hashing, MAC also follows the standard algorithms, MD5 and SHA-1. As with the HASH function, the parameter typ is used to specify the algorithm to be used. Select DBMS_CRYPTO.HMAC_MD5 or DBMS_CRYPTO.HMAC_SH1, respectively. Here is how I can calculate the MAC value of an input string using the SHA-1 algorithm. /* File on web: get_sha1_mac_val.sql */ CREATE OR REPLACE FUNCTION get_sha1_mac_val (p_in RAW, p_key RAW) RETURN RAW IS l_mac RAW (4000); BEGIN l_mac := DBMS_CRYPTO.mac (src => p_in, typ => DBMS_CRYPTO.hmac_sh1, key => p_key); RETURN l_mac; END; / Using my hashing function as a model, I can also create my own generic MAC calculations. /* File on web: get_mac_val.sql */ CREATE OR REPLACE FUNCTION get_mac_val ( p_in_val IN VARCHAR2, p_key IN VARCHAR2, p_algorithm IN VARCHAR2 := 'SH1' ) RETURN VARCHAR2 IS l_mac_val RAW (4000); l_key RAW (4000); l_mac_algo PLS_INTEGER; l_in RAW (4000); l_ret VARCHAR2 (4000); BEGIN l_mac_algo := CASE p_algorithm WHEN 'SH1' THEN DBMS_CRYPTO.hmac_sh1 WHEN 'MD5' THEN DBMS_CRYPTO.hash_md5 END; l_in := utl_i18n.string_to_raw (p_in_val, 'AL32UTF8'); l_key := utl_i18n.string_to_raw (p_key, 'AL32UTF8'); l_mac_val := DBMS_CRYPTO.mac (src => l_in, typ => l_mac_algo, key=>l_key); l_ret := RAWTOHEX (l_mac_val); RETURN l_ret; END; Let's test this function to get the MAC value of the data "Test Data" and the key "Key". SQL> SELECT get_mac_val ('Test Data','Key') 2 FROM DUAL 3 / GET_MAC_VAL('TESTDATA','KEY') ----------------------------------------- 8C36C24C767E305CD95415C852E9692F53927761 Because a key is required to generate the checksum value, the MAC method provides more security than the hashing method. For example, in a banking application, the integrity of character data such as a Social Security number (SSN) in a bank account is important. Assume that the ACCOUNTS table looks like this. ACCOUNT_NO NUMBER(10) SSN CHAR(9) SSN_MAC VARCHAR2(200) When an account is created, the MAC value is calculated on the SSN field using a predetermined key, such as "A Jolly Good Rancher". The column SSN_MAC is updated by the statement: UPDATE accounts SET ssn_mac = get_mac_val (ssn, 'A Jolly Good Rancher') WHERE account_no = account_no; Now assume that sometime afterward, an intruder updates the SSN field. If the SSN_MAC field contained the hash value of the column SSN, the intruder could calculate the hash value herself and update the column with the new value, as well. Later, when the hash value is calculated on the SSN column and compared to the stored SSN_MAC value, they would match, hiding the fact that the data was compromised! However, if the column contained the MAC value of the column, rather than the hash value, the calculation of the new value would require the key "A Jolly Good Rancher". Because the intruder does not know that, the updated value will not generate the same MAC value, thus revealing that the data was compromised. |