Section 5.5. Techniques to Make Scripts Safe

5.5. Techniques to Make Scripts "Safe"

There is only one solution to keeping your scripts running safe: Do not trust users. Although this may sound harsh, it's perfectly true. Not only might users "hack" your site, but they also do weird things by accident. It's the programmer's responsibility to make sure that these inevitable errors can't do serious damage. Thus, you need to deploy some techniques to save the user from insanity.

5.5.1. Input Validation

One essential technique to protect your web site from users is input validation, which is an impressive term that doesn't mean much at all. The term simply means that you need to check all input that comes from the user, whether the data comes from cookies, GET, or POST data.

First, turn off register_globals in php.ini and set the error_level to the highest possible value (E_ALL | E_STRICT). The register_globals setting stops the registration of request data (Cookie, Session, GET, and POST variables) as global variables in your script; the high error_level setting will enable notices for uninitialized variables.

For different kinds of input, you can use different methods. For instance, if you expect a parameter passed with the HTTP GET method to be an integer, force it to be an integer in your script:

 <?php $product_id = (int) $_GET['prod_id']; ?>

Everything other than an integer value is converted to 0. But, what if $_GET['prod_id'] doesn't exist? You will receive a notice because we turned the error_level setting up. A better way to validate the input would be

 <?php if (!isset($_GET['prod_id'])) {     die ("Error, product ID was not set"); } $product_id = (int) $_GET['prod_id']; ?>

However, if you have a large number of input variables, it can be tedious to write this code for each and every variable separately. Instead, you might want to create and use a function for this, as shown in the following example:

 <?php function sanitize_vars(&$vars, $signatures, $redir_url = null) {     $tmp = array();     /* Walk through the signatures and add them to the temporary      * array $tmp */     foreach ($signatures as $name => $sig) {         if (!isset($vars[$name]]) &&             isset($sig['required']) && $sig['required'])         {             /* redirect if the variable doesn't exist in the array */             if ($redir_url) {                 header("Location: $redir_url");             } else {                 echo 'Parameter $name not present and no redirect URL';             }             exit();         }         /* apply type to variable */         $tmp[$name] = $vars[$name];         if (isset($sig['type'])) {             settype($tmp[$name], $sig['type']);         }         /* apply functions to the variables, you can use the standard PHP          * functions, but also use your own for added flexibility. */         if (isset($sig['function'])) {             $tmp[$name] = {$sig['function']}($tmp[$name]);         }     }     $vars = $tmp; } $sigs = array(     'prod_id' => array('required' => true, 'type' => 'int'),     'desc' =>    array('required' => true, 'type' => 'string',         'function' => 'addslashes') ); sanitize_vars(&$_GET, $sigs,     "http:// {$_SERVER['SERVER_NAME']}/error.php?cause=vars"); ?>

5.5.2. HMAC Verification

If you need to prevent bad guys from tampering with variables passed in the URL (such as for a redirect as shown previously, or for links that pass special parameters to the linked script), you can use a hash, as shown in the following script:

 <?php function create_parameters($array) {     $data = '';     $ret = array();     /* For each variable in the array we a string containing      * "$key=$value" to an array and concatenate      * $key and $value to the $data string. */     foreach ($array as $key => $value) {         $data .= $key . $value;         $ret[] = "$key=$value";     }     /* We also add the md5sum of the $data as element      * to the $ret array. */     $hash = md5($data);     $ret[] = "hash=$hash";     return join ('&amp;', $ret); } echo '<a href="script.php?'. create_parameters(array('cause' => 'vars')).'">err!</a>'; ?>

Running this script echoes the following link:

 <a href='script.php?cause=vars&hash=8eee14fe10d3f612589cdef079c025f6'>err!</a>

However, this URL is still vulnerable. An attacker can modify both the variables and the hash. We must do something better. We're not the first ones with this problem, so there is an existing solution: HMAC (Keyed-Hashing for Message Authentication). The HMAC method is proven to be stronger cryptographically, and should be used instead of home-cooked validation algorithms. The HMAC algorithm uses a secret key in a two-step hashing of plain text (in our case, the string containing the key/value pairs) with the following steps:

1.	If the key length is smaller than 64 bytes (the block size that most hashing algorithms use), we pad the key to 64 bytes with `\0`s; if the key length is larger than 64, we first use the hash function on the key and then pad it to 64 bytes with `\0`s.
2.	We construct `opad` (the 64-byte key XORed with 0x5C) and `ipad` (the 64-byte key `XOR`ed with 0x36).
3.	We create the "inner" hash by running the hash function with the parameter `ipad . plain text`. (Because we use an "iterative" hash function, like `md5()` or `sha1()`, we don't need to seed the hash function with our key and then run the seeded hash function over our plain text. Internally, the hash will do the same anyway, which is the reason we padded the key up to 64 bytes).
4.	We create the "outer" hash by running the hash function over `opad . inner_result` that is, using the result obtained in step 3.

Here is the formula to calculate HMAC, which should help you understand the calculation:

 H(K XOR opad, H(K XOR ipad, text))

With

H. The hash function to use
K. The key padded to 64 bytes with zeroes (0x0)
opad. The 64 bytes of 0x5Cs
ipad. The 64 bytes of 0x36s
text. The plain text for which we are calculating the hash

Greatso much for the boring theory. Now let's see how we can use it with a PEAR class that was developed to calculate the hashes.

5.5.3. PEAR::Crypt_HMAC

The Crypt_HMAC class implements the algorithm as described in RFC 2104 and can be installed with pear install crypt_hmac. Let's look at it:

 class Crypt_HMAC {     /**     * Constructor     * Pass method as first parameter     *     * @param  string method - Hash function used for the calculation     * @return void     * @access public     */     function Crypt_HMAC($key, $method = 'md5')     {         if (!in_array($method, array('sha1', 'md5'))) {             die("Unsupported hash function '$method'.");         }         $this->_func = $method;         /* Pad the key as the RFC wishes (step 1) */         if (strlen($key) > 64) {             $key = pack('H32', $method($key));         }         if (strlen($key) < 64) {             $key = str_pad($key, 64, chr(0));         }         /* Calculate the padded keys and save them (step 2 & 3) */         $this->_ipad = substr($key, 0, 64) ^ str_repeat(chr(0x36), 64);         $this->_opad = substr($key, 0, 64) ^ str_repeat(chr(0x5C), 64);     }

First, we make sure that the requested underlying hash function is actually supported (for now, only the built-in PHP functions md5() and sha1() are supported). Then, we create a key, according to steps 1 and 2, as previously described. Finally, in the constructor, we pre-pad and XOR the key so that the hash() method can be used several times without losing performance by padding the key every time a hash is requested:

     /**     * Hashing function     *     * @param  string data - string that will hashed (step 4)     * @return string     * @access public     */     function hash($data)     {         $func = $this->_func;         $inner  = pack('H32', $func($this->_ipad . $data));         $digest = $func($this->_opad . $inner);         return $digest;     } } ?>

In the hash function, we use the pre-padded key. First, we hash the inner result. Then, we hash the outer result, which is the digest (a different name for hash) that we return.

Back to our original problem. We want to verify that no one tampered with our precious $_GET variables. Here is the second, more secure, version of our create_parameters() function:

 <?php require_once('Crypt/HMAC.php'); /* The RFC recommends a key size larger than the output hash  * for the hash function you use (16 for md5() and 20 for sha1()). */ define ('SECRET_KEY', 'Professional PHP 5 Programming Example'); function create_parameters($array) {     $data = '';     $ret = array();     /* Construct the string with our key/value pairs */     foreach ($array as $key => $value) {         $data .= $key . $value;         $ret[] = "$key=$value";     }     $h = new Crypt_HMAC(SECRET_KEY, 'md5');     $hash = $h->hash($data);     $ret[] = "hash=$hash";     return join ('&amp;', $ret); } echo '<a href="script.php?'.     create_parameters(array('cause' => 'vars')).'">err!</a>'; ?>

The output is

 <a href="script.php?cause=vars&hash=6a0af635f1bbfb100297202ccd6dce53">err!</a>

To verify the parameters passed to the script, we can use this script:

 <?php require_once('Crypt/HMAC.php'); define ('SECRET_KEY', 'Professional PHP 5 Programming Example'); function verify_parameters($array) {     $data = '';     $ret = array();     /* Store the hash in a separate variable and unset the hash from      * the array itself (as it was not used in constructing the hash */     $hash = $array['hash'];     unset ($array['hash']);     /* Construct the string with our key/value pairs */     foreach ($array as $key => $value) {         $data .= $key . $value;         $ret[] = "$key=$value";     }     $h = new Crypt_HMAC(SECRET_KEY, 'md5');     if ($hash != $h->hash($data)) {         return FALSE;     } else {         return TRUE;     } } /* We use a static array here, but in real life you would be using  * $array = $_GET or similar. */ $array = array(     'cause' => 'vars',     'hash' => '6a0af635f1bbfb100297202ccd6dce53' ); if (!verify_parameters($array)) {     die("Dweep! Somebody tampered with our parameters.\n"); } else {     echo "Good guys, they didn't touch our stuff!!"; } ?>

The SHA1 hash method gives you more cryptographic strength, but both MD5 and SHA1 are adequate enough for the purpose of checking the validity of your parameters.

5.5.4. Input Filter

By using PHP 5, you can add hooks to process incoming data, but it's mainly targeted at advanced developers with a sound knowledge of C and some knowledge of PHP internals. These hooks are called by the SAPI layer that treats the registering of the incoming data into PHP. One appliance might be to strip_tags() all incoming data automatically. Although all this can be done in user land with a function such as sanitize_vars(), this solution can only be enforced by writing a script that performs the desired processing and setting auto_prepend_file in php.ini to designate this script. Setting auto_prepend causes the processing script to be run at the beginning of every script. On the other hand, the server administrator can enforce a solution. For information on this, see http://www.derickrethans.nl/sqlite_filter.php for an implementation of a filter that uses SQLite as an information source for filter rules.

5.5.5. Working with Passwords

Another appliance of hash functions is authenticating a password entered in a form on your web site with a password stored in your database. For obvious reasons, you don't want to store unencrypted passwords in your database. You want to prevent evil hackers who have access to your database (because the sysadmin blundered) from stealing passwords used by your clients. Because hash functions are not at all reversible, you can store the password hashed with a function like md5() or sha1() so the evil hackers can't get the password in plain text.

The example Auth class implements two methodsaddUser() and authUser()and makes use of the sha1() hashing function. The table scheme looks like this:

 CREATE TABLE users (   email   VARCHAR(128) NOT NULL PRIMARY KEY,   passwd CHAR(40) NOT NULL );

We use a length of 40 here, which is the same as the sha1() digest in hexadecimal characters:

 <?php class Auth {     function Auth()     {         mysql_connect('localhost', 'user', 'password');         mysql_select_db('my_own_bookshop');     }     public function addUser($email, $password)     {         $q = '             INSERT INTO users(email, passwd)                VALUES ("'. $email. '", "'. sha1($password).'")         ';         mysql_query($q);     }     public function authUser($email, $password)     {         $q = '             SELECT * FROM users             WHERE email="'. $email. '"                 AND passwd ="'. sha1($password). '"         ';         $r = mysql_query($q);         if (mysql_num_rows($r) == 1) {             return TRUE;         } else {             return FALSE;         }     } } ?>

We didn't use addslashes() around the $email and $password variables earlier. We will do that in the script that calls the methods of this class:

 <?php /* Include our authentication class  and sanitizing function*/ require_once 'Auth.php'; require_once 'sanitize.php'; /* Define our parameters */ $sigs = array (     'email'  => array ('required' => TRUE, 'type' => 'string',         'function' => 'addslashes'),     'passwd' => array ('required' => TRUE, 'type' => 'string',         'function' => 'addslashes') ); /* Clean up our input */ sanitize_vars(&$_POST, $sigs); /* Instantiate the Auth class and add the user */ $a = new Auth(); $a->addUser($_POST['email'], $_POST['passwd']); /* or... we instantiate the Auth class and validate the user */ $a = new Auth(); echo $a->authUser($_POST['email'], $_POST['passwd']) ? 'OK' : 'ERROR'; ?>

After the user is added to the database, something like this appears in your table:

 +--------+------------------------------------------+ | user   | password                                 | +--------+------------------------------------------+ | derick | 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 | +--------+------------------------------------------+

The first person who receives the correct password back from this sha1() hash can ask me for a crate of Kossu.

5.5.6. Error Handling

During development, you probably want to code with error_reporting set to E_ALL & E_STRICT. Doing so helps you catch some bugs. If you have error_reporting set to E_ALL & E_STRICT, the executed script will show you errors like this:

 Warning: Call-time pass-by-reference has been deprecated - argument passed by value;  If you would like to pass it by reference, modify the declaration of sanitize_vars().  If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file.  However, future versions may not support this any longer.

The reason for this is that we prefixed $_POST in the call to sanitize with the reference operator, which is no longer supported. The correct line is:

 sanitize_vars($_POST, $sigs);

However, you definitely do not want to see error messages like these on your production sites, especially not your cusomers. Not only is it unsightly, but some debuggers show the full parameters, including username and password, which is information that should be kept private. PHP has features that make the experience much nicer for you, your customers, and visitors to the site. With the php.ini directives 'log_errors' and 'display_errors', you can control where the errors appear. If you set the log_errors directive to 1, all errors are recorded in a file that you specify with the error_log directive. You can set error_log to syslog or to a file name.

In some cases, recording errors in a file (rather than displaying them to the user) may not make the experience nicer for the visitors. Instead, it may result in an empty or broken page. In such cases, you may want to tell visitors that something went wrong, or you may want to hide the problem from visitors. PHP supports a customized error handler that can be set with set_error_handler(). This function accepts one parameter that can be either a string containing the function name for the error-handling function or an array containing a classname/methodname combination. The error-handling function should be defined like

 error_function($type, $error, $file, $line)

The $type is the type of error that is caught and can be either E_NOTICE, E_WARNING, E_USER_NOTICE, E_USER_WARNING, or E_USER_ERROR. No additional errors should be possible because the PHP code and the extensions are not supposed to emit other errors except parse errors or other low-level error messages. $error is the textual error message. $file and $line are the file name and line number on which the error occurred.

By using the error handler, you can tell the user in a nice way that something went wrong (for instance, in the layout of your site) or you can redirect the user to the main page (to hide the fact that something went wrong). The redirect, of course, will only work if no output was sent before the redirect, or if you have output_buffering turned on. Note that a user-defined error handler captures all errors, even if the error_reporting level tells PHP that not all errors should be shown.