Moving on to the next aspect of our approach to securityinspecting components individually and looking at how to improve their securitywe will begin by investigating the things we can do to keep our code safe. While we cannot show you everything you can use to cover all possible security threats (entire tomes have been devoted to these subjects), we can at least give some general guidelines and point you in the right direction. We will point out security concerns for specific technology areas that we will use in later chapters. A Golden RuleWe will start with a golden rule that applies to security in any context: The most important thing we can do as application authors is filter all input that comes from external sources. This does not mean that we should design a system with the assumption that all of our users are crookswe still want to welcome them and encourage them to use our web application. However, we want to be sure that we are prepared for any misuse of our system. If we filter effectively, we can substantially reduce the number of external threats and massively improve the robustness of our system. Even if we are pretty sure we trust the user, such as the CEO of our company or our bosses, we cannot be certain they do not have some type of spyware program that is modifying or sending new requests to our server. Filtering InputGiven the importance of filtering the input we get from external customers, we should look at the ways in which we might do this. Double Checking Expected ValuesThere are times when we will present the user with a range of possible values to choose for things such as shipping (ground, express, overnight), states (one of 50 states in the U.S.), and so on. Now, imagine if we had the following form: <html> <head> <title> What be ye laddie? </title> </head> <body> <form action='submit_form.php' method='POST'> <input type='radio' name='gender' value='Male'/>Male<br/> <input type='radio' name='gender' value='Female'>Female<br/> <input type='radio' name='gender' value='Other'/>Unknown<br/> <input type='submit' value='submit'/> </form> </body> </html> This form could look as shown in Figure 16-1. From this form, we might assume that whenever we query the value of $_POST['gender'] in submit_form.php, we would get the value "Male," "Female," or "Other"and we would be wrong. Figure 16-1. A trivial gender entry form.As we mentioned in Chapter 13, "Web Applications and the Internet," the web operates over HTTP. The form submission from our example would be sent to our server as a text message with a structure similar to the following: POST /submit_form.php HTTP/1.1 Host: www.myhostnamehooray.com User-Agent: WoobaBrowser/3.4 (Windows) Content-Type: application/x-www-form-urlencoded Content-Length: 11 gender=Male However, there is nothing stopping someone from connecting to our web server and sending any values he wants for a form. Thus, somebody could send us the following: POST /submit_form.php HTTP/1.1 Host: www.myhostnamehooray.com User-Agent: WoobaBrowser/3.4 (Windows) Content-Type: application/x-www-form-urlencoded Content-Length: 22 gender=I+like+cookies. If we were to write the following code <?php echo <<<EOM <p align='center'> The user's gender is: {$_POST['gender']}. </p> ?> we might find ourselves embarrassed later on. A better strategy is to verify that the incoming value is one of the expected/permitted values, as follows: <?php switch ($_POST['gender']) { case 'Male': case 'Female': case 'Other': echo <<<EOM <p align='center'> Congratulations! You are: '{$_POST['gender']}'. </p> EOM; break; default: echo <<<EOM <p align='center'> <font color='red'>WARNING:</font> Invalid input value for gender specified. </p> EOM; break; } ?> There is a little bit more code involved, but we can be sure we are getting correct values; this becomes more important when we start handling values more financially sensitive than a user's gender. As a rule, we can never assume a value from a form will be within a set of expected valueswe must check first. Filtering Even Basic ValuesHTML form elements have no types associated with them, and most pass strings (which may represent things such as dates, times, or numbers) to the server. Thus, if you have a numeric field, you cannot assume that it was entered as such. Even in environments where powerful client side code can try to make sure that the value entered is of a particular type, there is no guarantee that the values will not be sent to the server directly, as in the "Double Checking Expected Values" section. An easy way to make sure that a value is of the expected type is to cast or convert it to that type and use it, as follows: $number_of_nights = (int)$_POST['num_nights']; if ($number_of_nights == 0) { echo "ERROR: Invalid number of nights for the room!"; exit; } If we have the user input a date in a localized format, such as "mm/dd/yy"' for users in the United States, we can then write some code to verify it using the PHP function called checkdate. This function takes a month, day, and year value (4-digit years), and indicates whether or not they form a valid date: // split is mbcs-safe via mbstring (see chapter 5) $mmddyy = split($_POST['departure_date'], '/'); if (count($mmddyy) != 3) { echo "ERROR: Invalid Date specified!"; exit; } // handle years like 02 or 95 if ((int)$mmddyy[2] < 100) { if ((int)$mmddyy[2] > 50) $mmddyy[2] = (int)$mmddyy[2] + 1900; else if ((int)$mmddyy[2] >= 0) $mmddyy[2] = (int)$mmddyy[2] + 2000; // else it's < 0 and checkdate will catch it } if (!checkdate($mmddyy[0], $mmddyy[1], $mmddyy[2])) { echo "ERROR: Invalid Date specified!"; exit; } By taking the time to filter and validate the input, we can not only help ourselves out for natural error-checking that we should be doing in the first place (such as verifying whether a departure date for a plane ticket is a valid date), but we can also improve the security of our system. HTML EscapingThere are applications where you might take the input that a user has specified and display the input on a page. Pages where users can comment on a published article or message board system are perfect examples of where this might occur. In these situations, we need to be careful that users do not inject malicious HMTL markup into the text they input. One of the easiest ways to do this is to use the htmlspecialchars or the htmlentities function. These functions take certain characters they see in the input string and convert them to HTML entities. An HTML entity is a special character sequence, begun with the ampersand character (&), that is used to indicate a special character that cannot be represented easily in HTML code. Also, the entity name and a terminating semicolon (;) are supplied after the ampersand character. Optionally, an entity can be an ASCII key code specified by # and a decimal number, such as /, for the forward slash character (/). Since all markup elements in HTML are demarcated by <> characters, it could prove difficult to enter them in a string for output to the final content (since the browser will default to assuming they delineate markup elements). To get around this, we use < and >. Similarly, if we want to include the ampersand character in our HTML, we can use the entity &. Single and double quotes are represented by ' and ". Entities are converted into output by the HTML client and are thus not considered part of the markup. The difference between htmlspecialchars and htmlentities is as follows: The former defaults to only replacing &, <, and >, with optional switches for single and double quotes. The latter replaces anything that can be represented by a named entity with these things. Examples of such entities are the copyright symbol ©, represented by © and the Euro currency symbol , represented by &euro. However, it will not convert characters to numeric entities. Both functions take a value to control the conversion single and double quotes to entities as their second parameter, and both functions also take the character set in which the input string is encoded as their third parameter (which is vital for us, since we want this function to be safe on our UTF-8 strings). Possible values for the second parameter are
Consider the following text: $input_str = <<<EOSTR <p align='center'> The user gave us "15000¤". </p> <script> // malicious JavaScript code goes here. </script> EOSTR; If we ran it through the following PHP script (we will run the nl2brsee the "nl2br" section in Chapter 1function on the output string to ensure that it is formatted nicely in the browser) <?php $str = htmlspecialchars($input_str, ENT_NOQUOTES, "UTF-8"); echo nl2br($str); $str = htmlentities($input_str, ENT_QUOTES, "UTF-8"); echo nl2br($str); ?> we would see the following text output: <br /> <p align='center'><br /> The user gave us "15000¤".<br /> </p><br /> <br /> <script><br /> // malicious JavaScript code goes here.<br /> </script><br /> <br /> <p align='center'><br /> The user gave us "15000€".<br /> </p><br /> <br /> <script><br /> // malicious JavaScript code goes here.<br /> </script><br /> It would look in the browser as follows: <p align='center'> The user gave us "15000¤". </p> <script> // malicious JavaScript code goes here. </script> <p align='center'> The user gave us "15000¤". </p> <script> // malicious JavaScript code goes here. </script> Note that the htmlentities function replaced the symbol for the Euro () with an entity (€), while htmlspecialchars left it alone. For situations where we would like to permit users to enter some HTML, such as message board users who would like to use characters to control font, color, and style (bold or italics), we will have to pick our way through the strings to find those and not strip them out. We will do this through the use of regular expressions in Chapter 22, "Data Validation with Regular Expressions." Making Strings Safe for SQLAnother reason we want to process our strings to make them safe is to prevent SQL injection attacks, which we mentioned briefly in Chapter 12, "PHP and Data Access." In these attacks, the malicious user tries to take advantage of poorly protected code and user permissions to execute extra SQL code that we do not wish them to. If we are not careful, a username of kitty_cat; DELETE FROM users; could become a problem for us. There are two ways to prevent this sort of security breach:
The mysqli extension that ships with PHP5 has the added security advantage of allowing only a single query to execute with the mysqli_query or mysqli::query methods. To execute multiple queries, you have to use the mysqli_multi_query or mysqli::multi_query method, which helps us prevent the execution of more potentially harmful statements or queries. Code OrganizationSome would argue that any file not directly accessible to the user from the Internet should not find a place in the document root of the web site. For example, if the document root for our message board web site is /home/httpd/messageboard/www, we should place all of our .inc files and other files in a place such as /home/httpd/messageboard/code. When we want to include those files, we can simply write in our code: require_once('../code/user_object.inc'); The reasons for this degree of caution come down to what happens when a malicious user makes a request for a file that is not a .php or .html file. Many web servers default to dumping the contents of that file to the output stream. Thus, if we were to keep user_object.inc in the public document root and the user requested it, he might see a full dump of our code in his web browser. This would let him see the implementation, get at any intellectual property we might have in this file, and potentially find exploits that we might have missed. To fix this, we should be sure that the web server is configured to only allow the request of .php and .html files (see the "Securing your Web Server and PHP" section in Chapter 17), and that requests for other types of files should return an error from the server. Similarly, files such as password files, text files, configuration files, or special directories are best kept away from the public document root. Even if we think we have our web server configured properly, we might have missed something. Or if our web application is moved to a new server that is not properly configured in the future, we might be exposed to exploitation. What Goes in Your CodeMany of the code snippets we have shown for accessing databases have included the database name, username, and user password in plain text, as follows: $conn = @new mysqli("localhost", "bob", "secret", "somedb"); While this is convenient, it is slightly insecure because somebody could have immediate access to our database with the full permissions that the user "bob" has if he got his hands on our .php file. It would be better to put the username and password in a file that is not in the document root of the web application and include it in our script, as follows: <?php // this is dbconnect.inc $db_server = 'localhost'; $db_user_name = 'bob'; $db_password = 'secret'; $db_name = 'somedb'; ?> <?php include('../code/dbconnect.inc'); $conn = @new mysqli($db_server, $db_user_name, $db_password, $db_name); // etc ?> We should think about doing the same thing for other sensitive data. File System ConsiderationsAs we will see in Chapter 24, "Files and Directories," PHP was designed with the ability to work with the local file system in mind. There are two concerns for us:
We will discuss the first problem further in Chapter 24, but basically we will have to be careful not to write files with open security permissions or place them in a location where other users of a multi-user operating system (such as Unix) could get access to them. For the second, we will want to be extremely careful when we let users enter the name of a file they would like to see. If we had a directory in our document root (c:\webs\messageboard\documentroot) with a bunch of files we were granting the user access to and he input the name of the file he wanted to view, we could get into trouble if he asked to see ..\..\..\php\php.ini This would let him learn about our PHP installation and see if there were any obvious weaknesses to exploit. The fix to this problem is easy: If we accept user input, we should filter it aggressively to avoid these problems. For the previous example, removing any instances of ..\would help prevent this problem, as would any attempt at an absolute path, such as c:\mysql\my.ini. Code Stability and BugsAs we mentioned previously, your web application is neither likely to perform well nor be terribly secure if the code has not been properly tested, reviewed, or is full of bugs. This should not be taken as an accusation, but rather as an admission that all of us are fallible, as is the code we write. When a user connects to a web site, enters a word in the search dialog box (for instance, "defenestration"), and clicks on "Search," he is not going to have great confidence in the robustness or security of it if the next thing they see is: ¡Aiee! This should never happen. BUG BUG BUG !!!! See Deb! If we plan for the stability of our application, we can effectively reduce the likelihood of problems due to human error. Ways in which we can do this are
Execution Quotes and execWe briefly mentioned a feature in Chapter 2 "The PHP Language," called the shell command executor or execution quotes. This is basically a language operator through which you can execute arbitrary commands in a command shell (sh under Unix-like operating systems or cmd.exe under Windows) by enclosing the command in back quotes (`)notice that they are different from regular single quotes ( ' ). The key is typically located in the upper-left of English language keyboards and can be quite challenging to find on other keyboard layouts. Execution quotes return a string value with the text output of the program executed. If we had a text file with a list of names and phone numbers in it, we might use the grep command to find a list of names that contain "Smith." grep is a Unix-like command that takes a string pattern to look for and a list of files in which to find it. It then returns the lines in the files that match the pattern to find. grep [args] pattern files-to-search... There are Windows versions of grep. Windows ships with a program called findstr.exe, which can be used similarly. To find people named "Smith," we could execute the following: <?php // -i means ignore case $users = `grep i smith /home/httpd/www/phonenums.txt` // split the output lines into an array // note that the \n should be \r\n on Windows! $lines = split($users, "\n"); foreach ($lines as $line) { // names and phone nums are separated by , char $namenum = split($lines, ','); echo "Name: {$namenum[0]}, Phone #: {$namenum[1]}<br/>\n"; } ?> However, as we also mentioned in Chapter 2, we will avoid using this operator. If you ever allow user input to the command placed in back quotes, you are opening yourself to all sorts of security problems, and you will need to filter the input heavily to ensure the safety of your system. At the very least, the escapeshellcmd function should be used. However, to be certain, you might want to restrict the possible input even more. Even worse, given that we normally want to run our web server and PHP in a context with lower permissions (we will see more about this in the following sections), we might find ourselves having to grant it more permissions to execute some of these commands, which could further compromise our security. Use of this operator in a production environment is something to be approached with a great amount of caution. The exec and system functions are very similar to the execution quotes operator except they execute the command directly instead of executing it within a shell environment, and they do not always return the full set of output that the execution quotes return. They share many of the same security concerns, and therefore come with the same warnings. |