Securing Your Code

Moving on to the next aspect of our approach to securityinspecting components individually and looking at how to improve their securitywe will begin by investigating the things we can do to keep our code safe. While we cannot show you everything you can use to cover all possible security threats (entire tomes have been devoted to these subjects), we can at least give some general guidelines and point you in the right direction. We will point out security concerns for specific technology areas that we will use in later chapters.

A Golden Rule

We will start with a golden rule that applies to security in any context:

The most important thing we can do as application authors is filter all input that comes from external sources. This does not mean that we should design a system with the assumption that all of our users are crookswe still want to welcome them and encourage them to use our web application. However, we want to be sure that we are prepared for any misuse of our system.

If we filter effectively, we can substantially reduce the number of external threats and massively improve the robustness of our system. Even if we are pretty sure we trust the user, such as the CEO of our company or our bosses, we cannot be certain they do not have some type of spyware program that is modifying or sending new requests to our server.

Filtering Input

Given the importance of filtering the input we get from external customers, we should look at the ways in which we might do this.

Double Checking Expected Values

There are times when we will present the user with a range of possible values to choose for things such as shipping (ground, express, overnight), states (one of 50 states in the U.S.), and so on. Now, imagine if we had the following form:

 <html> <head>   <title> What be ye laddie? </title> </head> <body>   <form action='submit_form.php' method='POST'>     <input type='radio' name='gender' value='Male'/>Male<br/>     <input type='radio' name='gender' value='Female'>Female<br/>     <input type='radio' name='gender' value='Other'/>Unknown<br/>     <input type='submit' value='submit'/>   </form> </body> </html>

This form could look as shown in Figure 16-1. From this form, we might assume that whenever we query the value of $_POST['gender'] in submit_form.php, we would get the value "Male," "Female," or "Other"and we would be wrong.

Figure 16-1. A trivial gender entry form.

As we mentioned in Chapter 13, "Web Applications and the Internet," the web operates over HTTP. The form submission from our example would be sent to our server as a text message with a structure similar to the following:

 POST /submit_form.php HTTP/1.1 Host: www.myhostnamehooray.com User-Agent: WoobaBrowser/3.4 (Windows) Content-Type: application/x-www-form-urlencoded Content-Length: 11 gender=Male

However, there is nothing stopping someone from connecting to our web server and sending any values he wants for a form. Thus, somebody could send us the following:

 POST /submit_form.php HTTP/1.1 Host: www.myhostnamehooray.com User-Agent: WoobaBrowser/3.4 (Windows) Content-Type: application/x-www-form-urlencoded Content-Length: 22 gender=I+like+cookies.

If we were to write the following code

 <?php   echo <<<EOM   <p align='center'>     The user's gender is: {$_POST['gender']}.   </p> ?>

we might find ourselves embarrassed later on. A better strategy is to verify that the incoming value is one of the expected/permitted values, as follows:

 <?php   switch ($_POST['gender'])   {     case 'Male':     case 'Female':     case 'Other':       echo <<<EOM   <p align='center'>     Congratulations!  You are: '{$_POST['gender']}'.   </p> EOM;       break;     default:       echo <<<EOM   <p align='center'>     <font color='red'>WARNING:</font> Invalid input value for gender     specified.   </p> EOM;       break;   } ?>

There is a little bit more code involved, but we can be sure we are getting correct values; this becomes more important when we start handling values more financially sensitive than a user's gender. As a rule, we can never assume a value from a form will be within a set of expected valueswe must check first.

Filtering Even Basic Values

HTML form elements have no types associated with them, and most pass strings (which may represent things such as dates, times, or numbers) to the server. Thus, if you have a numeric field, you cannot assume that it was entered as such. Even in environments where powerful client side code can try to make sure that the value entered is of a particular type, there is no guarantee that the values will not be sent to the server directly, as in the "Double Checking Expected Values" section.

An easy way to make sure that a value is of the expected type is to cast or convert it to that type and use it, as follows:

 $number_of_nights = (int)$_POST['num_nights']; if ($number_of_nights == 0) {   echo "ERROR: Invalid number of nights for the room!";   exit; }

If we have the user input a date in a localized format, such as "mm/dd/yy"' for users in the United States, we can then write some code to verify it using the PHP function called checkdate. This function takes a month, day, and year value (4-digit years), and indicates whether or not they form a valid date:

 // split is mbcs-safe via mbstring (see chapter 5) $mmddyy = split($_POST['departure_date'], '/'); if (count($mmddyy) != 3) {   echo "ERROR: Invalid Date specified!";   exit; } // handle years like 02 or 95 if ((int)$mmddyy[2] < 100) {   if ((int)$mmddyy[2] > 50)     $mmddyy[2] = (int)$mmddyy[2] + 1900;   else if ((int)$mmddyy[2] >= 0)     $mmddyy[2] = (int)$mmddyy[2] + 2000;   // else it's < 0 and checkdate will catch it } if (!checkdate($mmddyy[0], $mmddyy[1], $mmddyy[2])) {   echo "ERROR: Invalid Date specified!";   exit; }

By taking the time to filter and validate the input, we can not only help ourselves out for natural error-checking that we should be doing in the first place (such as verifying whether a departure date for a plane ticket is a valid date), but we can also improve the security of our system.

HTML Escaping

There are applications where you might take the input that a user has specified and display the input on a page. Pages where users can comment on a published article or message board system are perfect examples of where this might occur. In these situations, we need to be careful that users do not inject malicious HMTL markup into the text they input.

One of the easiest ways to do this is to use the htmlspecialchars or the htmlentities function. These functions take certain characters they see in the input string and convert them to HTML entities. An HTML entity is a special character sequence, begun with the ampersand character (&), that is used to indicate a special character that cannot be represented easily in HTML code. Also, the entity name and a terminating semicolon (;) are supplied after the ampersand character. Optionally, an entity can be an ASCII key code specified by # and a decimal number, such as /, for the forward slash character (/). Since all markup elements in HTML are demarcated by <> characters, it could prove difficult to enter them in a string for output to the final content (since the browser will default to assuming they delineate markup elements). To get around this, we use < and >. Similarly, if we want to include the ampersand character in our HTML, we can use the entity &. Single and double quotes are represented by ' and ". Entities are converted into output by the HTML client and are thus not considered part of the markup.

The difference between htmlspecialchars and htmlentities is as follows: The former defaults to only replacing &, <, and >, with optional switches for single and double quotes. The latter replaces anything that can be represented by a named entity with these things. Examples of such entities are the copyright symbol ©, represented by © and the Euro currency symbol , represented by &euro. However, it will not convert characters to numeric entities.

Both functions take a value to control the conversion single and double quotes to entities as their second parameter, and both functions also take the character set in which the input string is encoded as their third parameter (which is vital for us, since we want this function to be safe on our UTF-8 strings). Possible values for the second parameter are

ENT_COMPAT Double quotes are converted to " but single quotes are left untouched.
ENT_QUOTES Both single and double quotes are converted to ' and ".
ENT_NOQUOTES (the default value) Neither single nor double quotes are converted by this function.

Consider the following text:

   $input_str = <<<EOSTR   <p align='center'>     The user gave us "15000¤".   </p>   <script>     // malicious JavaScript code goes here.   </script> EOSTR;

If we ran it through the following PHP script (we will run the nl2brsee the "nl2br" section in Chapter 1function on the output string to ensure that it is formatted nicely in the browser)

 <?php   $str = htmlspecialchars($input_str, ENT_NOQUOTES, "UTF-8");   echo nl2br($str);   $str = htmlentities($input_str, ENT_QUOTES, "UTF-8");   echo nl2br($str); ?>

we would see the following text output:

 <br />   &lt;p align='center'&gt;<br />     The user gave us "15000¤".<br />   &lt;/p&gt;<br /> <br />   &lt;script&gt;<br />     // malicious JavaScript code goes here.<br />   &lt;/script&gt;<br /> <br />   &lt;p align=&#039;center&#039;&gt;<br />     The user gave us &quot;15000&euro;&quot;.<br />   &lt;/p&gt;<br /> <br />   &lt;script&gt;<br />     // malicious JavaScript code goes here.<br />   &lt;/script&gt;<br />

It would look in the browser as follows:

 <p align='center'> The user gave us "15000¤". </p> <script> // malicious JavaScript code goes here. </script> <p align='center'> The user gave us "15000¤". </p> <script> // malicious JavaScript code goes here. </script>

Note that the htmlentities function replaced the symbol for the Euro () with an entity (€), while htmlspecialchars left it alone.

For situations where we would like to permit users to enter some HTML, such as message board users who would like to use characters to control font, color, and style (bold or italics), we will have to pick our way through the strings to find those and not strip them out. We will do this through the use of regular expressions in Chapter 22, "Data Validation with Regular Expressions."

Making Strings Safe for SQL

Another reason we want to process our strings to make them safe is to prevent SQL injection attacks, which we mentioned briefly in Chapter 12, "PHP and Data Access." In these attacks, the malicious user tries to take advantage of poorly protected code and user permissions to execute extra SQL code that we do not wish them to. If we are not careful, a username of

 kitty_cat; DELETE FROM users;

could become a problem for us.

There are two ways to prevent this sort of security breach:

Filter and escape all strings sent to database servers via SQL. The exact function you call differs for each server, but for MySQL, PostgreSQL, and Microsoft SQL Server, the functions are mysqli_real_escape_string, pg_escape_string, and mssql_escape_string. Oracle users should use bound variables, where escaping is handled for them.
Make sure that all input conforms to what you expect it to be. If our usernames are supposed to be up to 50 characters long and include only letters and numbers, then we can be sure that "; DELETE FROM users" is something we would not want to permit. Writing the PHP code to make sure input conforms to the appropriate criteria before we even send it to the database server means we can print a more meaningful error than the database would give us and reduce our risks.

The mysqli extension that ships with PHP5 has the added security advantage of allowing only a single query to execute with the mysqli_query or mysqli::query methods. To execute multiple queries, you have to use the mysqli_multi_query or mysqli::multi_query method, which helps us prevent the execution of more potentially harmful statements or queries.

Code Organization

Some would argue that any file not directly accessible to the user from the Internet should not find a place in the document root of the web site. For example, if the document root for our message board web site is /home/httpd/messageboard/www, we should place all of our .inc files and other files in a place such as /home/httpd/messageboard/code. When we want to include those files, we can simply write in our code:

 require_once('../code/user_object.inc');

The reasons for this degree of caution come down to what happens when a malicious user makes a request for a file that is not a .php or .html file. Many web servers default to dumping the contents of that file to the output stream. Thus, if we were to keep user_object.inc in the public document root and the user requested it, he might see a full dump of our code in his web browser. This would let him see the implementation, get at any intellectual property we might have in this file, and potentially find exploits that we might have missed.

To fix this, we should be sure that the web server is configured to only allow the request of .php and .html files (see the "Securing your Web Server and PHP" section in Chapter 17), and that requests for other types of files should return an error from the server.

Similarly, files such as password files, text files, configuration files, or special directories are best kept away from the public document root. Even if we think we have our web server configured properly, we might have missed something. Or if our web application is moved to a new server that is not properly configured in the future, we might be exposed to exploitation.

What Goes in Your Code

Many of the code snippets we have shown for accessing databases have included the database name, username, and user password in plain text, as follows:

 $conn = @new mysqli("localhost", "bob", "secret", "somedb");

While this is convenient, it is slightly insecure because somebody could have immediate access to our database with the full permissions that the user "bob" has if he got his hands on our .php file.

It would be better to put the username and password in a file that is not in the document root of the web application and include it in our script, as follows:

 <?php   // this is dbconnect.inc   $db_server = 'localhost';   $db_user_name = 'bob';   $db_password  = 'secret';   $db_name = 'somedb'; ?> <?php   include('../code/dbconnect.inc');   $conn = @new mysqli($db_server, $db_user_name, $db_password,                       $db_name);   // etc ?>

We should think about doing the same thing for other sensitive data.

File System Considerations

As we will see in Chapter 24, "Files and Directories," PHP was designed with the ability to work with the local file system in mind. There are two concerns for us:

Will any files we write to the disk be visible to others?
If we expose this functionality to anybody else, will that person be able to access files we might not want him to, such as /etc/passwd?

We will discuss the first problem further in Chapter 24, but basically we will have to be careful not to write files with open security permissions or place them in a location where other users of a multi-user operating system (such as Unix) could get access to them.

For the second, we will want to be extremely careful when we let users enter the name of a file they would like to see. If we had a directory in our document root (c:\webs\messageboard\documentroot) with a bunch of files we were granting the user access to and he input the name of the file he wanted to view, we could get into trouble if he asked to see

 ..\..\..\php\php.ini

This would let him learn about our PHP installation and see if there were any obvious weaknesses to exploit. The fix to this problem is easy: If we accept user input, we should filter it aggressively to avoid these problems. For the previous example, removing any instances of ..\would help prevent this problem, as would any attempt at an absolute path, such as c:\mysql\my.ini.

Code Stability and Bugs

As we mentioned previously, your web application is neither likely to perform well nor be terribly secure if the code has not been properly tested, reviewed, or is full of bugs. This should not be taken as an accusation, but rather as an admission that all of us are fallible, as is the code we write.

When a user connects to a web site, enters a word in the search dialog box (for instance, "defenestration"), and clicks on "Search," he is not going to have great confidence in the robustness or security of it if the next thing they see is:

 ¡Aiee!  This should never happen.  BUG BUG BUG !!!! See Deb!

If we plan for the stability of our application, we can effectively reduce the likelihood of problems due to human error. Ways in which we can do this are

Complete a thorough design phase of our product (possibly with prototypes). The more people with whom we review what we plan to do, the more likely we are to spot problems even before we begin. This is also a great time to do usability testing on our interface.
Allocate testing resources to our project. So many projects skimp on this or hire 1 tester for a project with 50 developers. Developers do not typically make good testers! They are very good at making sure their code works with the correct input, but they're less proficient at finding other problems. Major software companies have a ratio of developers to testers of nearly 1:1, and while it may not be likely that our bosses would pay for that many testers, some testing resources will be critical to the success of the application.
Have your developers use unit testing, a topic we will cover more in Chapter 29, "Development and Deployment." While this might not help us find all the bugs that a tester would, this will help the product from regressinga phenomenon in which problems or bugs that were fixed are reintroduced due to other code changes. Developers should not be allowed to commit recent changes to the project unless all of the unit tests continue to succeed.
Monitor the application as it runs after it is deployed. By browsing through the logs on a regular basis and looking at user/customer comments, you should be able to see if any major problems or possible security holes are cropping up. If so, you can act to address them before they become more serious.

Execution Quotes and `exec`

We briefly mentioned a feature in Chapter 2 "The PHP Language," called the shell command executor or execution quotes. This is basically a language operator through which you can execute arbitrary commands in a command shell (sh under Unix-like operating systems or cmd.exe under Windows) by enclosing the command in back quotes (`)notice that they are different from regular single quotes ( ' ). The key is typically located in the upper-left of English language keyboards and can be quite challenging to find on other keyboard layouts.

Execution quotes return a string value with the text output of the program executed.

If we had a text file with a list of names and phone numbers in it, we might use the grep command to find a list of names that contain "Smith." grep is a Unix-like command that takes a string pattern to look for and a list of files in which to find it. It then returns the lines in the files that match the pattern to find.

 grep [args] pattern files-to-search...

There are Windows versions of grep. Windows ships with a program called findstr.exe, which can be used similarly. To find people named "Smith," we could execute the following:

 <?php   // -i means ignore case   $users = `grep i smith /home/httpd/www/phonenums.txt`   // split the output lines into an array   // note that the \n should be \r\n on Windows!   $lines = split($users, "\n");   foreach ($lines as $line)   {     // names and phone nums are separated by , char     $namenum = split($lines, ',');     echo "Name: {$namenum[0]}, Phone #: {$namenum[1]}<br/>\n";   } ?>

However, as we also mentioned in Chapter 2, we will avoid using this operator. If you ever allow user input to the command placed in back quotes, you are opening yourself to all sorts of security problems, and you will need to filter the input heavily to ensure the safety of your system. At the very least, the escapeshellcmd function should be used. However, to be certain, you might want to restrict the possible input even more.

Even worse, given that we normally want to run our web server and PHP in a context with lower permissions (we will see more about this in the following sections), we might find ourselves having to grant it more permissions to execute some of these commands, which could further compromise our security. Use of this operator in a production environment is something to be approached with a great amount of caution.

The exec and system functions are very similar to the execution quotes operator except they execute the command directly instead of executing it within a shell environment, and they do not always return the full set of output that the execution quotes return. They share many of the same security concerns, and therefore come with the same warnings.