Section 5.7. SQL Injection Attacks | Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications

5.7. SQL Injection Attacks

Displaying user-entered data directly on the pages of your application without filtering can be extremely dangerous, but it's certainly not the worst kind of hole you could encounter. Although we very rarely fork out to external programs within a web application (at least, we're hopefully not opening a shell and passing along a command string; the forking overhead gets painful when we have many requests), this was once the normal way of things and there was a fate much worse that user-entered data being displayed to usersuser-entered commands being executed at the command line.

Consider the following example code:

 # create a temporary file for request data $filename = "/tmp/app_data_$_POST[id].tmp"; system("touch $filename");

If someone passes in an ID of "foo; rm -rf /;", we're going to be blindly executing that command. If you're actually executing commands containing user-entered data, then take a look at the PHP built-in functions escapeshellcmd( ) and escapeshellarg( ).

Of course, we very rarely mess about with shell commands in practice. We store our data not in the filesystem but in a database, which we command using SQL. You'll be happy to know that SQL is just as vulnerable to user data-based attacks.

A SQL injection attack is defined as a situation in which you embed unescaped user data in SQL that you pass to a database for execution. Typically, all data in your SQL statements will come from user-entered data in some wayeither as data sent directly from the client (in the case of GET or POST variables) or as data that we once received from the client and stored unescaped, such as data we previously stored in the database.

Nothing illustrates the issue as well as an example, so consider this snippet of PHP source code:

 mysql_query("SELECT * FROM Frobs WHERE id=$HTTP_GET_VARS[id]");

The operation here is fairly straightforward. We want to select a single row from the Frobs table. The ID of the row in a GET query string parameter has been passed in. Pretty soon, a malicious user comes along and requests the following URL:

 page.php?id=1;DELETE+FROM+Frobs;

We blindly insert the ID value from the query string into our SQL and end up passing this along to the database:

 SELECT * FROM Frobs WHERE id=1; DELETE FROM Frobs;

Oh dear. We've just lost all our frobs. I hope you made a backup recently.

This genre of exploitable hole exists in many well-known web applications, both open and closed source. In every case, this could have been easily avoided by a little common sense and planning. Having SQL injection holes in a commercial application is inexcusableallowing user-entered data to delete your entire data store is many times worse than allowing one user to steal another's data. So what can you do about it?

5.7.1. Mitigating SQL Injection Attacks

An often-used example of SQL injection attacks is when an attacker tricks the application into running "DROP DATABASE Foo;". This action would delete the database named Foo, with all the tables of data it contained.

The most interesting thing to note about this example is that the web application in question had enough privileges to delete an entire database. An important rule of thumb when working with databases is to not grant more permissions than absolutely necessary. Many open and closed source web applications prompt you for database login credentials with the default username set to root. This is often because the application must create the database schema it's going to use automatically and so needs extra privileges at install time. But these extra privileges are still around later when users start using the application. Any exploit is going to allow an attacking user to gain root level access to your database.

With MySQL, this problem can easily be fixed by creating extra users in the mysql.users table with reduced permissions. The GRANT statement can be used to do this fairly easily, assuming you have a user with GRANT privileges (such as root) to run it to start with. The following SQL creates a new user named "foo" in MySQL with only read privileges:

 GRANT SELECT ON *.* TO "foo"@"%" IDENTIFIED BY "bar"; FLUSH PRIVILEGES;

The new user has the username foo, the password bar, and can only SELECT data (although from any table in any database).

Web applications shouldn't ever need CREATE, DROP, or ALTER privileges. If they do, you probably have something wrong with your basic design. Web applications shouldn't need to do anything outside of reading and writing data. For reading, that means SELECT privileges, while writing means INSERT, UPDATE, and DELETE rights. If you're connecting to a server that you're only going to read from, there's no need to grant write permissions. Most connections within a well architected web application can thus use a very restricted set of permissions, with slightly more being given out to code that needs to perform writes. A good set of MySQL grants for a web application are as follows:

 GRANT SELECT, INSERT, UPDATE, DELETE ON MyApp.* TO "www-rw"@"10.0.0.%" IDENTIFIED BY "password_1"; GRANT SELECT ON MyApp.* TO "www-r"@"10.0.0.%" IDENTIFIED BY "password_2"; FLUSH PRIVILEGES;

Here we create two accounts: one with read-only and one with both read and write permissions. All permissions are scoped to a single database and can only be accessed by machines on our local network. Any SQL injection holes in our application are going to only allow access to, in the best case, reads or, in the worst case, simple writes. Nobody is going to be dropped from our database.

5.7.2. Avoiding SQL Injection Attacks

Once we're protected from attackers dropping our database, we're still vulnerable to simple write attacks:

 DELETE FROM Frobs;

We obviously want to prevent this from happening, too. Luckily this is easy enough, so long as we untaint any data before we stuff it into our SQL. We can define a couple of functions in our database layer for converting tainted data into safe SQL. For strings, we want to escape any quotes or nulls with backslashes and change any existing backslashes into double backslashes. Once we've done that, we can surround the string in single quotes and we have a safe chunk of SQL. PHP has a couple of functions built in to do this: the generic AddSlashes( ), which escapes slashes and quotes, and the more complicated mysql_real_escape_string( ), which escapes a bunch more characters but is ultimately unnecessary (although useful for making logs easier to read). For integer data, we just need to check that the data really does constitute an integer, which we can do with the intval( ) function. Putting this together, we get the following two functions:

 function db_escape_str($data){         return "'".AddSlashes($data)."'"; }  function db_escape_int($data){          return intval($data); }

For our attempted attack string, 1;DELETE FROM Frobs;, the values returned become '1;DELETE FROM Frobs;' and 1, respectively, both of which can be safely included in a SQL statement.

Untainting float data is left as an exercise for the reader. It's also worth noting that when using LIKE or RLIKE constructs in your SQL, you'll need to further escape your data. The following two functions do all that is necessary to ensure no tainted data makes it through:

 function  db_escape_str_like($string){         return str_replace(array('%','_'), array('\\%','\\_'), $string); } function  db_escape_str_rlike($string){         return preg_replace("/([( ).\[\]*^\$])/", '\\\$1', $string); }

In this case we don't return the string with single quotes at either end, or else actually putting wildcards into the SQL would be impossible. Instead, we can use code constructs such as this:

 $foo = db_escape_str_like($_POST[widget]); $sql = "SELECT * FROM Frobs WHERE widget LIKE '%$foo%'";

So once we have our functions for escaping data, when do we use them? Every time we want to insert data into SQL statements, obviously, but where exactly in our code does that fit? The PHP magic_quotes_gpc directive causes all incoming user data to be automatically escaped. This is a really bad idea. Escaping all incoming data means that we need to unescape everything we display right back to users. This is easy enough, although annoying. The real danger comes from assuming that all your data is untainted. If we manipulate the user data in any way, there's a danger we might taint it. If we fetch data from the database, it's automatically tainted, so we can't use it in SQL again in the same script without escaping it. Having PHP escape all of your input creates a culture of assuming nothing is tainted and often leads to SQL injection holes.

The best approach is to always assume all of your data is tainted and escape it at the point that it's used. This "just in time" escaping technique helps make holes immediately obvious. If we look at a SQL statement in our application and the values used aren't being escaped on the few lines preceding it, then we know there's a mistake. We can go one step further and actually perform the escaping in the same command that executes the SQL. We just need to build a couple of wrapper functions around inserting and updating the database. We can then makes calls like this:

 db_insert('table_name', array(     'field_1' => db_escape_str($value_1),     'field_2' => db_escape_str($value_2),     'field_3' => db_escape_int($value_3), ));

By avoiding the action-at-a-distance effect of untainting data too early (the untainting should be physically close in the source to the use of the untainted data), we can easily glance at code and see if we're correctly escaping data. The procedure for escaping data can also constitute part of your coding standard. For instance, you can dictate that any variable name with the suffix '_sqlesc' has already been escaped for use in SQL. You have to be careful here that you identify the different kinds of untainting needed by various usages of datauntainting a string for SQL is different from untainting a string for a shell command or for displaying in HTML. Having a single '_untainted' suffix is not a good idea.

The Perl programmers among you are probably looking on and laughing. Perl's DBI has prepared statements baked in, which allow us to ignore all the problems of escaping data in SQL by doing it automatically. In PHP 5 we get both the mysqli extension and PDO, which both provide support for prepared statements. The mysqli extension (the "i" stands for improved) provides lots of new functions through which we can access the database. To perform a simple prepared SELECT statement, we can use the following code:

 if ($dbst = mysqli_prepare($dbh, "SELECT id FROM Frobs WHERE frob_type=?")) {         mysqli_stmt_bind_param($dbst, "s", $_GET[frob_type]);         mysqli_stmt_execute($dbst);         mysqli_stmt_store_result($dbst);         mysqli_stmt_bind_result($dbst, $id);         mysqli_stmt_fetch($dbst);         mysqli_stmt_close($dbst);         echo "Id was $id!"; }

Here we create the SQL first, with question marks delimiting where we want to stick our data. We then bind our data using mysqli_stmt_bind_param( ) and execute the statement. Once we've executed it, we can bind variables to the output columns and fetch back a row of results. This is all pretty ugly and far more syntax than we're used to with the simple mysql extension. However, it is quite easy to emulate the old style of code with a couple of helper functions:

 function db_query_prepare($dbh, $sql, $args){   $dbst = mysqli_prepare($dbh, $sql);   if (!$dbst){ return 0; }   foreach($args as $arg)     mysqli_stmt_bind_param($dbst, "s", $arg);   }   mysqli_stmt_execute($dbst);   mysqli_stmt_store_result($dbst);   return $dbst; } function db_fetch_array(&$dbst){   $data = mysqli_stmt_result_metadata($dbst);   $fields = array( );   $out = array( );   $fields[0] = &$dbst;   $count = 1;   while ($field = mysqli_fetch_field($data)) {     $fields[$count] = &$out[$field->name];     $count++;   }   call_user_func_array(mysqli_stmt_bind_result, $fields);   mysqli_stmt_fetch($dbst);   return $out; } $result = db_query_prepare($dbh, "SELECT id FROM Frobs WHERE frob_type=?",   array($_GET[frob_type])); $row = db_fetch_array($result); echo "Id is $row[id]!\n";

We hide the complexity of the call sequence and the multiple bind calls (both for the request and result) within our helper functions and expose a very simple interface to the rest of our code. The fewer moving parts we have in the rest of our code, the less there is to go wrong.

If we're using PDO, then it's even easier. We don't have to write the helper functions since somebody already thought of that for us. Our example code can be reduced to a few lines:

 $result = $dbh->prepare("SELECT id FROM Frobs WHERE frob_type=?"); $result->execute(array($_GET['frob_type'])); $row = $result->fetch( ); echo "Id is $row[id]!\n";

PDO also supports prepared statements for older versions of MySQL, by escaping the data and constructing the SQL in the manner we talked about performing manually earlier in this chapter. PDO then takes out all the effort and potential problems of escaping at the cost of adding an extension to PHP 5. But you do have to use PHP 5.

Once we've made sure we're safely receiving, filtering, storing, and fetching our data, we're ready to rock. Now's the time to go off and build your own killer application, take venture capital funding, and get rich. When you're done with that, turn the page into Chapter 6.