Recipe 19.7. Storing Web Input in a Database

Problem

Input obtained over the Web cannot be trusted and should not be entered into a database without taking the proper precautions.

Solution

Sanitize data values by using placeholders or a quoting function so that SQL statements you construct are valid and not subject to SQL injection attacks. Also, enabling strict SQL mode causes the MySQL server to reject values that are invalid for column data types.

Discussion

After you've extracted input parameter values in a web script and checked them to make sure they're valid, you're ready to use them to construct an SQL statement. This is actually the easy part of input processing, although it's necessary to take the proper precautions to avoid making a mistake that you'll regret. First, let's consider what can go wrong, and then see how to prevent problems.

Suppose that you have a search form that contains a keyword field and acts as a frontend to a simple search engine. When a user submits a keyword, you intend to use it to find matching rows in a table by constructing a statement like this:

SELECT * FROM mytbl WHERE keyword = 'keyword_val'

Here, keyword_val represents the value entered by the user. If the value is something like eggplant, the resulting statement is:

SELECT * FROM mytbl WHERE keyword = 'eggplant'

The statement returns all eggplant-matching rows, presumably generating a small result set. But suppose that the user is tricky and tries to subvert your script by entering the following value:

eggplant' OR 'x'='x

In this case, the statement becomes:

SELECT * FROM mytbl WHERE keyword = 'eggplant' OR 'x'='x'

That statement matches every row in the table! If the table is quite large, the input effectively becomes a form of denial-of-service attack, because it causes your system to divert resources away from legitimate requests into doing useless work. This type of attack is known as SQL injection because the user is injecting executable SQL code into your statement where you expect to receive only a nonexecutable data value. Likely results of SQL injection attacks include the following:

Extra load on the MySQL server
Out-of-memory problems in your script as it tries to digest the result set received from MySQL
Extra network bandwidth consumption as the script sends the results to the client

If your script generates a DELETE or UPDATE statement, the consequences of this kind of subversion can be much worse. Your script might issue a statement that empties a table completely or changes all of its rows, when you intended to allow it to affect only a single row.

The implication of the preceding discussion is that providing a web interface to your database opens you up to certain forms of security vulnerabilities. However, you can prevent these problems by means of a simple precaution that you should already be following: don't put data values literally into statement strings. Use placeholders or an encoding function instead. For example, in Perl you can handle an input parameter by using a placeholder:

$sth = $dbh->prepare ("SELECT * FROM mytbl WHERE keyword = ?"); $sth->execute (param ("keyword")); # ... fetch result set ...

Or by using quote⁠(⁠ ⁠ ⁠):

$keyword = $dbh->quote (param ("keyword")); $sth = $dbh->prepare ("SELECT * FROM mytbl WHERE keyword = $keyword"); $sth->execute (); # ... fetch result set ...

Either way, if the user enters the subversive value, the statement becomes harmless:

SELECT * FROM mytbl WHERE keyword = 'eggplant\' OR \'x\'=\'x'

As a result, the statement matches no rows rather than all rowsdefinitely a more suitable response to someone who's trying to break your script.

Placeholder and quoting techniques for Ruby, PHP, Python, and Java are similar, and have been discussed in Section 2.5. For JSP pages written using the JSTL tag library, you can quote input parameter values using placeholders and the <sql:param> tag (Section 17.3). For example, to use the value of a form parameter named keyword in a SELECT statement, do this:

<sql:query dataSource="${conn}" var="rs">   SELECT * FROM mytbl WHERE keyword = ?   <sql:param value="${param['keyword']}"/> </sql:query>

One issue not covered by placeholder techniques involves a question of interpretation: If a form field is optional, what should you store in the database if the user leaves the field empty? Perhaps the value represents an empty stringor perhaps it should be interpreted as NULL. One way to resolve this question is to consult the column metadata. If the column can contain NULL values, interpret an empty field as NULL. Otherwise, take an empty field to mean an empty string.

Placeholders and encoding functions apply only to SQL data values. One issue not addressed by them is how to handle web input used for other kinds of statement elements such as the names of databases, tables, and columns. If you intend to include such values into a statement, you must insert them literally, which means you should check them first. For example, if you construct a statement such as the following, you should verify that $tbl_name contains a reasonable value:

SELECT * FROM $tbl_name;

But what does "reasonable" mean? If you don't have tables containing strange characters in their names, it may be sufficient to make sure that $tbl_name contains only alphanumeric characters or underscores. An alternative is to issue a statement that determines whether the table actually exists. (You can check INFORMATION_SCHEMA or use SHOW TABLES.) This is more foolproof, at the cost of an additional statement.

A better option is to use an identifier-quoting routine, if you have one. This approach requires no extra statement because it renders any string safe for use in a statement. If the identifier does not exist, the statement simply fails as it should. Section 2.6 discusses this option further.

For additional protection in your web scripts, combine client-side checking of input values with strict server-side checking. You can set the server SQL mode to be restrictive about accepting input values so that it rejects values that don't match your table column data types. For discussion of the SQL mode and input value checking, see Section 10.20.

You Should Try to Break Your Scripts

The discussion in this recipe has been phrased in terms of guarding against other users from attacking your scripts. But it's not a bad idea to put yourself in the place of an attacker and adopt the mindset, "How can I break this application?" That is, consider whether there is some input you can submit to it that the application won't handle, and that will cause it to generate a malformed statement. If you can cause an application to misbehave, so can other people, either deliberately or accidentally. Be wary of bad input, and write your applications accordingly. It's better to be prepared than to just hope.

Recipe 19.7. Storing Web Input in a Database

Problem

Solution

Discussion

You Should Try to Break Your Scripts

See Also