18.8.1 Problem
Input obtained over the Web cannot be trusted and should not be placed into a query without taking the proper precautions.
18.8.2 Solution
Sanitize data values by using placeholders or a quoting function.
18.8.3 Discussion
After you've extracted input parameter values and checked them to make sure they're valid, you're ready to use them to construct a query. This is actually the easy part, though it's necessary to take the proper precautions to avoid making a mistake that you'll regret. First, let's consider what can go wrong, then see how to prevent the problem.
Suppose you have a search form containing a keyword field that acts as a frontend to a simple search engine. When a user submits a keyword, you intend to use it to find matching records in a table by constructing a query like this:
SELECT * FROM mytbl WHERE keyword = 'keyword_val'
Here, keyword_val represents the value entered by the user. If the value is something like eggplant, the resulting query is:
SELECT * FROM mytbl WHERE keyword = 'eggplant'
The query returns all eggplant-matching records, presumably generating a small result set. But suppose the user is tricky and tries to subvert your script by entering the following value:
eggplant' OR 'x'='x
In this case, the query becomes:
SELECT * FROM mytbl WHERE keyword = 'eggplant' OR 'x'='x'
That query matches every record in the table! If the table is quite large, the input effectively becomes a form of denial-of-service attack, because it causes your system to devote resources away from legitimate requests into doing useless work. Likely results are:
If your script generates a DELETE statement, the consequences of this kind of subversion can be much worseyour script might issue a query that empties a table completely, when you intended to allow it to delete only a single record at a time.
The implication is that providing a web interface to your database opens you up to certain forms of attack. However, you can prevent this kind of problem by means of a simple precaution that you should already be following: don't put data values literally into query strings. Use placeholders or an encoding function instead. For example, in Perl you can handle an input parameter like this using placeholders:
$keyword = param ("keyword"); $sth = $dbh->prepare ("SELECT * FROM mytbl WHERE keyword = ?"); $sth->execute ($keyword); # ... fetch result set ...
Or like this using quote( ):
$keyword = param ("keyword"); $keyword = $dbh->quote ($keyword); $sth = $dbh->prepare ("SELECT * FROM mytbl WHERE keyword = $keyword"); $sth->execute ( ); # ... fetch result set ...
Either way, if the user enters the subversive value, the query becomes:
SELECT * FROM mytbl WHERE keyword = 'eggplant' OR 'x'='x'
The input is rendered harmless, and the result is that the query will match no records rather than all recordsdefinitely a more suitable response to someone who's trying to break your script.
Placeholder and quoting techniques for PHP, Python, and Java are similar, and have been discussed in Recipe 2.7 and Recipe 2.8. For JSP pages written using the JSTL tag library, you can quote input parameter values using placeholders and the tag (Recipe 16.4). For example, to use the value of a form parameter named keyword in a SELECT statement, do this:
SELECT * FROM mytbl WHERE keyword = ?
Placeholders and encoding functions apply only to SQL data values. One issue not addressed by them is how to handle web input used for other kinds of query elements such as the names of databases, tables, and columns. If you intend to insert such values into a query, you must insert them literally, which means you should check them first. For example, if you construct a query such as the following, you should verify that $tbl_name contains a reasonable value:
SELECT * FROM $tbl_name;
But what does "reasonable" mean? If you don't have tables containing strange characters in their names, it may be sufficient to make sure that $tbl_name contains only alphanumeric characters or underscores. An alternative is to issue a SHOW TABLES query to make sure that the table name in question is in the database. This is more foolproof, at the cost of an additional query.
Another issue not covered by placeholder techniques involves a question of interpretation: if a form field is optional, what should you store in the database if the user leaves the field empty? Perhaps the value represents an empty stringor perhaps it should be interpreted as NULL. One way to resolve this question is to consult the column metadata. If the column can contain NULL values, then interpret an empty field as NULL. Otherwise, take an empty field to mean an empty string.
18.8.4 See Also
Several later sections in this chapter illustrate how to incorporate web input into queries. Recipe 18.9 shows how to upload files and load them into MySQL. Recipe 18.10 demonstrates a simple search application using input as search keywords. Recipe 18.11 and Recipe 18.12 process parameters in URLs.
Using the mysql Client Program
Writing MySQL-Based Programs
Record Selection Techniques
Working with Strings
Working with Dates and Times
Sorting Query Results
Generating Summaries
Modifying Tables with ALTER TABLE
Obtaining and Using Metadata
Importing and Exporting Data
Generating and Using Sequences
Using Multiple Tables
Statistical Techniques
Handling Duplicates
Performing Transactions
Introduction to MySQL on the Web
Incorporating Query Resultsinto Web Pages
Processing Web Input with MySQL
Using MySQL-Based Web Session Management
Appendix A. Obtaining MySQL Software
Appendix B. JSP and Tomcat Primer
Appendix C. References