Recipe 2.5. Handling Special Characters and NULL Values in Statements


Problem

You need to construct SQL statements that refer to data values containing special characters such as quotes or backslashes, or special values such as NULL. Or you are constructing statements using data obtained from external sources and want to avoid being subject to SQL injection attacks.

Solution

Use your API's placeholder mechanism or quoting function to make data safe for insertion.

Discussion

Up to this point in the chapter, our statements have used "safe" data values that require no special treatment. For example, we can easily construct the following SQL statements from within a program by putting the data values literally into the statement strings:

SELECT * FROM profile WHERE age > 40 AND color = 'green' INSERT INTO profile (name,color) VALUES('Gary','blue') 

However, some data values are not so easily handled and can cause problems if you are not careful. Statements might use values that contain special characters such as quotes, backslashes , binary data, or values that are NULL. The following discussion describes the difficulties caused by these types of values and the proper methods for handling them.

Suppose that you want to execute this INSERT statement:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('Alison','1973-01-12','blue','eggroll',4); 

There's nothing unusual about that. But if you change the name column value to something like De'Mont that contains a single quote, the statement becomes syntactically invalid:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12','blue','eggroll',4); 

The problem is that there is a single quote inside a single-quoted string. To make the statement legal, the quote could be escaped by preceding it either with a single quote or with a backslash:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De''Mont','1973-01-12','blue','eggroll',4); 

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De\'Mont','1973-01-12','blue','eggroll',4); 

Alternatively, you could quote the name value itself within double quotes rather than within single quotes (assuming that the ANSI_QUOTES SQL mode is not enabled):

INSERT INTO profile (name,birth,color,foods,cats) VALUES("De'Mont",'1973-01-12','blue','eggroll',4); 

If you are writing a statement literally in your program, you can escape or quote the name value by hand because you know what the value is. But if a variable holds the name value, you don't necessarily know what the variable's value is. Worse yet, single quote isn't the only character you must be prepared to deal with; double quotes and backslashes cause problems, too. And if you want to store binary data such as images or sound clips in your database, a value might contain anythingnot just quotes or backslashes, but other characters such as nulls (zero-valued bytes). The need to handle special characters properly is particularly acute in a web environment where statements are constructed using form input (for example, if you're searching for rows that match search terms entered by the remote user). You must be able to handle any kind of input in a general way, because you can't predict in advance what kind of information a user will supply. In fact, it is not uncommon for malicious users to enter garbage values containing problematic characters in a deliberate attempt to compromise the security of your server. That is a standard technique for finding insecure scripts that can be exploited.

The SQL NULL value is not a special character, but it too requires special treatment. In SQL, NULL indicates "no value." This can have several meanings depending on context, such as "unknown," "missing," "out of range," and so forth. Our statements thus far have not used NULL values, to avoid dealing with the complications that they introduce, but now it's time to address these issues. For example, if you don't know De'Mont's favorite color, you can set the color column to NULLbut not by writing the statement like this:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De''Mont','1973-01-12','NULL','eggroll',4); 

Instead, the NULL value should have no enclosing quotes:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De''Mont','1973-01-12',NULL,'eggroll',4); 

If you were writing the statement literally in your program, you'd simply write the word "NULL" without enclosing quotes. But if the color value comes from a variable, the proper action is not so obvious. You must know something about the variable's value to be able to determine whether to enclose it within quotes when you construct the statement.

There are two general means at your disposal for dealing with special characters such as quotes and backslashes, and with special values such as NULL:

  • Use placeholders in the statement string to refer to data values symbolically, and then bind the data values to the placeholders when you execute the statement. Generally, this is the preferred method because the API itself will do all or most of the work for you of providing quotes around values as necessary, quoting or escaping special characters within the data value, and possibly interpreting a special value to map onto NULL without enclosing quotes.

  • Use a quoting function (if your API provides one) for converting data values to a safe form that is suitable for use in statement strings.

This section shows how to use these techniques to handle special characters and NULL values for each API. One of the examples demonstrated here shows how to insert a profile table row that contains De'Mont for the name value and NULL for the color value. However, the principles shown here have general utility and handle any special characters, including those found in binary data. (See Chapter 18 for examples showing how to work with images, which are one kind of binary data.) Also, the principles are not limited to INSERT statements. They work for other kinds of statements as well, such as SELECT. One of the other examples shown here demonstrates how to execute a SELECT statement using placeholders.

Special characters come up in other contexts that are not covered here:

  • The placeholder and quoting techniques described here are only for data values and not for identifiers such as database or table names. For discussion of the problem of quoting identifiers, refer to Section 2.6.

  • This section covers the issue of getting special characters into your database. A related issue not covered here is the inverse operation of transforming special characters in values returned from your database for display in various contexts. For example, if you're generating HTML pages that include values taken from your database, you have to convert < and > characters in those values to the HTML entities &lt; and &gt; to make sure they display properly. Section 17.4 discusses that topic.

Using placeholders

Placeholders enable you to avoid writing data values literally into SQL statements. Using this approach, you write the statement using placeholdersspecial characters that indicate where the values go. One common placeholder character is ?. For APIs that use that character, the INSERT statement would be rewritten to use placeholders like this:

INSERT INTO profile (name,birth,color,foods,cats) VALUES(?,?,?,?,?) 

You then pass the statement string to the database and supply the data values separately. The values are bound to the placeholders to replace them, resulting in a statement that contains the data values.

One of the benefits of using placeholders is that parameter binding operations automatically handle escaping of characters such as quotes and backslashes. This can be especially useful if you're inserting binary data such as images into your database or using data values with unknown content such as input submitted by a remote user through a form on a web page. Also, there is usually some special value that you can bind to a placeholder to indicate that you want an SQL NULL value in the resulting statement.

A second benefit of placeholders is that you can "prepare" a statement in advance and then reuse it by binding different values to it each time it's executed. Some database interfaces have this capability, which allows some preparsing or even execution planning to be done prior to executing the statement. For a statement that is executed multiple times later, this reduces overhead because anything that can be done prior to execution need be done only once, not once per execution. Prepared statements thus encourage statement reuse. Statements become more generic because they contain placeholders rather than specific data values. If you're executing an operation over and over, you may be able to reuse a prepared statement and simply bind different data values to it each time you execute it. If so, you gain a performance benefit, at least for database systems that support query planning. For example, if a program issues a particular type of SELECT statement several times while it runs, such a database system can construct a plan for the statement and then reuse it each time, rather than rebuild the plan over and over. MySQL doesn't build query plans in advance, so you don't get any performance boost from using prepared statements. However, if you port a program to a database that does not use query plans and you've written your program to use prepared statements, you can get this advantage of prepared statements automatically. You don't have to convert from nonprepared statements to enjoy that benefit.

A third benefit is that code that uses placeholder-based statements can be easier to read, although that's somewhat subjective. As you read through this section, compare the statements used here with those from Section 2.4 that did not use placeholders to see which you prefer.

Generating a List of Placeholders

You cannot bind an array of data values to a single placeholder. Each value must be bound to a separate placeholder. If you want to use placeholders for a list of data values that may vary in number, you must construct a list of placeholder characters. For example, in Perl, the following statement creates a string consisting of n placeholder characters separated by commas:

$str = join (",", ("?") x n); 

The x repetition operator, when applied to a list, produces n copies of the list, so the join⁠(⁠ ⁠ ⁠) call joins these lists to produce a single string containing n comma-separated instances of the ? character. This is handy when you want to bind an array of data values to a list of placeholders in a statement string because the size of the array indicates how many placeholder characters are needed:

$str = join (",", ("?") x @values); 

In Ruby, the * operator can be used to similar effect:

str = (["?"] * values.size).join(",") 

Another Perl method of generating a list of placeholders that is perhaps less cryptic looks like this:

$str = "?" if @values; for (my $i = 1; $i < @values; $i++) {   $str .= ",?"; } 

This method's syntax is less Perl-specific and therefore easier to translate into other languages. For example, the equivalent method in Python looks like this:

str = "" if len (values) > 0:   str = "?" for i in range (1, len (values)):   str = str + ",?" 


Using a quoting function

Some APIs provide a quoting function that takes a data value as its argument and returns a properly quoted and escaped value suitable for safe insertion into an SQL statement. This is less common than using placeholders, but it can be useful for constructing statements that you do not intend to execute immediately. However, you do need to have a connection open to the database server while you use such a quoting function because the proper quoting rules cannot be selected until the database driver is known. (Some database systems have different quoting rules from others.)

Perl

To use placeholders in Perl DBI scripts, put a ? in your SQL statement string at each location where you want to insert a data value, and then bind the values to the statement. You can bind values by passing them to do⁠(⁠ ⁠ ⁠) or execute⁠(⁠ ⁠ ⁠), or by calling a DBI method specifically intended for placeholder substitution.

With do⁠(⁠ ⁠ ⁠), you can add the profile row for De'Mont by passing the statement string and the data values in the same call:

my $count = $dbh->do ("INSERT INTO profile (name,birth,color,foods,cats)                        VALUES(?,?,?,?,?)",                       undef,                       "De'Mont", "1973-01-12", undef, "eggroll", 4); 

The arguments after the statement string should be undef followed by the data values, one value for each placeholder. (The undef argument that follows the statement string is a historical artifact, but it must be present.)

Alternatively, use prepare⁠(⁠ ⁠ ⁠) plus execute⁠(⁠ ⁠ ⁠). Pass the statement string to prepare⁠(⁠ ⁠ ⁠) to get a statement handle, and then use that handle to pass the data values via execute⁠(⁠ ⁠ ⁠):

my $sth = $dbh->prepare ("INSERT INTO profile (name,birth,color,foods,cats)                           VALUES(?,?,?,?,?)"); my $count = $sth->execute ("De'Mont", "1973-01-12", undef, "eggroll", 4); 

If you need to issue the same statement over and over again, you can use prepare⁠(⁠ ⁠ ⁠) once and call execute⁠(⁠ ⁠ ⁠) each time you need to run the statement.

In either case, the resulting statement generated by DBI is as follows:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De\'Mont','1973-01-12',NULL,'eggroll','4') 

Note how DBI adds quotes around data values, even though there were none around the ? placeholder characters in the original statement string. (The placeholder mechanism adds quotes around numeric values, too, but that's okay because the MySQL server performs type conversion as necessary to convert strings to numbers.) Also note the DBI convention that when you bind undef to a placeholder, DBI puts a NULL into the statement and correctly refrains from adding enclosing quotes.

You can use these methods for other types of statements as well. For example, the following SELECT statement uses a placeholder to look for rows that have a cats value larger than 2:

my $sth = $dbh->prepare ("SELECT * FROM profile WHERE cats > ?"); $sth->execute (2); while (my $ref = $sth->fetchrow_hashref ()) {   print "id: $ref->{id}, name: $ref->{name}, cats: $ref->{cats}\n"; } 

Another way to bind values to placeholders is to use the bind_param⁠(⁠ ⁠ ⁠) call. It takes two arguments: a placeholder position and a value to bind to the placeholder at that position. (Placeholder positions begin with 1, not 0.) The preceding INSERT and SELECT examples can be rewritten to use bind_param⁠(⁠ ⁠ ⁠) as follows:

my $sth = $dbh->prepare ("INSERT INTO profile (name,birth,color,foods,cats)                           VALUES(?,?,?,?,?)"); $sth->bind_param (1, "De'Mont"); $sth->bind_param (2, "1973-01-12"); $sth->bind_param (3, undef); $sth->bind_param (4, "eggroll"); $sth->bind_param (5, 4); my $count = $sth->execute (); 

my $sth = $dbh->prepare ("SELECT * FROM profile WHERE cats > ?"); $sth->bind_param (1, 2); $sth->execute (); while (my $ref = $sth->fetchrow_hashref ()) {   print "id: $ref->{id}, name: $ref->{name}, cats: $ref->{cats}\n"; } 

No matter which method you use for placeholders, don't put any quotes around the ? characters, not even for placeholders that represent strings. DBI adds quotes as necessary on its own. In fact, if you do put quotes around the placeholder character, DBI interprets it as the literal string constant "?", not as a placeholder.

The high-level retrieval methods such as selectrow_array⁠(⁠ ⁠ ⁠) and selectall_arrayref⁠(⁠ ⁠ ⁠) can be used with placeholders, too. Like the do⁠(⁠ ⁠ ⁠) method, the arguments are the statement string and undef, followed by the data values to bind to the placeholders that occur in the statement string. Here's an example:

my $ref = $dbh->selectall_arrayref (                   "SELECT name, birth, foods FROM profile                    WHERE id > ? AND color = ?",                   undef, 3, "green"); 

Perl DBI also provides a quote⁠(⁠ ⁠ ⁠) database handle method as an alternative to using placeholders. Here's how to use quote⁠(⁠ ⁠ ⁠) to create a statement string that inserts a new row in the profile table:

my $stmt = sprintf ("INSERT INTO profile (name,birth,color,foods,cats)                      VALUES(%s,%s,%s,%s,%s)",                     $dbh->quote ("De'Mont"),                     $dbh->quote ("1973-01-12"),                     $dbh->quote (undef),                     $dbh->quote ("eggroll"),                     $dbh->quote (4)); my $count = $dbh->do ($stmt); 

The statement string generated by this code is the same as when you use placeholders. The %s format specifiers are written without enclosing quotes because quote⁠(⁠ ⁠ ⁠) provides them automatically as necessary: non-undef values are inserted with quotes, and undef values are inserted as NULL without quotes.

Ruby

Ruby DBI uses ? as the placeholder character in SQL statements and nil as the value to use for binding an SQL NULL value to a placeholder.

To use placeholders with do, pass the statement string followed by the data values to bind to the placeholders:

count = dbh.do("INSERT INTO profile (name,birth,color,foods,cats)                 VALUES(?,?,?,?,?)",               "De'Mont", "1973-01-12", nil, "eggroll", 4) 

Alternatively, pass the statement string to prepare to get a statement handle, and then use that handle to invoke execute with the data values:

sth = dbh.prepare("INSERT INTO profile (name,birth,color,foods,cats)                    VALUES(?,?,?,?,?)") count = sth.execute("De'Mont", "1973-01-12", nil, "eggroll", 4) 

Regardless of how you construct the statement, DBI includes a properly escaped quote and a properly unquoted NULL value:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De\'Mont','1973-01-12',NULL,'eggroll',4) 

The approach that uses prepare plus execute is useful for a statement that is executed multiple times with different data values. For a statement to be executed just once, you can skip the prepare step. Pass the statement string to the database handle execute method, followed by the data values. The following example processes a SELECT statement this way:

sth = dbh.execute("SELECT * FROM profile WHERE cats > ?", 2) sth.fetch do |row|   printf "id: %s, name: %s, cats: %s\n", row["id"], row["name"], row["cats"] end sth.finish 

The Ruby DBI placeholder mechanism provides quotes around data values as necessary when they are bound to the statement string, so do not put quotes around the ? characters in the string.

The Ruby DBI quote⁠(⁠ ⁠ ⁠) database handle method is an alternative to placeholders. The following example uses quote⁠(⁠ ⁠ ⁠) to produce the INSERT statement for De'Mont:

stmt = sprintf "INSERT INTO profile (name,birth,color,foods,cats)                 VALUES(%s,%s,%s,%s,%s)",                dbh.quote("De'Mont"),                dbh.quote("1973-01-12"),                dbh.quote(nil),                dbh.quote("eggroll"),                dbh.quote(4) count = dbh.do(stmt) 

The statement string generated by this code is the same as when you use placeholders. The %s format specifiers are written without enclosing quotes because quote⁠(⁠ ⁠ ⁠) provides them automatically as necessary: non-nil values are inserted with quotes, and nil values are inserted as NULL without quotes.

PHP

The PEAR DB module allows placeholders to be used with the query⁠(⁠ ⁠ ⁠) method that executes SQL statements, or you can use prepare⁠(⁠ ⁠ ⁠) to prepare a statement, and execute⁠(⁠ ⁠ ⁠) to supply the data values and execute the prepared statement. PEAR DB uses ? as the placeholder marker in SQL statements and the PHP NULL as the value to use when binding an SQL NULL value to a placeholder.

With query⁠(⁠ ⁠ ⁠), pass the statement string followed by an array that contains the data values to bind to the placeholders:

$result =& $conn->query ("INSERT INTO profile (name,birth,color,foods,cats)                           VALUES(?,?,?,?,?)",                          array ("De'Mont","1973-01-12",NULL,"eggroll",4)); if (PEAR::isError ($result))   die ("Oops, the statement failed"); 

Alternatively, pass the statement string to prepare⁠(⁠ ⁠ ⁠) to get a statement object. Pass this object and the array of data values to execute⁠(⁠ ⁠ ⁠):

$stmt =& $conn->prepare ("INSERT INTO profile (name,birth,color,foods,cats)                           VALUES(?,?,?,?,?)"); if (PEAR::isError ($stmt))   die ("Oops, statement preparation failed"); $result =& $conn->execute ($stmt,                            array ("De'Mont","1973-01-12",NULL,"eggroll",4)); if (PEAR::isError ($result))   die ("Oops, statement execution failed"); 

The statement constructed either way includes a properly escaped quote and a properly unquoted NULL value:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De\'Mont','1973-01-12',NULL,'eggroll',4) 

If there is only one placeholder, the array of data values has only a single member:

$result =& $conn->query ("SELECT * FROM profile WHERE cats > ?", array (2)); if (PEAR::isError ($result))   die ("Oops, the statement failed"); while ($row =& $result->fetchRow (DB_FETCHMODE_ASSOC)) {   printf ("id: %s, name: %s, cats: %s\n",       $row["id"], $row["name"], $row["cats"]); } $result->free (); 

In that case, you can specify the data value without using an array. The following two query⁠(⁠ ⁠ ⁠) invocations are equivalent:

$result =& $conn->query ("SELECT * FROM profile WHERE cats > ?", array (2)); $result =& $conn->query ("SELECT * FROM profile WHERE cats > ?", 2) 

The PEAR DB placeholder mechanism provides quotes around data values as necessary when they are bound to the statement string, so do not put quotes around the ? characters in the string.

The PEAR DB quoteSmart⁠(⁠ ⁠ ⁠) method can be used instead of placeholders for quoting data values. The following example inserts the row for De'Mont after using quoteSmart⁠(⁠ ⁠ ⁠) to construct the statement string:

$stmt = sprintf ("INSERT INTO profile (name,birth,color,foods,cats)                   VALUES(%s,%s,%s,%s,%s)",                    $conn->quoteSmart ("De'Mont"),                    $conn->quoteSmart ("1973-01-12"),                    $conn->quoteSmart (NULL),                    $conn->quoteSmart ("eggroll"),                    $conn->quoteSmart (4)); $result =& $conn->query ($stmt); if (PEAR::isError ($result))   die ("Oops, the statement failed"); 

PEAR DB also has an escapeSimple⁠(⁠ ⁠ ⁠) quoting method, but it's inferior to quoteSmart⁠(⁠ ⁠ ⁠). For example, it doesn't handle NULL values properly.

Python

Python's MySQLdb module implements placeholders using format specifiers in the SQL statement string. To use placeholders, invoke the execute⁠(⁠ ⁠ ⁠) method with two arguments: a statement string containing format specifiers and a sequence containing the values to bind to the statement string. To add the profile table row for De'Mont, the code looks like this:

cursor = conn.cursor () cursor.execute ("""                 INSERT INTO profile (name,birth,color,foods,cats)                 VALUES(%s,%s,%s,%s,%s)                 """, ("De'Mont", "1973-01-12", None, "eggroll", 4)) 

Some of the Python DB-API driver modules support several format specifiers (such as %d for integers and %f for floating-point numbers). With MySQLdb, you should use a placeholder of %s to format all data values as strings. MySQL will perform type conversion as necessary. To place a literal % character into the statement, use %% in the statement string.

The parameter binding mechanism adds quotes around data values where necessary. DB-API treats None as logically equivalent to the SQL NULL value, so you can bind None to a placeholder to produce a NULL in the statement string. The statement that is sent to the server by the preceding execute⁠(⁠ ⁠ ⁠) call looks like this:

INSERT INTO profile (name,birth,color,foods,cats) VALUES('De\'Mont','1973-01-12',NULL,'eggroll',4) 

If you have only a single value val to bind to a placeholder, you can write it as a sequence using the syntax ( val ,). The following SELECT statement demonstrates this:

cursor = conn.cursor () cursor.execute ("SELECT * FROM profile WHERE cats = %s", (2,)) for row in cursor.fetchall ():   print row cursor.close () 

Python's placeholder mechanism provides quotes around data values as necessary when they are bound to the statement string, so do not put quotes around the %s format specifiers in the string.

With MySQLdb, an alternative method of quoting data values is to use the literal⁠(⁠ ⁠ ⁠) method. To produce the INSERT statement for De'Mont by using literal⁠(⁠ ⁠ ⁠), do this:

cursor = conn.cursor () stmt = """        INSERT INTO profile (name,birth,color,foods,cats)        VALUES(%s,%s,%s,%s,%s)        """ % \          (conn.literal ("De'Mont"), \          conn.literal ("1973-01-12"), \          conn.literal (None), \          conn.literal ("eggroll"), \          conn.literal (4)) cursor.execute (stmt) 

Java

JDBC provides support for placeholders if you use prepared statements. Recall that the process for issuing nonprepared statements in JDBC is to create a Statement object and then pass the statement string to one of the statement-issuing functions executeUpdate⁠(⁠ ⁠ ⁠) , executeQuery⁠(⁠ ⁠ ⁠), or execute⁠(⁠ ⁠ ⁠). To use a prepared statement instead, create a PreparedStatement object by passing a statement string containing ? placeholder characters to your connection object's prepareStatement⁠(⁠ ⁠ ⁠) method. Then bind your data values to the statement using set XXX ⁠(⁠ ⁠ ⁠) methods. Finally, execute the statement by calling executeUpdate⁠(⁠ ⁠ ⁠), executeQuery⁠(⁠ ⁠ ⁠), or execute⁠(⁠ ⁠ ⁠) with an empty argument list.

Here is an example that uses executeUpdate⁠(⁠ ⁠ ⁠) to issue an INSERT statement that adds the profile table row for De'Mont:

PreparedStatement s; int count; s = conn.prepareStatement (             "INSERT INTO profile (name,birth,color,foods,cats)"             + " VALUES(?,?,?,?,?)"); s.setString (1, "De'Mont");         // bind values to placeholders s.setString (2, "1973-01-12"); s.setNull (3, java.sql.Types.CHAR); s.setString (4, "eggroll"); s.setInt (5, 4); count = s.executeUpdate (); s.close ();   // close statement 

The set XXX ⁠(⁠ ⁠ ⁠) methods that bind data values to statements take two arguments: a placeholder position (beginning with 1, not 0) and the value to bind to the placeholder. Choose each value-binding call to match the data type of the column to which the value is bound: setString⁠(⁠ ⁠ ⁠) to bind a string to the name column, setInt⁠(⁠ ⁠ ⁠) to bind an integer to the cats column, and so forth. (Actually, I cheated a bit by using setString⁠(⁠ ⁠ ⁠) to treat the date value for birth as a string.)

One difference between JDBC and the other APIs is that you don't specify a special value to bind a NULL to a placeholder by specifying some special value (such as undef in Perl or nil in Ruby). Instead, you invoke a special method setNull⁠(⁠ ⁠ ⁠), in which the second argument indicates the type of the column (java.sql.Types.CHAR for a string, java.sql.Types.INTEGER for an integer, and so forth).

The set XXX ⁠(⁠ ⁠ ⁠) calls add quotes around data values if necessary, so do not put quotes around the ? placeholder characters in the statement string.

For a statement that returns a result set, the preparation process is similar, but you execute the prepared statement with executeQuery⁠(⁠ ⁠ ⁠) instead:

PreparedStatement s; s = conn.prepareStatement ("SELECT * FROM profile WHERE cats > ?"); s.setInt (1, 2);  // bind 2 to first placeholder s.executeQuery (); // ... process result set here ... s.close ();     // close statement 




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net