Overview of Administrative Duties


Perl DBI Overview

This section provides background information for DBIthe information you'll need for writing your own scripts and for understanding scripts written by others. If you're already familiar with DBI, you may want to skip directly to the section "Putting DBI to Work."

DBI Data Types

In some ways, using the Perl DBI API is similar to using the C client library described in Chapter 6, "Writing MySQL Programs Using C." When you use the C client library, you call functions and access MySQL-related data primarily by means of pointers to structures or to arrays. When you use the DBI API, you also call functions and use pointers to structures, except that functions are called "methods," pointers are called "references," pointer variables are called "handles," and the structures that handles point to are called " objects."

DBI uses several kinds of handles. These tend to be referred to in DBI documentation by the conventional names shown in Table 7.1. The way you use these handles will become clear as we go along. Several conventional names for non-handle variables are used as well (see Table 7.2). This chapter doesn't actually use every one of these variable names, but it's useful to know them when you read DBI scripts written by other people.

Table 7.1. Conventional Perl DBI Handle Variable Names

Name

Meaning

$dbh

A handle to a database object

$sth

A handle to a statement (query) object

$fh

A handle to an open file

$h

A "generic" handle; the meaning depends on context


Table 7.2. Conventional Perl DBI Non-Handle Variable Names

Name

Meaning

$rc

The return code from operations that return true or false

$rv

The return value from operations that return an integer

$rows

The return value from operations that return a row count

@ary

An array (list) representing a row of values returned by a query


A Simple DBI Script

Let's start with a simple script, dump_members.pl, that illustrates several standard concepts in DBI programming, such as connecting to and disconnecting from the MySQL server, issuing SQL statements, and retrieving data. This script produces output consisting of the Historical League's member list in tab-delimited format. The format is not so interesting in itself. At this point, it's more important see how to use DBI than to produce pretty output.

dump_members.pl looks like this:

 #! /usr/bin/perl -w # dump_members.pl - dump Historical League's membership list use strict; use DBI; my $dsn = "DBI:mysql:sampdb:localhost"; # data source name my $user_name = "sampadm";              # username my $password = "secret";                # password # connect to database my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 1, PrintError => 0, AutoCommit => 1 }); # issue query my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"     . "street, city, state, zip, phone FROM member ORDER BY last_name"); $sth->execute (); # read results of query while (my @ary = $sth->fetchrow_array ()) {     print join ("\t", @ary), "\n"; } $sth->finish (); $dbh->disconnect (); exit (0); 

To try out the script for yourself, either use the copy that's included in the sampdb distribution or create it using a text editor. (If you use a word processor, be sure to save the script as plain text. Don't save it in the word processor's native format.) You'll probably need to change at least some of the connection parameters, such as the hostname, database name, username, or password. (That will be true for other scripts in this chapter that name the connection parameters as well.) Later, in "Specifying Connection Parameters," we'll see how to get parameters from an option file instead of putting them directly in the script.

Now let's go through the script a piece at a time. The first line contains the standard where-to-find-Perl indicator:

 #! /usr/bin/perl -w 

This line is part of every script we'll discuss in this chapter; I won't mention it further.

It's a good idea to include in a script at least a minimal description of its purpose, so the next line is a comment to give anyone who looks at the script a clue about what it does:

 # dump_members.pl - dump Historical League's membership list 

Text from a '#' character to the end of a line is considered a comment. It's a useful practice to sprinkle comments throughout your scripts to explain how they work.

Next we have a couple of use statements:

 use strict; use DBI; 

use strict tells Perl to require you to declare variables before using them. You can write scripts without putting in a use strict line, but it's useful for catching mistakes, so I recommend you always include it. For example, if you declare a variable $my_var but then later erroneously refer to it as $mv_var, the following message will appear when you run the script in strict mode:

 Global symbol "$mv_var" requires explicit package name at line n 

When you see that, you think, "Huh? I never used any variable named $mv_var!" Then you look at line n of your script, see that you misspelled $my_var as $mv_var, and fix it. Without strict mode, Perl won't squawk about $mv_var; it simply creates a new variable by that name with a value of undef (undefined) and uses it without complaint. And you're left to wonder why your script doesn't work.

use DBI tells the Perl interpreter that it needs to pull in the DBI module. Without this line, an error occurs as soon as you try to do anything DBI-related in the script. You don't have to indicate which DBD-level driver module to use, though. DBI activates the right one for you when you connect to your database.

Because we're operating in strict mode, we must declare the variables the script uses, by means of the my keyword (think of it as though the script is saying "I am explicitly indicating that these are my variables"). The next section of the script sets up the variables that specify connection parameters, and then uses them to connect to the database:

 my $dsn = "DBI:mysql:sampdb:localhost"; # data source name my $user_name = "sampadm";              # username my $password = "secret";                # password # connect to database my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 1, PrintError => 0, AutoCommit => 1 }); 

The connect() call is invoked as DBI->connect() because it's a method of the DBI class. You don't really have to know what that means; it's just a little object-oriented jargon to make your head hurt. (If you do want to know, it means that connect() is a function that "belongs" to DBI.) connect() takes several arguments:

  • The data source, also known as the "data source name," or "DSN." Data source formats are determined by the requirements of the particular DBD module you want to use. For the MySQL driver, allowable formats include either of the following:

     DBI:mysql:db_name DBI:mysql:db_name:host_name 

    The capitalization of DBI doesn't matter, but mysql must be lowercase. db_name represents the name of the database you want to use and host_name indicates the host where the server is running. If you omit the hostname, it defaults to localhost. (There actually are other allowable data source formats, which we'll discuss later in "Specifying Connection Parameters.")

  • The username and password for your MySQL account.

  • An optional argument indicating additional connection attributes. If it is given, this argument should be passed as a reference to a hash that specifies connection attribute names and values. The weird-looking construct we've specified creates a reference to a hash that enables the RaiseError attribute and disables PrintError. These settings cause DBI to check for database-related errors and exit with an error message if it detects one. (That's why you don't see error-checking code anywhere in the dump_members.pl script; DBI handles it all.) The section "Handling Errors" covers alternative methods of responding to errors.

    The attribute hash also enables the AutoCommit attribute. This is not strictly necessary currently, but it does make explicit that the script enables autocommit mode for transaction handling. The script doesn't include any explicit transactions, but there is some possibility that DBI will in the future require scripts to specify the AutoCommit attribute explicitly. Doing so in scripts now ensures that they are ready if such a change does occur.

If the connect() call succeeds, it returns a database handle, which we assign to $dbh. By default, connect() returns undef if it fails. However, because the script enables RaiseError, DBI will exit after displaying an error message if something goes wrong in the connect() call. (This is true for other DBI methods, too; I'll describe what they return to indicate an error, but they won't return at all if RaiseError is enabled.)

After connecting to the database, dump_members.pl issues a SELECT query to retrieve the membership list, and then executes a loop to process each of the rows returned. These rows constitute the result set. To perform a SELECT, you prepare the query first, and then execute it:

 # issue query my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"     . "street, city, state, zip, phone FROM member ORDER BY last_name"); $sth->execute (); 

prepare() is invoked using the database handle; it passes the SQL statement to the driver for preprocessing before execution. Some drivers actually do something with the statement at this point. Others just remember it until you invoke execute() to cause the statement to be performed. The return value from prepare() is a statement handle, here assigned to $sth. The statement handle is used for all further processing related to the statement.

Notice that the query string itself has no terminating semicolon. You no doubt have developed the habit (through long hours of interaction with the mysql program) of terminating SQL statements with a ';' character. However, it's best to break yourself of that habit when using DBI, because semicolons often cause statements to fail with syntax errors. The same applies to adding \g or \G to statementsdon't. Those statement terminators are conventions of mysql and are not used when issuing statements in DBI scripts. The end of the query string implicitly terminates the query and no explicit terminator is necessary.

When you invoke a method without passing it any arguments, you can leave off the parentheses. These two calls are equivalent:

 $sth->execute (); $sth->execute; 

I prefer to include the parentheses because it makes the call look less like a variable reference to me. Your preference may be different.

After you call execute(), the rows of the membership list are available for processing. In the dump_members.pl script, the row-fetching loop simply prints the contents of each row as a tab-delimited set of values:

 # read results of query while (my @ary = $sth->fetchrow_array ()) {     print join ("\t", @ary), "\n"; } $sth->finish (); 

fetchrow_array() returns an array containing the column values of the current row, or an empty array when there are no more rows. Thus, the loop fetches successive rows returned by the SELECT statement and prints each one with tabs between column values. NULL values in the database are returned as undef values to the Perl script, but these print as empty strings, not as the word "NULL". undef column values also have another effect when you run the script; they result in warnings like this from the Perl interpreter:

 Use of uninitialized value in join at dump_members.pl line n. 

These warnings are triggered by the use of the -w option on the first line of the script. If you remove the option and run the script again, the warnings will go away. However, -w is useful for discovering problems (such as printing uninitialized variables!), so a better way to eliminate the warnings is to detect and deal with undef values. Some techniques for this are discussed in the "Handling Statements That Return a Result Set" section later in this chapter.

In the print statement, note that the tab and newline characters (represented as the \t and \n sequences, respectively) are enclosed in double quotes. In Perl, escape sequences are interpreted only when they occur within double quotes, not within single quotes. If single quotes had been used, the output would be full of literal instances of \t and \n.

After the row-fetching loop terminates, the call to finish() indicates that the statement handle is no longer needed and that any temporary resources allocated to it can be freed. In this script, the call to finish() is for illustrative purposes only. It's not actually necessary to invoke it here, because the row-fetching call will do so automatically when it encounters the end of the result set. finish() is more useful in situations where you fetch only part of the result set and do not reach its end (for example, if for some reason you fetch only the first row). The examples from this point on do not use finish() unless it's necessary.

Having printed the membership list, we're done, so we can disconnect from the server and exit:

 $dbh->disconnect (); exit (0); 

dump_members.pl illustrates a number of concepts that are common to most DBI programs, and at this point you could probably start writing your own DBI programs without knowing anything more. For example, to write out the contents of some other table, all you'd need to do is change the text of the SELECT statement that is passed to the prepare() method. And in fact, if you want to see some applications of this technique, you can skip ahead immediately to the part of the section "Putting DBI to Work" that discusses how to generate the member list for the Historical League annual banquet program and the League's printed directory. However, DBI provides many other useful capabilities. The next sections cover some of these in more detail so that you can see how to do more than run simple SELECT statements in your Perl scripts.

Handling Errors

dump_members.pl turned on the RaiseError error-handling attribute when it invoked the connect() method so that errors would automatically terminate the script with an error message rather than just returning error codes. It's possible to handle errors in other ways. For example, you can check for errors yourself rather than having DBI do it.

To see how to control DBI's error-handling behavior, let's take a closer look at the connect() call's final argument:

 # connect to database my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 1, PrintError => 0, AutoCommit => 1 }); 

The two attributes relevant for error handling are RaiseError and PrintError:

  • If RaiseError is enabled (set to a non-zero value), DBI raises an exception if an error occurs in a DBI method. By default, this results in a call to die() to print a message and exit the script.

  • If PrintError is enabled, DBI calls warn() to print a message when a DBI error occurs, but the script continues executing.

By default, RaiseError is disabled and PrintError is enabled. In this case, if the connect() call fails, DBI prints a message but continues executing. Thus, with the default error-handling behavior that you get if you omit the fourth argument to connect(), you might check for errors like this:

 my $dbh = DBI->connect ($dsn, $user_name, $password)     or exit (1); 

If an error occurs, connect() returns undef to indicate failure, and that triggers the call to exit(). You don't have to print an error message because DBI already will have printed one.

If you were to explicitly specify the default values for the error-checking attributes, the call to connect() would look like this:

 my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 0, PrintError => 1, AutoCommit => 1 })     or exit (1); 

That's more work to write out, but it's also more obvious to the casual reader what the error-handling behavior is.

To check for errors yourself and print your own messages, disable both RaiseError and PrintError:

 my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 0, PrintError => 0, AutoCommit => 1 })     or die "Could not connect to server: $DBI::err ($DBI::errstr)\n"; 

The $DBI::err and $DBI::errstr, variables used in the code just shown are useful for constructing error messages. They contain the MySQL error code and error string, much like the C API functions mysql_errno() and mysql_error(). If no error occurred, $DBI::err will be 0 or undef, and $DBI::errstr will be the empty string or undef. (In other words, both variables will be false.)

If you want DBI to handle errors for you so that you don't have to check for them yourself, enable RaiseError and disable PrintError:

 my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 1, PrintError => 0, AutoCommit => 1 }); 

This is by far the easiest approach, and it is how almost all scripts presented in this chapter are written. The reason for disabling PrintError when enabling RaiseError is to prevent the possibility of having error messages being printed twice. (If both attributes are enabled, the DBI handlers for both may be called under some circumstances.)

Enabling RaiseError may not be appropriate if you want to execute some sort of cleanup code of your own when the script exits, although in this case you may be able to do what you want by redefining the $SIG{__DIE__} signal handler. Another reason you might want to avoid enabling the RaiseError attribute is that DBI prints technical information in its messages, like this:

 disconnect(DBI::db=HASH(0x197aae4)) invalidates 1 active statement. Either destroy statement handles or call finish on them before disconnecting. 

That's useful information for a programmer, but it might not be the kind of thing you want to present to the everyday user. In that case, it can be better to check for errors yourself so that you can display messages that are more meaningful to the people you expect to be using the script. Or you might consider redefining the $SIG{__DIE__} handler here, too. That may be useful because it allows you to enable RaiseError to simplify error handling, but replace the default error messages that DBI presents with your own messages. To provide your own __DIE__ handler, do something like the following before executing any DBI calls:

 $SIG{__DIE__} = sub { die "Sorry, an error occurred\n"; }; 

You can also define a subroutine in the usual fashion and set the signal handler value using a reference to the subroutine:

 sub die_handler {     die "Sorry, an error occurred\n"; } $SIG{__DIE__} = \&die_handler; 

As an alternative to passing connection attributes literally in the connect() call, you can define them using a hash and pass a reference to the hash. Some people find that breaking out the attribute settings this way makes scripts easier to read and edit, but operationally both approaches are the same. Here's an example that shows how to use an attribute hash:

 my %attr = (     PrintError => 0,     RaiseError => 0,     AutoCommit => 1 ); my $dbh = DBI->connect ($dsn, $user_name, $password, \%attr)     or die "Could not connect to server: $DBI::err ($DBI::errstr)\n"; 

The following script, dump_members2.pl, illustrates how you might write a script when you want to check for errors yourself and print your own messages. dump_members2.pl processes the same query as dump_members.pl, but explicitly disables PrintError and RaiseError and then tests the result of every DBI call. When an error occurs, the script invokes the subroutine bail_out() to print a message and the contents of $DBI::err and $DBI::errstr before exiting:

 #! /usr/bin/perl -w # dump_members2.pl - dump Historical League's membership list use strict; use DBI; my $dsn = "DBI:mysql:sampdb:localhost"; # data source name my $user_name = "sampadm";              # username my $password = "secret";                # password my %attr =                              # error-handling attributes (     RaiseError => 0,     PrintError => 0,     AutoCommit => 1 ); # connect to database my $dbh = DBI->connect ($dsn, $user_name, $password, \%attr)     or bail_out ("Cannot connect to database"); # issue query my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"     . "street, city, state, zip, phone FROM member ORDER BY last_name")     or bail_out ("Cannot prepare query"); $sth->execute ()     or bail_out ("Cannot execute query"); # read results of query while (my @ary = $sth->fetchrow_array ()) {     print join ("\t", @ary), "\n"; } !$DBI::err     or bail_out ("Error during retrieval"); $dbh->disconnect ()     or bail_out ("Cannot disconnect from database"); exit (0); # bail_out() subroutine - print error code and string, and then exit sub bail_out { my $message = shift;     die "$message\nError $DBI::err ($DBI::errstr)\n"; } 

bail_out() is similar to the print_error() function we used for writing C programs in Chapter 6, "Writing MySQL Programs Using C," except that bail_out() exits rather than returning to the caller. bail_out() saves you the trouble of writing out the values of $DBI::err and $DBI::errstr every time you want to print an error message. Also, by encapsulating error message printing into a subroutine, you can change the format of your error messages uniformly throughout your script simply by making a change to the subroutine.

The dump_members2.pl script has a test following the row-fetching loop. Because the script doesn't automatically exit if an error occurs in fetchrow_array(), it's prudent to determine whether the loop terminated because the result set was read completely (normal termination) or because an error occurred. The loop terminates either way, of course, but if an error occurs, output from the script will be truncated. Without an error check, the person running the script wouldn't have any idea that anything was wrong! If you're checking for errors yourself, be sure to test the result of your fetch loops.

Handling Statements That Return No Result Set

Statements that do not return rows, such as DELETE, INSERT, REPLACE, and UPDATE, are relatively easy to process compared to statements that do return rows, such as SELECT, DESCRIBE, EXPLAIN, and SHOW. To process a non-SELECT statement, pass it to do() using the database handle. The do() method prepares and executes the statement in one step. For example, to create a new member entry for Marcia Brown with an expiration date of June 3, 2007, you can do the following:

 $rows = $dbh->do ("INSERT INTO member (last_name,first_name,expiration)"                     . " VALUES('Brown','Marcia','2007-06-03')"); 

The do() method returns a count of the number of rows affected, undef if something goes wrong, and 1 if the number of rows is unknown for some reason. Errors can occur for various reasons. (For example, the statement might be malformed or you might not have permission to access the table.) For a non-undef return, watch out for the case in which no rows are affected. When this happens, do() doesn't return the number 0; instead, it returns the string "0E0" (Perl's scientific notation form of zero). "0E0" evaluates to 0 in a numeric context but is considered true in conditional tests so that it can be distinguished easily from undef. If do() returned 0, it would be more difficult to distinguish between the occurrence of an error (undef) and the "no rows affected" case. You can check for an error using either of the following tests:

 if (!defined ($rows)) {     print "An error occurred\n"; } if (!$rows) {     print "An error occurred\n"; } 

In numeric contexts, "0E0" evaluates as 0, so the following code will correctly print the number of rows for any non-undef value of $rows:

 if (!$rows) {     print "An error occurred\n"; } else {     $rows += 0; # force conversion to number if value is "0E0"     print "$rows rows affected\n"; } 

You could also print $rows using a %d format specifier with printf() to force an implicit conversion to a number:

 if (!$rows) {     print "An error occurred\n"; } else {     printf "%d rows affected\n", $rows; } 

The do() method is equivalent to using prepare() followed by execute(). This means that the preceding INSERT statement could be issued as follows rather than by invoking do():

 $sth = $dbh->prepare ("INSERT INTO member (last_name,first_name,expiration)"                     . " VALUES('Brown','Marcia','2007-06-03')"); $rows = $sth->execute (); 

Handling Statements That Return a Result Set

This section provides more information on several options that you have for executing row-fetching loops for SELECT statements (or for other SELECT-like statements that return rows, such as DESCRIBE, EXPLAIN, and SHOW). It also discusses how to get a count of the number of rows in a result, how to handle result sets for which no loop is necessary, and how to retrieve an entire result set all at once.

Writing Row-Fetching Loops

The dump_members.pl script retrieved data using a standard sequence of DBI methods: prepare() lets the driver preprocess the query, execute() begins executing the query, and fetchrow_array() fetches each row of the result set.

prepare() and execute() are fairly standard parts of processing any query that returns rows. However, for fetching the rows, fetchrow_array() is actually only one choice from among several available methods (see Table 7.3).

Table 7.3. DBI Row-Fetching Methods

Method Name

Return Value

fetchrow_array()

Array of row values

fetchrow_arrayref()

Reference to array of row values

fetch()

Same as fetchrow_arrayref()

fetchrow_hashref()

Reference to hash of row values, keyed by column name


The following examples show how to use each row-fetching method. The examples loop through the rows of a result set, and for each row, print the column values separated by commas. There are more efficient ways to write the code in some cases, but the examples are written the way they are for illustrative purposes (to show the syntax for accessing individual column values), not for efficiency.

Use fetchrow_array() as follows:

 while (my @ary = $sth->fetchrow_array ()) {     my $delim = "";     for (my $i = 0; $i < @ary; $i++)     {         $ary[$i] = "" if !defined ($ary[$i]);   # NULL value?         print $delim . $ary[$i];         $delim = ",";     }     print "\n"; } 

Each call to fetchrow_array() returns an array of row values, or an empty array when there are no more rows. The inner loop tests each column value to see if it's defined, and sets it to the empty string if not. This converts NULL values (which are represented by DBI as undef) to empty strings. It may seem that this is an entirely superfluous action; after all, Perl prints nothing for both undef and the empty string. The reason for the test is that if the script is run with the -w option, Perl will issue a "Use of uninitialized value" warning message if you attempt to print an undef value. Converting undef to the empty string eliminates the warnings. You'll see a similar construct used elsewhere throughout this chapter.

If you prefer to print another value for undef values, such as the string "NULL", just change the test a little bit:

 while (my @ary = $sth->fetchrow_array ()) {     my $delim = "";     for (my $i = 0; $i < @ary; $i++)     {         $ary[$i] = "NULL" if !defined ($ary[$i]);   # NULL value?         print $delim . $ary[$i];         $delim = ",";     }     print "\n"; } 

When working with an array of values, you can simplify the code a bit by using map to convert all the undef array elements at once:

 while (my @ary = $sth->fetchrow_array ()) {     @ary = map { defined ($_) ? $_ : "NULL" } @ary;     print join (",", @ary) . "\n"; } 

map processes each element of the array using the expression within the braces and returns an array of the resulting values.

An alternative to assigning the return value of fetchrow_array() to an array variable is to fetch column values into a set of scalar variables. This allows you to work with variable names that are more meaningful than $ary[0], $ary[1], and so forth. Suppose that you want to retrieve member name and email values into variables. Using fetchrow_array(), you could select and fetch rows like this:

 my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email"                         . " FROM member ORDER BY last_name"); $sth->execute (); while (my ($last_name, $first_name, $suffix, $email)                                                 = $sth->fetchrow_array ()) {     # do something with variables } 

When you use a list of variables this way, you must make sure that the order of the columns selected by the query matches the order of the variables into which you fetch the values. DBI has no idea of the order in which columns are named by your SELECT statement, so it's up to you to assign variables correctly. You can also cause column values to be assigned to individual variables automatically when you fetch a row, using a technique known as "parameter binding." This is discussed further in "Placeholders and Prepared Statements."

If you fetch a single value into a variable, be careful how you write the assignment. If you write the beginning of your loop like this, it will work correctly:

 while (my ($val) = $sth->fetchrow->array ()) ... 

The value is fetched in list context, so the test will fail only when there are no more rows. But if instead you write the test like this, it will fail in mysterious ways:

 while (my $val = $sth->fetchrow->array ()) ... 

The difference here is that the value is fetched in scalar context, so if $val happens to be zero, undef, or the empty string, the test will evaluate as false and terminate the loop, even though you have not yet reached the end of the result set.

The second row-fetching method, fetchrow_arrayref(), is similar to fetchrow_array(), but instead of returning an array containing the column values for the current row, it returns a reference to the array, or undef when there are no more rows. Use it like this:

 while (my $ary_ref = $sth->fetchrow_arrayref ()) {     my $delim = "";     for (my $i = 0; $i < @{$ary_ref}; $i++)     {         $ary_ref->[$i] = "" if !defined ($ary_ref->[$i]);   # NULL value?         print $delim . $ary_ref->[$i];         $delim = ",";     }     print "\n"; } 

You access array elements through the array reference, $ary_ref. This is something like dereferencing a pointer, so you use $ary_ref->[$i] rather than $ary[$i]. To convert the reference to an array, use the @{$ary_ref} construct.

fetchrow_arrayref() is unsuitable for fetching variables into a list. For example, the following loop does not work:

 while (my ($var1, $var2, $var3, $var4) = @{$sth->fetchrow_arrayref ()}) {     # do something with variables } 

As long as fetchrow_arrayref() actually fetches a row, the loop functions properly. But when there are no more rows, fetchrow_arrayref() returns undef, and @{undef} isn't legal. (It's like trying to de-reference a NULL pointer in a C program.)

The third row-fetching method, fetchrow_hashref(), is used like this:

 while (my $hash_ref = $sth->fetchrow_hashref ()) {     my $delim = "";     foreach my $key (keys (%{$hash_ref}))     {         $hash_ref->{$key} = "" if !defined ($hash_ref->{$key}); # NULL value?         print $delim . $hash_ref->{$key};         $delim = ",";     }     print "\n"; } 

Each call to fetchrow_hashref() returns a reference to a hash of row values keyed on column names, or undef when there are no more rows. In this case, column values don't come out in any particular order, because members of Perl hashes are unordered. However, DBI keys the hash elements using the column names, so $hash_ref gives you a single variable through which you can access any column value by name. This means you can pull out the values (or any subset of them) in any order you want, and you don't have to know the order in which the columns were retrieved by the SELECT query. For example, to access the name and email columns, you can do this:

 while (my $hash_ref = $sth->fetchrow_hashref ()) {     my $delim = "";     foreach my $key ("last_name", "first_name", "suffix", "email")     {         $hash_ref->{$key} = "" if !defined ($hash_ref->{$key}); # NULL value?         print $delim . $hash_ref->{$key};         $delim = ",";     }     print "\n"; } 

fetchrow_hashref() is especially useful when you want to pass a row of values to a function without requiring the function to know the order in which columns are named in the SELECT statement. In this case, you would call fetchrow_hashref() to retrieve rows and write a function that accesses values from the row hash using column names.

Keep in mind the following caveats when you use fetchrow_hashref():

  • If you need every bit of performance, fetchrow_hashref() is not the best choice. It's less efficient than fetchrow_array() or fetchrow_arrayref().

  • By default, the column names are used as key values with the same lettercase as the column names written in the SELECT statement. In MySQL, column names are not case sensitive, so the query will work the same way no matter what lettercase you use to write column names. But Perl hash key names are case sensitive, which may cause you problems. To avoid potential lettercase mismatch problems, you can tell fetchrow_hashref() to force column names into a particular lettercase by passing it a NAME_lc or NAME_uc attribute:

     $hash_ref = $sth->fetchrow_hashref ("NAME_lc"); # use lowercase names $hash_ref = $sth->fetchrow_hashref ("NAME_uc"); # use uppercase names 

  • The hash contains one element per unique column name. If you're performing a join that returns columns from multiple tables with overlapping names, you won't be able to access all the column values. For example, if you issue the following query, fetchrow_hashref() will return a hash having only one element, name:

     SELECT a.name, b.name FROM a, b WHERE a.name = b.name 

    To avoid this problem, use aliases to make sure that each column has a distinct name. For example, if you rewrite the query as follows, fetchrow_hashref() will return a reference to a hash with two elements, name and name2:

     SELECT a.name, b.name AS name2 FROM a, b WHERE a.name = b.name 

Determining the Number of Rows Returned by a Query

How can you tell the number of rows returned by a SELECT or SELECT-like query? One way is to count the rows as you fetch them. In fact, this is the only portable way in DBI to know how many rows a SELECT query returned. If you're using the MySQL driver, you can call the rows() method using the statement handle after invoking execute(). But this is not portable to other database engines, and the DBI documentation explicitly discourages using rows() for SELECT statements. Even for MySQL, rows() doesn't return the correct result until you've fetched all the rows if you've set the mysql_use_result attribute. So you may as well just count the rows as you fetch them. (See Appendix H, "Perl DBI API Reference," for more information about mysql_use_result.)

Fetching Single-Row Results

It's not necessary to run a loop to get your results if the result set consists of a single row. Suppose that you want to write a script, count_members.pl, that tells you the current number of Historical League members. The code to perform the query looks like this:

 # issue query my $sth = $dbh->prepare ("SELECT COUNT(*) FROM member"); $sth->execute (); # read results of query my $count = $sth->fetchrow_array (); $sth->finish (); $count = "can't tell" if !defined ($count); print "$count\n"; 

The SELECT statement will return only one row, so no loop is required; we call fetchrow_array() just once. In addition, because we're selecting only one column, it's not even necessary to assign the return value to an array. When fetchrow_array() is called in a scalar context (where a single value rather than a list is expected), it returns one column of the row, or undef if no row is available. DBI does not define which element of the row fetchrow_array() returns in scalar context, but that's all right for the query just shown. It retrieves only a single value, so there is no ambiguity about what value is returned.

This code invokes finish() to free the result set, even though the set consists of just one row. (fetchrow_array() frees a result set implicitly when it reaches the end of the set, but that would happen here only if you called it a second time.)

Another type of query for which you expect at most a single record is one that contains LIMIT 1 to restrict the number of rows returned. A common use for this is to return the row that contains the maximum or minimum value for a particular column. For example, the following query prints the name and birth date of the president who was born most recently:

 my $query = "SELECT last_name, first_name, birth"         . " FROM president ORDER BY birth DESC LIMIT 1"; my $sth = $dbh->prepare ($query); $sth->execute (); my ($last_name, $first_name, $birth) = $sth->fetchrow_array (); $sth->finish (); if (!defined ($last_name)) {     print "Query returned no result\n"; } else {     print "Most recently born president: $first_name $last_name ($birth)\n"; } 

Other types of statements for which no fetch loop is necessary are those that use MAX() or MIN() to select a single value. But in all these cases, an even easier way to get a single-row result is to use the database handle method selectrow_array(), which combines prepare(), execute() and row fetching into a single call. It returns an array (not a reference), or an empty array if the query returned no row or an error occurred. The previous example can be rewritten like this using selectrow_array():

 my $query = "SELECT last_name, first_name, birth"         . " FROM president ORDER BY birth DESC LIMIT 1"; my ($last_name, $first_name, $birth) = $dbh->selectrow_array ($query); if (!defined ($last_name)) {     print "Query returned no result\n"; } else {     print "Most recently born president: $first_name $last_name ($birth)\n"; } 

Working with Complete Result Sets

When you use a fetch loop, DBI doesn't provide any way to process the rows in any order other than that in which they are returned by the loop. Also, after you fetch a row, the previous row is lost unless you take steps to maintain it in memory. These behaviors aren't always suitable. For example, they're undesirable if you need to make multiple passes through the rows to perform a statistical calculation. (You might go through the result set once to assess some general numeric characteristics of your data, and then step through the rows again performing a more specific analysis.)

It's possible to access your result set as a whole in a couple different ways. You can perform the usual fetch loop and save each row as you fetch it, or you can use a method that returns an entire result set all at once. Either way, you end up with a matrix containing one row per row in the result set, and as many columns as you selected. You can process elements of the matrix in any order you want, as many times as you want. The following discussion describes both approaches.

One way to use a fetch loop to capture the result set is to use fetchrow_array() and save an array of references to the rows. The following code does the same thing as the fetch-and-print loop in dump_members.pl, except that it saves all the rows, and then prints the matrix. It illustrates how to determine the number of rows and columns in the matrix and how to access individual members of the matrix:

 my @matrix = (); # array of array references while (my @ary = $sth->fetchrow_array ())   # fetch each row {     push (@matrix, [ @ary ]); # save reference to just-fetched row } # determine dimensions of matrix my $rows = scalar (@matrix); my $cols = ($rows == 0 ? 0 : scalar (@{$matrix[0]})); for (my $i = 0; $i < $rows; $i++)           # print each row {     my $delim = "";     for (my $j = 0; $j < $cols; $j++)     {         $matrix[$i][$j] = "" if !defined ($matrix[$i][$j]); # NULL value?         print $delim . $matrix[$i][$j];         $delim = ",";     }     print "\n"; } 

When determining the dimensions of the matrix, the number of rows must be determined first because calculation of the number of columns is contingent on whether the matrix is empty. If $rows is 0, the matrix is empty and $cols becomes 0 as well. Otherwise, the number of columns can be calculated as the number of elements in the first row, using the syntax @{$matrix[0]} to access the row as a whole.

The preceding example fetches each row as an array, and then saves a reference to it. You might suppose that it would be more efficient to call fetchrow_arrayref() instead to retrieve row references directly:

 my @matrix = (); # array of array references while (my $ary_ref = $sth->fetchrow_arrayref ()) {     push (@matrix, $ary_ref); # save reference to just-fetched row } 

That doesn't work, because fetchrow_arrayref() reuses the array to which the reference points. The resulting matrix is an array of references, each of which points to the same rowthe final row retrieved. Therefore, if you want to construct a matrix by fetching a row at a time, use fetchrow_array() rather than fetchrow_arrayref().

As an alternative to using a fetch loop, you can use one of the DBI methods that return the entire result set. For example, fetchall_arrayref() returns a reference to an array of references, each of which points to the contents of one row of the result set. That's a mouthful; it means that the return value is a reference to a matrix. To use fetchall_arrayref(), call prepare() and execute(), and then retrieve the result like this:

 # fetch all rows as a reference to an array of references my $matrix_ref = $sth->fetchall_arrayref (); 

You can determine the dimensions of the array and access its elements as follows:

 # determine dimensions of matrix my $rows = (!defined ($matrix_ref) ? 0 : scalar (@{$matrix_ref})); my $cols = ($rows == 0 ? 0 : scalar (@{$matrix_ref->[0]})); for (my $i = 0; $i < $rows; $i++)           # print each row {     my $delim = "";     for (my $j = 0; $j < $cols; $j++)     {         $matrix_ref->[$i][$j] = "" if !defined ($matrix_ref->[$i][$j]); # NULL?         print $delim . $matrix_ref->[$i][$j];         $delim = ",";     }     print "\n"; } 

fetchall_arrayref() returns a reference to an empty array if the result set is empty. The result is undef if an error occurs, so if you don't have RaiseError enabled, you must check the return value before you start using it.

The number of rows and columns is determined by whether the matrix is empty. If you want to access an entire row $i of the matrix as an array, use the syntax @{$matrix_ref->[$i]}.

It's certainly simpler to use fetchall_arrayref() to retrieve a result set than to write a row-fetching loop, although the syntax for accessing array elements becomes a little trickier. A method that's similar to fetchall_arrayref() but that does even more work is selectall_arrayref(). This method performs the entire prepare(), execute(), fetch loop sequence for you. To use selectall_arrayref(), pass your query directly to it using the database handle:

 # fetch all rows as a reference to an array of references my $matrix_ref = $dbh->selectall_arrayref ($query); # determine dimensions of matrix my $rows = (!defined ($matrix_ref) ? 0 : scalar (@{$matrix_ref})); my $cols = ($rows == 0 ? 0 : scalar (@{$matrix_ref->[0]})); for (my $i = 0; $i < $rows; $i++)           # print each row {     my $delim = "";     for (my $j = 0; $j < $cols; $j++)     {         $matrix_ref->[$i][$j] = "" if !defined ($matrix_ref->[$i][$j]); # NULL?         print $delim . $matrix_ref->[$i][$j];         $delim = ",";     }     print "\n"; } 

Checking for NULL Values

When you retrieve information from a database, you might need to distinguish between column values that are NULL and those that are zero or the empty string. This is easy to do because DBI returns NULL column values as undef. However, you must be sure to use the correct test. If you try the following code fragment, it prints "false!" all three times:

 $col_val = undef; if (!$col_val) { print "false!\n"; } $col_val = 0;     if (!$col_val) { print "false!\n"; } $col_val = "";    if (!$col_val) { print "false!\n"; } 

What that demonstrates is that the form of the test is unable to distinguish between undef, 0, and the empty string. The next fragment prints "false!" for both tests, indicating that the test cannot distinguish undef from the empty string:

 $col_val = undef; if ($col_val eq "") { print "false!\n"; } $col_val = "";    if ($col_val eq "") { print "false!\n"; } 

This fragment prints the same output, showing that the second test fails to distinguish 0 from the empty string:

 $col_val = ""; if ($col_val eq "") { print "false!\n"; } if ($col_val == 0)  { print "false!\n"; } 

To distinguish between undef (NULL) column values and non-undef values, use defined(). After you know a value doesn't represent NULL, you can distinguish between other types of values using appropriate tests. For example:

 if (!defined ($col_val)) { print "NULL\n"; } elsif ($col_val eq "")   { print "empty string\n"; } elsif ($col_val == 0)    { print "zero\n"; } else                     { print "other\n"; } 

It's important to perform the tests in the proper order because both the second and third comparisons are true if $col_val is an empty string. If you reverse the order of those comparisons, you'll incorrectly interpret empty strings as zero.

Quoting Issues

Thus far, we have constructed statements in the most basic way possible, using simple quoted strings. That causes a problem at the Perl lexical level when your quoted strings contain quoted values. You can also have problems at the SQL level when you want to insert or select values that contain quotes, backslashes, or binary data. If you specify a query as a Perl quoted string, you must escape any occurrences of the quoting character that occur within the query string itself:

 $query = 'INSERT INTO absence VALUES(14,\'2004-09-16\')'; $query = "INSERT INTO absence VALUES(14,\"2004-09-16\")"; 

Both Perl and MySQL allow you to quote strings using either single or double quotes, so you can sometimes avoid escaping by mixing quote characters:

 $query = 'INSERT INTO absence VALUES(14,"2004-09-16")'; $query = "INSERT INTO absence VALUES(14,'2004-09-16')"; 

However, you must take care that the strings will be interpreted as you want:

  • The two types of quotes are not equivalent in Perl. Variable references are interpreted only within double quotes. Therefore, single quotes are not very useful when you want to construct statements by embedding variable references in the statement string. For example, if the value of $var is 14, the following two strings are not equivalent:

     "SELECT * FROM member WHERE id = $var" 'SELECT * FROM member WHERE id = $var' 

    The resulting strings as Perl interprets them as follows. Clearly, the first string is more like something you'd want to pass to the MySQL server:

     "SELECT * FROM member WHERE id = 14" 'SELECT * FROM member WHERE id = $var' 

  • The two types of quotes are not always equivalent in MySQL. If the server is running with the ANSI_QUOTES SQL mode disabled, you can indeed use either single quotes or double quotes to quote a string. But if ANSI_QUOTES is enabled, strings must be quoted with single quotes; double quotes can be used only for quoting identifiers such as database or table names. Consequently, it's safest to quote strings with single quotes, because that works regardless of whether ANSI_QUOTES is on or off.

At the Perl level, an alternative to quoting strings with double quotes is to use the qq{} construct, which tells Perl to treat everything between qq{ and } as a double-quoted string. (Think of double-q as meaning "double-quote.") For example, these two lines are equivalent:

 $date = "2004-09-16"; $date = qq{2004-09-16}; 

When you use qq{}, you can construct statements without thinking so much about quoting issues because you can use quotes (single or double) freely within the query string without having to escape them. In addition, variable references are interpreted. Both properties of qq{} are illustrated by the following INSERT statement:

 $id = 14; $date = "2004-09-16"; $query = qq{INSERT INTO absence VALUES($id,'$date')}; 

You don't have to use '{' and '}' as the qq delimiters. Other forms, such as qq() and qq//, will work, too, as long as the closing delimiter doesn't occur within the string. I prefer qq{} because the '}' character is less likely than ')' or '/' to occur within the text of the query and be mistaken for the end of the query string. For example, ')' occurs within the INSERT statement just shown, so qq() would not be a useful construct for quoting the query string.

The qq{} construct can cross line boundaries, which is useful if you want to make the query string stand out from the surrounding Perl code:

 $id = 14; $date = "2004-09-16"; $query = qq{     INSERT INTO absence VALUES($id,'$date') }; 

This is also useful if you simply want to format your query on multiple lines to make it more readable. For example, the SELECT statement in the dump_members.pl script looks like this:

 $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"     . "street, city, state, zip, phone FROM member ORDER BY last_name"); 

With qq{}, it could be written like this instead:

 $sth = $dbh->prepare (qq{             SELECT                 last_name, first_name, suffix, email,                 street, city, state, zip, phone             FROM member             ORDER BY last_name         }); 

It's true that double-quoted strings can cross line boundaries, too. But I find that qq{ and } stand out better than two lone '"' characters and make the statement easier to read. This book uses both forms; see which you prefer.

The qq{} construct takes care of quoting issues at the Perl lexical level so that you can get quotes into a string easily without having Perl complain about them. However, you must also think about SQL-level syntax. Consider this attempt to insert a record into the member table:

 $last = "O'Malley"; $first = "Brian"; $expiration = "2007-09-01"; $rows = $dbh->do (qq{     INSERT INTO member (last_name,first_name,expiration)     VALUES('$last','$first','$expiration') }); 

The resulting string that do() sends to MySQL looks like this:

 INSERT INTO member (last_name,first_name,expiration) VALUES('O'Malley','Brian','2007-09-01') 

That is not legal SQL because a single quote occurs within a single-quoted string. We encountered a similar quoting problem in Chapter 6. There we dealt with the issue by using the mysql_real_escape_string() function. DBI provides a similar mechanism: For each quoted value that you want to use literally in a statement, call the quote() method and use its return value instead. The preceding example is more properly written as follows:

 $last = $dbh->quote ("O'Malley"); $first = $dbh->quote ("Brian"); $expiration = $dbh->quote ("2007-09-01"); $rows = $dbh->do (qq{     INSERT INTO member (last_name,first_name,expiration)     VALUES($last,$first,$expiration) }); 

Now the string that do() sends to MySQL looks like this, with the quote that occurs within the quoted string properly escaped:

 INSERT INTO member (last_name,first_name,expiration) VALUES('O\'Malley','Brian','2007-09-01') 

Note that when you refer to $last and $first in the query string, you do not add any surrounding quotes; the quote() method supplies them for you. If you add quotes yourself, your query will have too many of them, as shown by the following example:

 $value = "paul"; $quoted_value = $dbh->quote ($value); print "The quoted value is: $quoted_value\n"; print "The quoted value is: '$quoted_value'\n"; 

These statements produce the following output:

 The quoted value is: 'paul' The quoted value is: ''paul'' 

In the second case, the string contains too many quotes.

Placeholders and Prepared Statements

In the preceding sections, we constructed statements by putting values to be inserted or used as selection criteria directly into the query string. It's not necessary to do this. DBI allows you to use special markers called "placeholders" within a query string, and then supply the values to be used in place of those markers when the query is executed. This is called "binding the values to the query." One reason for doing this is that you get the character-quoting benefits of the quote() method without having to invoke quote() explicitly. Another reason is improved performance: If you're executing a statement over and over within a loop, you can prepare it first and then execute it multiple times. This avoids the overhead of preparing the statement before each execution.

As an illustration of how placeholders work, suppose that you're beginning a new semester at school and you want to clear out the student table for your gradebook and then initialize it with the new students by using a list of student names contained in a file. Without placeholders, you can delete the existing table contents and load the new names like this:

 $dbh->do (qq{ DELETE FROM student } );  # delete existing rows while (<>)                              # read each input line, {                                       # use it to add a new row     chomp;     $_ = $dbh->quote ($_);     $dbh->do (qq{ INSERT INTO student SET name = $_ }); } 

This approach requires that you handle special characters in the data values yourself by calling quote(). It's also inefficient, because the basic form of the INSERT query is the same each time, and do() calls prepare() and execute() each time through the loop. It's more efficient to call prepare() a single time to set up the INSERT statement before entering the loop and invoke only execute() within the loop. That avoids all invocations of prepare() but one. DBI allows us to do this as follows:

 $dbh->do (qq{ DELETE FROM student } );  # delete existing rows my $sth = $dbh->prepare (qq{ INSERT INTO student SET name = ? }); while (<>)                              # read each input line, {                                       # use it to add a new row     chomp;     $sth->execute ($_); } 

In general, if you find yourself calling do() inside a loop, it's better to invoke prepare() prior to the loop and execute() inside it. Note the '?' character in the INSERT query. That's the placeholder. When execute() is invoked, you pass the value to be substituted for the placeholder when the query is sent to the server. DBI automatically quotes special characters in the value, so there is no need to call quote().

Some things to note about placeholders:

  • Do not enclose the placeholder character in quotes within the query string. If you do, it will not be recognized as a placeholder.

  • Do not use the quote() method to specify placeholder values, or you will get extra quotes in the values you're inserting.

  • You can have more than one placeholder in a query string, but be sure to pass as many values to execute() as there are placeholder markers.

  • Each placeholder must specify a single value, not a list of values. For example, you cannot prepare and execute a statement like this:

     my $sth = $dbh->prepare (qq{             INSERT INTO member last_name, first_name VALUES(?) }); $sth->execute ("Adams,Bill,2007-09-19"); 

    You must do so as follows:

     my $sth = $dbh->prepare (qq{             INSERT INTO member last_name, first_name VALUES(?,?,?) }); $sth->execute ("Adams","Bill","2007-09-19"); 

  • To specify NULL as a placeholder value, use undef.

  • Placeholders and quote() are intended only for data values. Do not try to use a placeholder for keywords like SELECT or for identifiers like database, table, or column names. It won't work because the keyword or identifier will be placed into the query surrounded by quotes, and the query will fail with a syntax error.

For some database engines, you get another performance benefit from using placeholders, in addition to improved efficiency in loops. Certain engines cache prepared statements and possibly also the statement execution plan. That way, if the same statement is received by the server later, it can be reused and processed more quickly without the initial preparation overhead. Statement caching is especially helpful for complex SELECT statements because it may take some time to prepare the statement and generate a good execution plan. Placeholders give you a better chance of finding a statement in the cache because they make statements more generic than statements constructed by embedding specific data values directly in the statement string.

MySQL does not cache execution plans. MySQL has a query cache, but it operates by caching result sets for lexically identical query strings, not by caching execution plans. The query cache is discussed in Chapter 4, "Query Optimization."

Prior to version 4.1, MySQL does not cache prepared statements, either; parameter binding to placeholders takes place on the client side within the DBD::mysql module. However, as of MySQL 4.1, the binary protocol implemented in the C client library does allow statements to be prepared on the server side and for parameter binding to be handled by the server. DBD::mysql can take advantage of this capability.

To turn on server-side prepared statements and parameter binding, all you need to do is enable the mysql_server_prepare option. For example, given a database handle $dbh, this can be done as follows:

 $dbh->{mysql_server_prepare} = 1; 

To disable server-side prepared statements, reset the option to 0.

The mysql_server_prepare option requires MySQL 4.1.3 or later, and currently is available only in the DBD::mysql development repository. It should be present in DBD::mysql 3.0, which should be available soon if not by the time you read this.

Even if you don't use MySQL's server-side capabilities for prepared statements, it still can be beneficial to write your statements using placeholders: When you port a script for use with an engine that does cache execution plans, it will execute statements with placeholders more efficiently than those without.

The Phantom undef

Some DBI methods like do() and selectrow_array() that execute a query string allow you to provide placeholder values to be bound to any '?' characters in the query. For example, you can update a record like this:

[View full width]

my $rows = $dbh->do ("UPDATE member SET expiration = ? WHERE member_id = ?", undef, "2007-01-01", 14);

Or fetch a record like this:

 my $ref = $dbh->selectrow_arrayref (                 "SELECT * FROM member WHERE member_id = ?",                     undef, 14); 

Observe that in both cases, the placeholder values are preceded by an undef argument that appears to do nothing. The reason it's there is that, for query-execution methods that allow placeholder arguments, those arguments are preceded by another argument that can be used to specify query-processing attributes. Such attributes are rarely (if ever) used, but the argument still must be present, so just specify it as undef.


Binding Query Results to Script Variables

Placeholders allow you to substitute values into a query string at query execution time. In other words, you can parameterize the "input" to the query. DBI also provides a corresponding output operation called "parameter binding" that allows you to parameterize the "output" by retrieving column values into variables automatically when you fetch a row without having to assign values to the variables yourself.

Suppose that you have a query to retrieve names from the member table. You can tell DBI to assign the values of the selected columns to Perl variables. When you fetch a row, the variables are automatically updated with the corresponding column values, which makes the retrieval very efficient. Here's an example that shows how to bind the columns to variables and then access them in the fetch loop:

 my ($last_name, $first_name, $suffix); my $sth = $dbh->prepare (qq{             SELECT last_name, first_name, suffix             FROM member ORDER BY last_name, first_name }); $sth->execute (); $sth->bind_col (1, \$last_name); $sth->bind_col (2, \$first_name); $sth->bind_col (3, \$suffix); print "$last_name, $first_name, $suffix\n" while $sth->fetch (); 

bind_col() should be called after execute() and before fetching rows. Each call should specify a column number and a reference to the variable you want to associate with the column. Column numbers begin with 1.

As an alternative to individual calls to bind_col(), you can pass all the variable references in a single call to bind_columns():

 my ($last_name, $first_name, $suffix); my $sth = $dbh->prepare (qq{             SELECT last_name, first_name, suffix             FROM member ORDER BY last_name, first_name }); $sth->execute (); $sth->bind_columns (\$last_name, \$first_name, \$suffix); print "$last_name, $first_name, $suffix\n" while $sth->fetch (); 

bind_columns() should be called after execute() and before fetching rows.

Specifying Connection Parameters

The most direct way to establish a server connection is to specify all the connection parameters as arguments to the connect() method:

 my $dsn = "DBI:mysql:db_name:host_name"; my $dbh = DBI->connect ($dsn, user_name, password); 

If you leave out connection parameters, DBI attempts to determine what values to use as follows:

  • The DBI_DSN environment variable is used if set and the data source name (DSN) is undefined or is the empty string. The DBI_USER and DBI_PASS environment variables are used if set and the username and password are undefined (but not if they are the empty string). Under Windows, the USER variable is used if the username is undefined.

  • If you leave out the hostname, DBI attempts to connect to the local host.

  • If you specify undef or an empty string for the username, it defaults to your Unix login name. Under Windows, the username defaults to ODBC.

  • If you specify undef or an empty string for the password, no password is sent.

You can specify certain options in the DSN by appending them to the initial part of the string, each preceded by a semicolon. For example, you can use the mysql_read_default_file option to specify an option file pathname:

 my $dsn = "DBI:mysql:sampdb;mysql_read_default_file=/home/paul/.my.cnf"; 

When the script executes, it will read the file for connection parameters. Suppose that /home/paul/.my.cnf has the following contents:

 [client] host=localhost user=sampadm password=secret 

In this case, the connect() call will attempt to connect to the MySQL server on localhost and will connect as user sampadm with password secret. Under Unix, you can tell your script to use the option file that belongs to the person who happens to be running it by parameterizing the filename like this:

 my $dsn = "DBI:mysql:sampdb;mysql_read_default_file=$ENV{HOME}/.my.cnf"; 

$ENV{HOME} contains the pathname to the home directory of the user running the script, so the connection parameters that it uses will be pulled from that user's own option file. By writing a script in this way, you don't have to embed connection parameters literally in the script.

Using mysql_read_default_file causes the script to read only the named option file, which may be undesirable if you want it to look for parameters in system-wide option files as well (such as /etc/my.cnf under Unix or C:\my.cnf under Windows). To have the script read all the standard option files for connection parameters, use mysql_read_default_group instead. This option causes parameters in the [client] group to be used, as well as in the group that you specify in the option's value. For example, if you have options that are specific to your sampdb-related scripts, you can list them in a [sampdb] group and then use a data source value like this:

 my $dsn = "DBI:mysql:sampdb;mysql_read_default_group=sampdb"; 

If you want to read just the [client] group from the standard option files, specify the option like this:

 my $dsn = "DBI:mysql:sampdb;mysql_read_default_group=client"; 

For more details on options for specifying the data source string, see Appendix H. For more information on the format of MySQL option files, see Appendix F, "MySQL Program Reference."

One difficulty with using mysql_read_default_file on Windows is that file pathnames typically begin with a drive letter and a colon. That's a problem, because DBI interprets colon as the character that separates parts of the DSN string. It's possible to work around this, although the method is ugly:

  1. Use chdir() to change location to the root directory of the drive where the option file is located, so that pathnames specified without a drive letter will be interpreted relative to that drive.

  2. Specify the filename as the value of the mysql_read_default_file option in the DSN, but without the leading drive letter or colon.

  3. If it's necessary to leave the current directory undisturbed by the connect operation, save the current directory pathname before calling connect() and then chdir() back to it after connecting.

The following code fragment shows how to do this if you want to use the option file C:\my.cnf. (Note that backslashes in Windows pathnames are specified as slashes in Perl strings.)

 # save current directory pathname use Cwd; my $orig_dir = cwd (); # change to root dir of drive where file is located chdir ("C:/") or die "Cannot chdir: $!\n"; # connect using parameters in C:\my.cnf my $dsn = "DBI:mysql:sampdb:localhost;mysql_read_default_file=/my.cnf"; my $dbh = DBI->connect ($dsn, undef, undef,                         { RaiseError => 1, PrintError => 0, AutoCommit => 1 }); # change back to original directory chdir ($orig_dir) or die "Cannot chdir: $!\n"; 

Using an option file doesn't prevent you from specifying connection parameters in the connect() call (for example, if you want the script to connect as a particular user). Any explicit hostname, username, and password values specified in the connect() call override connection parameters found in the option file. For example, you might want your script to parse options such as --host and --user from the command line and use those values, if they are given, in preference to any found in an option file. That would be useful because it's the way the standard MySQL clients behave. Your DBI scripts would therefore be consistent with that behavior.

For the remaining command-line scripts that we develop in this chapter, I'll use some standard connection setup and teardown code. I'll just show it once here so that we can concentrate on the main body of each script as we write it:

 #! /usr/bin/perl -w use strict; use DBI; # parse connection parameters from command line if given use Getopt::Long; $Getopt::Long::ignorecase = 0; # options are case sensitive $Getopt::Long::bundling = 1;   # -uname = -u name, not -u -n -a -m -e # default parameters - all undefined initially my ($host_name, $password, $port_num, $socket_name, $user_name); GetOptions (     # =i means an integer argument is required after option     # =s or :s means string argument is required or optional after option     "host|h=s"      => \$host_name,     "password|p:s"  => \$password,     "port|P=i"      => \$port_num,     "socket|S=s"    => \$socket_name,     "user|u=s"      => \$user_name ) or exit (1); # solicit password if option was specified without option value if (defined ($password) && !$password) {     # turn off echoing but don't interfere with STDIN     open (TTY, "/dev/tty") or die "Cannot open terminal\n";     system ("stty -echo < /dev/tty");     print STDERR "Enter password: ";     chomp ($password = <TTY>);     system ("stty echo < /dev/tty");     close (TTY);     print STDERR "\n"; } # construct data source my $dsn = "DBI:mysql:sampdb"; $dsn .= ";host=$host_name" if $host_name; $dsn .= ";port=$port_num" if $port_num; $dsn .= ";mysql_socket=$socket_name" if $socket_name; $dsn .= ";mysql_read_default_group=client"; # connect to server my $dbh = DBI->connect ($dsn, $user_name, $password,                         { RaiseError => 1, PrintError => 0, AutoCommit => 1 }); 

This code initializes DBI, looks for connection parameters on the command line, and then makes the connection to the MySQL server using parameters from the command line or found in the [client] group in the standard option files. If you have your connection parameters listed in your option file, you won't have to enter them when you run a script that uses this code.

The final part of each script will be similar, too; it simply terminates the connection and exits:

 $dbh->disconnect (); exit (0); 

When we get to the Web programming section, "Using DBI in Web Applications," we'll modify the connection setup code a bit, but the basic idea will be similar.

The password-prompting code works only under Unix. For Windows, you'll probably want to put your MySQL password in the [client] group of one of the standard option files or else specify it directly on the command line.

There is one unfortunate difference between the way the standard MySQL clients and the Getopt module handle command-line options. With Getopt, you cannot specify a password option (--password or -p) without an argument unless it is either the last argument on the command line or is immediately followed by another option. Suppose that you have a script that expects a table name argument to follow the options. If invoked as follows, Getopt will interpret mytbl as the password value rather than prompting for a password:

 % ./myscript.pl -u paul -p mytbl 

To get the script to prompt for a password, put the -p option before the -u option:

 % ./myscript.pl -p -u paul mytbl 

Debugging

When you want to debug a malfunctioning DBI script, two techniques are commonly used, either alone or in tandem. First, you can sprinkle print statements throughout your script. This allows you to tailor your debugging output the way you want it, but you must add the statements manually. Second, you can use DBI's built-in tracing capabilities. This is more general and more systematic, and it occurs automatically after you turn it on. DBI tracing also shows you information about the operation of the driver that you cannot get otherwise.

Debugging Using Print Statements

Here's a common question: "I have a query that works fine when I execute it using the mysql program, but it doesn't work from my DBI script. How come?" It's not unusual to find that the DBI script really is issuing a different query than you think. If you print a query before executing it, you might be surprised to see what you're actually sending to the server. Suppose that a query as you type it into mysql looks like this:

 mysql> INSERT INTO member (last_name,first_name,expiration)     -> VALUES('Brown','Marcia','2007-06-03'); 

Then you try the same thing in a DBI script (leaving out the terminating semicolon, of course):

 $last = "Brown"; $first = "Marcia"; $expiration = "2007-06-03"; $query = qq{     INSERT INTO member (last_name,first_name,expiration)     VALUES($last,$first,$expiration) }; $rows = $dbh->do ($query); 

That doesn't work, even though it's the same query. Or is it? Try printing it:

 print "$query\n"; 

Here is the result:

 INSERT INTO member (last_name,first_name,expiration) VALUES(Brown,Marcia,2007-06-03) 

From this output, you can see that the query is not the same at all. There are no quotes around the column values in the VALUES() list. One way to specify the query properly is like this, using quote():

 $last = $dbh->quote ("Brown"); $first = $dbh->quote ("Marcia"); $expiration = $dbh->quote ("2007-06-03"); $query = qq{     INSERT INTO member (last_name,first_name,expiration)     VALUES($last,$first,$expiration) }; $rows = $dbh->do ($query); 

Alternatively, you can specify the query using placeholders and pass the values to be inserted into it as arguments to the do() method:

 $last = "Brown"; $first = "Marcia"; $expiration = "2007-06-03"; $query = qq{     INSERT INTO member (last_name,first_name,expiration)     VALUES(?,?,?) }; $rows = $dbh->do ($query, undef, $last, $first, $expiration); 

Unfortunately, when you do this, you cannot see what the complete query looks like by using a print statement because the placeholder values aren't evaluated until you invoke do(). When you use placeholders, tracing may be a more helpful debugging method.

Debugging Using Tracing

DBI offers a tracing mechanism that generates debugging information to help you figure out why a script doesn't work properly. Trace levels range from 0 (off) to 15 (maximum information). Generally, trace levels 1 through 4 are the most useful. For example, a level 2 trace shows you the text of queries you're executing (including the result of placeholder substitutions), the result of calls to quote(), and so forth. This can be of immense help in tracking down a problem.

You can control tracing from within individual scripts using the trace() method, or you can set the DBI_TRACE environment variable to affect tracing for all DBI scripts you run.

To use the trace() call, pass a trace level argument and optionally a filename. If you specify no filename, all trace output goes to STDERR; otherwise, it goes to the named file. The following call sets up a level 1 trace to STDERR:

 DBI->trace (1); 

This call sets up a level 2 trace to the trace.out file:

 DBI->trace (2, "trace.out"); 

To disable tracing, specify a trace level of zero:

 DBI->trace (0); 

When invoked as DBI->trace(), all DBI operations are traced. For a more fine-grained approach, enable tracing at the individual handle level. This is useful when you have a good idea where a problem in your script lies and you don't want to wade through the trace output for everything that occurs up to that point. For example, if you're having problems with a particular SELECT query, you can trace the statement handle associated with the query:

 $sth = $dbh->prepare (qq{ SELECT ... }); # create the statement handle $sth->trace (1);                         # enable tracing on the statement $sth->execute (); 

If you specify a filename argument to any trace() call, whether for DBI as a whole or for an individual handle, all trace output goes to that file.

As an alternative to the TRace() method, you can use the TRaceLevel attribute. This attribute allows you to set or get the trace level for a given handle:

 $dbh->{TraceLevel} = 3;               # set database handle trace level my $cur_level = $sth->{TraceLevel};   # get statement handle trace level 

To turn on tracing globally so that it takes effect for all DBI scripts that you run, set the DBI_TRACE environment variable from your shell. The syntax for this depends on the shell you use:

  • For csh or tcsh:

     % setenv DBI_TRACE value 

  • For sh, bash, or ksh:

     $ export DBI_TRACE=value 

  • For Windows:

     C:\> set DBI_TRACE=value 

The format of value is the same for all shells: a number n to turn on tracing at level n to STDERR; a filename to turn on level 2 tracing to the named file, or n=file_name to turn on level n tracing to the named file. Here are some examples, using tcsh syntax:

  • A level 1 trace to STDERR:

     % setenv DBI_TRACE 1 

  • A level 1 trace to the file TRace.out:

     % setenv DBI_TRACE 1=trace.out 

  • A level 2 trace to the file TRace.out:

     % setenv DBI_TRACE trace.out 

Using DBI_TRACE is advantageous in that you can enable DBI script tracing without making any changes to your scripts. But if you turn on tracing to a file from your shell, be sure to turn it off after you resolve the problem. Debugging output is appended to the trace file without overwriting it, so the file can become quite large if you're not careful. It's a particularly bad idea to define DBI_TRACE in a shell startup file such as .cshrc, .tcshrc, .login, or .profile!

To turn off DBI_TRACE for various command interpreters, use any of the commands shown:

  • For csh or tcsh:

     % setenv DBI_TRACE 0 % unsetenv DBI_TRACE 

  • For sh, bash, or ksh:

     $ unset DBI_TRACE $ export DBI_TRACE=0 

  • For Windows:

     C:\> unset DBI_TRACE C:\> set DBI_TRACE=0 

Using Result Set Metadata

You can use DBI to gain access to result set metadatathat is, descriptive information about the rows selected by a query. To get this information, access the attributes of the statement handle associated with the query that generated the result set. Some of these are standard DBI attributes that are available across all database drivers (such as NUM_OF_FIELDS, the number of columns in the result set). Others, which are MySQL-specific, are provided by DBD::mysql, the MySQL driver for DBI. These attributes, such as mysql_max_length, which tells you the maximum width of the values in each column, are not applicable to other database engines. To the extent that you use any of the MySQL-specific attributes, you risk making your scripts nonportable to other databases. On the other hand, they can make it easier to get the information you want.

You must ask for metadata at the right time. Generally, result set attributes are not available for a SELECT statement until after you've invoked prepare() and execute(). In addition, attributes may become invalid after you reach the end of the result set with a row-fetching function or after you invoke finish().

The following example shows how to use one of the MySQL-specific metadata attributes, mysql_max_length, in conjunction with the more general attributes NUM_OF_FIELDS, which indicates the number of columns in the result set, and NAME, which holds their names. We can combine the information provided by these attributes to write a script, tabular.pl, that produces output from SELECT queries in the same tabular (boxed) style that you get when you run the mysql client program in interactive mode. The main body of tabular.pl is as follows (you can replace the SELECT statement with any other; the output-writing routines are independent of the particular query):

 my $sth = $dbh->prepare (qq{     SELECT last_name, first_name, suffix, city, state     FROM president ORDER BY last_name, first_name }); $sth->execute (); # attributes should be available after this call # actual maximum widths of column values in result set my @wid = @{$sth->{mysql_max_length}}; # number of columns in result set my $ncols = $sth->{NUM_OF_FIELDS}; # adjust column widths if data values are narrower than column headings # or than the word "NULL" for (my $i = 0; $i < $ncols; $i++) {     my $name_wid = length ($sth->{NAME}->[$i]);     $wid[$i] = $name_wid if $wid[$i] < $name_wid;     $wid[$i] = 4 if $wid[$i] < 4; } # print tabular-format output print_dashes (\@wid, $ncols);               # row of dashes print_row ($sth->{NAME}, \@wid, $ncols);    # column headings print_dashes (\@wid, $ncols);               # row of dashes while (my $ary_ref = $sth->fetchrow_arrayref ()) {     print_row ($ary_ref, \@wid, $ncols);    # row data values } print_dashes (\@wid, $ncols);               # row of dashes 

After the query has been initiated with execute(), we can grab the metadata we need. $sth->{NUM_OF_FIELDS} is a scalar value indicating how many columns are in the result set. $sth->{NAME} and $sth->{mysql_max_length} give us the column names and maximum width of each column's values. The value of each of these two attributes is a reference to an array that contains an element for each column of the result set, in the order that columns are named in the query.

The remaining calculations are very much like those used for the stmt_exec program developed in Chapter 6. For example, to avoid misaligned output, we adjust the column width values upward if the name of a column is wider than any of the data values in the column.

The output functions, print_dashes() and print_row(), are written as follows. They too are similar to the corresponding code in stmt_exec:

 sub print_dashes { my $wid_ary_ref = shift;    # reference to array of column widths my $cols = shift;           # number of columns     print "+";     for (my $i = 0; $i < $cols; $i++)     {         print "-" x ($wid_ary_ref->[$i]+2) . "+";     }     print "\n"; } # print row of data.  (doesn't right-align numeric columns) sub print_row { my $val_ary_ref = shift;    # reference to array of column values my $wid_ary_ref = shift;    # reference to array of column widths my $cols = shift;           # number of columns     print "|";     for (my $i = 0; $i < $cols; $i++)     {         printf " %-*s |", $wid_ary_ref->[$i],                 defined ($val_ary_ref->[$i]) ? $val_ary_ref->[$i] : "NULL";     }     print "\n"; } 

The output from tabular.pl looks like this:

 +------------+---------------+--------+---------------------+-------+ | last_name  | first_name    | suffix | city                | state | +------------+---------------+--------+---------------------+-------+ | Adams      | John          | NULL   | Braintree           | MA    | | Adams      | John Quincy   | NULL   | Braintree           | MA    | | Arthur     | Chester A.    | NULL   | Fairfield           | VT    | | Buchanan   | James         | NULL   | Mercersburg         | PA    | | Bush       | George H.W.   | NULL   | Milton              | MA    | | Bush       | George W.     | NULL   | New Haven           | CT    | | Carter     | James E.      | Jr.    | Plains              | GA    | ... 

Our next script uses column metadata to produce output in a different format. This script, show_member.pl, allows you to take a quick look at Historical League member entries without entering any queries. Given a member's last name, it displays the selected entry like this:

 % ./show_member.pl artel last_name:  Artel first_name: Mike suffix: expiration: 2008-04-16 email:      mike_artel@venus.org street:     4264 Lovering Rd. city:       Miami state:      FL zip:        12777 phone:      075-961-0712 interests:  Civil Rights,Education,Revolutionary War member_id:  63 

You can also invoke show_member.pl using a membership number, or using a pattern to match several last names. The following commands show the entry for member 23 or the entries for members with last names that start with "C":

 % ./show_member.pl 23 % ./show_member.pl C% 

The main body of the show_member.pl script follows. It uses the NAME attribute to determine the labels to use for each row of output, and the NUM_OF_FIELDS attribute to find out how many columns the result set contains:

 my $count = 0;  # number of entries printed so far my @label = (); # column label array my $label_wid = 0; while (@ARGV)       # run query for each argument on command line {     my $arg = shift (@ARGV);     # default is to do a search by last name...     my $clause = "last_name LIKE " . $dbh->quote ($arg);     # ...but do ID search instead if argument is numeric     $clause = "member_id = " . $dbh->quote ($arg) if $arg =~ /^\d+$/;     # issue query     my $sth = $dbh->prepare (qq{         SELECT * FROM member         WHERE $clause         ORDER BY last_name, first_name     });     $sth->execute ();     # get column names to use for labels and     # determine max column name width for formatting     # (only do this the first time through the loop, though)     if ($label_wid == 0)     {         @label = @{$sth->{NAME}};         foreach my $label (@label)         {             $label_wid = length ($label) if $label_wid < length ($label);         }     }     # read and print query results     my $matches = 0;     while (my @ary = $sth->fetchrow_array ())     {         # print newline before 2nd and subsequent entries         print "\n" if ++$count > 1;         foreach (my $i = 0; $i < $sth->{NUM_OF_FIELDS}; $i++)         {             # print label             printf "%-*s", $label_wid+1, $label[$i] . ":";             # print value, if there is one             print " " . $ary[$i] if defined ($ary[$i]);             print "\n";         }         ++$matches;     }     print "\nNo match was found for \"$arg\"\n" if $matches == 0; } 

The purpose of show_member.pl is to show the entire contents of an entry, no matter what the fields are. By using SELECT * to retrieve all the columns and the NAME attribute to find out what they are, this script works without modification even if columns are added to or dropped from the member table.

If you just want to know what columns a table contains without retrieving any rows, you can issue this query:

 SELECT * FROM tbl_name WHERE 0 

The WHERE 0 clause is true for all rows, so executing the statement has the effect of generating column metadata but returning no rows. After invoking prepare() and execute() in the usual way for the statement, you can get the column names from @{$sth->{NAME}}. Be aware that although this little trick of using an "empty" query works for MySQL, it's not portable and doesn't work for all database engines.

For more information on the attributes provided by DBI and by DBD::mysql, see Appendix H. It's up to you to determine whether you want to strive for portability by avoiding MySQL-specific attributes, or take advantage of them at the cost of portability.

Performing Transactions

One way to perform transactions in a DBI script is to issue explicit SET autocommit, START TRANSACTION, COMMIT, and ROLLBACK statements. (These statements are described in "Performing Transactions," in Chapter 2, "MySQL SQL Syntax and Use.") However, DBI provides its own abstraction for performing transactional operations. This abstraction is expressed in terms of DBI methods and attributes, and takes care of issuing the proper transaction-related SQL statements for you automatically. It's also portable to other database engines that support transactions, whereas the SQL statements may not be.

To use the DBI transaction mechanism, several requirements must be satisfied:

  • You must have DBD::mysql 2.0900 or later.

  • Your MySQL server must support transaction-safe storage engines such as InnoDB or BDB. Chapter 2 describes how to determine whether this is true.

  • You application must use tables that are of a transaction-safe type. If they are not, use ALTER TABLE to change their type. For example, to change a given table tbl_name to be an InnoDB table, use this statement:

     ALTER TABLE tbl_name ENGINE = INNODB; 

Assuming that these assumptions are satisfied, the general procedure for transactional processing in DBI is as follows:

1.

Disable (or temporarily suspend) autocommit mode so that SQL statements won't be committed until you commit them yourself.

2.

Issue the queries that are part of a transaction, but do so within an eval block that executes with RaiseError enabled and PrintError disabled so that any errors will terminate the block without printing errors. If the block executes successfully, the last operation within it should be to commit the transaction.

3.

When the eval block finishes, check its termination status. If an error occurred, invoke rollback() to cancel the transaction and report the error if that's appropriate.

4.

Restore the autocommit mode and error-handling attributes as necessary.

The following example shows how to implement this approach. It's based on a scenario from Chapter 2 that showed how to issue transaction-related statements manually from the mysql client. The scenario is one in which you discover that you've mistakenly mixed up two scores for students in the score table and need to switch them: Student 8 has been given a score of 18, student 9 has been given a score of 13, and the scores should be the other way around. The two UPDATE statements needed to correct this problem are as follows:

 UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8; UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9; 

You want to update both records with the correct scores, but both updates should succeed as a unit. In the earlier chapter, the updates were surrounded by explicit SQL statements for setting the autocommit mode, committing, and rolling back. Within a Perl script that uses the DBI transaction mechanism, the updates are performed as follows:

 my $orig_re = $dbh->{RaiseError};   # save error-handling attributes my $orig_pe = $dbh->{PrintError}; my $orig_ac = $dbh->{AutoCommit};   # save auto-commit mode $dbh->{RaiseError} = 1;             # cause errors to raise exceptions $dbh->{PrintError} = 0;             # but suppress error messages $dbh->{AutoCommit} = 0;             # don't commit until we say so eval {     # issue the statements that are part of the transaction     my $sth = $dbh->prepare (qq{                 UPDATE score SET score = ?                 WHERE event_id = ? AND student_id = ?             });     $sth->execute (13, 5, 8);     $sth->execute (18, 5, 9);     $dbh->commit();                 # commit the transaction }; if ($@)                             # did the transaction fail? {     print "A transaction error occurred: $@\n";     # roll back, but use eval to trap rollback failure     eval { $dbh->rollback (); } } $dbh->{AutoCommit} = $orig_ac;      # restore auto-commit mode $dbh->{RaiseError} = $orig_re;      # restore error-handling attributes $dbh->{PrintError} = $orig_pe; 

The eval block does the work of performing the transaction, and its termination status is available in the $@ variable. If the UPDATE statements execute without error, the commit() function is invoked to commit the transaction, and $@ will be empty. If an error occurs, the eval block fails and $@ holds the error message. In that case, the code prints the message, and then cancels the transaction by invoking rollback(). (The rollback operation is placed within its own eval block to prevent it from terminating the script if it fails.)

In this chapter, DBI scripts generally use an error-handling mode in which RaiseError is enabled and PrintError is disabled. This means that they already will have the values required for performing transactions, and thus it really wouldn't have been necessary to save, set, and restore those attributes as shown in the example. However, doing so is an approach that will work even for circumstances under which you're not sure in advance what the error-handling attributes might be set to.



MySQL The definitive guide to using, programming, and administering MySQL 4. 1 and 5. 0
Mysql: the Definitive Guide to Using, Programming, and Administering Mysql 4.1 and 5.0
ISBN: B003A1PKHY
EAN: N/A
Year: 2004
Pages: 190
Authors: Paul Dubois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net