Using Good Database Programming Techniques

The following sections will introduce some of the most common techniques used by database programmers to ensure that their applications are robust (do not break down easily), portable (easy to move to new environments and platforms), and easy to maintain. Persistent connections are a useful technique when the application makes high numbers of connection requests that mostly come from the same source in a short period of time. It's also all too easy for inexperienced, hurried, or lazy programmers to create code that makes the tasks of subsequent programmers (and often themselves, too) more difficult than need be by not considering portability and maintainability or placing too much burden on the application instead of the database. By doing some simple planning at the beginning, you can avoid needless problems later.

Using Persistent Connections

MySQL has always been fairly quick to connect to, compared to other databases. However, connecting to the database is still a fairly heavy task, and, if you are doing a large number of connections in a short space of time (such as when connecting from a web server), you want to make this as easy on your resources as possible. Using persistent connections will keep the connection open after the script has completed. When the next connection request comes along, it will reuse the existing connection, saving the overhead. In PHP, you can use the mysql_pconnect() function to create a persistent connection:

mysql_pconnect($host,$user,$pass);   // the mysql_pconnect function creates a persistent connection to a   // mysql database, taking host, user and password parameters

You don't always want the connections to hang around too long, though. In most cases, the web server will clean up after itself. However, I have encountered a web server that was having problems, resulting in it restarting itself but not cleaning up the connections. The web server was set to allow 400 instances, and MySQL could take 750 connections. With the web server misbehaving and not cleaning up, it effectively doubled the number of connections it was making to the database server, allowing a potential 800. Suddenly the database server was running out of available connections. You can minimize the risk of persistent connections hanging around for too long by reducing the wait_timeout mysqld variable (or interactive_timeout, depending how you connect), which determines the amount of time MySQL allows a connection to remain inactive before closing it. (Chapter 10, "Basic Administration," and Chapter 13, "Configuring and Optimizing MySQL," explain how to set these variables.) The default is 28,800 seconds (8 hours). By reducing this to 600 seconds, I prevented the problem from recurring. Then there was just the web server to worry about!

Making Code Portable and Easy to Maintain

Some simple steps can vastly improve the flexibility of the code. These include keeping connection details separate and in a single location, as well as building database queries flexibly so that future changes in database structure do not needlessly affect the application.

Connecting

Most programming languages make it easy to connect to a database through native functions. For example, PHP has a host of functions for use with MySQL, such as mysql_ connect(), mysql_query(), and so on.

When programming a tiny application with one connection to the database, using the native classes, it's easy to use something simple to connect to MySQL (see Listing 5.1).

Listing 5.1: TOTALLY_IMPORTABLE.PHP

$db = mysql_pconnect("dbhostname.co.za", "db_app", "g00r002b");   // the mysql_pconnect function creates a persistent connection to a   // mysql database, taking host, user and password parameters   // where 'dbhostname' is the host, 'db_app' the user and    // 'g00r002b' the password if (!$db) {      echo "There was a problem connecting to the database.";      exit; }   // basic error checking - if the connection is not successful, print   // an error message and exit the script

Many examples you'll come across connect in this way because it's simple to understand and fine for small situations. But to use a database in a more serious application, you'll want to make it as portable, as easy to maintain, and as secure as possible.

Imagine you have 10 scripts that connect to the database. If all 10 scripts connected in the same way, and then one day you had to move the database to a new server or wanted to change your password, you'd have to change all 10 scripts.

Now imagine you had 100 scripts.

I inherited a situation like this once before, and, facing the possibility of the password having been compromised (with the password in hundreds of locations, it's more likely to be found as well), I had to do lots of grunt work! The best solution is to build the application correctly from the beginning. Place your database connection details in a separate location, which is then included in scripts that connect to the database. Then, when you need to change the details, you only need to change them in one place (and you know that nothing has been forgotten). Changing the password in hundreds of locations involves the risk of missing a location and only finding out when functionality fails.

The solutions shown in Listings 5.2 and 5.3 are better.

Listing 5.2: DB.INC

$host = "dbhostname.co.za"; $user = "db_app"; $pass = "g00r002b";

Listing 5.3: NOT_TOO_PORTABLE.PHP

require_once "$include_path/db.inc"; // includes the file containing the connection details, db.inc // located in the path: $include_path, which should be a safe location $db = mysql_pconnect($host, $user, $pass);   // the mysql_pconnect function creates a persistent connection to a   // mysql database, taking host, user and password parameters if (!$db) {      echo "There was a problem connecting to the database.";      exit; }   // basic error checking - if the connection is not successful, print   // an error message and exit the script

In this example, the password, hostname, and username are stored in a single file, so you only need to change the details in one place (db.inc).

Warning

If you're creating a web application, make sure that db.inc is not inside the web tree. (Your web server should usually not be able to serve .inc files, but in any case, you'll want to keep sensitive information as far removed as possible.)

A further improvement comes from a slightly higher level of abstraction. Imagine, with Listings 5.2 and 5.3, that management decides to move to another DBMS. It happens; MySQL is not ideal for every kind of situation, and, besides, management often makes strange decisions—such as another situation I walked into, where management had spent thousands on another database when all that was needed was to configure MySQL properly. (Luckily I managed to talk them out of it because, once again, the code was not very portable!)

To make your code as portable as possible, you only want to change the DBMS details in one place. This involves creating a second function that handles the connection details, as shown in Listings 5.4 and 5.5. The db_pconnect() function is placed in the file db.inc, and in portable.php this function is used to connect to the database instead of mysql_pconnect().

Listing 5.4: DB.INC

// this function connects to the database, and returns the connection function db_pconnect() {   $host = "dbhostname.co.za";   $user = "db_app";   $pass = "g00r002b";   return mysql_pconnect($host, $user, $pass);   }

Listing 5.5: PORTABLE.PHP

require_once "$include_path/db.inc";  // includes the file containing the connection details, db.inc  // located in the path: $include_path, which should be a safe location $db = db_pconnect($host, $user, $pass); if (!$db) {   echo "There was a problem connecting to the database.";   exit; }  // basic error checking - if the connection is not successful, print  // an error message and exit the script

Now, if you want to change from MySQL to another database, simply replace mysql_ pconnect() with the appropriate function, such as odbc_pconnect()

Note

This book is not trying to impose a programming style upon you. Each language has its own strengths and weaknesses. Java is a much more object-oriented language than PHP, for example, so the previous examples will not work well if just directly translated into Java. What's important is the principle of making your applications as easy to maintain ( by storing connection information in one location) and as portable ( by avoiding database-specific code) as possible.

Database Queries

It's acceptable to use shortcuts such as SELECT * when querying MySQL directly. But you should avoid this in your applications, as they make them less portable. Take a situation where there are three fields in the entrants database table; in order, they are id, first_name, and surname. Your programming code may look something like Listing 5.6.

Listing 5.6: TOTALLY_INFLEXIBLE_SELECT.PHP

 // assuming the connection $db has already been made $result = mysql_query("SELECT * FROM entrants",$db);   // run the query on the active connection while ($row = mysql_fetch_array($result,MYSQL_NUM)) {     // mysql_fetch_array when called with MYSQL_NUM     // as a parameter, returns a numerically indexed     // array, each element corresponding to a      // field returned from a returned row   $id = $row[0];     // Because the id is the first field in the database,     // it is returned as the first element of the array,      // which of course starts at 0   $first_name = $row[1];   $surname = $row[2];   // .. do some processing with the details }

This works at first. But now, somebody (it's always somebody else) makes a change to the database structure, adding a new field between first_name and surname, called initial. Your code does not need the initial. Suddenly your code does not work, as the initial is the third element of the array (or $row[2]), and it is stored as $surname. The actual surname, now in $row[3], is never accessed.

There are a number of problems with the totally_inflexible_select.php script that need to be addressed. Besides the fact that it will not work if the database structure is changed, the function used to return fields returns a numeric array, rather than an associative array. Your code is then less readable, as someone with no knowledge of the database structure cannot know what is being returned from the database. In PHP, you can correct this by using a function that returns an associative array, as shown in Listing 5.7.

Listing 5.7: INFLEXIBLE_SELECT.PHP

 // assuming the connection $db has already been made $result = mysql_query("SELECT * FROM entrants",$db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // mysql_fetch_array when called with MYSQL_ASSOC     // as a parameter, returns an associative array,      // with each key of the array being the name of a      // field returned from the row   $id = $row["id"];   $first_name = $row["first_name"];   $surname = $row["surname"];   // .. do some processing with the details }

This is better, because now, even after the initial field has been added to the database table, your code will work. It can handle some changes in database structure, and it is more readable. A programmer with no knowledge of the database structure will know what fields are being returned. But you can still make an improvement. By running a SELECT * query, you're asking MySQL to return all the fields from the table. Your code only needs to use three fields, so why waste the resources in returning all of them, with the extra disk input/output and greater pressure on your network this involves? Rather, just specify the fields you want to return. Not only does this waste fewer resources, but it also makes your code even more readable. In fact, in some instances returning the associative array is more resource hungry than returning a numeric array. In this case, you can still keep your code readable, even when returning a numeric array, by specifying the fields as shown in Listing 5.8.

Listing 5.8: FLEXIBLE.PHP

 // assuming the connection $db has already been made $result = mysql_query("SELECT id,first_name,surname FROM entrants",$db); while ($row = mysql_fetch_array($result,MYSQL_NUM)) {     // mysql_fetch_array when called with MYSQL_NUM     // as a parameter, returns a numerically indexed     // array, each element corresponding to a      // field returned from a returned row   $id = $row[0];   $first_name = $row[1];   $surname = $row[2];   // .. do some processing with the details }

The same principle applies to INSERT queries. Never use an INSERT query without a list of fields inside an application. For example, taking the original entrants table with three fields— id, first_name, and surname—you could use code as shown in Listing 5.9.

Listing 5.9: INFLEXIBLE_INSERT.PHP

  // assuming the connection $db has already been made $result = mysql_query("INSERT INTO entrants–  VALUES('$id','$first_name','$surname')",$db);

If the table structure changes, once again the code will break. By adding a field, initial, the number of fields being inserted will not match the number of fields in the table, and the query will fail.

The way to solve this is to specify the database fields you are inserting into, as shown in Listing 5.10.

Listing 5.10: FLEXIBLE_INSERT.PHP

  // assuming the connection $db has already been made $result = mysql_query("INSERT INTO entrants(id, first_name,–  surname) VALUES('$id','$first_name','$surname')",$db);

This also has the advantage of being more readable, especially considering that the field values will not always match the field names as in this example.

How Much Work Should the Database Server Do?

One of the perennial debates raging among architects is that of how much work the database server must do and how much the application should do.

MySQL developers initially came out heavily in favor of passing the burden to the application, partly as justification for not supporting features such as stored procedures and triggers and partly as a matter of principle. They were criticized for this, and the lack of these features led people to dismiss MySQL as a serious database—a tag MySQL is only now, with version 4, starting to shake off.

In general, the database should do as much work as it can. Take the following two examples, which produce the same output but are done in different ways. Listing 5.11 returns all the data, unsorted, and then uses the PHP sort() function to sort the data. Listing 5.12, on the other hand, uses MySQL's ORDER BY clause to sort the data.

Listing 5.11: WORK_THE_SCRIPT.PHP

  // assuming the connection $db has already been made $result = mysql_query("SELECT surname FROM entrants",$db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // mysql_fetch_array when called with MYSQL_ASSOC     // as a parameter, returns an associative array,      // with each key of the array being the name of a      // field returned from the row   $surname[] = $row["surname"];     // add the surname as the next element of the      // surname array (and creates the array if not yet done) }   sort($surname)   // the sort() function sorts the array   // … continue processing the sorted data

Listing 5.12: WORK_THE_DB.PHP

  // assuming the connection $db has already been made $result = mysql_query("SELECT surname FROM entrants ORDER BY surname",$db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // mysql_fetch_array when called with MYSQL_ASSOC     // as a parameter, returns an associative array,      // with each key of the array being the name of a      // field returned from the row   $surname[] = $row["surname"];      // add the surname as the next element of the      // surname array (and creates the array if not yet done)  }   // … continue processing the sorted data

Listing 5.12 makes a lot more sense. MySQL could (or should) have an index on the surname field if this is a commonly performed operation, and reading the data in order, from an index, is much quicker than reading the data in the unordered format and then using the application to perform the sort.

In fact, it's possible that reading the sorted data from the database will be quicker than reading the unsorted data (even before taking the sort() function into account) because the sorted data may only need to be read from the index, not the data file.

There are possible exceptions (perhaps situations where an index is not possible and the database server is the main bottleneck), but in the vast majority of cases, the technique shown in Listing 5.12 will be far superior.

A similar, more extreme (but still all too common) example is one where the application is performing the work of the WHERE clause, as Listing 5.13 demonstrates.

Listing 5.13: WORK_THE_SCRIPT2.PHP

  // assuming the connection $db has already been made $result = mysql_query("SELECT surname FROM entrants",$db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // mysql_fetch_array when called with MYSQL_ASSOC     // as a parameter, returns an associative array,      // with each key of the array being the name of a      // field returned from the row   if ($row["surname"] == 'Johnson') {     $johnson[] = $row["surname"];       // add the surname as the next element of the        // johnson array (and creates the array if not yet done)   }   elseif ($row["surname"] == 'Makeba') {     $makeba[] = $row["surname"];     // add the surname as the next element of the      // makeba array (and creates the array if not yet done)   } }   // … process the makeba and johnson arrays

A far better solution is to use the WHERE clause, as shown in Listing 5.14, and not waste time retrieving all the unwanted extra records.

Listing 5.14: WORK_THE_DB2.PHP

  // assuming the connection $db has already been made $result = mysql_query("SELECT surname FROM–  entrants WHERE surname = 'Makeba' OR surname='Johnson'",$db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // mysql_fetch_array when called with MYSQL_ASSOC     // as a parameter, returns an associative array,      // with each key of the array being the name of a      // field returned from the row   if ($row["surname"] == 'Johnson') {     $johnson[] = $row["surname"];       // add the surname as the next element of the        // johnson array (and creates the array if not yet done)   }   elseif ($row["surname"] == 'Makeba') {     $makeba[] = $row["surname"];       // add the surname as the next element of the        // makeba array (and creates the array if not yet done)   } }   // … continue processing the sorted data

You can write these code snippets more elegantly if you're processing many names, but the point is that Listing 5.14 is much more efficient because MySQL is doing the work, limiting the results returned and reducing the resources used.

Listing 5.15 demonstrates a solution often implemented by people from other database backgrounds. Because MySQL version 4.0 does not fully implement sub-selects (although this will be rectified in version 4.1), people often assume there is no alternative but using the application. For example, in a situation where you have two customer tables and want to see which customers from one table do not appear in the other table, the following standard ANSI query does not work in MySQL:

 SELECT first_name,surname FROM entrants WHERE code NOT IN  (SELECT code FROM referred_entrants);

For this reason, the sort of code shown in Listing 5.15 is all too frequently implemented.

Listing 5.15: WORK_THE_SCRIPT3.PHP

  // assuming the connection $db has already been made $result = mysql_query("SELECT code FROM entrants",$db); $codelist = "";   // initialise the list of codes while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // mysql_fetch_array when called with MYSQL_ASSOC     // as a parameter, returns an associative array,      // with each key of the array being the name of a      // field returned from the row   $codelist .= $row["code"].",";         // add the code, followed by a comma to the $codelist variable } $codelist = substr($codelist, 0, -1);   // removes the last comma, resulting in a list   // such as "1,3,4,8,12"; $result = mysql_query("SELECT first_name,surname FROM–  referred_entrants WHERE code NOT IN($codelist)",$db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {     // … process the entrant details  }

Listing 5.15 would function, but again it is placing far too much of a load on the application. Instead, with some thought, MySQL could perform a query to return the results, as Listing 5.16 demonstrates.

Listing 5.16: WORK_THE_DB3.PHP

    // assuming the connection $db has already been made $result = mysql_query("SELECT entrants.first_name,–   entrants.surname FROM entrants LEFT JOIN referred_entrants–   ON entrants.code = referred_entrants.code WHERE–   referred_entrants.code IS NULL",$db);  while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {       // … process the entrant details  }

When sub-selects are implemented (version 4.1 according the current schedule), the resulting code, shown in Listing 5.17, is even simpler.

Listing 5.17: WORK_THE_DB3_2.PHP

    // assuming the connection $db has already been made $result = mysql_query("SELECT first_name,surname FROM–  entrants WHERE code NOT IN (SELECT code FROM–  referred_entrants", $db); while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {       // … process the entrant details  }