Connecting a Database to the Web | FreeBSD 6 Unleashed

Nothing we have discussed so far in this chapter is exclusively related to the web. The databases you have set up can be used by local users of the system, administered by automated scripts, and even accessed remotely by database clients on other machines if configured to do so. But most databases these days are being set up so that their content can be accessed by web browsers. Fortunately, because you have now learned all the fundamental concepts necessary for understanding web database connectivity, making that next step should be a breeze.

Using Perl and CGI to Access Database Content

As was discussed in Chapter 11, "Introduction to Perl Programming," and Chapter 26, "Configuring a Web Server," the Common Gateway Interface (CGI) is a standardized API by which data (stored in variables) is passed from web forms into back-end programs that process it and return it to the web server. Any Perl script can be made into a working CGI program simply by making sure of the following:

Its permissions are set correctly.
It's in a location accessible by the web server.
It prints out the appropriate HTTP headers before it outputs content.

The same goes for programs written in Python, C, or any other language. For the purposes of the following discussion, we'll use Perl.

You have already seen how to create a Perl script that interfaces with a MySQL or PostgreSQL back end. And, you've seen (in Chapter 26) how to create a Perl script that can handle data submitted through a web form. Creating a script that does both of these things is a simple matter of merging the functionality of the two kinds of scripts. The sections that follow explain how to accomplish this merger.

Inserting Data into the Database

Let's say you have a web form that takes a user's information (the same fields that you saw earlier in the Users table) and passes this information to a CGI program that then inserts the information fields into the database. To accomplish this, you simply have to make sure the script is set up to handle form input as well as database connections. Here's how to do that:

#!/usr/local/bin/perl use Mysql; $dbh = Mysql->connect(undef, "PictureArchive","frank","franks-password"); read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); foreach $pair (@pairs) {   ($name, $value) = split(/=/, $pair);   $value =~ tr/+/ /;   $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;   $value =~ s/~!/ ~!/g;   $FORM{$name} = $value; } print "Content-type: text/html\n\n";

Now, when someone accesses the web form page (which refers to this script in the ACTION field of the <FORM> tag, as you saw in Chapter 26), the variables from that form's fields are available to you in the %FORM array.

The code in this block performs several operations on the form input to prevent the user from inserting malicious HTML code instead of legitimate names and ages. However, when dealing with databases, there is an additional caveat: You must ensure that the data submitted by the user is compatible with the functions that insert it into the database as well as with the database itself. This is where the >$dbh->quote function comes into play.

In a loop similar to the one in Listing 29.1, you can quote the input strings in the following way:

[View full width]
$inserth = $dbh->query("INSERT INTO Users (Fullname,Email,Age) VALUES (".$dbh->quote( $FORM{'fullname'}).",".$dbh->quote($FORM{'e-mail'}).",".$dbh->quote($FORM{'age'}).")");

The $dbh->quote function will properly escape all quotation marks, apostrophes, and other special characters to ensure they don't interfere with the text, as entered by the user, being inserted into the database without being truncated or translatedor even causing the script to fail (which would have happened if someone had entered a string with double quotes in it!).

Similar failures can occur if a user attempts to input data that's incompatible with the type of column it's supposed to go into. For instance, if Age is defined as an INT column, and a user enters "I'm not telling!" for his age, a value of zero will be entered; similarly, if a VARCHAR field is only 10 characters long, and the user enters a string longer than that, the string will be truncated upon insertion into the database. Still other conditions can cause the query operation to fail completely, returning an error, and the entire script will crash.

For this reason, it's important to implement your own consistency-checking code in your script before the data gets to the database. For instance, you can ensure that data going into an INT column is a nonzero integer by putting it into a conditional loop, as shown here:

if (int($FORM{'age'}) {   $updh = $dbh->query("UPDATE Users SET Age='$FORM{age}' where id='$userid'"); } else {   print "Error!"; }

Caution

Be aware that using single quotes around field names in associative array elements, such as $FORM{'age'}, won't work in this kind of query (where the entire query string is double quoted and the field contents are single quoted). Perl lets you get away with leaving out the single quotes within the variable, as shown here, but this can be an invitation to trouble. It may be safer for you to reassign the contents of $FORM{'age'} to another variable (for instance, $age) before using it in the query.

Another safeguard is to use the MAXLENGTH attribute in your HTML form input fields. For instance, if the length of a VARCHAR column is 64 bytes, you can use the following HTML code for the corresponding form input field:

<INPUT NAME="fullname" MAXLENGTH="64">

This will prevent the user from entering any more than 64 characters, and you won't have to worry about the input data getting truncated when it's inserted.

Good coding practices dictate that error-trapping should be done as close as possible to the user. With web-to-database programming, you're dealing with three different layers of code separating the user from the data store: the front end (HTML and JavaScript), the CGI layer (Perl, Python, or C), and the back end (the database APIs and the built-in tableconsistency safeguards). The best-designed web applications have as much error-checking implemented in HTML and JavaScript as possible; then the CGI programming layer is used to catch more esoteric cases. Ideally, the database itself should never be exposed to input errors that fall through from those upper layers.

Creating a Search Form

Now that you know how to insert data into the database from a web form, you're ready to implement something a little more generally useful: a search form. A single HTML input field ($FORM{'keywords'}) can take an arbitrary set of words, separated by spaces. Then the Perl code underlying it will divide that string into its component words and search the database for matches against each one. If more than one word matches a certain entry, you want to get a better "score" for that entry and sort the results according to that score. Even better, you want to add weight to an entry that matches all the keywords in order. Here are the basic steps for creating the back-end code to handle this type of search form:

Check to make sure there is something in the input field (for example, "pictures of yosemite"). Then, create a temporary table, which is a table that exists only for the duration of this program and is dropped at the end of the $dbh session. (This table, called Queries, will hold all the querieswildcards and allyou plan to execute.)
Split the contents of the input field into the @querywords array: ("pictures", "of", "yosemite).
Create $comboquery, which is going to be a string of all the keywords separated by the % wildcard: %pictures%of%yosemite%. This will be the query term that should match entries with all the keywords, in order. If you search for this term as well as for all the keywords individually, its score should reflect all the matches of the individual words, plus that of its combined match. Therefore, it should be your topmost result. Add a final % to it and then push it onto the @querywords array.
Go through each keyword and create a variable with that word surrounded by % wildcards (such as %yosemite%) and quote it to protect your database. Then, insert it into the temporary Queries database.

You now have a table full of queries ready to perform.

The following examples show how these steps are executed in Perl. We begin with the form-handling preamble discussed earlier; after that, the steps you have just seen are pointed out in comments in the code.

#!/usr/local/bin/perl use Mysql; $dbh = Mysql->connect(undef, "PictureArchive","frank","franks-password"); read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); foreach $pair (@pairs) {   ($name, $value) = split(/=/, $pair);   $value =~ tr/+/ /;   $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;   $value =~ s/~!/ ~!/g;   $FORM{$name} = $value; } print "Content-type: text/html\n\n"; # Here begins the handling of the form input. if ($FORM{'keywords'}) {                        # Step 1   $sth = $dbh->query("CREATE TEMPORARY TABLE Queries (Query VARCHAR(255))");   @querywords = split(/\s/,$FORM{'keywords'});     # Step 2   $comboquery = $FORM{'keywords'};                 # Step 3   $comboquery =~ s/\s+/%/g;   push (@querywords,$comboquery);   foreach (@querywords) {                          # Step 4     $_ = $dbh->quote('%'.$_.'%');     $sth = $dbh->query("insert into Queries (Query) VALUES ($_)"); }

Now, you can execute the query. Here, you're searching filenames in Pictures, and a unique picture is defined by both User_ID and Filename, so we'll group by both of those. The COUNT() of the results you'll call Score, and you'll sort by that column in descending order:

[View full width]
 $searchh = $dbh->query("SELECT Filename,User_ID,COUNT(Filename) AS Score FROM Pictures  WHERE Filename LIKE Query GROUP BY Filename,User_ID ORDER BY Score DESC");

Finally, you can assign the contents of each result row to an array and print out some appropriate HTML code, which you can season to your taste:

  while (@searchdata = $searchh->fetchrow) {     print "$searchdata[0] (Score: $searchdata[2])<BR>\n"   } }

Needless to say, a more sophisticated searching algorithm can certainly be constructed, and it can be even easier to design a more efficient one. MySQL has a concept called fulltext indexes; you can define a TEXT field to have a FULLTEXT index, and then you can perform specialized searches on that column with great speed and efficiency. Refer to the MySQL documentation (http://www.mysql.com/doc/en/Fulltext_Search.html) for details on how this is done.

Note

Note that FULLTEXT searches will not work on keywords smaller than four letters; because the indexing code is tuned to work best on very large data sets, searches on small data sets will produce very weird results.

Session Locking and Data Integrity

If your web application has a lot of users accessing it at the same time, it's important to code your application's database operations to be as atomic as possible. As we discussed earlier, an atomic operation is one in which all commands and responses are executed in a single instruction so that no competing processes can interfere and potentially cause a change in the state of the database while your operation is executing.

MySQL and PostgreSQL both support the concept of transactions, where instead of issuing queries in autocommit mode (where commands are sent to the database engine as soon as they're entered, and the response data is immediately returned), you can create an atomic operation by using the BEGIN and COMMIT commands as well as any number of query operations in between. The use of these commands in a transaction is shown here ($now is assumed to be a Perl variable containing the current UNIX time):

BEGIN; SELECT * FROM Users; UPDATE User_Prefs SET Last_mod='$now' WHERE Last_mod > '$now'; COMMIT;

The queries are not executed until the COMMIT command is given, and while the queries are operating, the tables in use are locked. When locked, the tables cannot be accessed by any other processes, which have to wait until this process is completed.

In order to use such transactions in MySQL, you must begin the session with the following command:

SET AUTOCOMMIT=0;

PostgreSQL will automatically drop out of autocommit mode at the first BEGIN statement. Also, if you use the Perl or Python modules to connect to PostgreSQL, it will be in non-autocommit mode automatically. In fact, you can choose to use the BEGIN and COMMIT methods in the query context or through the dedicated database API commands, which do the same thing. In MySQL, transactions are a much newer addition to the engine and are most easily executed directly as SQL query commands.

The ROLLBACK command can be used instead of a COMMIT to cancel any queries that have been issued since the BEGIN command. If, for instance, you're working at the database command line and you mistakenly issue a DELETE FROM User command, as long as you started with a BEGIN statement you can enter ROLLBACK and the deletion will be reversed. Naturally, using the BEGIN command is a desirable (if somewhat onerous) habit to develop!

Caution

If you're using transactions, don't forget to use COMMIT after you've made your (intentional) changes! It's an easy step to leave out of your code, but if you do, it won't necessarily be obvious what's going wrong when you go to debug the script.

Security Concerns

CGI programs that connect to database back ends (particularly MySQL, which is entirely password-based) have one big security weakness: The password for connecting to the database must be hard-coded into the script or read in from another file. However you choose to implement this, if the web server is running as the www user, it will execute the CGI program under the ownership of www; any file that contains the password must be readable by the www user.

If you put the password in a file that's readable only by root, the CGI program won't be able to get access to it. However, if you leave it hard-coded into a file that the www user can read, anyone on your system can write her own CGI program that can read that same file and print it out for all to seeor use it to connect to your database and issue a few DROP DATABASE commands. This is an inherent flaw in database security as it currently exists in the UNIX world, and there's no elegant solution to it.

The easiest workaround is to disallow regular users from running CGI scripts. You may also be able to come up with creative solutions using CGIwrap (which is covered in Chapter 30, "Network Security"). But what it ultimately comes down to is a question of what kind of security level you want to enforce. Do you trust your local users? If so, you can probably get away with living dangerously. But if you don't trust the users with accounts on your system not to try to hack the database access password out of your scripts, you probably shouldn't be running a database with sensitive or critical data on that same server.

However, if you're running PostgreSQL, you can implement security that does not depend on passwords. The configuration we discussed earlier using ident and sameuser will only allow a user to access his own database; it doesn't allow the user to access any other databases. This type of configuration helps to prevent local users from breaking into databases where they don't belongand you don't have to worry about your password being exposed by a malicious CGI program.