Storing and Retrieving Images | MySQL and Perl for the Web

only for RuBoard - do not distribute or recompile

Images are used in many Web applications. This section describes a couple of small scripts that provide you with the ability to load images into MySQL over the Web or from the command line. It also discusses a script that serves images over the Web by pulling them from the database. My reason for placing a section on image storage and retrieval at this point in the chapter is that we ll need image-serving capability for the next section, which demonstrates how to write an electronic greeting card application. One feature of that application is that it enables users to select a picture to display with the card when the recipient views it.

Obviously, to implement that capability, we ll need to be able to send images to the client s browser. And we ll need to supply the application with some images first so that it has something to send! So, here we are.

Images Aren t Special! (Part I)

Although the scripts presented in this section show how to perform storage and retrieval using images, the techniques can be adapted easily for working with any kind of data, not just images. The information here can help you construct a database of sound or video clips, PDF files, compressed data, and so forth.

For storing the images, we ll use a table called image that contains a descriptive name for the image, the image itself, a thumbnail (small version) of the image, and the image MIME type:

 CREATE TABLE image  (     name        VARCHAR(60) NOT NULL,   # descriptive name for image      UNIQUE (name),      image       BLOB NOT NULL,          # image data      thumbnail   BLOB NOT NULL,          # thumbnail data      mime_type   VARCHAR(20) NOT NULL    # image MIME type  )

The scripts we ll develop refer to images by name. The name column has a unique index so that we don t give the same name to two different images. image and thumbnail are BLOB columns, the usual type for binary data. (VARCHAR isn t really suitable for such data, because it has a maximum length of just 255 characters.) The mime_type column contains values that identify the image format. These will be values such as image/gif, image/jpeg, image/png, and so forth. The image type value applies to both the image and its thumbnail.

To populate the image table, we ll write an image-loading script (upload_image.pl) that allows images stored on your local disk to be uploaded using your browser for storage into MySQL. It works as follows:

The initial page presents a form containing a file-upload field for specifying the image file you want to transfer and a field for giving the image a descriptive name. If you have an image of the U.S. flag in a file named us_flag.jpg, for example, you might give it a descriptive name of Stars & Stripes.
When you submit the form, your browser will send the field values back to the Web server, including the contents of the image file. upload_image.pl receives this information, creates a thumbnail version of the image, and stores everything into the database. It also presents a confirmation page that reports the status of the upload operation and displays the image so that you can see that it really was received properly.

Where Should Images Be Stored?

One of the ongoing debates about images and databases is whether to store images in the database or whether to store them in the file system and store only the pathname in the database. upload_image.pl shows how to store images in the database so that you ll know how to do it if you want to, but I m not going to try to settle the debate. If you want more information about the pros and cons of each approach, search the MySQL mailing list archives.

The dispatch logic for upload_image.pl is similar to that of several previous applications, so I won t show it here. It invokes display_upload_form() to present the entry form and process_form() to handle submitted images.

display_upload_form() generates the image-selection form. The important thing you should notice about the code is that it uses start_multipart_form() rather than start_form(). File uploads require form contents to be encoded differently from regular forms (otherwise the file content transfer won t work properly):

 sub display_upload_form  {     print start_multipart_form (-action => url ()),              "Image file: ", br (),              filefield (-name => "image", -size => 60),              br (),              "Descriptive name for image: ", br (),              textfield (-name => "name",                          -value => "",                          -override => 1,                          -size => 60),              br (), br (),              submit (-name => "choice", -value => "Submit"),              end_form ();  }

When the user submits an image, process_form() is called. This function makes sure the name and image file were both supplied, creates the thumbnail from the image, stores everything in the database, and displays a confirmation page:

 sub process_form  { my $name = param ("name");              # image name  my $image = param ("image");            # image file  my @errors = ();  my $dbh;  my $mime_type;  my ($full, $thumb);  my $serve_url;      $image = "" unless defined (($image);# convert undef to empty string      $name = WebDB::trim ($name);         # trim extraneous whitespace from name      # check for required fields      push (@errors, "Please supply an image name") if $name eq "";      push (@errors, "Please specify an image file") if $image eq "";      if (@errors)      {         print p ("The following errors occurred:");          print ul (li (\@errors));          print p ("Please click your Browser's Back button to\n"                  . "return to the previous page and correct the problem.");          return;      }      # Form was okay; get image type and contents and create new record.      # Use REPLACE to clobber any old image with the same name.      $mime_type = uploadInfo ($image)->{'Content-Type'};      ($full, $thumb) = read_image_file ($image);      $dbh = WebDB::connect ();      $dbh->do (             "REPLACE INTO image              (name,image,thumbnail,mime_type)              VALUES(?,?,?,?)",                  undef,                  $name, $full, $thumb, $mime_type);      $dbh->disconnect ();      # Image was stored into database successfully.  Present confirmation      # page that displays both the full size and thumbnail images.      print p ("The image upload was successful.");      # encode the name with escape() for URL, but with escapeHTML() otherwise      $serve_url = sprintf ("serve_image.pl?name=%s", escape ($name));      $name = escapeHTML ($name);      $mime_type = escapeHTML ($mime_type);      print p ("Image name: $name"),              p ("MIME type: $mime_type"),              p ("Full size image:"),              img ({-src => $serve_url, -alt => $name}), "\n",              p ("Thumbnail image:"),              img ({-src => "$serve_url;thumbnail=1", -alt => $name}), "\n";      # Display link to main page so user can upload another image      print hr (), a ({-href => url ()}, "Upload next image");  }

process_form() validates the form by making sure that both the descriptive image name and the file pathname are present. We do the usual thing of trimming extraneous whitespace from the descriptive name. However, we don t do that for the pathname because that value legitimately could begin or end with spaces, and trimming it would change the name. (CGI.pm makes the file contents and information about the file available through the pathname; changing it would render the script unable to access the file.)

If any errors occur during validation, we indicate what they are and instruct the user to return to the preceding page to correct the problems. You may recall that in the discussion of form validation for the product-registration application, I discouraged the approach of having users click the Back button, favoring instead the method of redisplaying the form on the same page as the error messages. Aren t I contradicting that advice here? Yes, I am; and before you read the footnote that explains why, I invite you to consider why this might be.^[12]

^[12] The reason I don t follow my own advice here is that CGI.pm won t initialize the value of file-upload fields. This prevents script writers from trying to trick users into uploading specific files, but it also means you can t take advantage of CGI.pm s sticky form behavior for file fields. In fact, this isn t just a CGI.pm behavior; browsers themselves may refuse to honor a value attribute for a file-upload field, even if your script includes one by writing the HTML directly. This means we can t properly redisplay the form with the values submitted by the user, and therefore really don t have much choice but to ask the user to return to the previous page.

If the form contents check out okay, we get the image s MIME type using uploadInfo(), a CGI.pm function that provides information about the uploaded file, given the filename as an argument. (This function is described in Chapter 4, Generating and Processing Forms, in the section that discusses the sample form application.) The return value is a reference to a hash of file attributes and values. One of these attributes, Content-Type, gives us the image s MIME type.

Next, we read the image from the temporary file in which it is stored. read_image_file() (discussed shortly) reads the image file, creates the thumbnail, and returns both values. At this point, we have all the information we need to create a new image table record. The statement that adds the record uses REPLACE rather than INSERT to make it easy to overwrite an existing image with a new one. (INSERT would generate an error, and INSERT IGNORE would keep the old image and discard the new one. Neither behavior is desirable here.)

Images Aren t Special! (Part II)

I have the feeling that I should write a headline in GREAT BIG LETTERS that the do() statement used in the process_form() function answers the often-asked question, How, oh how, do I insert images into my database? What s the special trick? Well, actually, there isn t one. Images are inserted the same way as any other kind of data: Use placeholders or the quote() function. The usual thing that gives people trouble putting images in a database is the failure to properly escape the special characters that images usually contain. Images consist of binary data, so attempting to put an image directly into the query string without properly escaping its content almost certainly will fail. If you use a placeholder, or insert the image data into the query string after calling quote(), you ll have no problems.

If images seem special compared to text values, due to the need to escape special characters, that s a sign you re probably not processing text properly, either. With text values, you can often get away with not using placeholders or quote(), but that doesn t mean it s correct to do so. Text can contain special characters that cause problems, too such as quote characters. It s important to use placeholders or quote() for all data, not just images or other binary data. If you do that consistently, you ll likely find that the magic conceptual distinction between text and images disappears.

With the image safely stored away in the database, we can present a confirmation page to the user. Given that we re working with images, we may as well make this a graphical page. Therefore, we ll not only inform the user that the upload succeeded, but we'll also display the image and its thumbnail as well.

The image parts of the confirmation page are nothing more than <img> tags that reference an image stored in the database. If the image s name is My Image, for instance, the tags will look like this:

 <img src="/books/2/307/1/html/2/serve_image.pl?name=My%20Image" alt="My Image">  <img src="/books/2/307/1/html/2/serve_image.pl?name=My%20Image;thumbnail=1" alt="My Image">

When your browser sees each of these tags in the page, it will send requests to the Web server to retrieve the corresponding images. As the tags show, these requests are handled by another script, serve_image.pl. (We have yet to write this script, but we ll get to it soon.) serve_image.pl yanks an image out of the database and turns it into a valid image transfer to the browser. The <img> tags refer to the script without a leading pathname; we can get away with that if we install serve_image.pl in the same directory as upload_image.pl. The name parameter specifies which image serve_image.pl should return to the browser, and the absence or presence of the thumbnail parameter indicates whether it should return the full-size image or the thumbnail.

The last thing process_form() displays in the confirmation page is a link to the main upload_image.pl page so that the user can transfer another image if desired.

We still have to see how to read the contents of the uploaded image file from the temporary file where it s stored and how to produce a thumbnail from it. Let s return to read_image_file(), the function that actually does this. This function uses some of the capabilities of Image::Magick, a Perl module that allows sophisticated image manipulations to be performed. (You should obtain Image::Magick from the CPAN and install it if you don t already have it.)

We pass read_image_file() the value of the image parameter from the upload form. That parameter contains the name of the file. However, CGI.pm performs a little trick that also allows it to be treated as an open file handle pointing to the uploaded file, so we can use it to read and process the file:

 use Image::Magick;  sub read_image_file  { my $fh = shift;             # filename/file handle  my $img = new Image::Magick;  my ($full, $thumb);  my $err;      # read full-size image directly from upload file      (read ($fh, $full, (stat ($fh))[7]) == (stat ($fh))[7])          or error ("Can't read image file: $!");      # produce thumbnail from full-size image      $err = $img->BlobToImage ($full);      error ("Can't convert image data: $err") if $err;      $err = $img->Scale (geometry => "64x64");      error ("Can't scale image file: $err") if $err;      $thumb = $img->ImageToBlob ();      return ($full, $thumb);  }

To handle the image, we create a new Image::Magick object, then invoke a few of its methods after reading the contents of the file containing the full-size image. BlobToImage() converts the raw image data to a form that Image::Magick can use, and Scale() resizes the image to produce the thumbnail. The 64x64 argument to Scale() does not indicate the final pixel size of the resulting image; it indicates the boundary within which the resized image must fit. (That is, Scale() does not change the aspect ratio of the image, only its size.) After scaling the image, we call ImageToBlob() to retrieve the thumbnail as a string.^[13]

^[13] Here s something that may or may not affect you: I find that Image::Magick often crashes in read_image_file() at the ImageToBlob() call if the image is in GIF format and uses transparency.

read_image_file() uses error(), a small utility function that just displays an error message, closes the page, and exits:

 sub error  { my $msg = shift;      print p (escapeHTML ("Error: $msg")), end_html ();      exit (0);  }

Security and File Uploads

File-uploading operations have the potential to cause some security problems. See Chapter 9, Security and Privacy Issues, for a discussion of these problems and what you might want to do about them.

upload_image.pl is complete at this point, so you can install it and try it out right now if you like. Note that although the script should upload images properly, the confirmation page won t yet display the uploaded images. That s because we haven t yet written serve_image.pl, the script that handles requests to display images from the image table.

Before we create serve_image.pl, I want to take a slight detour, because I personally find it really tedious to upload image files one by one over the Web. That s convenient for occasional transfers; but when I have a pile of images, I d rather transfer them to a UNIX box and run a command-line script that loads them. Here s a command-line equivalent to the upload_image.pl script called load_image.pl :

 #! /usr/bin/perl -w  # load_image.pl - load an image file into the image table  use strict;  use lib qw(/usr/local/apache/lib/perl);  use Image::Magick;  use WebDB;  # Determine image file and image name.  Use basename of filename if no image  # name is given.  die "Usage: $0 image_file [ image_name ]\n" unless @ARGV >= 1 && @ARGV <= 2;  my $image_file = shift (@ARGV);  my $image_name = shift (@ARGV);  ($image_name = $image_file) =~ s| */|| if !defined $image_name;  # determine MIME type of image file from filename extension  my %mime_map = (     "gif" => "image/gif",      "jpg" => "image/jpeg",      "jpeg" => "image/jpeg",      "jpe" => "image/pjpeg",      "png" => "image/png"  );  my $mime_type = $mime_map{lc ($1)} if $image_file =~ /\.([^.]+)$/;  die "Cannot determine image MIME type\n" if !defined $mime_type;  # Read image file and generate thumbnail from image  my $img = new Image::Magick;  my ($err, $image_data, $thumbnail_data);  $err = $img->Read ($image_file);  die "Can't read image file: $err\n" if $err;  $image_data = $img->ImageToBlob ();  $err = $img->Scale (geometry => "64x64");  die "Can't scale image file: $err\n" if $err;  $thumbnail_data = $img->ImageToBlob ();  # Insert new record into the database image table  my $dbh = WebDB::connect ();  $dbh->do (         "REPLACE INTO image          (name,image,thumbnail,mime_type)          VALUES(?,?,?,?)",              undef,              $image_name, $image_data, $thumbnail_data, $mime_type);  $dbh->disconnect ();  warn "$image_name loaded\n";     # announce success of image storage operation  exit (0);

load_image.pl expects to find either one or two arguments on the command line. The first is the image filename. The second, if present, is the descriptive name to give to the image. (If not present, the filename itself is used as the descriptive name.) The script determines the image type from the filename suffix.

Of course, I don t really want to type in a bunch of load_image.pl commands at the shell prompt any more than I want to upload images over the Web one by one. So I FTP the images to a directory Images on my UNIX box, and then log in there and write a shell script load_images.sh that looks like this:

 #! /bin/sh  ./load_image.pl Images/blackcat.jpg "Black Cat"  ./load_image.pl Images/flowers.jpg "Flower Bouquet"  etc.

The basis of this script can be created using just a few commands:

 % ls Images > load_images.sh  % chmod +x load_images.sh  % vi load_images.sh  :1,$s/.*/.\/load_image.pl Image\/&/

All that needs to be added is the #! line at the beginning of the script and the descriptive names at the end of the command lines, and then I can load all the images easily by running load_images.sh. This is particularly useful when moving all the images to another machine, because the same script can be used there. (In other words, writing the script creates a repeatable action.)

If you don t have a shell account, you can t use this command-line approach. If you do have one, however, to my mind this method is much preferable to uploading images individually or typing a bunch of individual commands. If you re setting up the image table using the webdb distribution that accompanies this book, you ll find that its image directory includes the load_images.sh script and an Images subdirectory containing a set of sample images to use.

Serving Images

Now that we can get images into the database, how do we get them out again? This section shows how to write serve_image.pl, the script that retrieves an image from the image table and displays it in a Web page. We need this script so that the confirmation page generated by upload_image.pl can properly show the uploaded images. We ll also use serve_image.pl for image display in the electronic greeting card application developed later in this chapter.

Before we write this script, let s briefly go over the mechanism used to transfer images over the Web to browsers. Images are referenced from Web pages using <img> tags. Typically, the tag refers to a static file on the Web server host. For example, the following tag refers to the Powered by Apache image file located in the top directory of the document tree on the host www.snake.net:

 <img src="/books/2/307/1/html/2/http://www.snake.net/apache_pb.gif">

A browser retrieves the image by sending the URL named in the src attribute to the Web server. The server in turn satisfies the request by opening the file and sending it to the browser, preceded by some header information that allows the browser to make sense of the data. Typical headers are Content-Type: to specify the MIME type for the image format gif (image/gif, image/jpeg, and so forth), and Content-Length: to let the browser know how many data bytes to expect.

However, images can be served from sources other than files. If we use a script to duplicate the kind of output the Web server sends when it transfers a static image file, the browser won t care. The script can do this easily by reading a record from the image table and using it to generate a request response. The mime_type column value indicates what kind of Content-Type: header to send and the length of the image data provides a value for the Content-Length: header. We write the headers followed by the image data, and we re done. (Now you see why we store the MIME type in the image table.)

Naturally, we don t want to write a different script for each image, so we ll have serve_image.pl accept a name parameter at the end of the URL specifying the name of the image to display. Additionally, if the URL also includes a thumbnail parameter, we ll serve the thumbnail image rather than the full-size version. And as a final touch, let s give the script the capability to present a gallery page if we invoke it with a gallery parameter rather than an image name. In this case, the script will look up all the image names and descriptions and write an HTML page that includes an <img> tag for the thumbnail version of each one. The thumbnails will be clickable so that you can select any of them to see the corresponding full-size image. (In other words, serve_image.pl will write an HTML page that causes itself to be invoked in its image-serving capacity.) The URLs for invoking serve_image.pl in these various ways look like this:

 http://www.snake.net/cgi-perl/serve_image.pl?name=image_name  http://www.snake.net/cgi-perl/serve_image.pl?name=image_name;thumbnail=1  http://www.snake.net/cgi-perl/serve_image.pl?gallery=1

The dispatch logic for serve_image.pl extracts the URL parameters and determines what to do as follows:

 if (defined (param ("name")))  {     display_image (param ("name"), param ("thumbnail"));  }  elsif (defined (param ("gallery")))  {     display_gallery ()  }  else  {     error ("Unknown request type");  }

The image-serving code really is pretty trivial. It checks whether to use the full-size image or the thumbnail, and then looks up the appropriate record from the image table, determines the image length from the image data, and writes the headers followed by the data:

 sub display_image  { my ($name, $show_thumbnail) = @_;  my $col_name = (defined ($show_thumbnail) ? "thumbnail" : "image");  my ($dbh, $mime_type, $data);      $dbh = WebDB::connect ();      ($mime_type, $data) = $dbh->selectrow_array (                     "SELECT mime_type, $col_name FROM image WHERE name = ?",                      undef, $name);      $dbh->disconnect ();      # did we find a record?      error ("Cannot find image named $name") unless defined ($mime_type);      print header (-type => $mime_type, -Content_Length => length ($data)),              $data;  }

By default, the header() function writes a Content-Type: header with a value of text/html if you don t specify any type parameter. We need to override that with the MIME type of the image, otherwise the browser may misinterpret the output and try to display the image data as text. (You can see whether your browser makes a mess of images by removing the type parameter from the header() call and then requesting an image from your browser.)

If the gallery parameter is present in the URL, serve_image.pl generates an HTML page that displays the thumbnails for all the images in the image table:

 sub display_gallery  { my ($dbh, $sth);       print header (), start_html ("Image Gallery");      $dbh = WebDB::connect ();      $sth = $dbh->prepare ("SELECT name FROM image ORDER BY name");      $sth->execute ();      # we're fetching a single value (name), so we can call fetchrow_array()      # in a scalar context to get the value      while (my $name = $sth->fetchrow_array ())      {         # encode the name with escape() for the URL, with escapeHTML() otherwise          my $url = url () . sprintf ("?name=%s", escape ($name));          $name = escapeHTML ($name);          print p ($name),              a ({-href => $url},     # link for full size image                  # embed thumbnail as the link content to make it clickable                  img ({-src => "$url;thumbnail=1", -alt => $name})              ),              "\n";      }      $sth->finish ();      $dbh->disconnect ();      print end_html ();  }

For each image, display_gallery() displays the image name and an <img> tag for the thumbnail. The <img> tag is embedded within a hyperlink that takes the user to the full-size image; you can click any thumbnail to see the larger version.

The error() utility routine handles any problems by presenting a short error page. It differs from the version used in upload_image.pl slightly because it generates a complete HTML page:

 sub error  { my $msg = shift;      print header (),              start_html ("Error"),              p (escapeHTML ($msg)),              end_html ();      exit (0);  }

Suggested Modifications

upload_image.pl doesn t check whether the uploaded file really is an image. Is that a problem? If so, can you fix it?

It s possible to load into the image table images that some browsers may be unable to display. For example, older browsers likely won t know what to do with images in PNG (Portable Network Graphics) format. Modify the display_gallery() function of serve_image.pl to exclude images except those in formats the browser understands.

If you load lots of images into the image table, you d probably want to modify the gallery display code in serve_image.pl to split up the gallery into a multiple-page display. Techniques for multiple-page presentations are described in Chapter 7, Performing Searches.

serve_image.pl assumes that it is supposed to read images from the image table. If you use it to serve images on behalf of many different applications, you may find it limiting to share the image table among them all. Modify serve_image.pl to accept a table parameter on the URL so that applications can specify which table to use. To preserve compatibility with its original behavior, have the default table be image if no table parameter is present.

Modify upload_image.pl and load_image.pl to store image files in the file system and reference them from the database by storing the pathname in the image table. When you do this, can you toss serve_image.pl in the trashcan?

only for RuBoard - do not distribute or recompile