Recipe 18.6. Storing Images or Other Binary Data

Problem

You want to store images in MySQL.

Solution

That's not difficult, provided that you follow the proper precautions for encoding the image data.

Discussion

Web sites are not limited to displaying text. They can also serve various forms of binary data such as images, music files, PDF documents, and so forth. Images are a common kind of binary data, and because image storage is a natural application for a database, a very common question is "How do I store images in MySQL?" Many people will answer this question by saying, "Don't do it!" and some of the reasons are discussed in the sidebar "Should You Store Images in Your Database?" Because it's important to know how to work with binary data, this section does show how to store images in MySQL. Nevertheless, in recognition that that may not always be the best thing to do, the section also shows how to store images in the filesystem.

Although the discussion here is phrased in terms of working with images, the principles apply to any kind of binary data, such as PDF files or compressed text. In fact, they apply to any kind of data at all, including text. People tend to think of images as special somehow, but they're not.

One reason that image storage confuses people more often than does storing other types of information like text strings or numbers is that it's difficult to type in an image value manually. For example, you can easily use mysql to enter an INSERT statement to store a number like 3.48 or a string like Je voudrais une bicyclette rouge, but images contain binary data and it's not easy to refer to them by value. So you need to do something else. Your options are:

Use the LOAD_FILE⁠(⁠ ⁠ ⁠) function.
Write a program that reads in the image file and constructs the proper INSERT statement for you.

Should You Store Images in Your Database?

Deciding where to store images is a matter of trade-offs. There are advantages and disadvantages regardless of whether you store images in the database or in the filesystem:

Storing images in a database table bloats the table. With a lot of images, you're more likely to approach any limits your operating system places on table size. On the other hand, if you store images in the filesystem, directory lookups may become slow. To avoid this, you may be able to implement some kind of hierarchical storage or use a filesystem that has good lookup performance for large directories (such as the Reiser filesystem that is available on Linux).
Using a database centralizes storage for images that are used across multiple web servers on different hosts. Images stored in the filesystem must be stored locally on the web server host. In a multiple-host situation, that means you must replicate the set of images to the filesystem of each host. If you store the images in MySQL, only one copy of the images is required because each web server can get the images from the same database server.
When images are stored in the filesystem, they constitute in essence a foreign key. Image manipulation requires two operations: one in the database and one in the filesystem. This in turn means that if you require transactional behavior, it's more difficult to implementnot only do you have two operations, but they take place in different domains. Storing images in the database is simpler because adding, updating, or removing an image requires only a single row operation. It becomes unnecessary to make sure the image table and the filesystem remain in synchrony.
It can be faster to serve images over the Web from the filesystem than from the database, because the web server itself opens the file, reads it, and writes it to the client. Images stored in the database must be read and written twice. First, the MySQL server reads the image from the database and writes it to your web script. Then the script reads the image and writes it to the client.
Images stored in the filesystem can be referred to directly in web pages by means of <img> tag links that point to the image files. Images stored in MySQL must be served by a script that retrieves an image and sends it to the client. However, even if images are stored in the filesystem and accessible to the web server, you might still want to serve them through a script. This would be appropriate if you need to account for the number of times you serve each image (such as for banner ad displays where you charge customers by the number of ad impressions) or if you want to select an image at request time (such as when you pick an ad at random).
If you store images in the database, you need to use a data type such as a BLOB. This is a variable length type, so the table itself will have variable-length rows. For the MyISAM storage engine, operations on fixed-length rows are often quicker, so you may gain some table lookup speed by storing images in the filesystem and using fixed-length types for the columns in the image table.

Storing images with LOAD_FILE⁠(⁠ ⁠ ⁠)

The LOAD_FILE⁠(⁠ ⁠ ⁠) function takes an argument indicating a file to be read and stored in the database. For example, an image stored in /tmp/myimage.png might be loaded into a table like this:

INSERT INTO mytbl (image_data) VALUES(LOAD_FILE('/tmp/myimage.png'));

To load images into MySQL with LOAD_FILE⁠(⁠ ⁠ ⁠), certain requirements must be satisfied:

The image file must be located on the MySQL server host.
The file must be readable by the server.
You must have the FILE privilege.

These constraints mean that LOAD_FILE⁠(⁠ ⁠ ⁠) is available only to some MySQL users.

Storing images using a script

If LOAD_FILE⁠(⁠ ⁠ ⁠) is not an option, or you don't want to use it, you can write a short program to load your images. The program should either read the contents of an image file and create a row that contains the image data, or create a row that indicates where in the filesystem the image file is located. If you elect to store the image in MySQL, include the image data in the row-creation statement the same way as any other kind of data. That is, you either use a placeholder and bind the data value to it, or else encode the data and put it directly into the statement string.

The script shown in this recipe, store_image.pl, runs from the command line and stores an image file for later use. The script takes no side in the debate over whether to store images in the database or the filesystem. Instead, it demonstrates how to implement both approaches! Of course, this requires twice the storage space. To adapt this script for your own use, you'll want to retain only the parts that are appropriate for whichever storage method you want to implement. The necessary modifications are discussed at the end of this section.

The store_image.pl script uses an image table that includes columns for the image ID, name, and MIME type, and a column in which to store the image data:

CREATE TABLE image (   id    INT UNSIGNED NOT NULL AUTO_INCREMENT, # image ID number   name  VARCHAR(30) NOT NULL,                 # image name   type  VARCHAR(20) NOT NULL,                 # image MIME type   data  MEDIUMBLOB NOT NULL,                  # image data   PRIMARY KEY (id),                           # id and name are unique   UNIQUE (name) );

The name column indicates the name of the image file in the directory where images are stored in the filesystem. The data column is a MEDIUMBLOB, which is good for images smaller than 16 MB. If you need larger images, use a LONGBLOB column.

It is possible to use the name column to store full pathnames to images in the database, but if you put them all under the same directory, you can store names that are relative to that directory, and name values will take less space. That's what store_image.pl does. It needs to know the pathname of the image storage directory, which is what its $image_dir variable is for. You should check this variable's value and modify it as necessary before running the script. The default value reflects where I like to store images, but you'll need to change it according to your own preferences. Make sure to create the directory if it doesn't exist before you run the script, and set its access permissions so that the web server can read and write files there. You'll also need to check and possibly change the image directory pathname in the display_image.pl script discussed later in this chapter.

NOTE

The image storage directory should be outside the web server document tree. Otherwise, a user who knows or can guess the location may be able to upload executable code and cause it to run by requesting it with a web browser.

store_image.pl looks like this:

#!/usr/bin/perl # store_image.pl - read an image file, store in the image table and # in the filesystem.  (Normally, you'd store images only in one # place or another; this script demonstrates how to do both.) use strict; use warnings; use Fcntl;    # for O_RDONLY, O_WRONLY, O_CREAT use FileHandle; use Cookbook; # Default image storage directory and pathname separator # (CHANGE THESE AS NECESSARY) # The location should NOT be within the web server document tree my $image_dir = "/usr/local/lib/mcb/images"; my $path_sep = "/"; # Reset directory and pathname separator for Windows/DOS if ($^O =~ /^MSWin/i || $^O =~ /^dos/) {   $image_dir = "C:\\mcb\\images";   $path_sep = "\\"; } -d $image_dir or die "$0: image directory ($image_dir)\ndoes not exist\n"; # Print help message if script was not invoked properly (@ARGV == 2 || @ARGV == 3) or die <<USAGE_MESSAGE; Usage: $0 image_file mime_type [image_name] image_file = name of the image file to store mime_time = the image MIME type (e.g., image/jpeg or image/png) image_name = alternate name to give the image image_name is optional; if not specified, the default is the image file basename. USAGE_MESSAGE my $file_name = shift (@ARGV);  # image filename my $mime_type = shift (@ARGV);  # image MIME type my $image_name = shift (@ARGV); # image name (optional) # if image name was not specified, use filename basename # (allow either / or \ as separator) ($image_name = $file_name) =~ s|.*[/\\]|| unless defined $image_name; my $fh = new FileHandle; my ($size, $data); sysopen ($fh, $file_name, O_RDONLY)   or die "Cannot read $file_name: $!\n"; binmode ($fh);    # helpful for binary data $size = (stat ($fh))[7]; sysread ($fh, $data, $size) == $size   or die "Failed to read entire file $file_name: $!\n"; $fh->close (); # Save image file in filesystem under $image_dir.  (Overwrite file # if an old version exists.) my $image_path = $image_dir . $path_sep . $image_name; sysopen ($fh, $image_path, O_WRONLY|O_CREAT)   or die "Cannot open $image_path: $!\n"; binmode ($fh);    # helpful for binary data syswrite ($fh, $data, $size) == $size   or die "Failed to write entire image file $image_path: $!\n"; $fh->close (); # Save image in database table.  (Use REPLACE to kick out any old image # that has the same name.) my $dbh = Cookbook::connect (); $dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)",           undef,           $image_name, $mime_type, $data); $dbh->disconnect ();

If you invoke the script with no arguments, it displays a short help message. Otherwise, it requires two arguments that specify the name of the image file and the MIME type of the image. By default, the file's basename (final component) is also used as the name of the image stored in the database and in the image directory. To use a different name, provide it using an optional third argument.

The script is fairly straightforward. It performs the following actions:

Check that the proper number of arguments was given and initialize some variables from them.
Make sure the image directory exists. If it does not, the script cannot continue.
Open and read the contents of the image file.
Store the image as a file in the image directory.
Store a row containing identifying information and the image data in the image table.

store_image.pl uses REPLACE rather than INSERT so that you can replace an old image with a new version having the same name simply by loading the new one. The statement specifies no id column value; id is an AUTO_INCREMENT column, so MySQL assigns it a unique sequence number automatically. Note that if you replace an image by loading a new one with the same name as an existing image, the REPLACE statement will generate a new id value. If you want to keep the old value, you should use INSERT ... ON DUPLICATE KEY UPDATE instead (Section 11.14). This will insert the row if the name doesn't already exist, or update the image value if it does.

The REPLACE statement that stores the image information into MySQL is relatively mundane:

$dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)",           undef,           $image_name, $mime_type, $data);

If you examine that statement looking for some special technique for handling binary data, you'll be disappointed, because the $data variable that contains the image isn't treated as special in any way. The statement refers to all column values uniformly using ? placeholder characters and the values are passed at the end of the do⁠(⁠ ⁠ ⁠) call. Another way to accomplish the same result is to perform escape processing on the column values explicitly and then insert them directly into the statement string:

$image_name = $dbh->quote ($image_name); $mime_type = $dbh->quote ($mime_type); $data = $dbh->quote ($data); $dbh->do ("REPLACE INTO image (name,type,data)            VALUES($image_name,$mime_type,$data)");

Many people think image-handling is a lot more troublesome than it really is. If you properly handle image data in a statement by using placeholders or by encoding it, you'll have no problems. If you don't, you'll get errors. It's as simple as that. This is no different from how you should handle other kinds of data, even text. After all, if you insert into a statement a piece of text that contains quotes or other special characters without escaping them, the statement will blow up in your face. So the need for placeholders or encoding is not some special thing that's necessary only for imagesit's necessary for all data. Say it with me: "I will always use placeholders or encode my column values. Always. Always, always, always." (Having said that, I feel obliged to point out that if you know enough about a given valuefor example, if you're absolutely certain that it's an integerthere are times you can get away with breaking this rule. Nevertheless, it's never wrong to follow the rule.)

To try the script, change location into the apache/images directory of the recipes distribution. That directory contains the store_image.pl script, and some sample images are in its flags subdirectory (they're pictures of national flags for several countries). To store one of these images, run the script like this under Unix:

% ./store_image.pl flags/iceland.jpg image/jpeg

Or like this under Windows:

C:\> store_image.pl flags\iceland.jpg image/jpeg

store_image.pl takes care of image storage, and the next section discusses how to retrieve images to serve them over the Web. What about deleting images? I'll leave it to you to write a utility to remove images that you no longer want. If you are storing images in the filesystem, remember to delete both the database row and the image file that the row points to.

store_image.pl stores each image both in the database and in the filesystem for illustrative purposes, but of course that makes it inefficient. Earlier, I mentioned that if you use this script as a basis for your own applications, you should modify it to store images only in one placeeither in the database or in the filesystemnot in both places. The modifications are as follows:

To adapt the script to store images only in MySQL, there is no need to create an image directory, and you can delete the code that checks for that directory's existence and that writes image files there.
To adapt the script for storage only in the filesystem, drop the data column from the image table, and modify the REPLACE statement so it doesn't refer to that column.

These modifications also apply to the display_image.pl image processing script shown in Section 18.7.