Writing Your Own Export Programs | Importing and Exporting Data

10.18.1 Problem

MySQL's built-in export capabilities don't suffice.

10.18.2 Solution

Write your own utilities.

10.18.3 Discussion

When existing software doesn't do what you want, you can write your own programs to export data. This section shows how to write a Perl script, mysql_to_text.pl, that executes an arbitrary query and exports it in the format you specify. It writes output to the client host and can include a row of column labels (neither of which SELECT ... INTO OUTFILE can do). It produces multiple output formats more easily than by using mysql with a postprocessor, and it writes to the client host, unlike mysqldump, which can write only SQL-format output to the client.

mysql_to_text.pl is based on the Text::CSV_XS module, which you'll need to obtain if it's not installed on your system. Once it's installed, you can read the documentation like so:

% perldoc Text::CSV_XS

This module is convenient because all you have to do is provide an array of column values, and it will package them up into a properly formatted output line. This makes it relatively trivial to convert query output to CSV format. But the real benefit of using the Text::CSV_XS module is that it's configurable; you can tell it what kind of delimiter and quote characters to use. This means that although the module produces CSV format by default, you can configure it to write a variety of output formats. For example, if you set the delimiter to tab and the quote character to undef, Text::CSV_XS generates tab-delimited output. We'll take advantage of that flexibility in this section for writing mysql_to_text.pl, and later in Recipe 10.19 to write a file-processing utility that converts files from one format to another.

mysql_to_text.pl accepts several command-line options. Some of these are for specifying MySQL connection parameters (such as --user, --password, and --host). You're already familiar with these, because they're used by the standard MySQL clients like mysql. The script also can obtain connection parameters from an option file, if you specify a [client] group in the file. The other options that mysql_to_text.pl accepts are as follows:

--execute= query, -e query

Execute query and export its output.

--table= tbl_name, -t tbl_name

Export the contents of the named table. This is equivalent to using --execute to specify a query value of SELECT * FROM tbl_name.

--labels

Write an initial row of column labels.

--delim= str

Set the column delimiter sequence to str. The option value may consist of one or more characters. The default is to use tabs.

--quote= c

Set the column value quote character to c. The default is to not quote anything.

--eol= str

Set the end-of-line sequence to str. The option value may consist of one or more characters. The default is to use linefeeds.

The defaults for the --delim, --quote, and --eol options correspond to those used by LOAD DATA and SELECT ... INTO OUTFILE.

The final argument on the command line should be the database name, unless it's implicit in the query. For example, these two commands are equivalent; each exports the passwd table in colon-delimited format into a file named tmp:

% mysql_to_text.pl --delim=":" --table=passwd cookbook > tmp
% mysql_to_text.pl --delim=":" --table=cookbook.passwd > tmp

To generate CSV output with CRLF line terminators instead, use a command like this:

% mysql_to_text.pl --delim="," --quote=""" --eol="
" 
 --table=cookbook.passwd > tmp

That's a general description of how you use mysql_to_text.pl. Now let's discuss how it works. The initial part of the mysql_to_text.pl script declares a few variables, then processes the command-line arguments, using option-processing techniques developed in Recipe 2.11. (As it happens, most of the code in the script actually is devoted to processing the command-line arguments and getting set up to run the query!)

#! /usr/bin/perl -w
# mysql_to_text.pl - export MySQL query output in user-specified text format

# Usage: mysql_to_text.pl [ options ] [db_name] > text_file

use strict;
use DBI;
use Text::CSV_XS;
use Getopt::Long;
$Getopt::Long::ignorecase = 0; # options are case sensitive
$Getopt::Long::bundling = 1; # allow short options to be bundled

my $prog = "mysql_to_text.pl";

# ... construct usage message variable $usage (not shown) ...

# Variables for command line options - all undefined initially
# except for output structure, which is set to be tab-delimited,
# linefeed-terminated.
my $help;
my ($host_name, $password, $port_num, $socket_name, $user_name, $db_name);
my ($query, $tbl_name);
my $labels;
my $delim = "	";
my $quote;
my $eol = "
";

GetOptions (
 # =i means an integer argument is required after the option
 # =s means a string argument is required after the option
 # :s means a string argument is optional after the option
 "help" => $help, # print help message
 "host|h=s" => $host_name, # server host
 "password|p:s" => $password, # password
 "port|P=i" => $port_num, # port number
 "socket|S=s" => $socket_name, # socket name
 "user|u=s" => $user_name, # username
 "execute|e=s" => $query, # query to execute
 "table|t=s" => $tbl_name, # table to export
 "labels|l" => $labels, # generate row of column labels
 "delim=s" => $delim, # column delimiter
 "quote=s" => $quote, # column quoting character
 "eol=s" => $eol # end-of-line (record) delimiter
) or die "$usage
";

die "$usage
" if defined $help;

$db_name = shift (@ARGV) if @ARGV;

# One of --execute or --table must be specified, but not both
die "You must specify a query or a table name

$usage
"
 if !defined ($query) && !defined ($tbl_name);
die "You cannot specify both a query and a table name

$usage
"
 if defined ($query) && defined ($tbl_name);

# If table name was given, convert it to a query that selects entire table
$query = "SELECT * FROM $tbl_name" if defined ($tbl_name);

# convert defined/undefined state into true/false
$labels = defined ($labels);

# interpret special chars in the file structure options
$quote = interpret_option ($quote);
$delim = interpret_option ($delim);
$eol = interpret_option ($eol);

The interpret_option( ) function processes escape and hex sequences for the --delim, --quote, and --eol options. , , , and are interpreted as linefeed, carriage return, tab, and the ASCII NUL character. Hex values may be given in the form 0xnn (for example, 0x0d indicates a carriage return). The function is not shown here; you can examine the script source to see how it works.

After processing the command-line options, the script constructs the data source name (DSN) and connects to the server:

my $dsn = "DBI:mysql:";
$dsn .= ";database=$db_name" if $db_name;
$dsn .= ";host=$host_name" if $host_name;
$dsn .= ";port=$port_num" if $port_num;
$dsn .= ";mysql_socket=$socket_name" if $socket_name;
# read [client] group parameters from standard option files
$dsn .= ";mysql_read_default_group=client";

my $dbh = DBI->connect ($dsn, $user_name, $password,
 {PrintError => 0, RaiseError => 1});

The database name comes from the command line. Connection parameters can come from the command line or an option file. Use of MySQL option files is covered in Recipe 2.11.

After establishing a connection to MySQL, the script is ready to execute the query and produce some output. This is where the Text::CSV_XS module comes into play. First, we create a CSV object by calling new( ), which takes an optional hash of options that control how the object handles data lines. Then the script prepares and executes the query, prints a row of column labels (if the user specified the --labels option), and writes the rows of the result set:

my $csv = Text::CSV_XS->new ({
 sep_char => $delim,
 quote_char => $quote,
 escape_char => $quote,
 eol => $eol,
 binary => 1
 });

my $sth = $dbh->prepare ($query);
$sth->execute ( );
if ($labels) # write row of column labels
{
 $csv->combine (@{$sth->{NAME}}) or die "cannot process column labels
";
 print $csv->string ( );
}

my $count = 0;
while (my @val = $sth->fetchrow_array ( ))
{
 ++$count;
 $csv->combine (@val) or die "cannot process column values, row $count
";
 print $csv->string ( );
}

The sep_char and quote_char options in the name( ) call set the column delimiter sequence and quoting character. The escape_char option is set to the same value as quote_char so that instances of the quote character occurring within data values will be doubled in the output. The eol option indicates the line-termination sequence. Normally, Text::CSV_XS leaves it to you to print the terminator for output lines. By passing a non-undef eol value to new( ), the module adds that value to every output line automatically. The binary option is useful for processing data values that contain binary characters.

The column labels are available in $sth->{NAME} after invoking execute( ). Each line of output is produced using combine( ) and string( ). The combine( ) method takes an array of values and converts them to a properly formatted string. string( ) returns the string so we can print it.

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References