Recipe 10.3. Specifying the Structure of the Datafile


Problem

You have a data file that's not in LOAD DATA's default format.

Solution

Use FIELDS and LINES clauses to tell LOAD DATA how to interpret the file.

Discussion

By default, LOAD DATA assumes that datafiles contain lines that are terminated by linefeed (newline) characters and that data values within a line are separated by tab characters. The following statement does not specify anything about the format of the datafile, so MySQL assumes the default format:

mysql> LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl;             

Two LOAD DATA clauses provide explicit information about the datafile format. A FIELDS clause describes the characteristics of fields within a line, and a LINES clause specifies the line-ending sequence. The following LOAD DATA statement indicates that the input file contains data values separated by colons and lines terminated by carriage returns:

mysql> LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl     -> FIELDS TERMINATED BY ':'     -> LINES TERMINATED BY '\r';             

Each clause follows the table name. If both are present, the FIELDS clause must precede the LINES clause. The line and field termination indicators can contain multiple characters. For example, \r\n indicates that lines are terminated by carriage return/linefeed pairs.

The LINES clause also has a STARTING BY subclause. It specifies the sequence to be stripped from each input record. Like TERMINATED BY, the sequence can have multiple characters. If TERMINATED BY and STARTING BY both are present in the LINES clause, they can appear in any order.

Note that for STARTING BY, everything up to the given sequence is stripped from each line. If you specify STARTING BY 'X' and an input line begins with abcX, all four leading characters are stripped.

If you use mysqlimport, command options provide the format specifiers. mysqlimport commands that correspond to the preceding two LOAD DATA statements look like this:

% mysqlimport --local cookbook mytbl.txt % mysqlimport --local --fields-terminated-by=":" --lines-terminated-by="\r" \                cookbook mytbl.txt             

The order in which you specify the options doesn't matter for mysqlimport.

You can use hex notation to specify arbitrary format characters for FIELDS and LINES clauses. This can be useful for loading datafiles that use binary format codes. Suppose that a datafile has lines with Ctrl-A between fields and Ctrl-B at the end of lines. The ASCII values for Ctrl-A and Ctrl-B are 1 and 2, so you represent them as 0x01 and 0x02:

FIELDS TERMINATED BY 0x01 LINES TERMINATED BY 0x02 

mysqlimport also understands hex constants for format specifiers. You may find this capability helpful if you don't like remembering how to type escape sequences on the command line or when it's necessary to use quotes around them. Tab is 0x09, linefeed is 0x0a, and carriage return is 0x0d. Here's an example that indicates that the datafile contains tab-delimited lines terminated by CRLF pairs:

% mysqlimport --local --lines-terminated-by=0x0d0a \                --fields-terminated-by=0x09 cookbook mytbl.txt             

When you import datafiles, don't assume that LOAD DATA (or mysqlimport) knows more than it does. It's important always to keep in mind that LOAD DATA knows nothing at all about the format of your datafile. And always make sure that you do know what its format is. If the file has been transferred from one machine to another, its contents may have been changed in subtle ways of which you're not aware.

Some LOAD DATA frustrations occur because people expect MySQL to know things that it cannot possibly know. LOAD DATA makes certain assumptions about the structure of input files, represented as the default settings for the line and field terminators, and for the quote and escape character settings. If your input doesn't match those assumptions, you need to tell MySQL about it.

When in doubt, check the contents of your datafile using a hex dump program or other utility that displays a visible representation of whitespace characters like tab, carriage return, and linefeed. Under Unix, programs such as od or hexdump can display file contents in a variety of formats. If you don't have these or some comparable utility, the transfer directory of the recipes distribution contains hex dumpers written in Perl, Ruby, and Python (hexdump.pl, hexdump.rb, and hexdump.py), as well as programs that display printable representations of all characters of a file (see.pl, see.rb, and see.py). You may find them useful for examining files to see what they really contain. In some cases, you may be surprised to discover that a file's contents are different from what you think. This is, in fact, quite likely if the file has been transferred from one machine to another:

  • An FTP transfer between machines running different operating systems typically translates line endings to those that are appropriate for the destination machine if the transfer is performed in text mode rather than in binary (image) mode. Suppose that you have tab-delimited linefeed-terminated records in a datafile that load into MySQL on a Unix system just fine using the default LOAD DATA settings. If you copy the file to a Windows machine with FTP using a text transfer mode, the linefeeds might be converted to carriage return/linefeed pairs. On that machine, the file will not load properly with the same LOAD DATA statement if its contents have been changed. Does MySQL have any way of knowing that? No. So it's up to you to tell it, by adding a LINES TERMINATED BY '\r\n' clause to the statement. Transfers between any two systems with dissimilar default line endings can cause these changes.

  • Data files pasted into email messages often do not survive intact. Mail software may wrap (break) long lines or convert line-ending sequences. If you must transfer a datafile by email, it's best sent as an attachment.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net