ProblemYou have a data file that's not in LOAD DATA's default format. SolutionUse FIELDS and LINES clauses to tell LOAD DATA how to interpret the file. DiscussionBy default, LOAD DATA assumes that datafiles contain lines that are terminated by linefeed (newline) characters and that data values within a line are separated by tab characters. The following statement does not specify anything about the format of the datafile, so MySQL assumes the default format: mysql> LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl; Two LOAD DATA clauses provide explicit information about the datafile format. A FIELDS clause describes the characteristics of fields within a line, and a LINES clause specifies the line-ending sequence. The following LOAD DATA statement indicates that the input file contains data values separated by colons and lines terminated by carriage returns: mysql> LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl -> FIELDS TERMINATED BY ':' -> LINES TERMINATED BY '\r'; Each clause follows the table name. If both are present, the FIELDS clause must precede the LINES clause. The line and field termination indicators can contain multiple characters. For example, \r\n indicates that lines are terminated by carriage return/linefeed pairs. The LINES clause also has a STARTING BY subclause. It specifies the sequence to be stripped from each input record. Like TERMINATED BY, the sequence can have multiple characters. If TERMINATED BY and STARTING BY both are present in the LINES clause, they can appear in any order. Note that for STARTING BY, everything up to the given sequence is stripped from each line. If you specify STARTING BY 'X' and an input line begins with abcX, all four leading characters are stripped. If you use mysqlimport, command options provide the format specifiers. mysqlimport commands that correspond to the preceding two LOAD DATA statements look like this: % mysqlimport --local cookbook mytbl.txt % mysqlimport --local --fields-terminated-by=":" --lines-terminated-by="\r" \ cookbook mytbl.txt The order in which you specify the options doesn't matter for mysqlimport. You can use hex notation to specify arbitrary format characters for FIELDS and LINES clauses. This can be useful for loading datafiles that use binary format codes. Suppose that a datafile has lines with Ctrl-A between fields and Ctrl-B at the end of lines. The ASCII values for Ctrl-A and Ctrl-B are 1 and 2, so you represent them as 0x01 and 0x02: FIELDS TERMINATED BY 0x01 LINES TERMINATED BY 0x02 mysqlimport also understands hex constants for format specifiers. You may find this capability helpful if you don't like remembering how to type escape sequences on the command line or when it's necessary to use quotes around them. Tab is 0x09, linefeed is 0x0a, and carriage return is 0x0d. Here's an example that indicates that the datafile contains tab-delimited lines terminated by CRLF pairs: % mysqlimport --local --lines-terminated-by=0x0d0a \ --fields-terminated-by=0x09 cookbook mytbl.txt When you import datafiles, don't assume that LOAD DATA (or mysqlimport) knows more than it does. It's important always to keep in mind that LOAD DATA knows nothing at all about the format of your datafile. And always make sure that you do know what its format is. If the file has been transferred from one machine to another, its contents may have been changed in subtle ways of which you're not aware. Some LOAD DATA frustrations occur because people expect MySQL to know things that it cannot possibly know. LOAD DATA makes certain assumptions about the structure of input files, represented as the default settings for the line and field terminators, and for the quote and escape character settings. If your input doesn't match those assumptions, you need to tell MySQL about it. When in doubt, check the contents of your datafile using a hex dump program or other utility that displays a visible representation of whitespace characters like tab, carriage return, and linefeed. Under Unix, programs such as od or hexdump can display file contents in a variety of formats. If you don't have these or some comparable utility, the transfer directory of the recipes distribution contains hex dumpers written in Perl, Ruby, and Python (hexdump.pl, hexdump.rb, and hexdump.py), as well as programs that display printable representations of all characters of a file (see.pl, see.rb, and see.py). You may find them useful for examining files to see what they really contain. In some cases, you may be surprised to discover that a file's contents are different from what you think. This is, in fact, quite likely if the file has been transferred from one machine to another:
|