15.2. Importing and Exporting Using SQL


This section discusses how to perform data import and export operations using SQL statements. LOAD DATA INFILE reads the records from a data file and inserts them into a table. SELECTINTO OUTFILE writes the record in a result set to a file.

The two statements are not quite opposites. LOAD DATA INFILE imports a file into a single table, whereas SELECTINTO OUTFILE can write a result set that may be produced by selecting from multiple tables.

15.2.1. Importing Data with LOAD DATA INFILE

LOAD DATA INFILE provides an alternative to INSERT for adding new records to a table. INSERT specifies data values directly in the text of the statement. LOAD DATA INFILE reads the values from a separate data file.

The simplest form of the LOAD DATA INFILE statement specifies only the name of the data file and the table into which to load the file:

 LOAD DATA INFILE 'file_name' INTO TABLE table_name; 

The filename is given as a quoted string. On Windows, the pathname separator character is '\', but MySQL treats the backslash as the escape character in strings. To deal with this issue, write separators in Windows pathnames either as '/' or as '\\'. To load a file named C:\mydata\data.txt, specify the filename as shown in either of the following statements:

 LOAD DATA INFILE 'C:/mydata/data.txt' INTO TABLE t; LOAD DATA INFILE 'C:\\mydata\\data.txt' INTO TABLE t; 

MySQL assumes, unless told otherwise, that the file is located on the server host, that it has the default file format (columns separated by tab characters and terminated by \n newline characters), and that each input line contains a value for each column in the table. However, LOAD DATA INFILE has clauses that give you control over each of those aspects of data-loading operations and more:

  • Which table to load

  • The name and location of the data file

  • Whether to ignore lines at the beginning of the data file

  • Which columns to load

  • Whether to skip or transform data values before loading them

  • How to handle duplicate records

  • The format of the data file

The syntax for LOAD DATA INFILE is as follows, where optional parts of the statement are indicated by square brackets:

 LOAD DATA [LOCAL] INFILE 'file_name'     [IGNORE | REPLACE]     INTO TABLE table_name     format_specifiers     [IGNORE n LINES]     [(column_list)]     [SET (assignment_list)] 

15.2.1.1 Specifying the Data File Location

LOAD DATA INFILE can read data files that are located on the server host or on the client host:

  • By default, MySQL assumes that the file is located on the server host. The MySQL server reads the file directly.

  • If the statement begins with LOAD DATA LOCAL INFILE rather than with LOAD DATA INFILE, the file is read from the client host on which the statement is issued. In other words, LOCAL means local to the client host from which the statement is issued. In this case, the client program reads the data file and sends its contents over the network to the server.

The rules for interpreting the filename are somewhat different for the server host and the client host. Without LOCAL in the LOAD DATA INFILE statement, MySQL looks for the data file located on the server host and interprets the pathname as follows:

  • If you refer to the file by its full pathname, the server looks for the file in that exact location.

  • If you specify a relative name with a single component, the server looks for the file in the database directory for the default database. (This isn't necessarily the database that contains the table into which you're loading the file.)

  • If you specify a relative pathname with more than one component, the server interprets the name relative to its data directory.

Suppose that the server's data directory is /var/mysql/data, the database directory for the test database is /var/mysql/data/test, and the file data.txt is located in that database directory. Using the filename interpretation rules just given, it's possible to refer to the data.txt file three different ways in a LOAD DATA INFILE statement:

  • You can refer to the file by its full pathname:

     LOAD DATA INFILE '/var/mysql/data/test/data.txt' INTO TABLE t; 

  • If test is the default database, you can refer to a file in the database directory using just the final component of its pathname:

     LOAD DATA INFILE 'data.txt' INTO TABLE t; 

  • You can refer to any file in or under the server's data directory by its pathname relative to that directory:

     LOAD DATA INFILE './test/data.txt' INTO TABLE t; 

If you use LOCAL to read a data file located locally on the client host, pathname interpretation is simpler:

  • If you refer to the file by its full pathname, the client program looks for the file in that exact location.

  • If you specify a relative pathname, the client program looks for the file relative to its current directory. Normally, this is the directory in which you invoked the program.

Suppose that a data file named data.txt is located in the /var/tmp directory on the client host and you invoke the mysql program while located in that directory. You can load the file into a table t using either of these two statements:

 LOAD DATA LOCAL INFILE '/var/tmp/data.txt' INTO TABLE t; LOAD DATA LOCAL INFILE 'data.txt' INTO TABLE t; 

The first statement names the file using its full pathname. The second names the file relative to the current directory. If you invoke the mysql program in the /var directory instead, you can still load the file using the same full pathname. However, the relative pathname to the file is different than when running the program in the /var/tmp directory:

 LOAD DATA LOCAL INFILE 'tmp/data.txt' INTO TABLE t; 

15.2.1.2 Skipping Data File Lines

To ignore the initial part of the data file, use the IGNORE n LINES clause, where n is an integer that indicates the number of input lines to skip. This clause commonly is used when a file begins with a row of column names rather than data values. For example, to skip the first input line, a statement might be written like this:

 LOAD DATA INFILE '/tmp/data.txt' INTO TABLE t IGNORE 1 LINES; 

15.2.1.3 Loading Specific Table Columns

By default, LOAD DATA INFILE assumes that data values in input lines are present in the same order as the columns in the table. If the data file contains more columns than the table, MySQL ignores the excess data values. If the data file contains too few columns, each missing column is set to its default value in the table. (This is the same way MySQL handles columns that aren't named in an INSERT statement.)

If input lines don't contain values for every table column, or the data values are not in the same order as table columns, add a comma-separated list of column names within parentheses at the end of the LOAD DATA INFILE statement. This tells MySQL how columns in the table correspond to successive columns in the data file. A list of columns is useful in two ways:

  • If the rows of the data file don't contain a value for every column in the table, a column list indicates which columns are present in the file. Suppose that a table named subscriber has the following structure:

     mysql> DESCRIBE subscriber; +---------+------------------+------+-----+---------+----------------+ | Field   | Type             | Null | Key | Default | Extra          | +---------+------------------+------+-----+---------+----------------+ | id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment | | name    | char(40)         | NO   |     |         |                | | address | char(40)         | NO   |     |         |                | +---------+------------------+------+-----+---------+----------------+ 

    Here, id is an AUTO_INCREMENT column. If you have a file /tmp/people.txt containing names and addresses and want MySQL to generate ID numbers automatically, load the file like this:

     LOAD DATA INFILE '/tmp/people.txt' INTO TABLE subscriber (name,address); 

    For any table column that isn't assigned a value from the data file, MySQL sets it to its default value. MySQL thus sets the id column to the next sequence value for each input line.

  • If the order of the columns in the data file doesn't correspond to the order of the columns in the table, a column list tells MySQL how to match up columns properly. For example, if the lines in people.txt contain addresses and names rather than names and addresses, the statement to load the file looks like this instead:

     LOAD DATA INFILE '/tmp/people.txt' INTO TABLE subscriber (address,name); 

Each item in the column list can be a table column name, as just described, or a user variable. Reasons for specifying user variables in the column list are discussed in Section 15.2.1.4, "Skipping or Transforming Column Values."

15.2.1.4 Skipping or Transforming Column Values

It is possible to skip columns in the data file, or to transform data values read from the file before inserting them into the table. These features are available by specifying user variables in the column list and the optional SET clause.

To assign an input data column to a user variable rather than to a table column, provide the name of a user variable in the column list. If you assign the column to a user variable but do nothing else with the variable, the effect is to ignore the column rather than to insert it into the table. Or, by including a SET clause, you can use expressions that transform the value before inserting it.

Suppose that you have a file named /tmp/people2.txt that was exported from a table similar to the subscriber table, and that it contains four columns for each subscriber: ID number, first name, last name, and address. The file's contents need to be transformed in two ways for loading into the subscriber table. First, the ID values are not compatible with those in the subscriber table and should be ignored. Second, the first name and last name should be concatenated with a space between. These transformations can be achieved as follows:

 LOAD DATA INFILE '/tmp/people2.txt' INTO TABLE subscriber (@skip,@first,@last,address) SET name=CONCAT(@first,' ',@last); 

15.2.1.5 LOAD DATA INFILE and Duplicate Records

When you add new records to a table with an INSERT or REPLACE statement, you can control how to handle new records containing values that duplicate unique key values already present in the table. You can allow an error to occur, ignore the new records, or replace the old records with the new ones. LOAD DATA INFILE affords the same types of control over duplicate records by means of two modifier keywords. However, its duplicate-handling behavior differs slightly depending on whether the data file is on the server host or the client host, so you must take the data file location into account.

When loading a file that's located on the server host, LOAD DATA INFILE handles records that contain duplicate unique keys as follows:

  • By default, an input record that causes a duplicate-key violation results in an error and the rest of the data file isn't loaded. Records processed up to that point are loaded into the table.

  • If you specify the IGNORE keyword following the filename, new records that cause duplicate-key violations are ignored and no error occurs. LOAD DATA INFILE processes the entire file, loads all records not containing duplicate keys, and discards the rest.

  • If you specify the REPLACE keyword after the filename, new records that cause duplicate-key violations replace any records already in the table that contain the duplicated key values. LOAD DATA INFILE processes the entire file and loads all its records into the table.

IGNORE and REPLACE are mutually exclusive. You can specify one or the other, but not both.

For data files located on the client host, duplicate unique key handling is similar, except that the default is to ignore records that contain duplicate keys rather than to terminate with an error. That is, the default is as though the IGNORE modifier is specified. The reason for this is that the client/server protocol doesn't allow transfer of the data file from the client host to the server to be interrupted after it has started, so there's no convenient way to abort the operation in the middle.

15.2.1.6 Information Provided by LOAD DATA INFILE

As LOAD DATA INFILE executes, it keeps track of the number of records processed and the number of data conversions that occur. Then it returns to the client an information string in the following format (the counts in each field will vary per LOAD DATA INFILE operation):

 Records: 174  Deleted: 0  Skipped: 3  Warnings: 14 

The fields have the following meanings:

  • Records indicates the number of input records read from the data file. This is not necessarily the number of records added to the table.

  • Deleted indicates the number of records in the table that were replaced by input records having the same unique key value as a key already present in the table. The value may be non-zero if you use the REPLACE keyword in the statement.

  • Skipped indicates the number of data records that were ignored because they contained a unique key value that duplicated a key already present in the table. The value may be non-zero if you use the IGNORE keyword in the statement.

  • Warnings indicates the number of problems found in the input file. These can occur for several reasons, such as missing data values or data conversion (for example, converting an empty string to 0 for a numeric column). The warning count can be larger than the number of input records because warnings can occur for each data value in a record. To see what caused the warnings, issue a SHOW WARNINGS statement after loading the data file.

15.2.1.7 Privileges Required for LOAD DATA INFILE

LOAD DATA INFILE requires that you have the INSERT privilege for the table into which you want to load data, as well as the DELETE privilege if you specify the REPLACE modifier. If the file is located on the client host, you must have read access for the file, but no additional MySQL privileges are required. However, if the data file is located on the server host, the server itself must have read access for the file. In addition, you must have the FILE privilege. FILE is an administrative privilege, so it's likely that to use LOAD DATA INFILE without LOCAL, you'll need to connect to the server as an administrative user such as root.

15.2.1.8 Efficiency of LOAD DATA INFILE

It is more efficient to load data with LOAD DATA INFILE than by using INSERT statements. For a data file that is located on the server host, the MySQL server reads the file directly, so the data values need not cross the network from the client to the server. But even for a data file located locally on the client host, LOAD DATA INFILE is more efficient than INSERT because there's less overhead for parsing data values and because the rows are loaded in a single operation. Some of the efficiency of loading multiple rows at once can be obtained with multiple-row INSERT syntax, but LOAD DATA INFILE still is more efficient.

15.2.2. Exporting Data with SELECT ... INTO OUTFILE

A SELECT statement normally creates a result set that the server returns to the client. For example, when you issue a SELECT using the mysql client, the server returns the result and mysql writes it in tabular format when run interactively or in tab-delimited format when run in batch mode.

A variation on SELECT syntax adds an INTO OUTFILE clause. This form of SELECT writes the result set directly into a file and thus is the complement of LOAD DATA INFILE. To use SELECT in this way, place the INTO OUTFILE clause before the FROM clause. For example, to write the contents of the Country table into a file named Country.txt, issue this statement:

 SELECT * INTO OUTFILE 'Country.txt' FROM Country; 

The name of the file indicates the location where you want to write it. MySQL interprets the pathname using the same rules that apply to LOAD DATA INFILE for files located on the server host. For example, given the statement just shown, the server writes the file into the database directory of the default database.

Use of INTO OUTFILE changes the operation of the SELECT statement in several ways:

  • The output produced by a SELECTINTO OUTFILE statement never leaves the server host. Instead of sending the result over the network to the client, the server writes it to a file on the server host. To prevent files from being overwritten, either accidentally or maliciously, the server requires that the output file not already exist.

  • The statement causes the server to write a new file on the server host, so you must connect to the server using an account that has the FILE privilege.

  • The file is created with filesystem access permissions that make it owned by the MySQL server but world-readable.

  • The output file contains one line per row selected by the statement. By default, column values are delimited by tab characters and lines are terminated with newlines, but you can control the output format by adding format specifiers after the filename, as described in Section 15.2.3, "Data File Format Specifiers."

The location and manner in which SELECTINTO OUTFILE creates the file has several implications:

  • If you want to access the file directly, you must have a login account on the server host or be otherwise able to access files on that host somehow. For some purposes, this limitation might not be a problem. For example, you don't need to access the file yourself to reload it later with LOAD DATA INFILE because the MySQL server can read it for you.

  • The file is world-readable, so anyone who has filesystem access on the server host can read it. You probably don't want to use SELECTINTO OUTFILE to create files that contain sensitive information, unless perhaps you're the only person with access to the machine.

  • The file is owned by the MySQL server, so you might not be able to remove it after you're done with it. It might be necessary to coordinate with the server administrator to arrange for removal of the file.

Without the OUTFILE keyword, SELECTINTO can be used to fetch a single row of data into variables. These can be user variables, stored routine variables, or local variables. See Section 18.5.4.2, "Assigning Variable Values with SELECT ... INTO."

15.2.3. Data File Format Specifiers

LOAD DATA INFILE and SELECTINTO OUTFILE assume a default data file format in which column values are separated by tab characters and records are terminated by newlines. If a data file to be read by LOAD DATA INFILE has different column separators or line terminators, you must indicate what the format is so that MySQL can read the file contents correctly. Similarly, if you want SELECTINTO OUTFILE to write a file with different separators or terminators, you'll need to indicate the format to use. It's also possible to control data value quoting and escaping behavior.

The format specifiers supported by LOAD DATA INFILE and SELECTINTO OUTFILE don't enable you to characterize individual columns in the data file. For example, you cannot indicate that column 3 is numeric or that column 17 contains dates. Instead, you define the general characteristics that apply to all column values: What characters separate column values in data rows, whether values are quoted, and whether there is an escape character that signifies special character sequences.

For LOAD DATA INFILE, format specifiers are listed after the table name. For SELECTINTO OUTFILE, they follow the output filename. The syntax for format specifiers is the same for both statements and looks like this:

 FIELDS     TERMINATED BY 'string'     ENCLOSED BY 'char'     ESCAPED BY 'char' LINES TERMINATED BY 'string' 

The FIELDS clause defines the formatting of data values within a line. The LINES clause defines the line-ending sequence. In other words, FIELDS indicates the structure of column values within records and LINES indicates where record boundaries occur.

The TERMINATED BY, ENCLOSED BY, and ESCAPED BY parts of the FIELDS clause may be given in any order. You need not specify all three parts. Defaults are used for any that are missing (or if the FIELDS clause itself is missing) :

  • Data values are assumed to be terminated by (that is, separated by) tab characters. To indicate a different value, include a TERMINATED BY option.

  • Data values are assumed to be unquoted. To indicate a quote character, include an ENCLOSED BY option. For LOAD DATA INFILE, enclosing quotes are stripped from input values if they're found. For SELECTINTO OUTFILE, output values are written enclosed within quote characters.

    A variation on ENCLOSED BY is OPTIONALLY ENCLOSED BY. This is the same as ENCLOSED for LOAD DATA INFILE, but different for SELECTINTO OUTFILE: The presence of OPTIONALLY causes output value quoting only for string columns, not for all columns.

  • The default escape character is backslash ('\'). Any occurrence of this character within a data value modifies interpretation of the character that follows it. To indicate a different escape character, include an ESCAPED BY option. MySQL understands the following special escape sequences:

    Sequence

    Meaning

    \N

    NULL value

    \0

    NUL (zero) byte

    \b

    Backspace

    \n

    Newline (linefeed)

    \r

    Carriage return

    \s

    Space

    \t

    Tab

    \'

    Single quote

    \"

    Double quote

    \\

    Backslash


    All these sequences except \N are understood whether they appear alone or within a longer data value. \N is understood as NULL only when it appears alone.

The default line terminator is the newline (linefeed) character. To indicate a line-ending sequence explicitly, use a LINES clause. Common line terminators are newline, carriage return, and carriage return/newline pairs. Specify them as follows:

 LINES TERMINATED BY '\n' LINES TERMINATED BY '\r' LINES TERMINATED BY '\r\n' 

Because newline is the default line terminator, it need be specified only if you want to make the line-ending sequence explicit. Newline terminators are common on Unix systems and carriage return/newline pairs are common on Windows.

The ESCAPED BY option controls only the handling of values in the data file, not how you write the statement itself. If you want to specify a data file escape character of '@', you'd write ESCAPED BY '@'. That doesn't mean you then use '@' to escape special characters elsewhere in the statement. For example, you'd still specify carriage return as the line termination character using LINES TERMINATED BY '\r', not using LINES TERMINATED BY '@r'.

Suppose that a file named /tmp/data.txt contains information in comma-separated values (CSV) format, with values quoted by double quote characters and lines terminated by carriage returns. To import the file into a table t, use this LOAD DATA INFILE statement:

 LOAD DATA INFILE '/tmp/data.txt' INTO TABLE t FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r'; 

To export information in that same format, use this SELECTINTO OUTFILE statement:

 SELECT * INTO OUTFILE '/tmp/data-out.txt' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r' FROM t; 

15.2.4. Importing and Exporting NULL Values

A NULL value indicates the absence of a value or an unknown value, which is difficult to represent literally in a data file. For import and export purposes, MySQL uses the convention of representing NULL values by \N:

  • For LOAD DATA INFILE, a \N appearing unquoted by itself as a column value is interpreted as NULL. MySQL users sometimes assume that an empty value in an input file will be handled as a NULL value, but that isn't true. MySQL converts an empty input value to 0, an empty string, or a "zero" temporal value, depending on the type of the corresponding table column.

  • For SELECTINTO OUTFILE, MySQL writes NULL values to the output file as \N.



MySQL 5 Certification Study Guide
MySQL 5.0 Certification Study Guide
ISBN: 0672328127
EAN: 2147483647
Year: 2006
Pages: 312

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net