Recipe 10.11. Preprocessing Input Values Before Inserting Them


Problem

You have values in a datafile that are not in a format that is suitable for loading into a table. For example, values are in the wrong units, or two input fields must be combined and inserted into a single column.

Solution

LOAD DATA has the capability of performing limited preprocessing of input values before inserting them. This enables you to map input data onto more appropriate values before loading them into your table.

Discussion

Section 10.10 shows how you can specify a column list for LOAD DATA to indicate how input fields correspond to table columns. The column list also can name user-defined variables, such that for each input record, the input fields are assigned to the variables. You can then perform calculations with those variables before inserting the result into the table. These calculations are specified in a SET clause that names one or more col_name = expr assignments, separated by commas.

Suppose that you have a datafile that has the following columns, with the first line providing column labels:

Date        Time        Name        Weight      State 2006-09-01  12:00:00    Bill Wills  200         Nevada 2006-09-02  09:00:00    Jeff Deft   150         Oklahoma 2006-09-04  03:00:00    Bob Hobbs   225         Utah 2006-09-07  08:00:00    Hank Banks  175         Texas 

Suppose also that the file must be loaded into a table that has these columns:

CREATE TABLE t (   dt         DATETIME,   last_name  CHAR(10),   first_name CHAR(10),   weight_kg  FLOAT,   st_abbrev  CHAR(2) ); 

There are several mismatches between the datafile fields and the table columns that must be addressed to be able to import the file:

  • The file contains separate date and time columns that must be combined into date-and-time values for insertion into the DATETIME column.

  • The file contains a name field, which must be split into separate first and last name values for insertion into the first_name and last_name columns.

  • The file contains a weight in pounds, which must be converted to kilograms for insertion into the weight_kg column. (The conversion factor is that 1 lb. equals .454 kg.)

  • The file contains state names, but the table contains two-letter abbreviations. The name can be mapped to the abbreviation by performing a lookup in the states table.

To handle these conversions, assign each input column to a user-defined variable, and write a SET clause to perform the calculations. (Remember to skip the first line that contains the column labels.)

mysql> LOAD DATA LOCAL INFILE 'data.txt' INTO TABLE t     -> IGNORE 1 LINES     -> (@date,@time,@name,@weight_lb,@state)     -> SET dt = CONCAT(@date,' ',@time),     ->     first_name = SUBSTRING_INDEX(@name,' ',1),     ->     last_name = SUBSTRING_INDEX(@name,' ',-1),     ->     weight_kg = @weight_lb * .454,     ->     st_abbrev = (SELECT abbrev FROM states WHERE name = @state);             

As a result of this import operation, the table contains these rows:

mysql> SELECT * FROM t; +---------------------+-----------+------------+-----------+-----------+ | dt                  | last_name | first_name | weight_kg | st_abbrev | +---------------------+-----------+------------+-----------+-----------+ | 2006-09-01 12:00:00 | Wills     | Bill       |      90.8 | NV        | | 2006-09-02 09:00:00 | Deft      | Jeff       |      68.1 | OK        | | 2006-09-04 03:00:00 | Hobbs     | Bob        |    102.15 | UT        | | 2006-09-07 08:00:00 | Banks     | Hank       |     79.45 | TX        | +---------------------+-----------+------------+-----------+-----------+ 

LOAD DATA can perform data value reformatting, as shown in this section. Other examples showing uses for this capability occur later in the chapter. For example, Section 10.33 uses it to perform rewriting of non-ISO dates to ISO format during data import. However, although LOAD DATA can map input values to other values, it cannot outright reject an input record that is found to contain unsuitable values. To do that, you can either preprocess the input file to remove these records or issue a DELETE statement after loading the file.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net