While we covered the creation of databases and tables in Chapter 9, "Designing and Creating Your Database," which we demonstrated by creating a message board database with tables for users, messages, forums, and replies, we did not cover how to put data into these tables. Data is added via the INSERT INTO statement. Unlike the specific information concerning the creation of databases and tables discussed in the previous chapter, much of the SQL introduced in this chapter will work on most major servers (though there will be minor differences that we will point out to readers).
The INSERT INTO Statement
In its most basic form, the INSERT INTO statement requires the name of the table into which you want to insert data and the set of values for each of the columns in the table.
INSERT INTO TableName VALUES(value1, value2, ..., valuen);
The statement also has the option of letting you specify for which columns you would like to specify data:
INSERT INTO TableName (colname1, colname2, ..., colnamen) VALUES(value1, value2, ..., valuen);
If we wanted to add a new message board user to our Users table, we could execute the following statement:
INSERT INTO Users VALUES(100, 'lijin', 'Li Jin La', 'firstname.lastname@example.org', null);
There are two interesting things to note about the previous query. The first is the value for the user_id field, which cannot be null since we declared the column as NOT NULL (PRIMARY KEY implies NOT NULL) in the CREATE TABLE statement. Thus, we had to find and enter a value that was not being used. The second is the null value in the birthdate field, which indicates that we do not have information for the user in that field.
However, it seems suboptimal to have to specify the user_id field at all since we declared the column as AUTO_INCREMENT. What we would like is a way to not have to specify a value and let the database engine fill it in for us. Fortunately, we can use the syntax for the INSERT INTO statement to state for which columns we are providing values:
INSERT INTO Users(user_name, full_name, user_email, birthdate) VALUES('kmuller', 'Klaus Müller', 'email@example.com', '1980-12-16');
In this example, we indicated that we would be providing values for all of the columns in our Users table except for the user_id field, which the database engine assigns for us.
Unfortunately, while most platforms do this work for us, somenotably Oraclerequire that you create a sequence to do this incrementing for you. An example would be as follows:
CREATE SEQUENCE userid_seq MINVALUE 1 MAXVALUE 9999999999 START WITH 1 INCREMENT BY 1;
To insert a record into the table, you have to call the nextval method on this sequence:
INSERT INTO Users VALUES(userid_seq.nextval, 'kmuller', 'Klaus Müller', 'firstname.lastname@example.org', '1980-12-16');
One problem that you might encounter is what happens if you try to insert a row without specifying a field that was declared as NOT NULL in the table schema:
INSERT INTO Users(full_name) VALUES("Bobo the Clown");
This statement executes without trouble in MySQL. Other database engines might refuse to create the row and complain that values have not been specified for the NOT NULL columns. However, even in the MySQL case, the username for the newly created record is '' (the empty string), which is probably not what we want. The server sees that the column is marked as NOT NULL, so it chooses the next best value.
We can prevent this from happening more than once by marking the column as UNIQUE, but we still want to prevent a user from having an empty username the first time. This shows again why it behooves us to do data validation in our script before we pass information on to the database server.
For fields that are numeric, the default non-null value chosen by the MySQL database engine is 0.
Bulk Data Insertion
If we wanted to insert a core set of values for one or more tables (for example, we might choose to populate a table of states and provinces with all the values for the USA and Canada) into our databases, it would get tedious to enter each of the possible values by hand with individual INSERT INTO statements. We could write a script that read a data file and in turn executed a number of these statements, but many databases also provide the ability to perform a bulk insertiona means of inserting multiple records of data in a single statement.
While we mentioned at the beginning of this chapter that much of the new SQL learned would be portable across database servers, bulk insertion is one of the exceptions. We will mention the MySQL syntax here and show the options for other servers in Appendix B, "Database Function Equivalents."
MySQL distinguishes between two types of bulk insertionthat performed by the mysql client program, and insertion performed by the server. The syntax for client bulk data insertion is as follows:
LOAD DATA LOCAL INFILE 'filename' INTO TABLE TableName (column1, column2, ..., columnn);
The filename is a file on the local file system to which the client program must have access. One nice thing about this command is that it performs individual insertions for the data, so the user connected to the server merely needs to have INSERT privileges. However, this command tends to be inefficient. The last line, which is the set of columns for which values should be sought and inserted, is optionalif unspecified, the query assumes that the text file contains data for each of the columns in the table.
By default, the text file is expected to have Unix-style line endings (for example, only the single \n character). Therefore, when inserting a file created on a Windows system, you need to tell the LOAD DATA statement about the line endings:
LOAD DATA LOCAL INFILE 'windows.txt' INTO TABLE sometable LINES TERMINATED BY '\r\n'
By default, fields are separated with TAB characters. To change which character is used to indicate the end of a field in the input file (for example, if you wanted to load field data that can contain TAB characters), you add the FIELDS clause and the TERMINATED BY subclause. To enclose or wrap fields, you need to indicate which character to use by adding the ENCLOSED BY subclause to the FIELDS clause, which can be prefixed by the keyword OPTIONALLY to tell MySQL not to worry about enclosing characters if they are missing. If we wanted to separate our fields with a vertical bar (|) character and let the user choose whether to wrap fields in double quotes or not, we could write
LOAD DATA LOCAL INFILE 'barred.txt' INTO TABLE sometable FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
We could fill our Forums table in our MessageBoard database with a few basic forums, which we would be put into a text file:
"General Discussion"|"All General Discussions go here" "n000bs"|"New Users Discussion Forum" "Technical Support"|"Get the help you need now!" "Fans of Britney Spears"|"Get the help you need now!"
We can now insert the contents of this file into our Forums table with the following query specifying that only the name and description column are in the input file:
mysql> LOAD DATA LOCAL INFILE 'forums.txt' -> INTO TABLE Forums -> FIELDS TERMINATED BY '|' -> OPTIONALLY ENCLOSED BY '"' -> LINES TERMINATED BY '\r\n' -> (name, description); Query OK, 4 rows affected (0.00 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
To have the server perform the bulk insertion, you merely leave out the LOCAL keyword, which causes the client to pass the command to the server. This creates two additional requirements for the insertion:
If we were to try to load a file for which we did not have permissions, we would see this error:
mysql> LOAD DATA INFILE '/home/marcw/no_perms.txt' -> INTO TABLE Forums -> FIELDS TERMINATED BY '|' -> OPTIONALLY ENCLOSED BY '"' -> LINES TERMINATED BY '\n' -> (name, description); ERROR 1085: The file '/home/marcw/no_perms.txt' must be in the database directory or must be readable by all.
Data Type Entry
In all of the previous examples, we have been entering string data and wrapping it in single quotes for the INSERT INTO statements. Although many clients accept double quotes, single quotes are the preferred means of inputting strings. Numeric data can be entered without quotes.
For dates, times, and timestamps, you need to be careful. Dates should be entered in the format
while times are entered as
where the hours are in 24-hour format and xx represents hundredths of seconds. You would be fine on most systems by using 12-hour time with the suffix AM or PM. Where you start to run into trouble is with values such as
In this case, it is difficult for the DBMS to know whether this is May 4, 2003, or May 4, 1903. Each database server has different rules for how it processes ambiguous dates, so you should be careful when moving code between them. For dates where the year is specified with a two-digit year instead of all four digits, the MySQL server uses the following rules:
Dates in the standard American format (mm/dd/yy) or European format (dd/mm/yy) are not correctly parsed by MySQL. In general, it is recommended that you stick with the standardized date formats that all servers will recognize and handle correctly.
Another interesting case happens when the user enters the date:
Versions of MySQL prior to version 5 permit these values and only check to make sure that the month is between the values 0 and 12, and the day is between 0 and 31. Versions of MySQL later than 5.0.2 generate warnings for these invalid dates unless the Server SQL Mode is configured to permit these valuesconsult the MySQL documentation for more information.