The SQL Standard description of the typical INSERT format is:
INSERT INTO <Table> [ ( <column> [,...] ) ] <query expression>
The keyword INTO is optional for Microsoft, MySQL, and Sybase, but there is no reason to leave it out and cause portability trouble.
Here are two examples of INSERT:
INSERT INTO Table1 (column1) VALUES ('The rain in Spain falls in the plain') INSERT INTO Table1 (column1) SELECT column1 FROM Table2
If a particular column value appears frequently and is lengthy, then you can save network transmission by making the value the default for the column. For example:
CREATE TABLE Table1 ( column1 VARCHAR(40) DEFAULT 'The rain in Spain', ... ) INSERT INTO Table1 DEFAULT VALUES
This trick results in some gain if the column value is at least 100 bytes long (GAIN: 6/8).
There is no point in reordering the INSERT columns so that they are in the same order as in the storage pageit doesn't improve performance at all. (And by the way, that order isn't necessarily the same as the defined order, as we discussed in Chapter 7, "Columns.") It does, however, help slightly if the primary key or unique columns appear first in the column list. That is, if this is your table definition:
CREATE TABLE Table1 ( column1 DECIMAL(4,2), column2 VARCHAR(40) UNIQUE, column3 DATE, column4 INTEGER PRIMARY KEY)
then performance is slightly better if you do your INSERT like this:
INSERT INTO Table1 (column4, column2, column1, column3) VALUES (10, 'The Rain', 24.5, DATE '2001-01-01') GAIN: 2/7
Don't do this for Sybase on a PRIMARY KEY; it shows a loss. The gain shown is for only seven DBMSs.
When there are many separate INSERT statements to execute, it's helpful if the statements are in order according to the value in some index. This increases the chance that the appropriate index page will be in cache.
It's always good for performance if several insertions are combined. That's because there's a per-statement overhead, including the time needed to look up the table name in a catalog, to check privileges, and to form a network packet. Several ways exist to perform a multiple-row INSERT (usually called a bulk INSERT or just a load ).
The SQL-Standard way to do a bulk INSERT is to repeat the row-expression part of the statement multiple times, as in this example:
INSERT INTO Table1 (column1, column2, column3, column4, column5) VALUES (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7)
Only IBM and MySQL support this feature.
This method is rare.
An alternate, but still SQL-Standard way to do a bulk INSERT is with a compound statement:
BEGIN INSERT INTO Table1 VALUES (1, 2, 3, 4, 5); INSERT INTO Table1 VALUES (2, 3, 4, 5, 6); INSERT INTO Table1 VALUES (3, 4, 5, 6, 7); END
Only IBM, Microsoft, Oracle, and Sybase support this feature outside of a stored procedure.
Compound statements are an excellent idea in many situations, and IBM recommends their use quite strongly. In this particular case, where the same statement is being repeated, they're less than ideal because each repetition must be parsed.
It's also possible to do a bulk INSERT with an ANSI SQL subquery, if the data you want is already in the database:
INSERT INTO Table2 SELECT * FROM Table1 WHERE column1 < 100
MySQL doesn't support subqueries, but the DBMS does support this construct.
The nonstandard way to do a bulk INSERT is to use a special SQL-extension statement that reads data from a file, for example:
BULK INSERT ... BATCH SIZE = n INPUT FILE = 'xxx'; /* Microsoft stuff */
This is a common method, but because it's DBMS-specific, a discussion of it is beyond the scope of this book.
The final way to do a bulk INSERT is to load multiple values into an array and pass the entire array as a parameter for a function in an API such as ODBC or JDBC. In effect, you now have to do set management inside an application program, because there is no logical difference between an array of record values and a set of records. The difficulties with this approach are: (a) you need a lot of application memory, and (b) you need a special error-handling routine because a single statement can generate hundreds of errorsone for each row that has a problem. And you must take care to stay within the maximum buffer size for a network, which can be as little as 128KB.
You can prepare for bulk INSERTs by dropping indexes, disabling constraints, and locking the table. Generally such drastic acts only make sense when the number of rows to add is huge, and some validity testing has already been done on the input values. After a huge bulk INSERT, it's a good idea to run the "statistics update" statement for your DBMS so that the optimizer knows about the new situation; sometimes it's necessary to update statistics even during the bulk INSERT, as well as after it.
The Bottom Line: INSERT
If a particular, lengthy, column value appears frequently in INSERT, save network transmission by making the value the default value for that column.
There is no point in reordering INSERT columns so that they are in the same order as in the storage page, though it does help slightly if the primary key or unique columns appear first in the column list.
When you're executing many separate INSERT statements, put the statements in order according to the value in some index.
It's always good for performance if several insertions are combined.
Prepare for bulk INSERTs by dropping indexes, disabling constraints, and locking the table.
After a huge bulk INSERT, run the "statistics update" statement for your DBMS so that the optimizer knows about the new situation.