INSERT: Populating Tables with Data | Microsoft SQL Server 2008 Bible

As you know, tables in a relational database denote entities — or at least they should. For example, each row in the CUSTOMER table holds information about a specific customer; a row in ORDER_HEADER represents a definite order, and so on. Usually, the appearance of a new "real-life" entity calls for inserting a new row. For example, you would need a new row in CUSTOMER table if ACME, Inc. obtained a new customer; you need to insert a row into ORDER_HEADER table when a customer makes an order; a new row has to be added to the PRODUCT table if ACME starts selling a new product, and so on.

The INSERT statement is used to add rows to a table, either directly or through an updateable view. The syntax differs slightly among SQL99, Oracle 9i, DB2 UDB 8.1, and MS SQL Server 2000, but it is possible to come up with some kind of a generic INSERT syntax that works with all our "big three" databases:

INSERT INTO <table_or_view_name> [(<column_name>,...)] { {VALUES (<literal> |             <expression> |            NULL |            DEFAULT,...)} |    {<select_statement>} }

We are going to concentrate on the generic INSERT functionality first, and then describe some of its SQL99 and vendor-specific only features.

Common INSERT statement clauses

One would typically insert values into one table at a time within one INSERT statement. (Oracle 9i allows you to perform multitable inserts, but that is rather advanced functionality not covered in this book.) The name of the table is provided in the table_or_view_name clause. (An updateable view name can be given instead of a table name.)

The column_name clause is optional; it determines the order in which column values of the row to be inserted are populated. It also allows you to skip values for columns you don't want to populate at that particular moment; such columns would either have NULL values or would be populated with column defaults specified with a CREATE (ALTER) TABLE statement.

Caution

You cannot skip any NOT NULL columns in your column_name clause; otherwise the RDBMS will give you an error.

The values to insert can either be specified using the VALUES clause or the resulting set from the select_statement clause (also called subquery). In the first case, usually only one row is inserted (exception is DB2 where you can specify multiple VALUES clauses within one INSERT statement; more about that in this chapter's section about DB2); in the second case, the RDBMS inserts as many rows as were returned by the select_statement — it can be zero, one, ten, or one thousand.

You have to list values for all table columns in the VALUES clause in the exact same order they are specified in the table definition if the column_name list was omitted.

Cross-References

Obtaining information about the internal structure of database objects is discussed in Chapter 13.

If the column_name list is present, you have to specify a corresponding value for each column listed.

Note

If the select_statement clause was used rather than the VALUES clause, its resulting set has to be organized either in table definition order (if column_list is not specified) or in column_list order.

Inserting values for specified columns

The situation when you want to insert a row with NULL values for certain columns is not unusual. As you know, NULL is used when value is unknown or nonapplicable. For example, suppose you know ACME starts selling a new product SPRUCE LUMBER 30 ×40 ×50, but certain properties of this product (price and weight) are still unknown. We can add a record to the PRODUCT table using the following INSERT statement:

INSERT INTO product  (  prod_id_n,   prod_num_s,   prod_description_s,   prod_status_s,  prod_brand_s,   prod_pltwid_n,   prod_pltlen_n )  VALUES  (  990,   '990',   'SPRUCE LUMBER 30X40X50',   'N',   'SPRUCE LUMBER',   4,   6 )

The following two statements insert two records into the PAYMENT_TERMS table (we'll use them later in our examples):

INSERT INTO payment_terms  (   payterms_id_n,   payterms_code_s,   payterms_desc_s,   payterms_discpct_n,  payterms_daystopay_n )  VALUES  (   27,   'N21531',   '2% 15 NET 30',   0.02,   31 )

INSERT INTO payment_terms  (   payterms_id_n,   payterms_code_s,   payterms_desc_s,   payterms_discpct_n,  payterms_daystopay_n  )  VALUES  (   28,   'N21530',   '2% 15 NET 30',   0.02,   30 )

If any of skipped columns has a default value, RDBMS uses this value rather than NULL:

INSERT INTO salesman  (   salesman_id_n,   salesman_code_s,   salesman_name_s )  VALUES  (   23,   '02',   'FAIRFIELD BUGS ASSOCIATION' )

Because the SALESMAN_STATUS_S column of the SALESMAN table has the default value of Y, the inserted row looks like that:

SELECT * FROM salesman  WHERE salesman_code_s = '02'     SALESMAN_ID_N SALESMAN_CODE_S SALESMAN_NAME_S             SALESMAN_STATUS_S  --------------------------------------------------------------------------            23 02              FAIRFIELD BUGS ASSOCIATION  Y

Here is another example using the CUSTOMER table:

INSERT INTO customer  (   cust_id_n,   cust_paytermsid_fn,   cust_salesmanid_fn,   cust_name_s,   cust_alias_s,   cust_credhold_s  )  VALUES  (   1,   27,   24,   'WILE SEAL CORP.',   'MNGA71396',   'Y' )

Since the CUST_STATUS_S field has the default value of Y, the value for the column gets populated implicitly:

SELECT cust_status_s FROM customer WHERE cust_alias_s = 'MNGA71396'     CUST_STATUS_S  ------------  Y

Inserting values for all columns

Assuming you have all the necessary values and want to populate all columns by the time you are ready to insert a row, you have a choice. You can either still list all column names with corresponding values, or the column_name clause can be completely skipped. The two statements below produce identical results:

INSERT INTO product  (   prod_id_n,   prod_price_n,   prod_num_s,   prod_description_s,   prod_status_s,  prod_brand_s,   prod_pltwid_n,   prod_pltlen_n,   prod_netwght_n,  prod_shipweight_n  ) VALUES  (   1880,   33.28,   '1880',   'STEEL NAILS 6''''',   'Y',   'STEEL NAILS',   5,   4,   38.39148,   42.39148 )

INSERT INTO product  VALUES  (   1880,   33.28,   '1880',   'STEEL NAILS 6''''',  'Y',   'STEEL NAILS',   5,   4,   38.39148,   42.39148 )

Note that the latter syntax requires values to be in exact order to match the column list.

Tip

Even though the second syntax is faster to type, the first one is easier to understand and maintain, so it makes more sense to use column names in your production code.

Inserting NULL and default values explicitly

In the examples above we insert literals; when a column name is skipped, the column is populated with NULL or with the column default value implicitly. But sometimes you may want to explicitly insert a NULL value into a column or make the RDBMS use the column's default value. That can be accomplished by using keywords NULL or DEFAULT, correspondingly. The statement below populates NULL in the PROD_PRICE_N column of the newly inserted row:

INSERT INTO product  (   prod_id_n,   prod_price_n,   prod_num_s,   prod_description_s,   prod_status_s,  prod_brand_s,   prod_pltwid_n,   prod_pltlen_n,   prod_netwght_n,  prod_shipweight_n  ) VALUES (   5786,   NULL,   '5786',   'CRATING MATERIAL 12X48X72',   'Y',   'CRATING MATERIAL',   5,   6,   20.37674,   23.37674 )

Here is another statement that inserts a row into the ORDER_HEADER table, explicitly populating multiple columns with nulls:

INSERT INTO order_header  VALUES  (   30670,   28,   NULL,   1,   24,   '523783',   NULL,    NULL,  NULL,   NULL,   'N',   'Y',   NULL,   'RR',    NULL )

Later, the NULL fields (for invoice number, notes, multiple dates, etc.) can be populated with the actual data using the UPDATE statement.

This statement creates a record in the SALESMAN table explicitly specifying to use a default value for the SALESMAN_STATUS_S column:

INSERT INTO salesman  VALUES  (   24,   '03',   'AMERICA GONZALES LIMITED',   DEFAULT )

Inserting values selected from other tables

A typical (but not the only) situation when you may want to insert values selected from other tables is when archiving on a periodic basis. For example, a table that holds shipments for a large company can grow dramatically over long periods. Assume a company that has a business rule stating it usually does not need any information about shipments that are older than 180 days on a regular basis, but still may need the old shipment data occasionally for special cases like auditing, etc. To improve performance, we can create the table SHIPMENT_ARCHIVE with exactly the same columns as in SHIPMENT (see Chapter 4 for details), and then, say once a month, insert into SHIPMENT_ARCHIVE all rows from SHIPMENT that are older than 180 days. The old records can then be removed from the SHIPMENT table using the DELETE statement (discussed later in this chapter). This statement archives shipment records older than 180 days using Oracle syntax:

INSERT INTO shipment_archive SELECT *  FROM shipment WHERE TRUNC(shipment_createdate_d) < TRUNC(SYSDATE) - 180;

Note

DB2 UDB and MS SQL Server have their own methods and functions to work with dates. See Chapter 10 for details.

Archiving Based on Complex Business Rules

The business rules set could be much more complicated than the one described in our archiving example; for instance, you may want to archive certain order information that is older than 90 days and is logically stored in two tables, ORDER_HEADER and ORDER_LINE. Assuming you need to archive order number, order date, customer id, product id, and shipped quantity, you create the archive table with appropriate columns first and then use the INSERT statement with a subquery to archive data (subqueries are discussed in Chapter 8):

CREATE TABLE order_archive AS  (SELECT  ordhdr_nbr_s,          ordhdr_orderdate_d,          ordhdr_custid_fn,           ordline_prodid_fn,           ordline_shipqty_n  FROM     order_header JOIN order_line  ON       ordhdr_id_n = ordline_ordhdrid_fn) DEFINITION ONLY INSERT   INTO order_archive SELECT   ordhdr_nbr_s,          ordhdr_orderdate_d,          ordhdr_custid_fn,           ordline_prodid_fn,           ordline_shipqty_n  FROM     order_header JOIN order_line  ON       ordhdr_id_n = ordline_ordhdrid_fn  WHERE    ordhdr_createdate_d < (CURRENT DATE - 90 DAYS)

This example is using DB2 syntax to create table ORDER_ARCHIVE that is slightly different from the other two RDBMS; please refer to Chapter 4 on how to create a table based on columns from other tables.

INSERT statement and integrity constraints

Inserting rows into a table obeys certain rules and restrictions. For example, all column values have to be of same or at least compatible data types and sizes with corresponding column definitions. There are some implementation-specific variations — for example, Oracle performs implicit conversions whenever possible (from character string data types to numeric, from dates to strings, etc.), and in DB2 you always have to explicitly convert values to a compatible data type — but in general there is no RDBMS that would allow you to insert a customer name into a numeric or date column. An error will be generated and the whole row (or even multiple rows) is rejected.

Caution

The problem with "one bad row" while performing a subquery-based insert is quite typical (and is usually quite annoying). Just imagine the situation where you have to select ten thousand rows, twenty columns in each, from multiple tables and insert the result into your destination table. Just one value in one row can cause the whole insert to fail if the value is bigger than the column definition allows. For example, if your destination table column is NUMERIC(5), an attempt to insert into this column any value greater than 99,999 will fail — as well as the whole INSERT statement that might have been running for a couple of hours before it did. The conclusion is simple: Be careful when designing your INSERT statements.

A similar problem happens when you try to insert a value that violates an integrity constraint. You cannot insert NULL values into NOT NULL columns; duplicate values will be rejected for UNIQUE and PRIMARY KEY columns, and so on. For example, the following statement would fail because of the CHECK constraint violation:

INSERT INTO salesman  (   salesman_id_n,    salesman_code_s,    salesman_name_s,   salesman_status_s  )  VALUES  (    26,   '07',   'ELMERSON INDUSTRIES INCORPORATED',  'A' )

The check constraint on SALESMAN_STATUS_S says the only two valid values for this column are Y and N, and we are trying to insert A. Simply change it to Y to make this statement work.

RDBMS error messages

Each vendor has its own specific set of error messages for certain events. The wording is different; what is common though — all our "big three" vendors mention the name of the violated constraint.

Oracle

ORA-02290: check constraint (ACME.CHK_SALESSTATUS) violated

DB2

SQL0545N The requested operation is not allowed because a row does not satisfy the check constraint  "ACME.SALESMAN.CHK_SALESSTATUS".

MS SQL Server

INSERT statement conflicted with COLUMN  CHECK constraint 'CHK_SALESSTATUS'.

It is much easier to identify and fix a problem fast if the constraint name is descriptive, that is, if it tells you right away which column in what table caused the problem.

INSERT statement vendor-related specifics

The generic INSERT statement syntax given at the beginning of this chapter includes the most important clauses common for our "big three" RDBMS as well as for SQL99 syntax. In this section, we will talk about some vendor-specific differences.

SQL99

The main difference between our generic syntax and one used as SQL99 standard is that SQL99 has an additional DEFAULT VALUES clause:

INSERT INTO <table_or_view_name> [(<column_name>,...)] { {VALUES (<literal> |             <expression> |            NULL,...)} |    {<select_statement>} |   {DEFAULT VALUES}}

The DEFAULT VALUES clause is explained in the "MS SQL Server" section of this chapter.

Note

Abstract Data Types (ADT)–related clauses are intentionally excluded from the syntax above; abstract data types are discussed in Chapter 17.

Oracle 9i

The INSERT statement in Oracle has many optional clauses. For example, it can return rows and store them in predeclared variables. (UPDATE and DELETE also have this functionality.) Also, you can perform multitable inserts (i.e., insert values derived from the returned rows into multiple tables within one insert statement) that can also be conditional — for example, rows that satisfy condition1 are inserted into TABLE1; rows that suit condition2 go to TABLE2, and so on. And you can use sequences in your INSERT statements, work with partitioned tables, and more.

Note

Tables (and indexes) in Oracle can be partitioned. A partitioned table consists of a number of parts called partitions, all of which have the same logical attributes. For example, all partitions in a table share the same column and constraint definitions, but may reside on different logical and/or physical devices. Partitioning is usually done by a DBA for performance improvement; see Oracle technical documentation for details.

Most of the options listed above are advanced and vendor-specific and therefore are out of the scope of this book. Using sequences in the INSERT statement, however, is quite typical for Oracle (primarily for populating the primary keys or other unique values).

The following example illustrates inserting a row into an ADDRESS table using sequence SEQ_ADDR:

CREATE SEQUENCE seq_addr;     INSERT INTO address  (   addr_id_n,   addr_custid_fn,   addr_salesmanid_fn,   addr_address_s,   addr_type_s,  addr_city_s,   addr_state_s,   addr_zip_s,   addr_country_s  )  VALUES  (   SEQ_ADDR.NEXTVAL,  NULL,   23,   '223 E FLAGLER ST.',   NULL,   'MIAMI',   'FL',   '33131',   'USA' );

Cross-References

Sequences are discussed in Chapters 4 and 5.

DB2 UDB 8.1

The main difference between DB2's INSERT and our generic syntax is the ability to insert multiple rows within a single VALUES clause of the INSERT statement. The following statement inserts four rows into STATUS table at once:

INSERT INTO status  (   status_id_n,    status_code_s,    status_desc_s  )  VALUES   ( 2, '20', 'COMPLETE'),  ( 6, '60', 'SHIPPED'),  ( 8, '70', 'INVOICED'),  ( 9, '80', 'CANCELLED')

You can also use sequences in your INSERT statements (using syntax slightly different than in Oracle):

CREATE SEQUENCE seq_addr     INSERT INTO address  (   addr_id_n,   addr_custid_fn,   addr_salesmanid_fn,   addr_address_s,   addr_type_s,  addr_city_s,   addr_state_s,   addr_zip_s,   addr_country_s  )  VALUES  (   NEXTVAL FOR SEQ_ADDR,  NULL,   23,   '223 E FLAGLER ST.',   NULL,   'MIAMI',   'FL',   '33131',   'USA' )

The last thing to mention is identity columns (discussed in Chapter 4). When a row is inserted into a table that has an identity column, DB2 generates a value for the identity column. This value is always generated for a GENERATED ALWAYS identity column — in fact, if you explicitly try to insert a value (other than DEFAULT) into such column, an error will be generated. For a GENERATED BY DEFAULT column, DB2 generates a value only if it is not explicitly specified with a VALUES clause or with a subquery.

MS SQL Server 2000

In addition to the generic INSERT syntax clauses, MS SQL Server allows you to use the DEFAULT VALUES clause and to manipulate with identity columns.

The DEFAULT VALUES clause can be used to insert into a table with each column having a value that can be used when no explicit value is specified (DEFAULT, IDENTITY, NULL OR TIMESTAMP). In other words, the DEFAULT VALUES option is used to add rows without supplying explicit values:

1> CREATE TABLE defaults 2> ( 3>   c1 int identity, 4>   c2 varchar(30) DEFAULT ('column default'), 5>   c3 timestamp, 6>   c4 int NULL 7> ) 8> 9> INSERT INTO defaults 10> DEFAULT VALUES 11> 12> SELECT * 13> FROM defaults 14> 15> GO (1 row affected)  c1          c2             c3                c4  ----------- -------------- ----------------- ------------            1 column default 0x00000000000002A9        NULL     (1 row affected)

Trying to explicitly insert values into an identity column generates an error unless you override the identity property of the column using Transact-SQL syntax (see below). In addition, you have to use the column_name list in your INSERT statement.

The following example illustrates this concept using the PAYMENT_TERMS table created with PAYTERMS_ID_N as an identity column (see Chapter 4):

-- Trying to insert with IDENTITY_INSERT option off fails 1> INSERT INTO payment_terms 2> ( 3>  payterms_id_n, 4>  payterms_code_s, 5>  payterms_desc_s, 6>  payterms_discpct_n, 7>  payterms_daystopay_n 8> ) 9> VALUES 10> ( 11>  26, 12>  'N30', 13>  'NET 30', 14>  0, 15>  30 16> ) 17> GO     Msg 544, Level 16, State 1, Server PD-TS-TEST1, Line 1  Cannot insert explicit value for identity column in table  'payment_terms' when IDENTITY_INSERT is set to OFF.

-- Set IDENTITY_INSERT option on; insert succeeds. 1> SET IDENTITY_INSERT payment_terms ON 2> GO     1> INSERT INTO payment_terms 2> ( 3>  payterms_id_n, 4>  payterms_code_s, 5>  payterms_desc_s, 6>  payterms_discpct_n, 7>  payterms_daystopay_n 8> ) 9> VALUES 10> ( 11>  26, 12>  'N30', 13>  'NET 30', 14>  0, 15>  30 16> ) 17> GO     (1 row affected)