Data Modification Guidelines

< Day Day Up >

All of the guidelines thus far in this chapter have explored ways to make retrieving data more efficient. But data also must be modified, and you need to ensure that data modification is performed efficiently , too. Under normal circumstances, you can modify data in a DB2 table in six ways:

Using an SQL UPDATE statement
Using an SQL INSERT statement
Using an SQL DELETE statement
Because of a referential constraint specifying ON DELETE CASCADE or ON DELETE SET NULL
Because a trigger is fired as the result of an UPDATE , INSERT , or DELETE , and the trigger issues a SQL data modification statement
Using the DB2 LOAD utility

This section provides tips for the efficient implementation of the first three methods . You can find guidelines for the others as follows :

For referential integrity, in Chapter 5
For triggers, in Chapter 8, "Using DB2 Triggers for Integrity"
For using the LOAD utility, as well as the other DB2 utilities, in Chapter 33, "Data Organization Utilities."

Limit Updating Indexed Columns

When you update columns in indexes, a corresponding update is applied to all indexes in which the columns participate. Updating can have a substantial impact on performance due to the additional I/O overhead.

Use `FOR UPDATE OF` Correctly

Specify only those columns that actually will or can be updated in the FOR UPDATE OF column list of a cursor. DB2 does not use any index that contains columns listed in the FOR UPDATE OF clause.

Consider Using `DELETE/INSERT` Instead of `FOR UPDATE OF`

If all columns in a row are being updated, use DELETE on the old row and use INSERT on the new one rather than the FOR UPDATE OF clause.

Update Multiple Rows

You have two options for updating data using the SQL UPDATE verb:

A cursor UPDATE using WHERE CURRENT OF
A direct SQL UPDATE

If the data does not have to be retrieved by the application before the update, use the direct SQL UPDATE statement.

A cursor UPDATE with the WHERE CURRENT OF option performs worse than a direct UPDATE for two reasons. First, the rows to be updated must be retrieved from the cursor a row at a time. Each row is fetched and then updated. A direct UPDATE affects multiple rows with one statement. Second, when using a cursor, you must add the overhead of the OPEN and CLOSE statement.

Update Only Changed Columns

UPDATE statements should specify only columns in which the value will be modified. For example, if only the ACSTAFF column of the DSN8810.PROJACT table should be changed, do not code the following:

 EXEC SQL      FETCH C1      INTO :HV-PROJNO, :HV-ACTNO, :HV-ACSTAFF,           :HV-ACSTDATE, :HV-ACENDATE END-EXEC. MOVE 4.5 TO HV-ACSTAFF. UPDATE DSN8810.PROJACT        SET PROJNO   = :HV-PROJNO,            ACTNO    = :HV-ACTNO,            ACSTAFF  = :HV-ACSTAFF,            ACSTDATE = :HV-ACSTDATE,            ACENDATE = :HV-ACENDATE WHERE CURRENT OF C1;

Although the host variables contain the same data currently stored in the table, you should avoid this type of coding. DB2 checks to see whether the data is different before performing the update. If none of the values are different from those already stored in the table, the update does not take place. Performance may suffer, though, because DB2 has to perform the value checking. You can avoid this situation by coding the UPDATE statement as follows:

 UPDATE DSN8810.PROJACT        SET ACSTAFF = :HV-ACSTAFF WHERE CURRENT OF C1;

Disregard this guideline when the application you are developing requires you to code a complicated check algorithm that DB2 can perform automatically. Because of the complexity of the code needed to check for current values, implementing this type of processing is not always feasible . Nevertheless, try to avoid specifying useless updates of this type when issuing interactive SQL.

Consider Dropping Indexes Before Large Insertions

When you execute a large number of INSERT s for a single table, every index must be updated with the columns and the appropriate RIDs (record IDs) for each inserted row. For very large insertions, the indexes can become disorganized, causing poor performance. Dropping all indexes for the table, performing the INSERT s, and then re-creating the indexes might be more efficient. The trade-offs to consider are

The overhead of updating indexes must be weighed against the index re-creation plus the rebinding of all application plans that used the indexes.
If indexes are required to enforce uniqueness, another method must be selected to enforce uniqueness before or after the indexes are rebuilt.
After the indexes are rebuilt, RUNSTATS must be executed before any binds are executed or dynamic SQL is allowed. RI rules may need to be checked.

Exercise Caution When Issuing Ad Hoc `DELETE` Statements

Be extremely careful when issuing SQL DELETE statements outside the control of an application program. Remember that SQL acts on a set of data, not just one row. All rows that qualify based on the SQL WHERE clause are updated or deleted. For example, consider the following SQL statement:

 DELETE FROM DSN8810.DEPT;

This SQL statement, called a mass DELETE , effectively deletes every row from the DEPT table. Normally, this result is undesirable.

Exercise Caution When Issuing Ad Hoc `UPDATE` Statements

When issuing an ad hoc UPDATE , take care to specify an appropriate WHERE clause. Consider the following SQL statement:

 UPDATE DSN8810.DEPT SET DEPTNAME = 'NEW DEPARTMENT';

This SQL statement changes the value of the DEPTNAME column for every row in the table to the value 'NEW DEPARTMENT' . This result occurs because no WHERE clause is coded to limit the scope of the UPDATE . Requests of this nature are not usually desirable and should be avoided.

Mass `DELETE` Versus `LOAD`

Sometimes you need to empty a table. You can do so by issuing a mass DELETE or by loading an empty data set. A mass DELETE usually is more efficient when you're using segmented table spaces. Loading an empty data set usually is more efficient when you're using simple or partitioned table spaces.

Consider Using a Self-Referencing `DELETE`

With DB2 Version 7 comes the ability to code a self-referencing sub- SELECT on searched UPDATE and DELETE statements. In previous releases of DB2, the WHERE clause cannot refer to the table (or view) being modified by the statement. For example, the following SQL is legitimate as of DB2 V7, and can be used to implement a 10% raise for employees who earn less than their department's average salary:

 UPDATE DSN8710.EMP E1 SET SALARY = SALARY * 1.10 WHERE SALARY < (SELECT AVG(SALARY)                 FROM   DSN8710.EMP E2                 WHERE  E1.WORKDEPT = E2.WORKDEPT);

DB2 will evaluate the complete subquery before performing the requested UPDATE .

Use `INSERT` and `UPDATE` to Add Long Columns

The maximum length of a string literal that can be inserted into DB2 is 255 characters . This restriction poses a problem when you must insert a LONG VARCHAR column in an ad hoc environment.

To get around this limitation, issue an INSERT followed immediately by an UPDATE . For example, if you need to insert 260 bytes of data into a LONG VARCHAR column, begin by inserting the first 255 bytes as shown:

 INSERT INTO  your.table  COLUMNS    (LONG_COL,  other columns  ) VALUES     ('   first 255 bytes of LONG_COL   ',  other values  );

Follow the INSERT with an UPDATE statement to add the rest of the data to the column, as in the following example:

 UPDATE  your.table  SET LONG_COL = LONG_COL  '   remaining 5 bytes of LONG_COL   ', WHERE KEY_COL = 'key value';

For this technique to be successful, a unique key column (or columns) must exist for the table. If each row cannot be uniquely identified, the UPDATE cannot be issued because it might update more data than you want.

CAUTION

Prior to DB2 V6 (and DB2 V5 with a retrofit APAR) the maximum length of a string literal that could be inserted into DB2 was limited to 254 characters, instead of 255.

Delete and Update with a Self-Referencing Sub- `SELECT`

The ability to code a self-referencing sub- SELECT on searched UPDATE and DELETE statements was added to DB2 Version 7. In previous releases of DB2, the WHERE clause cannot refer to the table (or view) being modified by the statement. For example, the following SQL is legitimate as of DB2 V7, and can be used to implement a 10% raise for employees who earn less than their department's average salary:

 UPDATE DSN8810.EMP E1 SET SALARY = SALARY * 1.10 WHERE SALARY < (SELECT AVG(SALARY)                 FROM   DSN8810.EMP E2                 WHERE  E1.WORKDEPT = E2.WORKDEPT);

DB2 will evaluate the complete subquery before performing the requested UPDATE .

`SELECT` from an `INSERT` to Retrieve Generated Values

DB2 Version 8 offers the intriguing new ability to SELECT from an INSERT statement. To understand why we might need to do so, let's first review some background data. In some cases, it is possible to perform actions that generate new data for an inserted row before it gets saved to disk. For example, a BEFORE TRIGGER might change data before it is even recorded to disk, but the application program will not have any knowledge of this change that is made in the trigger. Using general registers, identity columns, and user -defined defaults can produce similar effects.

What can be done if the program needs to know the final column values? Prior to DB2 V8, this is difficult and inefficient to implement. For example, to retrieve that last identity column value, a separate singleton SELECT statement is needed to retrieve the result of IDENTITY_VAL_LOCAL() into a host variable. The SELECT FROM INSERT syntax solves this problem. It allows you to both insert the row and retrieve the values of the columns with a single SQL statement. It performs very well, because it performs both the INSERT and the SELECT as a single operation. Consider the following example:

 SELECT PRSTDATE FROM  INSERT (PROJNO, DEPTNO, RESPEMP, PRSTDATE)  INTO DSN8810.PROJ  VALUES('000001', 'B01', '000020', CURRENT DATE);

The data is inserted as specified in the VALUES clause, and then retrieved as specified in the SELECT . Without the ability to select PRSTDATE , the program would have no knowledge of the value supplied to PRSTDATE , because it was assigned using CURRENT DATE . Instead, the program would have had to issue a separate SELECT statement to retrieve the actual date. With this new syntax, the program can retrieve the CURRENT DATE value that was just inserted into PRSTDATE without adding overhead.

List Columns for `INSERT`

When you are coding an INSERT statement in an application program, list the column names for each value you are inserting. Although you could merely align the values in the same order as the column names in the table, doing so leads only to confusion. Furthermore, if ALTER is used to add new columns to the table, every INSERT statement that does not explicitly list the columns being inserted will fail. The proper format is

 INSERT INTO DSN8810.DEPT         (DEPTNO,          DEPTNAME,          MGRNO,          ADMRDEPT)     VALUES         ('077',          'NEW DEPARTMENT',          '123456',          '123') ;

Consider Using Multi-Row `INSERT` s

When you need to INSERT multiple rows, consider deploying a multi-row INSERT to place more than one row at a time into the table. This capability is new as of DB2 Version 8. By inserting multiple rows at once, your request may become more efficient. More information on multi-row INSERT s is provided in Chapter 11.

< Day Day Up >

Limit Updating Indexed Columns

Use FOR UPDATE OF Correctly

Consider Using DELETE/INSERT Instead of FOR UPDATE OF

Update Multiple Rows

Update Only Changed Columns

Consider Dropping Indexes Before Large Insertions

Exercise Caution When Issuing Ad Hoc DELETE Statements

Exercise Caution When Issuing Ad Hoc UPDATE Statements

Mass DELETE Versus LOAD

Consider Using a Self-Referencing DELETE

Use INSERT and UPDATE to Add Long Columns

Delete and Update with a Self-Referencing Sub- SELECT

SELECT from an INSERT to Retrieve Generated Values

List Columns for INSERT

Consider Using Multi-Row INSERT s