Leveraging Advanced SQL | Intelligent Enterprises of the 21st Century

Some lesser-known SQL features exist that can dramatically improve performance. Namely, SELECT from INSERT, UPDATE, and DELETE, and the MERGE SQL statement. These SQL enhancements can greatly improve performance by unifying one or more SQL statements.

The SQL features discussed in this section (NEW TABLE, OLD TABLE, FINAL TABLE, and MERGE SQL statements) are currently supported on LUW and partially on zSeries. On zSeries, FINAL TABLE is supported with INSERT statements (you can learn more details in upcoming sections).

Combining SELECT with INSERT, UPDATE, and DELETE

Consider the procedure in Figure 10.1, which deletes an employee using an employee number and returns the name of the deleted employee as output. This procedure works on all platforms.

Figure 10.1. Fetching and deleting a row using two SQL statements.

 CREATE PROCEDURE delete_employee1 ( IN p_empno VARCHAR(6)                                   , OUT p_fullname VARCHAR(100))     LANGUAGE SQL     SPECIFIC delete_employee1                     -- applies to LUW and iSeries  --WLM ENVIRONMENT <env>                          -- applies to zSeries de: BEGIN     DECLARE EXIT HANDLER FOR NOT FOUND         SET p_fullname = 'employee not found';     SELECT firstnme || ' ' || lastname INTO p_fullname       FROM EMPLOYEE       WHERE EMPNO = p_empno;     DELETE FROM EMPLOYEE      WHERE EMPNO = p_empno; END de

To execute this procedure, you can use

 CALL delete_employee1 ('000210',?)

The key observation here is that the procedure must execute the same query twice: first to retrieve the employee name, and then again to delete the employee row (you can think of UPDATEs and DELETEs as SELECT statements with side effects). This procedure looks straight forward, and at first glance you would think it cannot be improved. However, as demonstrated in Figure 10.2, the procedure can be optimized to use just one SQL statement.

Figure 10.2. Fetching and deleting a row using a single SQL statement on LUW.

 CREATE PROCEDURE delete_employee2 ( IN p_empno VARCHAR(6)                                   , OUT p_fullname VARCHAR(100) )     LANGUAGE SQL     SPECIFIC delete_employee2 de: BEGIN     -- OLD TABLE clause currently valid on LUW only     DECLARE EXIT HANDLER FOR NOT FOUND         SET p_fullname = 'employee not found';     SELECT firstnme || ' ' || lastname INTO p_fullname       FROM OLD TABLE (DELETE FROM EMPLOYEE WHERE EMPNO = p_empno); --1 END de

NEW TABLE and OLD TABLE

Whenever a table is inserted into, updated, or deleted from, DB2 maintains something internally called transition tables. These transition tables are referred to as the NEW TABLE and OLD TABLE. In the case of INSERT or UPDATE, the NEW TABLE contains the new row to be inserted or the new value to use for update. In the case of UPDATE or DELETE, the OLD TABLE contains the old value before an update or the value that is to be deleted.

By using NEW TABLE or OLD TABLE with a SELECT statement, you can effectively combine SELECTs with INSERT, UPDATE, and DELETE for situations such the one described in the previous example. For clarity, this concept can be illustrated in the context of INSERT, UPDATE, and DELETE independently.

SELECT from DELETE is most useful when you need to be able to delete from a table and return information about the row that was deleted. You have already seen an example of this. Another good example is where a table is being used as a queue, and you must consume from the queue (delete the row) and then do something else with the de-queued data.

     CREATE TABLE queue (data int)     INSERT INTO queue VALUES (1),(2),(3)     SELECT *     FROM OLD TABLE (DELETE FROM (SELECT * FROM queue     FETCH FIRST ROW ONLY))

SELECT from UPDATE is useful when you need to know something about either the new or old values of the row being updated. A good example of this is where you give an employee a 10 percent raise but you also want to know his or her original salary.

     SELECT salary     FROM OLD TABLE     (UPDATE EMPLOYEE SET salary = salary * 1.1 WHERE EMPNO= '000340')

Similarly, if you wanted to retrieve the new salary, you can use NEW TABLE instead:

     SELECT salary     FROM NEW TABLE     (UPDATE EMPLOYEE SET salary = salary * 1.1 WHERE EMPNO= '000340')

What if you wanted to retrieve both the old salary and new salary in a single statement? This topic is discussed in the next subsection on INCLUDE columns.

SELECT from INSERT is useful if you want to know about the value being inserted and that value is not immediately available. A good example of this is where you are inserting into a table with an identity column. Normally, you would call the function IDENTITY_VAL_LOCAL to retrieve the value used for the most recently inserted identity value.

When more than one row is inserted using a single SQL statement, IDENTITY_VAL_LOCAL cannot return all generated identity values. In other words, if the table identity_tab was defined as follows:

      CREATE TABLE identity_tab      (id INT NOT NULL GENERATED ALWAYS AS IDENTITY, data VARCHAR(100))

the function IDENTITY_VAL_LOCAL cannot be used after statements, such as

      INSERT INTO identity_tab (data) VALUES ('A'), ('B'), ('C');

or (t2 has multiple rows)

      INSERT INTO identity_tab (data) SELECT data FROM t2

The procedure in Figure 10.3 demonstrates how you can use NEW TABLE to retrieve all generated identity values after a multi-row insert.

Figure 10.3. Using NEW TABLE to retrieve identity values for a multi-row insert on LUW.

 CREATE PROCEDURE insert_identities()     LANGUAGE SQL     SPECIFIC insert_identities ii: BEGIN -- This procedure requires a table defined as follows: -- CREATE TABLE identity_tab (id INT NOT NULL GENERATED ALWAYS AS IDENTITY, data VARCHAR(100));     DECLARE c1 CURSOR WITH RETURN TO CALLER FOR         SELECT id FROM NEW TABLE (INSERT INTO identity_tab (data) SELECT deptname FROM department);     OPEN c1; END ii

When you call this procedure, the department names are inserted into the identity_tab table, and identity values are generated for each row. The cursor returns to the client all the identity column values used for the multi-row insert.

Note

The INSERT operation is not executed until the cursor is opened.

INCLUDE

The previous section discussed how an UPDATE statement has access to both NEW TABLE and OLD TABLE values. The example demonstrated how you can retrieve the old or new salary of an employee after a salary increase of 10 percent. The SQL statements are repeated here for the purpose of introducing INCLUDE columns.

 SELECT salary FROM OLD TABLE (UPDATE EMPLOYEE SET salary = salary * 1.1 WHERE EMPNO= '000340') SELECT salary FROM NEW TABLE (UPDATE EMPLOYEE SET salary = salary * 1.1 WHERE EMPNO= '000340')

By using NEW TABLE and OLD TABLE alone, however, it is not possible to retrieve both the old and new salary value. An INCLUDE column can be used to fill this gap.

 SELECT salary as new_salary, old_salary   FROM NEW TABLE ( UPDATE employee INCLUDE (old_salary DECIMAL(9,2))                       SET salary = salary * 1.10,                           old_salary = salary                    WHERE empno='000340')

The INCLUDE clause in the nested UPDATE statement creates a new column that can be selected from the outer SELECT statement. You can see that the old_salary retrieves the old salary value while the table column salary is increased by 10 percent. If p_empno is 000340, the output would be

 NEW_SALARY  OLD_SALARY ----------- -----------    26224.00    23840.00

FINAL TABLE

In addition to OLD TABLE and NEW TABLE, there is also something called FINAL TABLE. When executing INSERT, UPDATE, or DELETE, there may still be AFTER triggers or referential constraints that result in further modification of data. FINAL TABLE can be used to ensure that such changes cannot occur.

Note

On zSeries, FINAL TABLE is supported for INSERT statements only.

Consider the following example where, using the same identity_tab table defined earlier in this section, an AFTER TRigger is created that sets all data values to NULL. That is, any data value provided on INSERT to table identity_tab always is wiped out. While this is not particularly useful, it clearly demonstrates the difference between NEW TABLE and FINAL TABLE.

 CREATE TRIGGER trig1     AFTER INSERT ON identity_tab REFERENCING NEW AS n     FOR EACH ROW MODE DB2SQL     UPDATE identity_tab SET data=null

If you used NEW TABLE after performing the INSERT, the SELECT would retrieve the inserted data value even though the AFTER trigger ultimately sets it to NULL.

 SELECT data FROM NEW TABLE (INSERT INTO identity_tab VALUES (default,'x')) DATA -------------------------------- x   1 record(s) selected.

However, when using FINAL table, the SQL statement is not allowed to succeed because DB2 sees the AFTER TRigger. This protects you from any unforeseen side effects not visible to the application.

 SELECT data  FROM FINAL TABLE (INSERT INTO identity_tab VALUES (default,'x')) SQL0989N AFTER trigger "TRIG1" attempted to modify a row in table "IDENTITY_TAB" that was modified by an SQL data change statement within a FROM clause. SQLSTATE=560C3

Deleting a Few Rows at a Time

When you want to delete a large amount of data, one major concern can be transaction log space consumption. There are not many situations more painful than starting a DELETE operation that runs for hours, only to be forced to roll back due to a log full condition.

Tip

On LUW, if you want to purge all data from a table rather than using DELETE, you can use ALTER TABLE table-name ACTIVATE NOT LOGGED INITIALLY WITH EMPTY TABLE to purge the table data without any transaction logging.

And even if you do have sufficient log space, you generally do not want to have a single transaction consume the majority of active online log files. The solution is to delete from the table in groups of rows with intermittent COMMITs. Figure 10.4 provides an example of how to perform deletes in this manner.

Figure 10.4. Deleting from a table in groups on LUW.

 CREATE PROCEDURE chunk_delete (OUT p_batches INT)     LANGUAGE SQL     SPECIFIC chunk_delete cd: BEGIN     -- This example is currently supported on LUW only     DECLARE SQLSTATE CHAR(5) DEFAULT '00000';     DECLARE v_cnt INT DEFAULT 0;     DECLARE rows_deleted INT DEFAULT 1;     REPEAT         SET v_cnt = v_cnt + 1;         DELETE FROM (SELECT 1 FROM EMP_ACT                         WHERE YEAR(EMENDATE) < YEAR(current date)                         FETCH FIRST 10 ROWS ONLY);         GET DIAGNOSTICS rows_deleted = ROW_COUNT;         COMMIT;         UNTIL rows_deleted = 0     END REPEAT;     SET p_batches = v_cnt; END cd

The emp_act table can grow quickly because employees log their project activity hours. As time passes, older data can be purged. In Figure 10.4, data that is older than the current year is being purged.

The key feature of this procedure is the DELETE from SELECT statement that limits its result set to a fixed number of rows using FETCH FIRST ... ROWS ONLY. In this example, the rows are deleted in groups of 10 and committed (however, in your own situations, you should find that significantly higher values in the hundreds or thousands will yield significantly better performance). This is repeated until no rows are left to delete as determined by GET DIAGNOSTICS.

MERGE Statement

The MERGE SQL statement is used to unify INSERT, UPDATE, and DELETE statements into one, with which may result in better performance.

Tip

MERGE is currently available on LUW. However, the procedure that implements merging of data without the MERGE statement (see Figure 10.6) works for all platforms.

To demonstrate, consider an example where you have a staging table that contains both updates to existing data and new data to be inserted into a primary table. The following tables and data are used to support the following examples.

Figure 10.5. Scenario setup for merging data from two tables.

 CREATE TABLE master_table (id INT NOT NULL, data VARCHAR(100)); CREATE UNIQUE INDEX ix1 ON master_table (id); ALTER TABLE master_table ADD CONSTRAINT master_pk PRIMARY KEY (id); CREATE TABLE stage_table LIKE master_table; INSERT INTO master_table VALUES (1,'a'); INSERT INTO master_table VALUES (2,'b'); INSERT INTO master_table VALUES (3,'c'); INSERT INTO master_table VALUES (4,'d'); INSERT INTO stage_table VALUES (1,'A'); INSERT INTO stage_table VALUES (3,'C'); INSERT INTO stage_table VALUES (10,'j'); INSERT INTO stage_Table VALUES (11,'k');

The table master_table in this case is the primary table, whereas stage_table is the staging table. If a row in stage_table also exists in master_table, the existing row in master_table should be updated. If a row in stage_table does not exist in master_table, it should be inserted into master_table. Figure 10.6 demonstrates how one might do this.

Figure 10.6. Merging data from two tables using INSERT and UPDATE statements.

 CREATE PROCEDURE old_merge ()     LANGUAGE SQL     SPECIFIC old_merge                 -- applies to LUW and iSeries   --WLM ENVIRONMENT <env>              -- applies to zSeries om: BEGIN     DECLARE v_id INT;     DECLARE v_data VARCHAR(100);     DECLARE SQLSTATE CHAR(5) DEFAULT '00000';     DECLARE cur CURSOR FOR         SELECT id, data FROM stage_table;                       -- (1)     DECLARE CONTINUE HANDLER FOR SQLSTATE '23505'          -- Primary key violation. Perform update instead.          UPDATE master_table SET data = v_data WHERE id = v_id; -- (2)     OPEN cur;     FETCH cur INTO v_id, v_data;     WHILE (SQLSTATE = '00000') DO         INSERT INTO master_table VALUES (v_id, v_data);         -- (3)         FETCH cur INTO v_id, v_data;     END WHILE; END om

The procedure works by creating a cursor on Line (1) for table stage_table and attempts insert each row into master_table on Line (3). Should the insert fail due to a primary key violation, the row must already exist in master_table. The CONTINUE handler on Line (2) then performs an UPDATE instead using the primary key value of the row to be merged.

The procedure can be dramatically simplified and execute with greater efficiency using a MERGE statement. Compare Figure 10.6 with Figure 10.7.

Figure 10.7. Merging data from two tables using the MERGE statement on LUW.

 CREATE PROCEDURE new_merge()     LANGUAGE SQL     SPECIFIC new_merge nm: BEGIN     -- the MERGE statement is currently support on LUW only     MERGE INTO master_table as M     USING (SELECT id, data FROM stage_table) AS S        ON M.id=S.id     WHEN NOT MATCHED THEN         INSERT VALUES (S.id, S.data)     WHEN MATCHED THEN         UPDATE SET M.data=S.data; END nm

Now that you have seen a basic example, additional detail about the MERGE statement can be introduced. The MERGE statement can be broken down as follows:

 MERGE INTO <target> USING <source> ON <match-condition>  {WHEN [NOT] MATCHED [AND <predicate>]   THEN [UPDATE SET ...|DELETE|INSERT VALUES ....|SIGNAL ...]}  [ELSE IGNORE]

The <target> is the primary table and the <source> is a SELECT statement defining the source of rows to be merged into the primary table. In Figure 10.7, the source was simply the set of all rows from table stage_table. The source, however, can be any valid SELECT statement and might even join multiple tables.

The <match condition> is a unique column value that defines the relationship between the source and target tables. In the previous example, this was the primary key column shared by both master_table and stage_table.

Once the source and target and their relationship has been defined, you can decide what to do when there is or is not a match. You can even have multiple matched and not matched conditions.

MERGE has a number of intricate details that are best served by the DB2 SQL Reference. Refer to the SQL Reference for more examples or if you need additional information.