Application Development Guidelines

< Day Day Up >

The guidelines in this section aid you in coding more efficient DB2 application programs by

Coding efficient embedded SQL
Coding efficient host language constructs to process the embedded SQL
Reducing concurrency
Promoting the development of easily maintainable code

When you're designing a DB2 program, you can easily get caught up in programming for efficiency, thereby compromising the effectiveness of the program. Efficiency can be defined as "doing things right," whereas effectiveness can be defined as "doing the right thing."

Design embedded SQL programs to be as efficient as possible (following the guidelines in this book) without compromising the effectiveness of the program. Gauge program efficiency by the following criteria:

CPU time
Elapsed time
Number and type of I/Os
Lock wait time
Transaction throughput

For a thorough discussion of DB2 performance monitoring and tuning, consult Parts IV and V. Gauge program effectiveness by the following criteria:

User satisfaction
Expected results versus actual results
Integrity of the processed data
Capability to meet prearranged service-level requirements

Avoid "Black Boxes"

Often, DB2 professionals are confronted with the "black box" approach to database access. The basic idea behind a "black box" is that instead of having programs issue direct requests to the database, they will make requests of a "black box" data engine. This "black box" program is designed to accept parameter-driven requests and then issue common SQL statements to return results to the program.

So, a black box is a database access program that sits in between your application programs and DB2. It is designed so that all application programs call the black box for data instead of containing their own embedded SQL statements. The general idea behind such a contraption is that it will simplify DB2 development, because programmers will not need to know how to write SQL. Instead, the programmer just calls the black box program to request whatever data is required. SQL statements become calls ”and every programmer knows how to code a call, right?

Basically, proponents of the "black box" solution believe that access to data by calling a program with parameters is easier than learning SQL. But the "black box" approach is complete rubbish and should be avoided at all costs. The proper way to formulate requests for DB2 data is by coding well-designed, efficient SQL statements. A "black box" will never be able to completely mimic the functionality of SQL. Furthermore, the "black box" approach is sure to cause performance problems because it will have been coded for multiple users and will forgo the efficient SQL design techniques discussed in Chapter 2, "Data Manipulation Guidelines."

For example, what if the data access requirements of the programs calling the "black box" call for the following:

One program requires three columns from TABLE1
A second program requires two columns from TABLE1
Two additional programs require four columns from TABLE1
A fifth program requires all of the columns from TABLE1

In this case, the "black box" is almost sure to be designed with a single SQL SELECT that returns all of the columns from TABLE1 . But, based on which program calls the "black box," only the required rows would be returned. We know this is bad SQL design because we should always return the absolute minimum number of columns and rows per SQL SELECT statement to optimize performance. But even if four different SELECT statements were used by the "black box," if requirements change, so must the "black box." The additional maintenance required for the "black box" program adds unneeded administrative overhead. Furthermore, the "black box" program is a single-point-of-failure for any application that uses it.

Perhaps more importantly, using a "black box" always consumes more CPU than simply using embedded SQL statements in your program. The black box application requires additional lines of code to be executed than would be without the black box. It is elementary when you think about it. The CALL statement in the calling program is extra, and the code surrounding the statements in the black box that ties them together is extra. None of this is required if you just plug your SQL statements right into your application programs.

This extra code must be compiled and executed. When extra code is required (no matter how little or efficient it may be) extra CPU will be expended to run the application. More code means more work, and that means degraded performance.

All in all, "black boxes" provide no benefit but at significant cost. Application programs should be designed using SQL to access DB2 data. No "black boxes" should be allowed.

Code Modular DB2 Programs

You should design DB2 programs to be modular. One program should accomplish a single, well-defined task. If you need to execute multiple tasks, structure the programs so that tasks can be strung together by having the programs call one another. This approach is preferable to a single, large program that accomplishes many tasks for two reasons. First, single tasks in separate programs make the programs easier to understand and maintain. Second, if each task can be executed either alone or with other tasks, isolating the tasks in a program enables easier execution of any single task or list of tasks .

Minimize the Size of DB2 Programs

Code DB2 programs to be as small as possible. Streamlining your application code to remove unnecessary statements results in better performance. This recommendation goes hand-in-hand with the preceding one.

Consider Stored Procedures for Reuseability

When you're modularizing a DB2 application, do so with an eye toward reuseability. Whenever a particular task must be performed across many programs, applications, or systems, consider developing a stored procedure. A stored procedure, after it is created, can be called from multiple applications. However, when you modify the code, you need to modify only the stored procedure code, not each individual program.

For more information on stored procedures, refer to Chapter 15.

Consider User-Defined Functions for Reuseability

If your organization relies on business rules that transform data, consider implementing user-defined functions. Data transformation tasks that are performed by many programs, applications, or systems, can benefit from the reuseability aspects of user-defined functions. Consider developing user-defined functions for the business rule and then using it in subsequent SQL statements. This reuse is preferable to coding the business rule into multiple applications because

You can be sure the same code is being used in all programs.
You can optimize the performance of the UDF and impact multiple programs at once, instead of requiring massive logic changes in many programs.
When the rule changes, you need to modify the UDF once, not in each individual program.

For more information on user-defined functions, refer to Chapter 4, "Using DB2 User-Defined Functions and Data Types."

Be Aware of Active Database Constructs

You can create active DB2 databases using features such as referential integrity and triggers. An active database takes action based on changes to the state of the data stored in it. For example, if a row is deleted, subsequent activity automatically occurs (such as enforcing a DELETE CASCADE referential constraint or an INSERT trigger firing that causes other data to be modified).

You need to be aware of the active database features that have been implemented to appropriately code DB2 application programs. This awareness is required because you need to know the processes that the database itself will automatically perform so your application programs do not repeat the process.

Use Unqualified SQL

Use unqualified table, view, synonym, and alias names in application programs. This way, you can ease the process of moving programs, plans, and packages from the test environment to the production environment. If tables are explicitly qualified in an application program, and tables are qualified differently in test DB2 than they are in production DB2, programs must be modified before they are turned over to an operational production application.

When the program is bound, the tables are qualified by one of the following:

If neither the OWNER nor QUALIFIER parameter is specified, tables are qualified by the userid of the binding agent.
If only the OWNER is specified, tables are qualified by the token specified in the OWNER parameter.
If a QUALIFIER is specified, all tables are qualified by the token specified to that parameter.

Avoid `SELECT *`

Never use SELECT * in an embedded SQL program. Request each column that needs to be accessed. Also, follow the SQL coding recommendations in Chapter 2.

Filter Data Using the SQL `WHERE` Clause

Favor the specification of DB2 predicates to filter rows from a desired results table instead of the selection of all rows and the use of program logic to filter those not needed. For example, coding the embedded SELECT

 SELECT  EMPNO, LASTNAME, SALARY FROM    EMP WHERE   SALARY > 10000

is preferred to coding the same SELECT statement without the WHERE clause and following the SELECT statement with an IF statement:

 IF SALARY < 10000     NEXT SENTENCE ELSE  Process data  .

The WHERE clause usually outperforms the host language IF statement because I/O is reduced.

Use SQL to Join Tables

To join tables, favor SQL over application logic, except when the data retrieved by the join must be updated. In this situation, consider coding multiple cursors to mimic the join process. Base the predicates of one cursor on the data retrieved from a fetch to the previous cursor.

Listing 11.4 presents pseudo-code for retrieving data from a cursor declared with an SQL join statement.

Listing 11.4. Pseudo-code for Retrieving Data from an SQL Join

 EXEC SQL     DECLARE JOINCSR CURSOR FOR     SELECT  D.DEPTNO, D.DEPTNAME, E.EMPNO, E.SALARY     FROM    DEPT    D,             EMP     E     WHERE   D.DEPTNO = E.WORKDEPT END-EXEC. EXEC SQL     OPEN JOINCSR END-EXEC.  Loop until no more rows returned or error  EXEC SQL         FETCH JOINCSR         INTO :DEPTNO, :DEPTNAME, :EMPNO, :SALARY     END-EXEC     Process retrieved data  end of loop

The criteria for joining tables are in the predicates of the SQL statement. Compare this method to the application join example in Listing 11.5. The pseudo-code in this listing employs two cursors, each accessing a different table, to join the EMP table with the DEPT table using application logic.

Listing 11.5. Pseudo-code for Retrieving Data from an Application Join

 EXEC SQL     DECLARE DEPTCSR CURSOR FOR     SELECT  DEPTNO, DEPTNAME     FROM    DEPT END-EXEC. EXEC SQL     DECLARE EMPCSR CURSOR FOR     SELECT  EMPNO, SALARY     FROM    EMP     WHERE   WORKDEPT = :HV-WORKDEPT END-EXEC. EXEC SQL     OPEN DEPTCSR END-EXEC.  Loop until no more department rows or error  EXEC SQL         FETCH DEPTCSR         INTO :DEPTNO, :DEPTNAME     END-EXEC.     MOVE DEPTNO TO HV-WORKDEPT.     EXEC SQL         OPEN EMPCSR     END-EXEC.  Loop until no more employee rows or error  EXEC SQL             FETCH EMPCSR             INTO :EMPNO, :SALARY         END-EXEC.  Process retrieved data   end of loop   end of loop

Joining tables by application logic requires additional code and is usually less efficient than an SQL join. When data will be updated in a cursor-controlled fashion, favor application joining over SQL joining because the results of an SQL join are not always updated directly. When you're updating the result rows of an application join, remember to code FOR UPDATE OF on each cursor, specifying every column that can be updated. When you're only reading the data without subsequent modification, remember to code FOR READ ONLY (or FOR FETCH ONLY ) on the cursor.

Avoid Host Structures

Avoid selecting or fetching INTO a group -level host variable structure. Your program is more independent of table changes if you select or fetch into individual data elements. For example, code

 EXEC SQL     FETCH C1     INTO :DEPTNO,          :DEPTNAME:DEPTNAME-IND,          :MGRNO:MGRNO-IND,          :ADMDEPT:ADMRDEPT-IND END-EXEC.

instead of

 EXEC SQL     FETCH C1     INTO  :DCLDEPT:DEPT-IND END-EXEC.

Although the second example appears easier to code, the first example is preferred. Using individual host variables instead of host structures makes programs easier to understand, easier to debug, and easier to maintain.

Use `ORDER BY` to Ensure Sequencing

Always use ORDER BY when your program must ensure the sequencing of returned rows. Otherwise, the rows are returned to your program in an unpredictable sequence. The only way to guarantee a specific sequence for your results set is to use the ORDER BY clause on your SELECT statement.

Use `FOR READ ONLY` for Read-Only Access

Code all read-only SELECT cursors with the FOR READ ONLY (or FOR FETCH ONLY) cursor clause. Doing so tells DB2 that you will not be modifying data using the cursor. This makes the cursor non-ambiguous and provides DB2 with better options to optimize query performance.

Explicitly Code Literals

When possible, code literals explicitly in the SQL statement rather than move the literals to host variables and then process the SQL statement using the host variables. This technique gives the DB2 optimization process the best opportunity for arriving at an optimal access path .

Although DB2 offers an option to reoptimize SQL statements on the fly (that is, REOPT(VARS) ), explicit literal coding still should be considered when feasible . It should not, however, be a forced standard. As a general rule of thumb, if the value will not change for more than a year, consider coding it as a literal instead of using a host variable.

Use Temporary Tables to Simulate a Host Variable List

Sometimes the need arises to check a column for equality against a list of values. This can be difficult to do efficiently without using temporary tables. For example, suppose you have a list of twelve employee numbers for which you want names. You could code a loop that feeds the twelve values, one-by-one, into a host variable, say HVEMPNO , and execute the following SQL

 SELECT EMPNO, LASTNAME, FIRSTNME FROM   EMP WHERE  EMPNO = :HVEMPNO;

Of course, this requires twelve executions of the SQL statement. Wouldn't it be easier if you could supply the twelve values in a single SQL statement, as shown in the following?

 SELECT EMPNO, LASTNAME, FIRSTNME FROM   EMP WHERE  EMPNO IN (:HV1, :HV2, :HV3, :HV4, :HV5, :HV6,              :HV7, :HV8, :HV9, :HV10, :HV11, :HV12);

Well, that SQL is valid, but it requires twelve host variables. What if the number of values is not constant? If fewer than twelve values are supplied, you can put a non-existent value (low values, for example) in the remaining host variables and still be able to execute the SQL. But if more than twelve values are supplied, the statement has to be run multiple times ”exactly the situation we were trying to avoid.

Instead, declare and use a temporary table. For example

 EXEC SQL DECLARE userid.GTT   (COL_LIST  INTEGER NOT NULL);

Insert all the values for the list into the COL_LIST column of the temporary table ( GTT ) and issue the following SQL:

 SELECT EMPNO, LASTNAME, FIRSTNME FROM   EMP WHERE  EMPNO IN (SELECT COL_LIST FROM GTT);

Of course, each of the previous SELECT statements should be embedded in a cursor because multiple rows can be retrieved.

Remember, there are two types of temporary tables: declared and created. Refer to Chapter 5 in the section titled "Temporary Tables" for the pros and cons of both types of temporary tables before deciding which to use.

Joining Non-Relational Data Using SQL

Consider using temporary tables when you need to join non-relational data to DB2 data. Recall from Chapter 5 that there are two types of temporary tables supported by DB2: declared and created.

Temporary tables are quite useful for storing non-relational data in a relational, or tabular, format. For example, consider an application that needs to join employee information stored in an IMS database to employee information in the a DB2 table, such as the EMP table. One approach, of course, would be to retrieve the required data from the IMS database and join it using program logic to the DB2 data. However, you could also create a temporary table and INSERT the IMS data as it is retrieved into the temporary table. After the temporary table is populated , it can be joined to the EMP table using a standard SQL join.

This technique is not limited to IMS data. Any non-relational data source can be read and inserted into a temporary table, which can then be accessed using SQL for the duration of the unit of work.

Avoid Cursors If Possible

Whenever doing so is practical, avoid the use of a cursor. Cursors add overhead to an application program. You can avoid cursors, however, only when the program retrieves a single row from an application table or tables.

Code Cursors to Retrieve Multiple Rows

If you do not check for -811 SQLCODE s, always code a cursor for each SELECT statement that does not access tables either by the primary key or by columns specified in a unique index.

Use Scrollable Cursors to Move Within a Result Set

When writing a program that needs to move forward and backward through the SQL results set, use a scrollable cursor. In terms of both ease of development and performance, scrollable cursors are preferable to coding a scrolling capability using host language code.

Specify Isolation Level by SQL Statement

Individual SQL statements can specify a different, appropriate isolation level. Although each DB2 plan and package has an isolation level, you can override it for individual SQL statements by using the WITH clause. You can specify the WITH clause for the following types of SQL statements:

SELECT INTO
DECLARE CURSOR
INSERT
Searched DELETE
Searched UPDATE

Valid options are as follow:

RR and RR KEEP UPDATE LOCKS (Repeatable Read)
RS and RS KEEP UPDATE LOCKS (Read Stability)
CS (Cursor Stability)
UR (Uncommitted Read)

The KEEP UPDATE LOCKS clause was added as of DB2 V5. It indicates that DB2 is to acquire X locks instead of U or S locks on all qualifying rows or pages. Use KEEP UPDATE LOCKS sparingly. Although it can better serialize updates, it can reduce concurrency.

In Chapter 13, "Program Preparation," you can find additional guidance for each of the isolation levels.

Use the Sample Programs for Inspiration

IBM provides source code in several host languages for various sample application programs. This source code is in a PDS library named SYS1.DB2V8R1.DSNSAMP (or something similar) supplied with the DB2 system software. Samples of COBOL, PL/I, FORTRAN, Assembler, and C programs for TSO, CICS, and IMS are available in the aforementioned library.

Favor Complex SQL for Performance

When embedding SQL in application programs, developers are sometimes tempted to break up complex SQL statements into smaller, easier-to-understand SQL statements and combine them together using program logic. Avoid this approach. When SQL is properly coded, DB2 is almost always more efficient than equivalent application code when it comes to accessing and updating DB2 data.