< Day Day Up > |
The guidelines in this section aid you in coding more efficient DB2 application programs by
When you're designing a DB2 program, you can easily get caught up in programming for efficiency, thereby compromising the effectiveness of the program. Efficiency can be defined as "doing things right," whereas effectiveness can be defined as "doing the right thing." Design embedded SQL programs to be as efficient as possible (following the guidelines in this book) without compromising the effectiveness of the program. Gauge program efficiency by the following criteria:
For a thorough discussion of DB2 performance monitoring and tuning, consult Parts IV and V. Gauge program effectiveness by the following criteria:
Avoid "Black Boxes"Often, DB2 professionals are confronted with the "black box" approach to database access. The basic idea behind a "black box" is that instead of having programs issue direct requests to the database, they will make requests of a "black box" data engine. This "black box" program is designed to accept parameter-driven requests and then issue common SQL statements to return results to the program. So, a black box is a database access program that sits in between your application programs and DB2. It is designed so that all application programs call the black box for data instead of containing their own embedded SQL statements. The general idea behind such a contraption is that it will simplify DB2 development, because programmers will not need to know how to write SQL. Instead, the programmer just calls the black box program to request whatever data is required. SQL statements become calls ”and every programmer knows how to code a call, right? Basically, proponents of the "black box" solution believe that access to data by calling a program with parameters is easier than learning SQL. But the "black box" approach is complete rubbish and should be avoided at all costs. The proper way to formulate requests for DB2 data is by coding well-designed, efficient SQL statements. A "black box" will never be able to completely mimic the functionality of SQL. Furthermore, the "black box" approach is sure to cause performance problems because it will have been coded for multiple users and will forgo the efficient SQL design techniques discussed in Chapter 2, "Data Manipulation Guidelines." For example, what if the data access requirements of the programs calling the "black box" call for the following:
In this case, the "black box" is almost sure to be designed with a single SQL SELECT that returns all of the columns from TABLE1 . But, based on which program calls the "black box," only the required rows would be returned. We know this is bad SQL design because we should always return the absolute minimum number of columns and rows per SQL SELECT statement to optimize performance. But even if four different SELECT statements were used by the "black box," if requirements change, so must the "black box." The additional maintenance required for the "black box" program adds unneeded administrative overhead. Furthermore, the "black box" program is a single-point-of-failure for any application that uses it. Perhaps more importantly, using a "black box" always consumes more CPU than simply using embedded SQL statements in your program. The black box application requires additional lines of code to be executed than would be without the black box. It is elementary when you think about it. The CALL statement in the calling program is extra, and the code surrounding the statements in the black box that ties them together is extra. None of this is required if you just plug your SQL statements right into your application programs. This extra code must be compiled and executed. When extra code is required (no matter how little or efficient it may be) extra CPU will be expended to run the application. More code means more work, and that means degraded performance. All in all, "black boxes" provide no benefit but at significant cost. Application programs should be designed using SQL to access DB2 data. No "black boxes" should be allowed. Code Modular DB2 ProgramsYou should design DB2 programs to be modular. One program should accomplish a single, well-defined task. If you need to execute multiple tasks, structure the programs so that tasks can be strung together by having the programs call one another. This approach is preferable to a single, large program that accomplishes many tasks for two reasons. First, single tasks in separate programs make the programs easier to understand and maintain. Second, if each task can be executed either alone or with other tasks, isolating the tasks in a program enables easier execution of any single task or list of tasks . Minimize the Size of DB2 ProgramsCode DB2 programs to be as small as possible. Streamlining your application code to remove unnecessary statements results in better performance. This recommendation goes hand-in-hand with the preceding one. Consider Stored Procedures for ReuseabilityWhen you're modularizing a DB2 application, do so with an eye toward reuseability. Whenever a particular task must be performed across many programs, applications, or systems, consider developing a stored procedure. A stored procedure, after it is created, can be called from multiple applications. However, when you modify the code, you need to modify only the stored procedure code, not each individual program. For more information on stored procedures, refer to Chapter 15. Consider User-Defined Functions for ReuseabilityIf your organization relies on business rules that transform data, consider implementing user-defined functions. Data transformation tasks that are performed by many programs, applications, or systems, can benefit from the reuseability aspects of user-defined functions. Consider developing user-defined functions for the business rule and then using it in subsequent SQL statements. This reuse is preferable to coding the business rule into multiple applications because
For more information on user-defined functions, refer to Chapter 4, "Using DB2 User-Defined Functions and Data Types." Be Aware of Active Database ConstructsYou can create active DB2 databases using features such as referential integrity and triggers. An active database takes action based on changes to the state of the data stored in it. For example, if a row is deleted, subsequent activity automatically occurs (such as enforcing a DELETE CASCADE referential constraint or an INSERT trigger firing that causes other data to be modified). You need to be aware of the active database features that have been implemented to appropriately code DB2 application programs. This awareness is required because you need to know the processes that the database itself will automatically perform so your application programs do not repeat the process. Use Unqualified SQLUse unqualified table, view, synonym, and alias names in application programs. This way, you can ease the process of moving programs, plans, and packages from the test environment to the production environment. If tables are explicitly qualified in an application program, and tables are qualified differently in test DB2 than they are in production DB2, programs must be modified before they are turned over to an operational production application. When the program is bound, the tables are qualified by one of the following:
Avoid SELECT *Never use SELECT * in an embedded SQL program. Request each column that needs to be accessed. Also, follow the SQL coding recommendations in Chapter 2. Filter Data Using the SQL WHERE ClauseFavor the specification of DB2 predicates to filter rows from a desired results table instead of the selection of all rows and the use of program logic to filter those not needed. For example, coding the embedded SELECT SELECT EMPNO, LASTNAME, SALARY FROM EMP WHERE SALARY > 10000 is preferred to coding the same SELECT statement without the WHERE clause and following the SELECT statement with an IF statement: IF SALARY < 10000 NEXT SENTENCE ELSE Process data . The WHERE clause usually outperforms the host language IF statement because I/O is reduced. Use SQL to Join TablesTo join tables, favor SQL over application logic, except when the data retrieved by the join must be updated. In this situation, consider coding multiple cursors to mimic the join process. Base the predicates of one cursor on the data retrieved from a fetch to the previous cursor. Listing 11.4 presents pseudo-code for retrieving data from a cursor declared with an SQL join statement. Listing 11.4. Pseudo-code for Retrieving Data from an SQL JoinEXEC SQL DECLARE JOINCSR CURSOR FOR SELECT D.DEPTNO, D.DEPTNAME, E.EMPNO, E.SALARY FROM DEPT D, EMP E WHERE D.DEPTNO = E.WORKDEPT END-EXEC. EXEC SQL OPEN JOINCSR END-EXEC. Loop until no more rows returned or error EXEC SQL FETCH JOINCSR INTO :DEPTNO, :DEPTNAME, :EMPNO, :SALARY END-EXEC Process retrieved data end of loop The criteria for joining tables are in the predicates of the SQL statement. Compare this method to the application join example in Listing 11.5. The pseudo-code in this listing employs two cursors, each accessing a different table, to join the EMP table with the DEPT table using application logic. Listing 11.5. Pseudo-code for Retrieving Data from an Application JoinEXEC SQL DECLARE DEPTCSR CURSOR FOR SELECT DEPTNO, DEPTNAME FROM DEPT END-EXEC. EXEC SQL DECLARE EMPCSR CURSOR FOR SELECT EMPNO, SALARY FROM EMP WHERE WORKDEPT = :HV-WORKDEPT END-EXEC. EXEC SQL OPEN DEPTCSR END-EXEC. Loop until no more department rows or error EXEC SQL FETCH DEPTCSR INTO :DEPTNO, :DEPTNAME END-EXEC. MOVE DEPTNO TO HV-WORKDEPT. EXEC SQL OPEN EMPCSR END-EXEC. Loop until no more employee rows or error EXEC SQL FETCH EMPCSR INTO :EMPNO, :SALARY END-EXEC. Process retrieved data end of loop end of loop Joining tables by application logic requires additional code and is usually less efficient than an SQL join. When data will be updated in a cursor-controlled fashion, favor application joining over SQL joining because the results of an SQL join are not always updated directly. When you're updating the result rows of an application join, remember to code FOR UPDATE OF on each cursor, specifying every column that can be updated. When you're only reading the data without subsequent modification, remember to code FOR READ ONLY (or FOR FETCH ONLY ) on the cursor. Avoid Host StructuresAvoid selecting or fetching INTO a group -level host variable structure. Your program is more independent of table changes if you select or fetch into individual data elements. For example, code EXEC SQL FETCH C1 INTO :DEPTNO, :DEPTNAME:DEPTNAME-IND, :MGRNO:MGRNO-IND, :ADMDEPT:ADMRDEPT-IND END-EXEC. instead of EXEC SQL FETCH C1 INTO :DCLDEPT:DEPT-IND END-EXEC. Although the second example appears easier to code, the first example is preferred. Using individual host variables instead of host structures makes programs easier to understand, easier to debug, and easier to maintain. Use ORDER BY to Ensure SequencingAlways use ORDER BY when your program must ensure the sequencing of returned rows. Otherwise, the rows are returned to your program in an unpredictable sequence. The only way to guarantee a specific sequence for your results set is to use the ORDER BY clause on your SELECT statement. Use FOR READ ONLY for Read-Only AccessCode all read-only SELECT cursors with the FOR READ ONLY (or FOR FETCH ONLY) cursor clause. Doing so tells DB2 that you will not be modifying data using the cursor. This makes the cursor non-ambiguous and provides DB2 with better options to optimize query performance. Explicitly Code LiteralsWhen possible, code literals explicitly in the SQL statement rather than move the literals to host variables and then process the SQL statement using the host variables. This technique gives the DB2 optimization process the best opportunity for arriving at an optimal access path . Although DB2 offers an option to reoptimize SQL statements on the fly (that is, REOPT(VARS) ), explicit literal coding still should be considered when feasible . It should not, however, be a forced standard. As a general rule of thumb, if the value will not change for more than a year, consider coding it as a literal instead of using a host variable. Use Temporary Tables to Simulate a Host Variable ListSometimes the need arises to check a column for equality against a list of values. This can be difficult to do efficiently without using temporary tables. For example, suppose you have a list of twelve employee numbers for which you want names. You could code a loop that feeds the twelve values, one-by-one, into a host variable, say HVEMPNO , and execute the following SQL SELECT EMPNO, LASTNAME, FIRSTNME FROM EMP WHERE EMPNO = :HVEMPNO; Of course, this requires twelve executions of the SQL statement. Wouldn't it be easier if you could supply the twelve values in a single SQL statement, as shown in the following? SELECT EMPNO, LASTNAME, FIRSTNME FROM EMP WHERE EMPNO IN (:HV1, :HV2, :HV3, :HV4, :HV5, :HV6, :HV7, :HV8, :HV9, :HV10, :HV11, :HV12); Well, that SQL is valid, but it requires twelve host variables. What if the number of values is not constant? If fewer than twelve values are supplied, you can put a non-existent value (low values, for example) in the remaining host variables and still be able to execute the SQL. But if more than twelve values are supplied, the statement has to be run multiple times ”exactly the situation we were trying to avoid. Instead, declare and use a temporary table. For example EXEC SQL DECLARE userid.GTT (COL_LIST INTEGER NOT NULL); Insert all the values for the list into the COL_LIST column of the temporary table ( GTT ) and issue the following SQL: SELECT EMPNO, LASTNAME, FIRSTNME FROM EMP WHERE EMPNO IN (SELECT COL_LIST FROM GTT); Of course, each of the previous SELECT statements should be embedded in a cursor because multiple rows can be retrieved.
Joining Non-Relational Data Using SQLConsider using temporary tables when you need to join non-relational data to DB2 data. Recall from Chapter 5 that there are two types of temporary tables supported by DB2: declared and created. Temporary tables are quite useful for storing non-relational data in a relational, or tabular, format. For example, consider an application that needs to join employee information stored in an IMS database to employee information in the a DB2 table, such as the EMP table. One approach, of course, would be to retrieve the required data from the IMS database and join it using program logic to the DB2 data. However, you could also create a temporary table and INSERT the IMS data as it is retrieved into the temporary table. After the temporary table is populated , it can be joined to the EMP table using a standard SQL join. This technique is not limited to IMS data. Any non-relational data source can be read and inserted into a temporary table, which can then be accessed using SQL for the duration of the unit of work. Avoid Cursors If PossibleWhenever doing so is practical, avoid the use of a cursor. Cursors add overhead to an application program. You can avoid cursors, however, only when the program retrieves a single row from an application table or tables. Code Cursors to Retrieve Multiple RowsIf you do not check for -811 SQLCODE s, always code a cursor for each SELECT statement that does not access tables either by the primary key or by columns specified in a unique index.
When writing a program that needs to move forward and backward through the SQL results set, use a scrollable cursor. In terms of both ease of development and performance, scrollable cursors are preferable to coding a scrolling capability using host language code. Specify Isolation Level by SQL StatementIndividual SQL statements can specify a different, appropriate isolation level. Although each DB2 plan and package has an isolation level, you can override it for individual SQL statements by using the WITH clause. You can specify the WITH clause for the following types of SQL statements:
Valid options are as follow:
The KEEP UPDATE LOCKS clause was added as of DB2 V5. It indicates that DB2 is to acquire X locks instead of U or S locks on all qualifying rows or pages. Use KEEP UPDATE LOCKS sparingly. Although it can better serialize updates, it can reduce concurrency. In Chapter 13, "Program Preparation," you can find additional guidance for each of the isolation levels. Use the Sample Programs for InspirationIBM provides source code in several host languages for various sample application programs. This source code is in a PDS library named SYS1.DB2V8R1.DSNSAMP (or something similar) supplied with the DB2 system software. Samples of COBOL, PL/I, FORTRAN, Assembler, and C programs for TSO, CICS, and IMS are available in the aforementioned library. Favor Complex SQL for PerformanceWhen embedding SQL in application programs, developers are sometimes tempted to break up complex SQL statements into smaller, easier-to-understand SQL statements and combine them together using program logic. Avoid this approach. When SQL is properly coded, DB2 is almost always more efficient than equivalent application code when it comes to accessing and updating DB2 data. |
< Day Day Up > |