Batch Programming Guidelines

< Day Day Up >

When coding batch DB2 programs the following tips and tricks can be used to create effective and useful applications.

Favor Clustered Access

Whenever sequential access to table data is needed, process the table rows in clustered sequence. This reduces I/O cost because pages need not be re-read if the data is processed by the clustering sequence.

Increase Parallel Processing

The architecture of IBM mainframes is such that multiple engines are available for processing. A batch program executing in a single, standard batch job can be processed by only a single engine. To maximize performance of CPU-bound programs, increase the parallelism of the program in one of two ways:

Program Cloning ” Clone the program and stratify the data access. Stratifying data access refers to dividing data access into logical subsets that can be processed independently.
Query Parallelism ” Utilize partitioned table spaces and bind the application program specifying DEGREE(ANY) to indicate that DB2 should try to use query I/O, CPU, and Sysplex parallelism.

Using the first method, you, as the application developer, must physically create multiple clone programs. Each program clone must be functionally identical but will process a different subset of the data. For example, you could split a program that reads DSN8810.EMP to process employees into a series of programs that perform the same function, but each processes only a single department. The data can be stratified based on any consistent grouping of data that is comprehensive (all data to be processed is included) and non-overlapping (data in one subset does not occur in a different subset). For example, you can accomplish data stratification based on the following:

Unique key ranges
Table space partitions
Functional groupings (for example, departments or companies)

Ensure that the data is stratified both programmatically and in the structure of the database. For example, if you're stratifying using partitioned table spaces, ensure that each job operates only on data from a single partition. If data from multiple partitions can be accessed in concurrent jobs, timeout and deadlock problems might occur. Refer to Chapter 5 for DDL recommendations for increasing concurrency.

Also note that concurrency problems can still occur. When data from one subset physically coexists with data from another subset, lockout and timeout can take place. DB2 locks at the page level ( usually ). If data is stratified at any level other than the table space partition level, data from one subset can coexist on the same table space page as data from another subset.

Using the second method, DB2's inherent query parallelism feature, you can develop a single program. DB2 determines whether parallelism is of benefit. If you specify DEGREE(ANY) , DB2 formulates the appropriate degree of parallelism for each query in the program. The primary benefits accrued from allowing DB2 to specify parallelism are as follow:

The avoidance of code duplication. Only one program is required. DB2 itself handles the parallel query execution.
The ability of DB2 to determine the appropriate number of parallel engines per query (not per program).
The ability of DB2 to change the degree of parallelism on the fly. If the resources are not available to process parallel queries, DB2 can automatically " turn off" parallelism at runtime.
The ability of DB2 to enable Sysplex parallelism. With data sharing, when capacity requirements increase you can add extra engines. The cost to add additional engines is minimal and DB2 will automatically take advantage of additional engines.
Finally, if the nature of the data changes such that a change to the degree of parallelism is warranted, all that is required is a new bind. DB2 automatically formulates the degree of parallelism at bind time.

However, potential problems arise when you're using query parallelism instead of program cloning:

DB2 controls the number of parallel engines. The developer can exert no control. When program cloning is used, the number of parallel jobs is fixed and unchanging.
One program can contain multiple queries, each with a different degree. Although this can be considered a benefit, it can also be confusing to novice programmers.
DB2 I/O, CPU, and Sysplex parallelism are for read-only SQL. Updates, inserts , and deletes cannot be performed in parallel yet.
DB2 can "turn off" parallelism at runtime. Once again, though, this can be considered a benefit because DB2 is smart enough to disengage parallelism because of an overexerted system.

Both methods of achieving parallelism are viable for DB2 V6 and later releases (for CPU parallelism, V4 is required; for Sysplex parallelism V5 is required). Whenever possible, favor DB2 parallelism over program cloning, because it is simpler to implement and modify.

Use `LOCK TABLE` with Caution

As a general rule, use the LOCK TABLE command with caution. Discuss the implications of this command with your DBA staff before deciding to use it.

Issuing a LOCK TABLE statement locks all tables in the table space containing the table specified. It holds all locks until COMMIT or DEALLOCATION . This statement reduces concurrent access to all tables in the table space affected by the command.

The preceding rule notwithstanding, LOCK TABLE can significantly improve an application program's processing time. If a significant number of page locks are taken during program execution, the addition of LOCK TABLE eliminates page locks, replacing them with table (or table space) locks. It thereby enhances performance by eliminating the overhead associated with page locks.

Balance the issuing of the LOCK TABLE command with the need for concurrent data access, the locking strategies in the DDL of the table spaces, and the plans being run.

NOTE

You can use LOCK TABLE to explicitly limit concurrent access. For example, issuing a LOCK TABLE statement in a batch program can prevent online transactions from entering data before the batch cycle has completed.

Parameterize Lock Strategies

If a batch window exists wherein concurrent access is not required, but a high degree of concurrency is required after the batch window, consider coding batch programs with dynamic lock-switching capabilities. For example, if the batch window extends from 2:00 a.m. to 6:00 a.m., and a batch DB2 update program must run during that time, make the locking parameter-driven or system-clock-driven.

The program can read the system clock and determine whether it can complete before online processing begins at 6:00 a.m. This decision should be based on the average elapsed time required for the program to execute. If possible, the program should issue the LOCK TABLE statement. If this is not possible, the program should use the normal locking strategy as assigned by the table space DDL. A flexible locking strategy increases performance and reduces the program's impact on the online world.

An alternative method is to let the program accept a parameter to control locking granularity. For example, the value TABLE or NORMAL can be passed as a parameter. If TABLE is specified as a parameter, the program issues LOCK TABLE statements. Otherwise, normal locking ensues. If NORMAL is specified, normal locking requires manual intervention and is not as easily implemented as the system time method.

Periodically `COMMIT` Work in Batch Modification Programs

Favor issuing COMMIT s in all DB2 programs where data is modified ( INSERT , UPDATE , and DELETE ). A COMMIT externalizes all modifications that occurred in the program since the beginning of the program or the last COMMIT . Failing to code COMMIT s in a DB2 data modification program is sometimes referred to as "Bachelor Programming Syndrome" ”that is, fear of committing.

NOTE

COMMIT does not flush data from the DB2 buffer pool and physically apply the data to the table. It does, however, ensure that all modifications have been physically applied to the DB2 log, thereby ensuring data integrity and recoverability.

One important factor affecting the need for a COMMIT strategy is the amount of elapsed time required for the program to complete. The greater the amount of time needed, the more you should consider using COMMIT s (to reduce rollback time and reprocessing time in the event of program failure). You can safely assume, however, that the elapsed time increases as the number of modifications increases.

Issuing COMMIT s in an application program is important for three reasons. First, if the program fails, all the modifications are backed out to the last COMMIT point. This process could take twice the time it took to perform the modifications in the first place ” especially if you are near the end of a program with no COMMIT s that performed hundreds or thousands of modification operations.

Second, if you resubmit a failing program that issues no COMMIT s, the program redoes work unnecessarily.

Third, programs bound using the repeatable read page locking strategy or the RELEASE(COMMIT) table space locking strategy hold their respective page and table space locks until a COMMIT is issued. If no COMMIT s are issued during the program, locks are not released until the auto- COMMIT when the program completes, thereby negatively affecting concurrent access. This can cause lock timeouts and lock escalation.

Given these considerations for COMMIT processing, the following situations should compel you to code COMMIT logic in your batch programs:

The modification program must run in a small batch processing window
Concurrent batch or online access must occur during the time the batch program is running

NOTE

If the concurrent batch or online access uses ISOLATION(UR) , COMMIT processing is irrelevant. However, most processing requires accurate data and as such does not use ISOLATION(UR) .

In some rare circumstances, you might be able to avoid issuing COMMIT s in your batch DB2 programs. When modification programs without COMMIT s fail, you can generally restart them from the beginning because database changes have not been committed. Additionally, COMMIT s require resources. By reducing or eliminating COMMIT s, you might conceivably be able to enhance performance (albeit at the expense of concurrency due to additional locks being held for a greater duration).

Before you decide to avoid COMMIT processing, remember that all cataloged sequential files must be deleted, any updated VSAM files must be restored, and any IMS modifications must be backed out before restarting the failing program. If the outlined situations change, you might need to retrofit your batch programs with COMMIT processing ”a potentially painful process.

I recommend that you plan to issue COMMIT s in every batch program. You can structure the logic so that the COMMIT processing is contingent on a parameter passed to the program. This approach enables an analyst to turn off COMMIT processing but ensures that all batch programs are prepared if COMMIT processing is required in the future.

Use Elapsed Time to Schedule `COMMIT` s

Base the frequency of COMMIT s on the information in Table 11.8 or on the elapsed time since the last COMMIT . Doing so provides a more consistent COMMIT frequency. If you insist on basing COMMIT processing on the number of rows processed instead of the elapsed time, consider estimating the elapsed time required to process a given number of rows and then correlate this time to Table 11.8 to determine the optimal COMMIT frequency.

Table 11.8. Recommendations for `COMMIT` Frequency

Application Requirement	`COMMIT` Recommendations
No concurrent access required and unlimited time for reprocessing in the event of an abend	Code program for `COMMIT` s, but consider processing without `COMMIT` s (using a parameter)
No concurrency required but limited reprocessing time available	`COMMIT` in batch approximately every 15 minutes
Limited batch concurrency required; no concurrent online activity	`COMMIT` in batch every 1 to 5 minutes (more frequently to increase concurrency)
Online concurrency required	`COMMIT` in batch every 5 to 15 seconds

Modify As Close As Possible to the `COMMIT`

You can reduce the amount of time that locks are held and thereby minimize lock timeouts by issuing INSERT , UPDATE , and DELETE statements as close as possible to your COMMIT statements. By limiting the lines of code that must be executed in between your data modification statements and your COMMIT statements, you can optimize DB2 locking. The quicker modified data is committed to the database (or rolled back), the shorter the amount of time that data will need to be locked out from other users.

Of course, programmers might balk at implementing this suggestion. Moving data modification code might not fall in line with the aesthetics of programming. However, from a data integrity perspective, all that matters is that the appropriate INSERT , UPDATE , and DELETE statements fall within the appropriate units of work. So, keep in mind that moving your data modification statements closer to the COMMIT statement can improve application performance.

Sequence Modification Statements Within the Unit of Work

You can minimize the number of deadlocks that occur by coding all data modification statements (issuing INSERT , UPDATE , and DELETE statements) within each unit of work in a prespecified sequence. For example, code modifications in alphabetical order by table name . Doing so reduces the chance of deadlocks, because the order of the modification statements will be the same from program to program.

Choose Meaningful Units of Work

A unit of work is a portion of processing that achieves data integrity, is logically complete, and creates a point of recovery. Units of work are defined by the scope of the COMMIT s issued by your program. (All data modification that occurs between COMMIT s is considered to be in a unit of work.) Use care in choosing units of work for programs that issue INSERT , UPDATE , or DELETE statements.

Choosing a unit of work that provides data integrity is of paramount importance for programs that issue COMMIT s. For example, consider an application program that modifies the project start and end dates in tables DSN8810.PROJACT and DSN8810.EMPPROJACT . The start and end DSN8810.PROJACT columns are

`ACSTDATE`	Estimated start date for the activity recorded in this row of the project activity table
`ACENDATE`	Estimated end date for the activity recorded in this row of the project activity table

The columns for DSN8810.EMPPROJACT are

`EMSTDATE`	Estimated start date when the employee will begin work on the activity recorded in this row of the employee project
`EMENDATE`	Estimated end date when the employee will have completed the activity recorded in this row of the employee project activity

The start and end dates in these two tables are logically related . A given activity for a project begins on a specified date and ends on a specified date. A given employee is assigned to work on each activity and is assigned also a start date and an end date for the activity.

Many employees can work on a single activity, but each employee can start and end his or her involvement with that activity at different times. The only stipulation is that the employees must begin and end their involvement within the start and end dates for that activity. Therein lies the relationship that ties these four columns together.

The unit of work for the program should be composed of the modifications to both tables. In other words, the program should not commit the changes to one table without committing the changes to the other table at the same time. If it does commit the changes to one but not the other, the implicit relationship between the dates in the two tables can be destroyed .

Consider the following situation. A project has a start date of 2002-12-01 and an end date of 2004-03-31. This information is recorded in the DSN8810.PROJACT table. Employees are assigned to work on activities in this project with start and end dates in the stated range. These dates are recorded in the DSN8810.EMPPROJACT table.

Later, you must modify the end date of the project to 2004-01-31. This new end date is earlier than the previous end date. Consider the status of the data if the program updates the end date in the DSN8810.PROJACT table, commits the changes, and then abends. The data in the DSN8810.EMPPROJACT table has not been updated, so the end dates are not synchronized. An employee can still be assigned an activity with the old end date. For this reason, you should be sure to group related updates in the same unit of work.

Consider Using External `SAVEPOINT` s

As of DB2 V7, you can set a SAVEPOINT within a transaction ”without committing your work. You can think of a SAVEPOINT as a sub-UOW (unit of work) "stability" point. After the SAVEPOINT is set, you can use application logic to undo any data modifications and database schema changes that were made since the application set the SAVEPOINT . Using a SAVEPOINT can be more efficient, because you will not need to include contingency and what-if logic in your application code.

Remember, issuing a SAVEPOINT does not COMMIT work to DB2. It is simply a mechanism for registering milestones within a transaction or program. Let's learn by example. Consider the following pseudo-code:

 SAVEPOINT POINTX ON ROLLBACK RETAIN CURSORS; ...  Subsequent processing  ... ROLLBACK TO SAVEPOINT POINTX;

The ROLLBACK will cause any data or schema changes made in the "subsequent processing" to be undone.

You can code multiple SAVEPOINT s within a UOW, and you can ROLLBACK to any SAVEPOINT (as long as you do not reuse the SAVEPOINT name). The UNIQUE keyword can be specified to ensure that the SAVEPOINT name is not reused within the unit of recovery.

There are two clauses that can be specified to further define the nature of the SAVEPOINT when a ROLLBACK is issued:

`RETAIN CURSORS`	Any cursors opened after the `SAVEPOINT` is set are not tracked and will not be closed when rolling back to that `SAVEPOINT` .
`RETAIN LOCKS`	Any locks acquired after the `SAVEPOINT` is set are not tracked and will not be released when rolling back to the `SAVEPOINT` .

Even if RETAIN CURSORS is specified, some of the cursors might not be useable. For example, if the ROLLBACK removes a row (that is, rolls back an INSERT ) upon which the cursor was positioned, DB2 will raise an error.

NOTE

SAVEPOINT s can be very useful for stored procedures that need to maintain sub-UOW stability points.

Make Programs Restartable

In time-critical applications, DB2 batch programs that modify table data should be restartable if a system error occurs. To make a batch program restartable, you first create a DB2 table to control the checkpoint and restart processing for all DB2 update programs. A checkpoint is data written by an application program during its execution that identifies the status and extent of processing. This checkpoint is usually accomplished by storing the primary key of the table row being processed. The program must update the primary key as it processes before each COMMIT point. During restart processing, the primary key information is read, enabling the program to continue from where it left off.

The following DDL illustrates a DB2 table (and an associated index) that can be used to support checkpoint and restart processing:

 CREATE TABLE CHKPT_RSTRT     (PROGRAM_NAME       CHAR(8)       NOT NULL,      ITERATION          CHAR(4)       NOT NULL,      COMMIT_FREQUENCY   SMALLINT      NOT NULL,      NO_OF_COMMITS      SMALLINT      NOT NULL WITH DEFAULT,      CHECKPOINT_TIME    TIMESTAMP     NOT NULL WITH DEFAULT,      CHECKPOINT_AREA    CHAR(254)     NOT NULL WITH DEFAULT.      PRIMARY KEY (PROGRAM_NAME, ITERATION)     ) IN DATABASE.TBSPACE ; CREATE UNIQUE INDEX XCHKPRST     (PROGRAM_NAME, ITERATION)      CLUSTER  other parameters  ;

When a batch program is restarted after an abend, it can continue where it left off if it follows certain steps. This is true because a checkpoint row was written indicating the last committed update, the time that the employee was processed, and the key of the processed employee table ( ACTNO ).

The following steps show you the coding necessary to make a program restartable:

Declare two cursors to SELECT rows to be updated in the PROJACT table. Code an ORDER BY for the columns of the unique index ( PROJNO , ACTNO , and ACSTDATE ). The first cursor should select the rows you want. It is used the first time the request is processed. For example,
```
 
```
```
 EXEC SQL DECLARE CSR1     SELECT   PROJNO, ACTNO, ACSTDATE,              ACSTAFF, ACENDATE     FROM     PROJACT     ORDER BY PROJNO, ACTNO, ACSTDATE END-EXEC. 
```
This statement reflects the needs of your application. The second cursor is for use after issuing COMMIT s and for restart processing. It must reposition the cursor at the row following the last row processed. You can reposition the cursor by using WHERE clauses that reflect the ORDER BY on the primary key (or the unique column combination). For example,
```
 
```
```
 EXEC SQL DECLARE CSR2     SELECT   PROJNO, ACTNO, ACSTDATE,              ACSTAFF, ACENDATE     FROM     PROJACT     WHERE    ((PROJNO = :CHKPT-PROJNO     AND        ACTNO = :CHKPT-ACTNO     AND        ACSTDATE > :CHKPT-ACSTDATE)     OR        (PROJNO = :CHKPT-PROJNO     AND        ACTNO > :CHKPT-ACTNO)     OR        (PROJNO > :CHKPT-PROJNO))     AND      PROJNO >= :CHKPT-PROJNO     ORDER BY PROJNO, ACTNO, ACSTDATE END-EXEC. 
```
This cursor begins processing at a point other than the beginning of the ORDER BY list. Although, technically you can use only the second cursor by coding low values for the host variables the first time through, doing so is not recommended. The first cursor usually provides better performance than the second, especially when the second cursor is artificially constrained by bogus host variable values. However, if you can determine (using EXPLAIN or other performance monitoring techniques) that the first cursor provides no appreciable performance gain over the second, use only one cursor.
SELECT the row from the CHKPT-RESTRT table for the program and iteration being processed. You can hard-code the program name into the program. Or, if the program can run parallel with itself, it should be able to accept as parameter-driven input an iteration token, used for identifying a particular batch run of the program.
If it is the first time through and CHECKPOINT_AREA contains data, the program is restarted. Move the appropriate values from the CHECKPOINT_AREA to the host variables used in the second cursor and OPEN it. If it is the first time through and the program is not restarted, OPEN the first PROJACT cursor.
FETCH a row from the opened cursor.
If the FETCH is successful, increment a WORKING-STORAGE variable that counts successful fetches.
Perform the UPDATE for the PROJACT row that was fetched .
If the fetch counter is equal to COMMIT_FREQUENCY , perform a commit paragraph. This paragraph should increment and update NO_OF_COMMITS and the CHECKPOINT_AREA column with the PROJNO , ACTNO , and ACSTDATE of the PROJACT row retrieved, and set CHECKPOINT_TIME to the current timestamp. It should then issue a COMMIT and reset the fetch counter to zero.
After a COMMIT , cursors are closed unless you specified the WITH HOLD option. If the WITH HOLD option is not used, the cursor must change after the first COMMIT is executed (unless only the second cursor shown previously is used). Remember, the first time through, the program can use the C1 cursor above; subsequently, it should always use C2 .
When update processing is complete, reset the values of the columns in the CHKPT_RSTRT table to their original default values.

If the CHKPT_RSTRT row for the program is reread after each COMMIT , you can modify the COMMIT_FREQUENCY column on the fly. If you determine that too few or too many checkpoints have been taken, based on the state of the data and the time elapsed and remaining, he or she can update the COMMIT_FREQUENCY (using QMF, SPUFI, or some other means) for that program only. Doing so dynamically changes the frequency at which the program COMMIT s.

Incurring the extra read usually causes little performance degradation because the page containing the row usually remains in the bufferpool because of its frequent access rate.

Following these nine steps enables you to restart your programs after a program failure. During processing, the CHKPT_RSTRT table is continually updated with current processing information. If the program abends, all updates ”including updates to the CHKPT_RSTRT table ”are rolled back to the last successful checkpoint. This way, the CHKPT_RSTRT table is synchronized with the updates made to the table. You can then restart the update program after you determine and correct the cause of the abend.

On restart, the CHKPT_RSTRT table is read, and the CHECKPOINT_AREA information is placed into a cursor that repositions the program to the data where the last update occurred.

Additional Notes on Restartability

If a restartable program uses the WITH HOLD option to prohibit cursor closing at COMMIT time, it can avoid the need to reposition the cursor constantly, thereby enabling more efficient processing. To be restartable, however, the program still requires a repositioning cursor so that it can bypass the work already completed.

When you specify the WITH HOLD option, the repositioning cursor is used only when the program is restarted, not during normal processing. Additional code and parameters are required to signal the program when to use the repositioning cursors.

Restartable programs using sequential input files can reposition the input files using one of two methods. The first way is to count the records read and place the counter in the CHKPT_RSTRT table. On restart, the table is read and multiple reads are issued (number of reads equals READ_COUNTER ). Alternatively, for input files sorted by the checkpoint key, the program can use the information in the CHECKPOINT_AREA to read to the appropriate record.

Restartable programs writing sequential output files must handle each output file separately. Most sequential output files can have their disposition modified to MOD in the JCL , allowing the restarted program to continue writing to them. For the following types of output files, however, you must delete or modify output file records before restarting:

Headers for report files with control break processing
Output files with different record types
Any output file requiring specialized application processing

NOTE

If you have a budget for ISV tools, one of the best investments is a checkpoint/restart tool. Such tools provide the ability to resume the processing of failed or interrupted batch applications from the most recent checkpoint rather than from the beginning of the job step. Many also enable you to control the checkpoint frequency outside the application. They save a lot of coding and are not typically very expensive. Examples of such tools include BMC Software's Application Restart Control and Softbase Systems' Database Rely.

Hold Cursors Rather Than Reposition

You also can use the concept of cursor repositioning for programs not coded to be restartable. If COMMIT s are coded in a program that updates data using cursors, you have two options for repositioning cursors. You can use the WITH HOLD option of the cursor, or you can code two cursors, an initial cursor and a repositioning cursor, as shown in the previous example.

The best solution is to code the WITH HOLD clause for each cursor that needs to be accessed after a COMMIT . WITH HOLD prohibits the closing of the cursor by the COMMIT statement and maintains the position of the cursor.

< Day Day Up >

Favor Clustered Access

Increase Parallel Processing

Use LOCK TABLE with Caution

Parameterize Lock Strategies

Periodically COMMIT Work in Batch Modification Programs

Use Elapsed Time to Schedule COMMIT s

Table 11.8. Recommendations for COMMIT Frequency

Modify As Close As Possible to the COMMIT

Sequence Modification Statements Within the Unit of Work

Choose Meaningful Units of Work

Consider Using External SAVEPOINT s

Make Programs Restartable

Additional Notes on Restartability

Hold Cursors Rather Than Reposition

Use `LOCK TABLE` with Caution

Periodically `COMMIT` Work in Batch Modification Programs

Use Elapsed Time to Schedule `COMMIT` s

Table 11.8. Recommendations for `COMMIT` Frequency

Modify As Close As Possible to the `COMMIT`

Consider Using External `SAVEPOINT` s