Locking Issues and Problems


At times, too much locking can be a problem, resulting in suspensions, timeouts, or even deadlocks.

Timeouts and Deadlocks

While waiting for a lock, a process can exceed an allowable amount of wait time that has been established. A systemwide option determines the maximum time a process can wait before it gets a resource-unavailable error. This value is called the RESOURCE TIMEOUT value and is calculated as follows.

  1. Divide RESOURCE TIMEOUT by DEADLOCK TIME.

  2. Round the result up to the next larger integer.

  3. Multiply this integer by the DEADLOCK TIME.

  4. Multiply the result by the appropriate factor.

Table 16-7 shows the resource timeout factors.

Table 16-7. Timeout Factors

Component

Factor

Notes

Transaction and queries

x 1

 

BIND

x 3

 

IMS BMP

x 4

 

IMS fast path

x 6

Nonmessage processing

Utility

x Utility timeout

Defaults to 6


The result is the timeout value, which is always greater than or equal to the RESOURCE TIMEOUT. When the timeout value has been reached, the requesting task is informed of the unavailable resource. Several conditions can cause a timeout value to be exceeded by a process, and each returns a different message to a different location.

The deadlock time is determined by system parameter DEADLOCK TIME (default is 5 seconds). This is the time interval between two successive scans for a deadlock. For every deadlock interval, the IRLM will verify whether there are deadlock situations. In those cases, DB2 will inform the taskusually the one that made the smaller number of changesthat a deadlock has occurred. The other tasks will continue without any problems. The main causes of a timeout are that a program holding a lock has not freed it soon enough and that a deadlock has occurred between DB2 and a non-DB2 resource.

The main causes for deadlocks are the following:

  • Two hot pages

  • Lock escalation

  • One hot page, promotion from S to X lock, and FOR UPDATE OF missing on the cursor

When a timeout or a deadlock has occurred, the following error codes and messages are sent to the application program.

  • 911 = "THE CURRENT UNIT OF WORK HAS BEEN ROLLED BACK DUE TO DEADLOCK OR TIMEOUT, REASON [reason-code], TYPE OF RESOURCE [resource-type], AND RESOURCE NAME [resource-name]." The reason code indicates whether a timeout or a deadlock has occurred. The application is rolled back to the previous commit. On receipt of the 911 return code, the application should terminate or retry.

  • 913 = "UNSUCCESSFUL EXECUTION CAUSED BY DEADLOCK OR TIMEOUT, REASON [reason-code], TYPE OF RESOURCE [resource-type], AND RESOURCE NAME [resource-name]." The reason code indicates whether a timeout or a deadlock has occurred. The current unit of work has not been rolled back. The application should either commit or roll back. Then the application should terminate or retry.

  • The following messages are sent to the console: DSNT376I and DSNT500I.

A 911 SQL return code means that a deadlock or a timeout occurred and that DB2 has issued a successful ROLLBACK to the last commit point. If the ROLLBACK was not successful, the application will receive a 913 SQL return code signifying that a ROLLBACK was not performed. A912 SQL return code occurs when the maximum number of lock requests has been reached for the database because insufficient memory was allocated to the lock list. With a 912 or a 913 return code, the application needs to issue a COMMIT or ROLLBACK before proceeding with any other SQL.

When a 911 occurs, the choice of whether to use retry logic depends on the individual application. If a large amount of work has been rolled back and if other non-DB2 files are present, it may be difficult to reposition everything and retry the unit of work. With most 911 situations, a restart processvendor supplied or user writtenis generally easier than a programmatic reposition and retry. If a small amount of work was lost, a simple retry could be performed up to a fixed number of times. It is important not to retry with a breakpoint, as the source of the problem that caused the negative codes might still exist.

With the 913 SQL return code, the single statement could be retried; if successful, the program could simply continue on. However, if repeated attempts at retry logic fail, the application probably needs to be rolled back.

Lock Escalation and Promotion

Lock Escalation

Lock escalation can occur when the value for LOCKMAX is reached. At this time, DB2 will release all the locks held and take a more comprehensive lockrelease several page or row locks to escalate to a table lock. DB2 tries to balance the number of locks on objects, based on the amount of concurrent access. The LOCKMAX option will then further determine when/whether escalation occurs.

NOTE

Lock escalation can occur for objects defined with LOCKSIZE ANY, PAGE, or ROW. ANY simply allows DB2 the choice of lock to take initially.


The value of LOCKMAX is set on the CREATE TABLESPACE statement to define how many locks can be held simultaneously on an object. Locks will be escalated when this number is reached. You can specifically set this value to a number or leave it at the default of SYSTEM, and it will use the number set by NUMLKTS.

NOTE

If you specify LOCKSIZE ANY, the default for LOCKMAX is SYSTEM. If you specify LOCKSIZE ROW, PAGE, TABLE, or TABLESPACE, the default for LOCKMAX is 0 (no escalation).


You can turn off lock escalation by using the LOCKMAX = 0 parameter on the CREATE TABLESPACE statement. If you choose to turn off lock escalation, be sure that applications accessing those objects are committing frequently and that you have adjusted NUMLKTS to allow for more locks to be taken; otherwise, you run the risk of hitting negative SQL codes when you reach the maximum number of locks. (Without escalation, you could hold more, smaller locks.) Lock escalation exists to protect you from taking excessive system resources, so if you turn it off, you will need to control it. If you are CPU-bound, it may not be a good idea to turn off lock escalation, because it will take DB2 longer to traverse long chains of locks, and it will take more CPU for IRLM latch activity.

Details about lock-escalation activity can be found in the DB2 logs identified by the DSNI031I message. This message will contain the resource that experienced the escalation and the details about the new state of the lock. Figure 16-1 shows the output of the DSNI031I message.

Figure 16-1. Lock escalation


Lock Promotion

A lock mode that will be used for any table, table space, or table space partition for an SQL statement is reflected in the PLAN_TABLE populated by EXPLAIN in the column TSLOCKMODE if the isolation can be determined at bind time. This is the lock mode that the SQL statement would use for the table or table space lock if and only if it had not been raised by a preceding SQL statement. For example, an SQL SELECT statement might have a TSLOCKMODE of IS, yet during execution, it could have been promoted to IX, and that is what the SELECT statement would use. Such lock promotion is generally not a concern but simply a fact.



DB2 for z. OS Version 8 DBA Certification Guide
DB2 for z/OS Version 8 DBA Certification Guide
ISBN: 0131491202
EAN: 2147483647
Year: 2003
Pages: 175
Authors: Susan Lawson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net