At times, too much locking can be a problem, resulting in suspensions, timeouts, or even deadlocks. Timeouts and DeadlocksWhile waiting for a lock, a process can exceed an allowable amount of wait time that has been established. A systemwide option determines the maximum time a process can wait before it gets a resource-unavailable error. This value is called the RESOURCE TIMEOUT value and is calculated as follows.
Table 16-7 shows the resource timeout factors.
The result is the timeout value, which is always greater than or equal to the RESOURCE TIMEOUT. When the timeout value has been reached, the requesting task is informed of the unavailable resource. Several conditions can cause a timeout value to be exceeded by a process, and each returns a different message to a different location. The deadlock time is determined by system parameter DEADLOCK TIME (default is 5 seconds). This is the time interval between two successive scans for a deadlock. For every deadlock interval, the IRLM will verify whether there are deadlock situations. In those cases, DB2 will inform the taskusually the one that made the smaller number of changesthat a deadlock has occurred. The other tasks will continue without any problems. The main causes of a timeout are that a program holding a lock has not freed it soon enough and that a deadlock has occurred between DB2 and a non-DB2 resource. The main causes for deadlocks are the following:
When a timeout or a deadlock has occurred, the following error codes and messages are sent to the application program.
A 911 SQL return code means that a deadlock or a timeout occurred and that DB2 has issued a successful ROLLBACK to the last commit point. If the ROLLBACK was not successful, the application will receive a 913 SQL return code signifying that a ROLLBACK was not performed. A912 SQL return code occurs when the maximum number of lock requests has been reached for the database because insufficient memory was allocated to the lock list. With a 912 or a 913 return code, the application needs to issue a COMMIT or ROLLBACK before proceeding with any other SQL. When a 911 occurs, the choice of whether to use retry logic depends on the individual application. If a large amount of work has been rolled back and if other non-DB2 files are present, it may be difficult to reposition everything and retry the unit of work. With most 911 situations, a restart processvendor supplied or user writtenis generally easier than a programmatic reposition and retry. If a small amount of work was lost, a simple retry could be performed up to a fixed number of times. It is important not to retry with a breakpoint, as the source of the problem that caused the negative codes might still exist. With the 913 SQL return code, the single statement could be retried; if successful, the program could simply continue on. However, if repeated attempts at retry logic fail, the application probably needs to be rolled back. Lock Escalation and PromotionLock EscalationLock escalation can occur when the value for LOCKMAX is reached. At this time, DB2 will release all the locks held and take a more comprehensive lockrelease several page or row locks to escalate to a table lock. DB2 tries to balance the number of locks on objects, based on the amount of concurrent access. The LOCKMAX option will then further determine when/whether escalation occurs. NOTE Lock escalation can occur for objects defined with LOCKSIZE ANY, PAGE, or ROW. ANY simply allows DB2 the choice of lock to take initially. The value of LOCKMAX is set on the CREATE TABLESPACE statement to define how many locks can be held simultaneously on an object. Locks will be escalated when this number is reached. You can specifically set this value to a number or leave it at the default of SYSTEM, and it will use the number set by NUMLKTS. NOTE If you specify LOCKSIZE ANY, the default for LOCKMAX is SYSTEM. If you specify LOCKSIZE ROW, PAGE, TABLE, or TABLESPACE, the default for LOCKMAX is 0 (no escalation). You can turn off lock escalation by using the LOCKMAX = 0 parameter on the CREATE TABLESPACE statement. If you choose to turn off lock escalation, be sure that applications accessing those objects are committing frequently and that you have adjusted NUMLKTS to allow for more locks to be taken; otherwise, you run the risk of hitting negative SQL codes when you reach the maximum number of locks. (Without escalation, you could hold more, smaller locks.) Lock escalation exists to protect you from taking excessive system resources, so if you turn it off, you will need to control it. If you are CPU-bound, it may not be a good idea to turn off lock escalation, because it will take DB2 longer to traverse long chains of locks, and it will take more CPU for IRLM latch activity. Details about lock-escalation activity can be found in the DB2 logs identified by the DSNI031I message. This message will contain the resource that experienced the escalation and the details about the new state of the lock. Figure 16-1 shows the output of the DSNI031I message. Figure 16-1. Lock escalationLock PromotionA lock mode that will be used for any table, table space, or table space partition for an SQL statement is reflected in the PLAN_TABLE populated by EXPLAIN in the column TSLOCKMODE if the isolation can be determined at bind time. This is the lock mode that the SQL statement would use for the table or table space lock if and only if it had not been raised by a preceding SQL statement. For example, an SQL SELECT statement might have a TSLOCKMODE of IS, yet during execution, it could have been promoted to IX, and that is what the SELECT statement would use. Such lock promotion is generally not a concern but simply a fact. |