Maintaining Data Integrity

Data defined as "sharable" can be accessed by any DB2 system in the group. Several subsystems can read and write simultaneously against the same data. This sharing is controlled via inter-DB2 read/write interest. Changed data, or data with inter-DB2 R/W interest, is always cached in the group buffer pool; this data becomes group buffer pool dependent. DB2 uses the coupling facility to control the invalidations of changed pages in each member's local buffer pools.

Consistency of data among the DB2 members in the data sharing group is protected through both concurrency controls and coherency controls. In order to provide concurrency controls among the DB2 members in the data sharing group, a new locking mechanism is used. Tuning locking is extremely critical to performance in a data sharing environment.

Locking

Locking in a data sharing environment works differently from the locking we have been accustomed to. Data sharing introduces explicit hierarchical locking (EHL). (Prior to data sharing, you were using implicit hierarchical locking.) The only difference is that with EHL a token is kept that identifies parent/child relationships. A parent lock is a table space/partition lock, and a child lock is a page/row lock (Figure 9-5). The benefit of using EHL is that only the most restrictive lock moves to the coupling facility, reducing the number of calls to the coupling facility to control concurrency, which can create a great deal of overhead. Lock avoidance still works with EHL, and type 2 indexes work best. This does not lessen the fact that the use of uncommitted read (UR) should still be considered wherever possible.

Figure 9-5. Parent/child locks

With EHL, only the most restrictive parent lock is propagated until it is necessary for the child lock to be propagatedrecorded in the coupling facilitythus lessening the amount of lock activity. For example, if the parent lock on a table space is intent exclusive (IX), a child lock in share (S) mode on a page would not have to be propagated to the coupling facility lock structure, because the lock on the parent is more restrictive. In short, we lock only what is necessary and negotiate locks in CF if there is conflict. Child locks are propagated only if the parent locks are in conflict.

Local locks are the same in both non-data sharing and data sharing environments. These locks are requested on the local subsystem and provide only intra-DB2 concurrency control.

Global locks are the locks that a DB2 subsystem needs to make known to the group through the coupling facility. These locks are propagated to the coupling facility and provide intra-DB2 and inter-DB2 concurrency control. In a data sharing environment, almost all locks are global.

Whether a lock becomes global depends on whether the lock request is for an L-lock (logical) or P-lock (physical). Physical locks are owned by a DB2 member and are negotiable. Unlike normal transaction locks, P-locks are not used for concurrency but rather for coherency. Physical locks are of two types: page set and page.

Page set P-locks are used to track intersystem interest between DB2 members and to determine when a page set becomes GBP dependent. These locks have different modes, depending on the level of read/write interest on the page set among the DB2 members. A P-lock cannot be negotiated if it is retained (kept due to a subsystem failure). It is released when the page set or partitioned data set is closed. Few P-locks are taken; they are usually held for long periods of time.

Page P-locks are used to ensure physical consistency of a page when it is being modified; these locks work at the subpage level and are used in the same manner as latches in a non-data sharing environment. Page locks are also used when changes are being made to a GBP-dependent space map page.

Page set P-lock negotiation takes place when P-locks are noted as being incompatible. The two members with the incompatible locks will negotiate the lock mode so that both can still use the object. Because the P-lock is used for coherency, not concurrency, this negotiation does not sacrifice any data integrity.

The reason for P-lock negotiation is to lessen the number of locks propagated to the coupling facility. The most restrictive P-lock is taken first and then, if necessary, negotiated so that another process can have access to the page set. Page set P-locks are used to track interest in a page set and to know when it is necessary to begin propagation of child locks, because of the level of interest in the DB2 members for the page set.

P-locks and negotiation can be thought of as "I need to know what you are doing; here is what I am doing; let's find a way to work together, or do we have to take turns?" It can also be thought of as an activity indicator.

L-locks occur in both data sharing and non-data sharing subsystems. L-locks can be local or global. Logical locks, transaction or program owned, are non-negotiable locks and work like the normal locks in a single-subsystem environment to serialize access to objects. L-locks are controlled by IRLM of members and are held from update to commit.

Logical locks are of two types. Parent L-locks are at table space or partition levelpage setand are almost always propagated to determine whether a conflict exists with another member. Child L-locks can be on a table, data page, or row. They are based on parent L-lock conflict. If no conflict exists, the child L-locks are not propagated to the coupling facility.

Two more types of locks introduced in data sharing are modified locks and retained locks. Modified locks are used to identify a lock on a resource that is being shared (updated). This includes any active X type (X, IX, SIX) P-lock or L-lock. A modified lock is kept in the modified resource list in the lock structure of the coupling facility and is kept regardless of group buffer pool dependency of the object. Modified locks are used to create retained locks if the DB2 member holding the modify lock fails.

Retained locks are modified locks that are converted to retained locks if a member of the group fails. Retained locks are necessary to preserve the data integrity in the event of a failure. This type of lock will be held when a DB2 subsystem fails and the locks belongs to the failing member and must be resolved before access to the locked object will be allowed by other members.

A performance/availability bottleneck may occur if proper procedures are not in place for recovering a failed DB2 member. Retained locks are held at the global lock manager (GLM) level and are owned by the local lock manager (LLM), not a transaction. Thus, only the DB2 member that had the lock can resolve it, so the failed subsystem must come up to resolve the lock. So, regardless of whether a transaction resumes activity in another subsystem, the locks are still retained, and the data is still not accessible by any process: Readers using uncommitted read can still view the data. The failed DB2 can be restarted on the same system or another system in the same group; it does not matter, as long as it comes up.

NOTE

Each local IRLM also keeps a local copy of retained locks for fast reference, so retained locks can survive a coupling facility failure.

It is critically important that retained locks be resolved immediately. The RESTART LIGHT(YES) command can be used to help perform this. The restart light option brings the DB2 subsystem up just enough to resolve the retained locks before shutting it back down. During a restart light, DB2 does not accept connections. To set up restart light in the ARM policy, the following syntax would be used:

 RESTART_METHOD(SYSTEM, STC, 'cmdprfx STA DB2, LIGHT(YES)')

To manually start DB2 with restart light the following command would be used:

 START DB2 LIGHT(YES)

Lock Contention

Three types of lock contention can occur in a data sharing environment.

Global lock contention (IRLM/real) occurs when there is real contention against two resources.
False lock contention occurs when two locks hash to the same entry in the lock table, but the locks are not in contention.
XES contention occurs because XES interprets locks only as X or S; therefore, some locks that appear to be in contention are compatible because they are intent locks.

NOTE

All forms of lock contention need to be monitored and minimized because the process of contention resolution in a data sharing environment can get expensive and can be detrimental to performance.

Group Buffer Pools

Group buffer pools are structures in a data sharing environment coupling facility. They are used to allow for the sharing of data among multiple subsystems. When it is read by a member, a page is read into that member's virtual buffer pool and registered in the group buffer pool. If the page is updated, a force-at-commit will cache that changed page in the group buffer pool and invalidate the page in any other member's virtual buffer pool. This process, known as cross-invalidation, ensures that everyone is working with the most current page. Figure 9-6 shows how the registration of pages works.

Figure 9-6. Group buffer pool page registration

Page directory entries are used to check the existence of a page copy in the group buffer pool and to determine which members will need to be sent cross-invalidation messages. Only one directory entry is needed for a page, regardless of how many virtual buffer pools the page is cached in.

Interest for a page is registered in the page directory when a member reads a page into the local buffer pool from disk or into the group buffer pool for a group buffer pooldependent page set. If using coupling facilities at CFLEVEL=2 or higher, DB2 prefetch can register up to 32 pages with a single CF interaction; otherwise, it is done on a page-by-page basis. When a page set or partition becomes GBP dependent, all changed pages in the local buffer pool are moved synchronously into the GBP. All these pages, clean and changed, are registered in the directory in the GBP.

Each group buffer pool has a ratio setting as well as a size setting. The ratio is a GBP setting that establishes the number of directory entries to the number of data entries in the GBP. Without enough directory entriesentry for each page read on any DB2 member; only one page is registered, regardless of the number of members with interestwhen a new page needs to be registered it will claim a directory slot and deregister the existing page in order to be registered. The process requiring the page that was deregistered will have to go to disk to reread and register the page. Depending on the number of time this occurs, significant overhead can develop. Use the DISPLAY GROUPBUFFERPOOL command to determine how many times this occurs.

In order to change the ratio, you can issue the -ALTER GROUPBUFFERPOOL command. In the following example, the ratio is changed to be 20:1, or 20 directory entries for every data entry:

 -ALTER GROUPBUFFERPOOL(GBP3) RATIO(20)

A few situations can cause the deregistration of a page. If buffers are stolen from the local buffer pool of a GBP-dependent page set, the pages are deregistered. This occurrence indicates a possible problem with the size and/or threshold in the virtual buffer pool because the pages are falling off the LRU (least recently used) queue. This would not be a problem if the page was not referenced, but if it is needed and has to be read back into the virtual buffer pool, it must also be reregistered.

If an existing directory entry must be reclaimed for new work, the page is marked invalid and deregistered and so must then be reread in from disk. This can happen if the group buffer pool is too small or the ratio is incorrect.

Sizing Group Buffer Pools

Sizing group buffer pools is not like sizing normal, virtual buffer pools. Group buffer pools are defined as structures in the coupling facility. They are given an initial size when they are defined in the CFRM policy and for performance and availability reasons should be created in a coupling facility separate from the one that holds the Lock and SCA structures.

Some standard rules of thumb exist for sizing, but most are generic at best. For the best sizing of your group buffer pools, you will need a good understanding of the amount of sharing that will be occurring against the objects that are in the group buffer pool. In other words, you will need to worry about object separation in virtual buffer pools even more so when implementing group buffer pools; otherwise, your initial sizing estimates will be rather difficult.

Castout

Because no connection exists between the coupling facility and disk, DB2 must have a way to move changed pages out to disk. The castout process, performed by castout engines, moves the changed pages from the group buffer pool through a private area in the DBM1 address spacenot a virtual buffer pooland from there they are written to disk.

The castout process (Figure 9-7) is triggered when the number of changed pages exceeds the CLASST (number of changed pages in the class queue) or the GBPOOLT (number of changed pages in the GBP) threshold or a psuedo/physical close performed on a data set by the last updating member. The CLASST threshold is similar to the VDWQT threshold on local buffer pools, and the GBPOOLT is similar to the DWQT tHReshold. Castout can also be triggered if the GBPCHKPT (number of minutes between GBP checkpoints) threshold is reached and a group buffer pool checkpoint is taken.