Buffer Pools and Buffer Pool Queue Management
Buffer pools are areas of virtual storage that temporarily store pages of table spaces or indexes. When a program accesses a row of a table, DB2 places the page containing that row in a buffer. When a program changes a row of a table, DB2 must write the data in the buffer back to disk, eventually, normally at either a checkpoint or a write threshold. The write thresholds are either a vertical threshold at the page-set level or a horizontal threshold at the buffer pool level.
The way buffer pools work is fairly simple by design, but it is tuning these simple operations that can make all the difference in the world to the performance of your applications. The data manager issues GETPAGE requests to the buffer manager to satisfy the request from the buffer pool instead of having to retrieve the page from disk. CPU is often traded for I/O in order to manage buffer pools efficiently. Buffer pools are maintained by the subsystem, but individual buffer pool design and use should be by object granularity and, in some cases, also by application.
DB2 buffer pool management by design allows the ability to alter and display buffer pool information dynamically, without requiring a bounce of the DB2 subsystem. This improves availability by allowing you to dynamically create new buffer pools when necessary and to dynamically modify or delete them. You may need to do ALTERs of buffer pools a couple times during the day because of varying workload characteristics.
Initial buffer pool definitions are set at installation/migration but are often difficult to configure properly at this time because the application process against the objects is usually not detailed at installation. But regardless of what is set at installation, you can use ALTER after the install to add/delete new buffer pools, resize them, or change any of the thresholds. The buffer pool definitions are stored in BSDS, and you can move objects between buffer pools via an ALTER INDEX/TABLESPACE and a subsequent START/STOP command of the object.
Virtual pools have three types of pages:
Pages are of four sizes:
The 8K and 16K page sizes are logical constructs of physical 4K control intervals (CI), the same as the 32K pages size. Index page sizes are 4K. Work file table spaces are 4K or 32K. A version 8 DSNZPARM called DSVCI allows the control interval to match the actual page size.
Asynchronous page writes per I/O will change with each page size accordingly:
With the larger page sizes, you can achieve better hit ratios and have less I/O because you can fit more rows on a page. For instance, if you had a 2,200-byte rowperhaps for a data warehousea 4K page would be able to hold only one row, but if an 8K page were used, three rows could fit on a page, one more than if 4K pages were used, and one less lock is required. However, the larger page sizes should not be used as a bandaid for what may be a poor design. You may want to consider decreasing the row size, based on usage, to get more rows per page.
We can have up to 80 virtual buffer pools. This allows for up to 50 4K-page buffer pools (BP0BP49), up to 10 32K-page buffer pools (BP32K, BP32K1BP32K9), up to 10 8K-page buffer pools, and up to 10 16K-page buffer pools.
Pages used in the buffer pools are processed in two categories: randompages read one at a timeor sequentialpages read via prefetch. These pages are queued separately: LRU (random least recently used) queue or SLRU (sequential least recently used) queue. (Prefetch will steal only from the SLRU queue).
The percentage of each queue in a buffer pool is controlled via the VPSEQT parameter (sequential steal threshold). This threshold becomes difficult to adjust and often requires two settings: for example, one setting for batch processing and another setting for online processing. Processing of batch and online data often differs. Batch is usually more sequentially processed, whereas online is processed more randomly.
DB2 breaks up these queues into multiple LRU chains, producing less overhead for queue management because the latch that is taken at the head of the queueon the hash control block, which keeps the order of the pages on the queuewill be latched less because the queues are smaller. Multiple subpools are created for a large virtual buffer pool and the threshold is controlled by DB2 not to exceed 4,000 buffers in each subpool. The LRU queue is managed within each of the subpools in order to reduce the buffer pool latch contention when the degree of concurrency is high. Stealing of these buffers occurs in a round-robin fashion through the subpools.
FIFO (first-in, first-out) can be used instead of the default of LRU. With this method, the oldest pages are moved out, regardless. This decreases the cost of doing a GETPAGE operation and reduces internal latch contention for high concurrency. FIFO would be used only where there is little or no I/O and where table space or index is resident in the buffer pool. Buffer pools with LRU and FIFO objects will be separate, and this can be set via the ALTER BUFFERPOOL command with a PGSTEAL option of FIFO. LRU is the PGSTEAL option default.
I/O Requests and Externalization
Synchronous reads are physical 4K pages that are read in one page per I/O. Synchronous writes are pages written one page per I/O. Synchronous reads and writes should be kept to only what is truly necessary, meaning small in occurrence and number. If not, the result may be buffer pool stress: perhaps too many checkpoints. DB2 will begin to use synchronous writes if the IWTH threshold (immediate write) is reached or if two checkpoints pass without a page being written that has been updated and not yet committed.
Asynchronous reads are several pages read per I/O for prefetch operations such as sequential prefetch, dynamic prefetch, or list prefetch. Asynchronous writes are several pages per I/O for such operations as deferred writes.
Pages are externalized to disk when the following occur:
Controlling page externalization via DWQT and VDWQT thresholds is best for performance and avoids surges in I/O. If page externalization were controlled by DB2 system checkpoints, too many pages would be written to disk at one time, causing I/O queuing delays, increased response time, and I/O spikes. During a checkpoint, all updated pages in the buffer pools are externalized to disk, and the checkpoint is recorded in the log, except for the work files (DSNDB07).
Checkpoints and Page Externalization
DB2 checkpoints are controlled through the DSNZPARM LOGLOAD or CHKFREQ. The LOGLOAD parameter is the number of log records written between DB2 checkpoints; the default was changed from the older 10,000 to 500,000 in a recent version, which can be too low for some production subsystems. The CHKFREQ is the number of minutes between checkpoints. Often, different settings for this parameter may be needed, depending on workload. For example, you may want it higher during batch processing. Recognizing the importance of the ability to change this parameter is based on workloads. The SET LOG LOGLOAD and SET LOG CHKFREQ commands allow you to dynamically set the LOGLOAD or CHKFREQ parameter.
Other options have been added to the -SET LOG command to be able to suspend and resume logging for a DB2 subsystem. SUSPEND causes a system checkpoint to be taken in a non-data sharing environment. By obtaining the log-write latch, any further log records are prevented from being created, and any unwritten log buffers will be written to disk. Also, the BSDS will be updated with the high-written RBA. All further database updates are prevented until update activity is resumed by issuing a -SET LOG command to resume logging or until a -STOP DB2 command is issued. These single-subsystem-only commands will have to be entered for each member when running in a data sharing environment.
In very general terms,DB2 should checkpoint about every 15 to 20 minutes during an online processing, or some other value based on investigative analysis of the impact on restart time after a failure. The frequency of taking checkpoints raises two concerns: the cost and disruption of the checkpoints and the restart time for the subsystem after a crash.
Hundreds of checkpoints per hour is definitely too many, but the general guidelinethree to six checkpoints per houris likely to cause problems in restart time, especially if it is used for batch-update timing. After a normal quiesce stop, there is no work to do. The real concern for checkpointing is for DB2 abend situations or when a MODIFY IRLM,ABEND command is issued, which will also cause DB2 to have to be restarted.
Many times, the costs and disruption of DB2 checkpoints are overstated. A DB2 checkpoint is a tiny hiccup, but it does not prevent processing from proceeding. Having a LOGLOAD setting that is too high, along with large buffer pools and high thresholds, such as the defaults, can cause enough I/O to make the checkpoint disruptive. In trying to control checkpoints, some clients increase the LOGLOAD value and make the checkpoints less frequent, but that makes them much more disruptive.
The situation is corrected by reducing the amount written and increasing the checkpoint frequency, which yields much better performance and availability. It is not only possible, but does occur at some installations, that a checkpoint every minute does not impact performance or availability. The write efficiency at DB2 checkpoints is the key factor to be observed to see whether LOGLOAD can be reduced. If the write thresholds (DWQT/VDQWT) are doing their jobs, there is less work to perform at each checkpoint. Also, using the write thresholds to cause I/O to be performed in a level, nondisruptive fashion is helpful for the nonvolatile storage in storage controllers.
However, even if the write thresholds (DWQT/VDQWT) are set properly, as well as the checkpoints, an unwanted write problem could nonetheless exist if the log data sets are not properly sized. If the active log data sets are too small, active log switches will occur often. When an active log switch takes place, a checkpoint is taken automatically. Therefore, the logs could be driving excessive checkpoint processing, resulting in continual writes, preventing a high ratio of pages written per I/O from being achieved, because the deferred write queue is not allowed to fill as it should.
Buffer pool sizes are determined by the VPSIZE parameter, which determines the number of pages to be used for the virtual pool. In order to size buffer pools, it is helpful to know the residency rate of the pages for the object(s) in the buffer pool. DB2 can handle large buffer pools efficiently, as long as enough real memory is available. If insufficient real storage exists to back the buffer pool storage requested, paging can occur. Paging occurs when the buffer pool size exceeds the real memory on the z/OS image. DB2 limits the total amount of storage allocated for buffer pools to approximately twice the amount of real storage, but less is recommended. A maximum of 1TB is allowed for all buffer pools, provided that real storage is available. However, none of the current machines available have this much memory (for example, a z990 has a maximum of 256GB). Therefore, proper sizing is very important. There is also a feature called PGFIX that will allow you to fix pages in real storage to help avoid processing time that DB2 needs to fix and free pages during an I/O.
Sequential versus Random Processing
The VPSEQT (virtual pool sequential steal threshold) is the percentage of the virtual buffer pool that can be used for sequentially accessed pages. This is to prevent sequential data from using all the buffer pool and to keep some space available for random processing. The value is 0 to 100 percent, with a default of 80 percent, indicating that 80 percent of the buffer pool is to be set aside for sequential processing; 20 percent, for random processing. This percentage needs to be set according to how your objects in that buffer pool are processed.
One tuning option often used is altering the VPSEQT to 0 to set the pool up for only random use. When the VPSEQT is altered to 0, the SLRU will no longer be valid, making the buffer pool totally random. Because only the LRU will be used, all pages on the SLRU have to be freed, which will also disable prefetch operations in this buffer pool. This is beneficial for certain strategies: It will turn off the CPU for each GETPAGEup to 10 percent GETPAGE overhead reduction. However, this strategy has problems for certain buffer pools.
The DWQT, also known as the horizontal deferred write threshold, is the percentage threshold that determines when DB2 starts turning on write engines to begin deferred writes (32 pages/asynchronous I/O). The value can be from 0 to 90 percent, with a default of 50 percent. When the threshold is reached, up to 600 write engines begin writing pages out to disk. Running out of write engines can occur if the write thresholds are not set to keep a constant flow of updated pages being written to disk. This can occur and is okay if it is uncommon, but daily occurrence is a tuning opportunity.
DB2 turns on these write engines one vertical page set, queue, at a time, until a 10 percent reverse threshold is met. When DB2 runs out of write engines, it can be detected in the statistics reports in the WRITE ENGINES NOT AVAILABLE indicator on statistics reports.
When you set the DWQT threshold, a high value is useful to help improve the hit ratio for updated pages but will increase I/O time when deferred write engines begin. You use a low value to reduce I/O length for deferred write engines, but this will increase the number of deferred writes. This threshold should be set based on the referencing of the data by the applications. If the DWQT is set to zero so that all objects defined to the buffer pool are scheduled to be written immediately to disk, DB2 uses its own internal calculations for exactly how many changed pages can exist in the buffer pool before it is written to disk. Thirty-two pages are still written per I/O, but it will take 40 dirty, or updated, pages to trigger the threshold so that the highly rereferenced updated pages, such as space map pages, remain in the buffer pool.
When implementing LOBs, a separate buffer pool should be used and should not be sharedbacked by a group buffer pool in a data sharing environment. The DWQT should be set to 0 so that, for LOBs with LOG NO, force-at-commit processing occurs, and the updates continually flow to disk instead of surges of writes. For LOBs defined with LOG YES, DB2 could use deferred writes and avoid massive surges at checkpoint.
The DWQT threshold works at a buffer pool level for controlling writes of pages to the buffer pools, but for a more efficient write process, you will want to control writes at the page set/partition level. This can be controlled via the VDWQT. The percentage threshold determines when DB2 starts turning on write engines and begins the deferred writes for a given data set. This helps to keep a particular page set/partition from monopolizing the entire buffer pool with its updated pages. The value is 0 to 90 percent, with a default of 10 percent. The VDWQT should always be less than the DWQT.
A good rule of thumb for setting the VDWQT is that if fewer than ten pages are written per I/O, set it to 0. You may also want to set it to 0 to trickle-write the data out to disk. It is normally best to keep this value low in order to prevent heavily updated page sets from dominating the section of the deferred write area. As of version 6, both a percentage and a number of pages from 0 to 9,999 can specify the VDWQT. You must set the percentage to 0 to use the number specified. Set it to 0,0, and the system uses MIN(32,1%), which is good for trickle I/O.
If you set the VDWQT to 0, 32 pages are still written per I/O, but it will take 40 dirty pages to trigger the threshold so that the highly rereferenced updated pages, such as space map pages, remain in the buffer pool. It is a good idea to set the VDWQT using a number rather than a percentage. If someone increases the buffer pool, that means more pages for a particular page set can occupy the buffer pool, and this may not always be optimal or what you want.
When looking at any performance report showing the amount of activity for the VDWQT and the DWQT, you would want to see the VDWQT being triggered most of the time (VERTIC.DEFER.WRITE THRESHOLD) and the DWQT much less (HORIZ.DEFER.WRITE THRESHOLD). No general ratios are possible, as that would depend on both the activity and the number of objects in the buffer pools. The bottom line is that you want to control I/O by the VDWQT, with the DWQT watching for and controlling activity across the entire pool and in general writing out rapidly queuing-up pages. This will also assist in limiting the amount of I/O that checkpoint would have to perform.
The VPPSEQT (virtual pool parallel sequential threshold) is the percentage of VPSEQT setting that can be used for parallel operations. The value is 0 to 100 percent, with a default of 50 percent. If the value is set to 0, parallelism is disabled for objects in that particular buffer pool. Doing so can be useful in buffer pools that cannot support parallel operations. The VPXPSEQT (virtual pool sysplex parallel sequential threshold) is a percentage of the VPPSEQT to use for inbound queries. It too defaults to 50 percent. If it is set to 0, sysplex query parallelism is disabled. In affinity data sharing environments, this is normally set to 0 to prevent inbound resource consumption of DSNDB07 and work files.
The VPSTEAL threshold allows you to choose a queuing method for the buffer pools. The default is LRU, but FIFO is also an option. This option turns off the overhead for maintaining the queue and may be useful for objects that can completely fit in the buffer pool or if the hit ratio is less than 1 percent.
The following thresholds are a percentage of unavailable pages to total pages, where unavailable means either updated or in use by a process.
Be aware that the IWTH counter can also be incremented when dirty pages on the write queue have been referenced, which causes a synchronous I/O before the page can be used by the new process. This threshold counter can also be incremented if more than two checkpoints occur before an updated page is written, as this will cause a synchronous I/O to write out the page.
Virtual Pool Design Strategies
Separate buffer pools should be used, based on their type of usage by the applications, such as buffer pools for objects that are randomly accessed versus those that are sequentially accessed. Each of these buffer pools will have its own unique settings, and the type of processing may even differ between the batch cycle and the online day. These are very generic breakouts just for this example. Actual definitions would be much finer-tuned and less generic. The following list provides a more detailed example of buffer pool object breakouts:
Tuning with the -DISPLAY BUFFERPOOL Command
In several cases, the buffer pools can be tuned effectively by using the DISPLAY BUFFERPOOL command. When a tool is not available for tuning, the following steps can be used to help tune buffer pools.
The output contains valuable information, such as prefetch information (sequential, list, dynamic requests), pages read, prefetch I/O, and disablement (no buffer, no engine). The incremental detail display shifts the time frame every time a new display is performed.
The row identifier pool is used for storing and sorting RIDs for such operations as
The optimizer looks at the RID pool for prefetch and RID use. The full use of the RID pool is possible for any single user at runtime. Runtime can result in a table space scan if not enough space is available in the RID. For example, if you want to retrieve 10,000 rows from a 100,000,000-row table and no RID pool is available, a scan of 100,000,000 rows will occur, at any time and without external notification. The optimizer assumes that physical I/O will be less with a large pool.
The default size of the RID pool is 8MB, with a maximum size of 10,000MB. The RID pool is created at start-up time, but no space is allocated until RID storage is needed. It is then allocated in 32KB blocks as needed, until the maximum size you specified on the installation panel DSNTIPC is reached. When setting the RID pool size, you should have as large a RID pool as required, as it is a benefit for processing and can lead to performance degradation if it is too small. A good guideline for sizing the RID pool is as follows:
Number of concurrent RID processing activities x Average number of RIDs x 2 x 5 bytes per RID
Statistics to Monitor
You should monitor three statistics for RID pool problems: RIDs over the Relational Data System limit, RIDs over the Data Manager limit, and insufficient pool size.
RIDs over the RDS Limit
This statistic is the number of times list prefetch is turned off because the RID list built for a single set of index entries is greater that 25 percent of the number of rows in the table. In this case, DB2 determines that, instead of using list prefetch to satisfy a query, it will be more efficient to perform a table space scan, which may or may not be good, depending on the size of the table accessed. Increasing the size of the RID pool will not help in this case. This is an application issue for access paths and needs to be evaluated for queries using list prefetch.
There is one very critical issue with this type of failure. The 25 percent threshold is stored in the package/plan at bind time; therefore, it may no longer match the real 25 percent value and in fact could be far less. It is important to know what packages/plans are using list prefetch and on what tables. If the underlying tables are growing, the packages/plans that are dependent on it should be rebound after the RUNSTATS utility has updated the statistics. Key correlation statistics and better information about skewed distribution of data can also help to gather better statistics for access-path selection and may help avoid this problem (see Chapter 7 for more information about data correlation statistics.
RIDS over the DM Limit
This occurs when more than 16 million RIDS are required to satisfy a query. DB2 has a 16 million RID limit. The consequences of hitting this limit can be fallback to a table space scan. In order to control this, you can do the following.
Insufficient Pool Size
This indicates that the RID pool needs to be enlarged.
The SORT Pool
Sorts are performed in two phases: initialization and merge. During the first phase, DB2 builds ordered sets of runs from the given input; in the second phase, DB2 merges the runs.
At start-up, DB2 allocates a sort pool in the private area of the DBM1 address space. DB2 uses a special sorting technique called a tournament sort. During the sorting processes, this algorithm commonly produces logical work files called runs, which are intermediate sets of ordered data. If the sort pool is large enough, the sort completes in that area. More often than not, the sort cannot complete in the sort pool, and the runs are moved into DSNDB07. These runs are later merged to complete the sort. When DSNDB07 is used for holding the pages that make up the sort runs, you can experience performance degradation if the pages get externalized to the physical work files, as they will have to be read back in later in order to complete the sort.
The sort pool size defaults to 2MB unless specified but can range in size from 240KB to 128MB. The size is set with an installation DSNZPARM. The larger the sort pool (sort work area) is, the fewer sort runs are produced. If the sort pool is large enough, the buffer pools and DSNDB07 will not be used. If buffer pools and DSNDB07 are not used, better performance will result from less I/O. You should size the sort pool and DSNDB07 large because you do not want sorts to have pages being written to disk.
The Environmental Descriptor Manager Pool
The EDM pool is made up of three components, each of which is in its own separate storage; each contains many items, including the following:
The EDM storage pools cannot be undersized, as this can cause performance degradation. If the EDM DBD cache is too small, you will see increased I/O activity in the DSNDB01.DBD01 table space in the DB2 directory. If the EDM pool storage is too small, then you will see increased I/O activity in the DSNDB01.SPT01 and DSNDB01.SCT02 DB2 table spaces, which also support the DB2 directory. If the EDM statement cache is too small, the cache hit ratio on the dynamic statement cache will drop.
Your main goal for the EDM pool is to limit the I/O against the directory and the catalog. If the pool is too small, you will also see increased response times, owing to the loading of the SKCTs, SKPTs, and DBDs, and repreparing the dynamic SQL statements because they could not remained cached. By correctly sizing the EDM pool, you can avoid unnecessary I/Os from accumulating for a transaction. Reloading an SKCT, SKPT, or DBD into the EDM pool creates additional I/O. This can happen if the pool pages are stolen because the EDM pool is too small. Pages in the pool are maintained on an LRU queue, and the least recently used pages get stolen, if required.
If a new application is migrating to the environment, it may be helpful to look in SYSIBM.SYSPACKAGES to give you an idea of the number of packages that may have to exist in the EDM pool, and this can help determine the size.
You can measure the following ratios to help determine whether your EDM pool is efficient. Think of these as EDM pool hit ratios:
Your goal is a value of 5 for each of the preceding (1 out of 5). An 80 percent hit ratio is what you are aiming for.
Dynamic SQL Caching
If you are going to use dynamic SQL caching, you have to pay attention to your EDM statement cache pool size. If it is too small, you can affect the performance of your static SQL also. Cached statements are not backed by disk; if its pages are stolen and the statement is reused, it will have to be prepared again. Static minibinds can be flushed from EDM by LRU but are backed by disk and can be retrieved when used again. Statistics help to monitor cache use, and trace fields show the effectiveness of the cache and can be seen on the statistics long report.