Creating and Using DB2 Table Spaces

 <  Day Day Up  >  

Although DB2 data is accessed at the table level, the data itself is actually stored in a structure known as a table space. Each table space correlates to one or more individual physical VSAM linear data sets that are used to house the actual DB2 data.

Before any data can be physically stored in a DB2 database, a table space must first be created. If you attempt to create a table without specifying a table space, DB2 will create a default table space (unless you do not have authority to use the default database).

Types of Table Spaces

When designing DB2 databases, DBAs can choose from four types of table spaces, each one useful in different circumstances. The four types of table spaces are:

  • Simple table spaces

  • Segmented table spaces

  • Partitioned table spaces

  • LOB table spaces

Simple Table Spaces

Simple table spaces are found mostly in older DB2 applications. A simple table space can contain one or more tables. When multiple tables are defined to a simple table space, a single page can contain rows from all the tables defined to the table space.

Prior to DB2 V2.1, most DB2 table spaces were defined as simple table spaces because the only other option was a partitioned table space. However, most subsequent applications use segmented table spaces because of their enhanced performance and improved methods of handling multiple tables.

If an application must read rows from multiple tables in a predefined sequence, however, mixing the rows of these tables together in a single simple table space can prove to be beneficial. The rows should be mixed together on the page in a way that clusters the keys by which the rows will be accessed. This can be done by inserting the rows using a "round robin " approach, switching from table to table, as follows :

  1. Create a simple table space; this is accomplished by issuing the CREATE TABLESPACE statement without specifying either the SEGSIZE or NUMPARTS clause.

  2. Create the two tables (for example, Table1 and Table2), assigning them both to the simple table space you just created.

  3. Sort the input data set of values to be inserted into Table1 into key sequence order.

  4. Sort the input data set of values to be inserted into Table2 into sequence by the foreign key that refers to the primary key of Table1.

  5. Code a program that inserts a row into Table1 and then immediately inserts all corresponding foreign key rows into Table2.

  6. Continue this pattern until all of the primary keys have been inserted.

When the application reads the data in this predefined sequence, the data from these two tables will be clustered on the same (or a neighboring) page. Great care must be taken to ensure that the data is inserted in the proper sequence. Keep in mind that any subsequent data modification ( INSERT s, UPDATE s that increase row size , and DELETE s) will cause the data to get out of sequence ”and then performance will suffer. For this reason, this approach is more useful for static data than it is for dynamic, changing data.

Also, remember that mixing data rows from multiple tables on the same table space page adversely affects the performance of all queries, utilities, and applications that do not access the data in this manner. Be sure that the primary type of access to the data is by the predefined mixing sequence before implementing a simple table space in this manner.

Unless data-row mixing is being implemented, define no more than one table to each simple table space. Also, consider defining all your non-partitioned table spaces as segmented instead of simple.

Segmented Table Spaces

A segmented table space is the most common type of table space for most DB2 development efforts. A segmented table space provides most of the benefits of a simple table space, except multiple tables can be defined to one segmented table space without the problems encountered when using simple table spaces. Tables are stored in separate segments. Because data rows never are mixed on the same page, concurrent access to tables in the same segmented table space is not a problem.

For the segmented table space depicted in Figure 5.1, each box represents a segment. Notice how each segment is represented by a single pattern, signifying that only one table's data can be contained in the segment. Now look at the simple table space in the same figure. Each box represents a page. Notice how multiple patterns inter-mix on the same page.

Figure 5.1. Simple versus segmented table spaces.

graphics/05fig01.gif


Segmented table spaces have other benefits as well. For example,

  • Segmented table spaces handle free space more efficiently , which results in less overhead for inserts and for variable-length row updates.

  • Mass delete processing is more efficient because only the space map ”not the data itself ”is updated. A mass delete of rows from a table in a simple table space causes every row to be physically read and deleted. The following is an example of a mass delete:

     

     DELETE FROM DSN8810.DEPT; 

    If DSN8810.DEPT is defined in a simple table space, all of its rows are read, deleted, and logged. If it is defined in a segmented table space, however, only the space map is updated to indicate that all rows have been deleted.

  • Space can be reclaimed from dropped tables immediately. This reduces the need for reorganization.

Most of your application table spaces should be segmented. All table spaces that contain multiple tables (and do not need to mix data from multiple tables on a page) should be segmented. Even when you're defining one table for each table space, the performance advantage of the more efficient space utilization should compel you to use segmented table spaces.

Choose the segment size carefully . Consider each of the following when selecting the segment size:

  • SEGSIZE is defined as an integer representing the number of pages to be assigned to a segment. The size of a segment can be any multiple of 4, from 4 to 64, inclusive.

  • DASD space is allocated based either on the PRIQTY and SECQTY specifications for STOGROUP -defined table spaces, or on the VSAM IDCAMS definition for user -defined VSAM table spaces. However, this space can never be smaller than a full segment. The primary extent and all secondary extents are rounded to the next full segment before being allocated.

  • Space cannot be allocated at less than a full track. Consult the "PRIQTY and SECQTY and SECQTY " section later in this chapter for additional information.

  • When defining multiple tables in a segmented table space, keep tables of like size in the same table space. Do not combine large tables with small tables in a single segmented table space. Defining small tables in a table space with a large segment size could result in wasted DASD space.

  • When a segmented table space contains multiple tables large enough to be processed using sequential prefetch, be sure to define the SEGSIZE according to the following chart. The segment size should be at least as large as the maximum number of pages that can be read by sequential prefetch. Otherwise , sequential prefetch could read pages that do not apply to the table being accessed, causing inefficient sequential prefetch processing:

    Bufferpool Range

    Segment Size

    1 through 500

    16

    501 through 999

    32

    1000 and over

    64


Partitioned Table Spaces

A partitioned table space is divided into components called partitions . Each partition resides in a separate physical data set. Partitioned table spaces are designed to increase the availability of data in large tables by spreading the data across multiple physical disk devices.

Furthermore, data is assigned to a partition based on a partitioning limit key (see Figure 5.2). Each partition holds data only for the valid range of keys specified for the partition.

Figure 5.2. Partitioned table space.

graphics/05fig02.gif


graphics/v8_icon.gif

Prior to V8, the limit key was defined by a partitioning index. With DB2 V8, though, an index is not required to specify the partitioning limit key; instead, the partitioning limit key is specified in the table creation DDL.


In general, your larger table spaces should be partitioned in order to better control and manage your DB2 objects and data sets. As of DB2 V8, up to 4,096 partitions can be defined for a partitioned table space. For DB2 V6 and V7, the limit is 254 partitions for a LARGE table space. For a non- LARGE table space the limit is 64 partitions.

NOTE

As of DB2 V8, it is possible to create very large databases and table spaces. With up to 4,096 partitions, the maximum size of a partitioned table space is 128 terabytes. For example, a table space with a 4K page size and 4,096 partitions having a DSSIZE of 32GB would max out at 128TB. That is a lot of data! In fact, if you use one per partition for a single day's worth of data, then 4,096 partitions will provide for more than 11 years worth of data.

For V6 and V7, a partitioned table space can hold up to 1TB of data (254 partitions each containing 4GB of data is approximately equal to 1TB). A LARGE partitioned table space can contain up to 16TB of data. (254 partitions, each containing 64GB of data, is approximately equal to 16TB.)

So, even before V8 a partitioned table space could be used to store quite a bit of data.


There are two types of partitioned table space: LARGE and non- LARGE . Whether or not a partitioned table space is a LARGE table space is controlled using the DSSIZE parameter. The DSSIZE parameter specifies the maximum size, in gigabytes, for each partition of the table space. DB2 V8 continues to support the LARGE parameter for compatibility, but using DSSIZE is recommended over using LARGE .

When using the DSSIZE parameter, you will explicitly specify the maximum size of each partition. The following are valid DSSIZE values:

1GB (1 gigabyte)

2GB (2 gigabytes)

4GB (4 gigabytes)

8GB (8 gigabytes)

16GB (16 gigabytes)

32GB (32 gigabytes)

64GB (64 gigabytes)

 

NOTE

To specify a value greater than 4GB , you must be running DB2 with DFSMS V1.5 or greater, and the data sets for the table space must be associated with a DFSMS data class defined with extended format and extended addressability. DFSMS's extended addressability function is necessary to create data sets larger than 4GB in size. The term used by IBM to define data sets that are enabled for extended addressability is EA-enabled .


graphics/v8_icon.gif

When the total table space size is 4TB or greater, DB2 will use a 5-byte RID instead of a 4-byte RID for record identifiers. Prior to DB2 V8, this restriction was more simply stated as whenever the DSSIZE is greater than 4GB, a 5-byte RID is used. But with the greater number of partitions supported as of DB2 V8, this restriction needs to be stated as a combination of NUMPARTS , page size, and DSSIZE . Table 5.1 outlines the table space parameter combinations and sizes requiring 5-byte RIDs.


Table 5.1. Large Table Space Combinations

NUMPARTS

Page Size

DSSIZE

Table Space Size

4096

4KB

1GB

4TB

4096

4KB

2GB

8TB

4096

4KB

4GB

16TB

2048

4KB

8GB

16TB

1024

4KB

16GB

16TB

512

4KB

32GB

16TB

256

4KB

64GB

16TB

4096

4KB

1GB

4TB

4096

4KB

2GB

8TB

4096

4KB

4GB

16TB

4096

4KB

8GB

32TB

2048

4KB

16GB

32TB

1024

4KB

32GB

32TB

512

4KB

64GB

32TB

4096

4KB

1GB

4TB

4096

4KB

2GB

8TB

4096

4KB

4GB

16TB

4096

4KB

8GB

32TB

4096

4KB

16GB

64TB

2048

4KB

32GB

64TB

1024

4KB

64GB

64TB

4096

4KB

1GB

4TB

4096

4KB

2GB

8TB

4096

4KB

4GB

16TB

4096

4KB

8GB

32TB

4096

4KB

16GB

64TB

4096

4KB

32GB

128TB

2048

4KB

64GB

128TB

4096

4KB

LARGE (4GB)

16TB


graphics/v8_icon.gif

As mentioned earlier, DB2 V8 partitioning is no longer dependent on indexing. For all previous releases, a partitioned table space required a partitioning index. It was the index that set up the specific limit keys that determined which data rows were placed in which partition. With DB2 V8, it is possible to specify the limit keys in the table DDL.


Keep the following terminology in mind:

  • Partitioned Table Space ” Any table space with multiple physical partitions.

  • Partitioned Index ” Any index with multiple physical partitions.

  • Partitioning Index ” Any index where the left-most columns match the partitioning limit key of the table space; the index might or might not be partitioned itself.

Furthermore, as of V8, the data in a partitioned table space need not be clustered by the partitioning limit key. So, it is now possible to create a partitioned table space without any indexes (but it is still not generally advisable to do so).

Another V8 partitioned table space improvement is the ability to modify partitions. With the online schema change capabilities of DB2 V8, you can add, remove, and rotate partitions easily using the facilities of DB2. Prior to V8, such changes required dropping and re-creating the table space ”a difficult proposition for many large partitioned table spaces. For more information on the schema change capabilities of DB2 Version 8, refer to Chapter 7, "Database Change Management and Schema Evolution."

Partitioning, Size, and Data Distribution

Deciding to use a partitioned table space is not as simple as merely determining the size of the table. In the early days of DB2 (that is, pre-V4), size typically was the primary consideration for choosing a partitioned table space. However, as DB2 has matured and the applications written using DB2 have become modernized, additional considerations impact your partitioning decisions. Application-level details, such as data contention , performance requirements, degree of parallelism, and data access patterns, must factor into the decision to use partitioned table spaces.

A commonly held belief among DB2 DBAs is that partitioned table spaces should be defined with evenly distributed data across partitions. However, maintaining evenly distributed partitions might not be desirable when partitions are used to isolate data "hot spots." Indeed, it is better to design table space partitions with the needs of the application in mind. Therefore, the best approach is to define table space partitions based on the access requirements of the applications accessing the data. Keep in mind that parallel processing can benefit from properly partitioned table spaces placed on separate volumes .

Partitioning Versus Multiple Tables

Sometimes designers try to avoid partitioned table spaces by dividing a table into multiple tables, each with its own table space. Unless you have specific reasons for doing so, this approach is not wise. When proceeding in this manner, the designer usually places separate tables into each of the smaller table spaces. This can be problematic because it introduces an uncontrolled and unneeded denormalization. (See the "Denormalization" section later in this chapter for more information.)

Furthermore, when data that logically belongs in one table is separated into multiple tables, SQL operations to access the data as a logical whole are made needlessly complex. One example of this complexity is the difficulty in enforcing unique keys across multiple tables. Although partitioned table spaces can introduce additional complexities into your environment, these complexities rarely outweigh those introduced by mimicking partitioning with several smaller, identical table spaces. To clarify why this idea is usually not a good approach, consider these two different ways of implementing a three-"partition" solution:

The first, generally recommended way is to create the table in a single partitioned table space with three partitions. For example:

 

 CREATE DATABASE DB_SAMP; CREATE TABLESPACE TS_SAMPP        IN DB_SAMP        ERASE NO NUMPARTS 3        (PART 1         USING STOGROUP SG_SAMP1         PRIQTY 2000 SECQTY 50         COMPRESS NO,         PART 2         USING STOGROUP SG_SAMP2         PRIQTY 4000 SECQTY 150         COMPRESS YES,         PART 3         USING STOGROUP SG_SAMP3         PRIQTY 1000 SECQTY 50         COMPRESS YES)        LOCKSIZE PAGE   BUFFERPOOL BP1   CLOSE NO; CREATE TABLE TB_SAMP . . . IN DB_SAMP.TS_SAMPP; 

The second, alternative approach is to create three table spaces, each with its own table, as follows:

 

 CREATE DATABASE DB_SAMP2; CREATE TABLESPACE TS_SAMP1 IN DB_SAMP2        USING STOGROUP SG_SAMP1        PRIQTY 2000   SECQTY 50        ERASE NO COMPRESS NO        LOCKSIZE PAGE   BUFFERPOOL BP1  CLOSE NO; CREATE TABLESPACE TS_SAMP2 IN DB_SAMP2        USING STOGROUP SG_SAMP2        PRIQTY 4000   SECQTY 150        ERASE NO COMPRESS YES        LOCKSIZE PAGE   BUFFERPOOL BP1  CLOSE NO; CREATE TABLESPACE TS_SAMP3 IN DB_SAMP2        USING STOGROUP SG_SAMP3        PRIQTY 1000   SECQTY 50        ERASE NO COMPRESS YES        LOCKSIZE PAGE   BUFFERPOOL BP1  CLOSE NO; CREATE TABLE TB_SAMP1 . . . IN DB_SAMP2.TS_SAMP1; CREATE TABLE TB_SAMP2 . . . IN DB_SAMP2.TS_SAMP2; CREATE TABLE TB_SAMP3 . . . IN DB_SAMP2.TS_SAMP3; 

Now consider how difficult it would be to retrieve data in the second implementation if you did not know which "partition" (table) the data resides in, or if the data could reside in multiple partitions. Using the first example, a simple SELECT will work:

 

 SELECT  * FROM    TB_SAMP WHERE   COL1 = :HOST-VARIABLE; 

In the second example, a UNION is required:

 

 SELECT  * FROM    TB_SAMP1 WHERE   COL1 = :HOST-VARIABLE UNION ALL SELECT  * FROM    TB_SAMP2 WHERE   COL1 = :HOST-VARIABLE UNION ALL SELECT  * FROM    TB_SAMP3 WHERE   COL1 = :HOST-VARIABLE; 

If other tables need to be joined, the "solution" becomes even more complex. Likewise, if data must be updated, inserted, or deleted and you do not know which "partition" contains the affected data, it is difficult to code an efficient method to change the data.

graphics/v7_icon.gif

As of DB2 V7, it is possible to implement UNION in views, so the multi-table approach becomes a little easier. A view could be created so that all users and programs could access the view as if it were a table. The view would look something like this:


 

 CREATE VIEW ALL_DATA AS SELECT * FROM TB_SAMP1    UNION ALL    SELECT * FROM TB_SAMP2    UNION ALL    SELECT * FROM TB_SAMP3; 

NOTE

Of course, the SELECT statements in the view should not use SELECT * but instead should list out all of the columns in the table. The example used the shorthand notation for simplicity.


Data integrity and modification poses an additional problem. Every UPDATE will need to "know" which table contains which ranges of values. Without this knowledge, valid data may be entered into the wrong table.

The bottom line is that you should avoid bypassing DB2 partitioning using your own pseudo-partitions unless the table is inordinately large and you are not running DB2 V7 or earlier. Using the multi-table approach can be viable for very, very large tables to get around non-partitioning index problems prior to V8.

graphics/v8_icon.gif

Prior to V8 only the partitioning index is partitioned like the underlying data. [*] Every other index will be a single, non-partitioned index (NPI), also known as a non-partitioned secondary index (NPSI). With DB2 V8 you can create data partitioned secondary indexes to alleviate NPI problems. NPSIs are unwieldy and difficult to manage, administer, backup, recover, and reorganize. These administration issues are somewhat alleviated by the multi-table approach ”but keep in mind, additional administration issues are created. Such as:

[*] As of DB2 V8 a partitioned index can have a superset of the partitioning key columns. For example, a table space partitioned on columns C1 and C2 can have a partitioned index on columns C1 , C2 , and C3 .


  • Assuring data integrity (as discussed above), including avoiding duplicate keys, ensuring key ranges are put in the proper table, and so on, is extremely difficult.

  • Managing multiple indexes can be an administrative burden . Multiple indexes are required because each table will generally have to have the same indexes placed on them. That is, if you index on ACCT_TYPE , you will need to build that index on each of the individual tables; with partitioning, it is built once on the partitioned table.

  • Creating and maintaining additional backup and recovery jobs for each of the table spaces is required when using the multi-table approach. Of course, you may have a similar number of backup jobs under the partitioned approach if you COPY by partition.

Partitioning Pros and Cons

Before deciding to partition a table space, weigh the pros and cons. Consult the following list of advantages and disadvantages before implementation:

Advantages of a partitioned table space are the following:

  • Each partition can be placed on a different DASD volume to increase access efficiency.

  • Partitioned table spaces can be used to store large amounts of data. They are the only type of table space that can hold more than 64GB of data (the maximum size of simple and segmented table spaces).

  • START and STOP commands can be issued at the partition level. By stopping only specific partitions, the remaining partitions are available to be accessed thereby promoting higher availability.

  • Free space ( PCTFREE and FREEPAGE ) can be specified at the partition level enabling the DBA to isolate data "hot spots" to a specific partition and tune accordingly .

  • Query I/O, CPU, and Sysplex parallelism enable multiple engines to access different partitions in parallel, usually resulting in reduced elapsed time. DB2 can access non-partitioned table spaces in parallel, too, but partitioning can optimize parallelism by removing disk contention.

  • Table space scans on partitioned table spaces can skip partitions that are excluded based on the query predicates. Skipping entire partitions can improve overall query performance for table space scans .

  • By mixing clustering and partitioning you can design to decrease data contention. For example, if the table space will be partitioned by DEPTNO , each department (or range of compatible departments) could be placed in separate partitions. Each department is in a discrete physical data set, thereby reducing inter-departmental contention due to multiple departments coexisting on the same data page.

    graphics/v8_icon.gif

    As of DB2 V8 you can further reduce contention by creating data partitioned secondary indexes (DPSIs). Prior to V8, some contention will remain for data in non-partitioned indexes. Defining an NPSI on a table in a partitioned table space causes you to lose some of the benefits of partition-level independence for utility operations because access to an NPSI is sequential.

  • DB2 creates a separate compression dictionary for each table space partition. Multiple dictionaries tend to cause better overall compression ratios. In addition, it is more likely that the partition-level compression dictionaries can be rebuilt more frequently than non-partitioned dictionaries. Frequent rebuilding of the compression dictionary can lead to a better overall compression ratio.

  • The REORG , COPY , and RECOVER utilities can execute on table spaces at the partition level. If these utilities are set to execute on partitions instead of on the entire table space, valuable time can be saved by processing only the partitions that need to be reorganized, copied , or recovered. Partition independence and resource serialization further increase the availability of partitions during utility processing.

Disadvantages of a partitioned table space are as follows:

  • Only one table can be defined in a partitioned table space. This is not really a disadvantage , merely a limitation.

  • Prior to DB2 V8, updating the partitioning columns can be problematic. Although it is possible to UPDATE the columns of the partitioning index, it is not very efficient. First, the PARTKEYU DSNZPARM parameter must be set to enable portioning key modification. If this parameter is set to NO , then updates are not permitted. If updates are allowed, the actual UPDATE will in all likelihood run quite slowly. If the PARTKEYU previous ZPARM is not set, you must delete the row and then reinsert it with the new values in order change a value in a column of a partitioning index key. (As of V8, a partitioning index is no longer required.)

  • The range of key values for which data will be inserted into the table should be known and stable before you create the partitioning index. To define a partition, a range of values must be hard coded either into the partitioning index definition or the table definition. These ranges should distribute the data throughout the partitions according to the access needs of the applications using the data. If you provide a stop-gap partition to catch all the values lower (or higher) than the defined range, monitor that partition to ensure that it does not grow dramatically or cause performance problems if it is smaller or larger than most other partitions.

CAUTION

For table spaces created with a large DSSIZE (or with the LARGE parameter), the values specified after the VALUES clause are strictly enforced. The highest value specified is the highest value that can be placed in the table. Any values greater than the value specified for the last partition are out of range and cannot be inserted.


NOTE

As of DB2 V6, you can change partition key ranges using ALTER INDEX without having to drop and redefine the partitioned table space and index. This capability greatly increases data availability when partition key ranges need to be changed.

Of course, as of DB2 V8, partitioning becomes more flexible and modifying partitioning details becomes much easier with online schema changes.


Updating Partitioning Keys

Your organization must decide whether or not to allow updates to partitioning keys. This is controlled using the PARTKEYU DSNZPARM (which is set on the DSNTIP4 installation panel).

There are three valid settings for PARTKEYU :

  • YES ” The partitioning key columns may be updated. This is the default.

  • NO ” The partitioning key columns are not permitted to be updated.

  • SAME ” The partitioning key columns can be updated but only if the UPDATE results in the row staying in its current partition.

Partitioning and Data Set Sizes

For partitioned table spaces not specified as LARGE (or without the DSSIZE parameter), the number of partitions affects the maximum size of the data set partition as follows:

Number of Partitions

Maximum Data Set Size

1 to 16

4GB

17 to 32

2GB

33 to 64

1GB


graphics/v7_icon.gif

Prior to V7, table spaces that are defined with the LARGE parameter can have a maximum data set size of 4GB for 1 to 256 partitions.


For V7, table spaces that are defined with a DSSIZE of 4GB or greater can have a maximum data set size of 64GB; partitions can range from 1 to 256.

graphics/v8_icon.gif

For V8, table spaces that are defined with a DSSIZE of 4GB or greater can have a maximum data set size of 64GB; partitions can range from 1 to 4,096.


The preceding discussion of table space size is somewhat unclear, so let's clarify it. The maximum number of partitions a table space can have is dependent on the DSSIZE , the page size, and the total tablespace size. Page size affects table size because it affects the number of partitions allowed. Table 5.2 consolidates this information accurately as of DB2 V8.

Table 5.2. Table Space Size

Max Part

Page Size

DSSIZE

Max TS Size

4096

4KB

4GB

16TB

256

4KB

64GB

16TB


CAUTION

Use caution when creating very large table spaces with LOBs. You can only specify up to 5 LOBs per table if 4096 partitions are to be supported. This is so because one LOB table space is required for each LOB per partition. So one LOB on a table with 4,096 partitions would require 12,288 objects. The maximum number of objects per DB2 database is 65,535, hence the 5 LOB limit.


Table-Controlled Partitioning

Consider favoring table-controlled partitioning over index-controlled partitioning. This option is available as of DB2 V8. By specifying the partitioning limit keys in the table DDL, DB2 will not have to rely on a partitioning index to keep the correct data in the proper partition.

The ability to ALTER a table-controlled partitioned scheme is made much simpler with the advent of online schema evolution. Online schema evolution is addressed in detail in Chapter 7.

In order to determine whether a given partitioned table space is index-controlled or table-controlled you will have to query the DB2 Catalog. The PARTKEYCOLUMN in SYSIBM.SYSTABLES will be set to zero if it is index-controlled, or to a particular value if it is table-controlled.

Index-controlled partitioning will be converted to table-controlled partitioning whenever a new DB2 V8 table-controlled partitioning feature is exploited. This includes

  • Creating a data partitioned secondary index (DPSI)

  • Creating a partitioning index without the CLUSTER keyword or altering the existing partitioning index to specify CLUSTER NO

  • Dropping the partitioning index on an index-controlled partitioning table space

  • Altering the table in a partitioned table space to add a partition, rotate a partition, or modify a partition parameter

LOB Table Spaces

LOB table spaces are to be used only in conjunction with LOB columns. One LOB table space is required per LOB column in a table. If the table space containing the LOB column is partitioned, one LOB table space per partition per column is required. The LOB table space is used to store the large object data.

Comprehensive coverage of LOB table spaces is provided in Chapter 9, "Large Objects and Object/Relational Databases."

Table Space Parameters

Many parameters must be considered when creating a table space. Each of these parameters is discussed in this section.

LARGE

The LARGE parameter is available for partitioned table spaces only. When LARGE is specified more than 64GB of data can be stored in the table space. A large table space can have up to 254 partitions, each containing up to 4GB; if EA-enabled, each containing up to 64GB. Refer to Table 5.3 for definitions of storage abbreviations such as GB and TB.

Table 5.3. Storage Abbreviations

Abbreviation

Term

Amount

KB

Kilobyte

1,024 bytes

GB

Gigabyte

1,024 KB

TB

Terabyte

1,024 GB

PB

Petabyte

1,024 TB

EB

Exabyte

1,024 PB

ZB

Zettabyte

1,024 EB

YB

Yottabyte

1,024 ZB


When LARGE (or DSSIZE ) is not specified, the maximum storage amount is limited to 64GB; the maximum number of partitions to 64.

CAUTION

If the NUMPARTS parameter is defined to be greater than 64, the table space will automatically be defined as a large table space even if the LARGE parameter is omitted.


Create LARGE Table Spaces Sparingly

Although it may be tempting to define every table space as LARGE , space considerations and resource requirements need to be taken into account. RIDs in a large table space are 5 bytes instead of 4 bytes. As such, index space usage will increase. Additionally, large table spaces can use more data sets and increase resource consumption of utility processing. Therefore, a large table space should be used only under the following conditions:

  • When more than 16 partitions are required and more than 1GB must be stored per partition; or

  • More than 64 partitions are required; or

  • More than 64GB of data must be stored in a single table space

CAUTION

Use the DSSIZE clause instead of LARGE to specify a maximum partition size of 4GB and larger. The LARGE clause is retained for compatibility with releases of DB2 prior to Version 6.


DSSIZE

The DSSIZE parameter is used to specify the maximum size for each partition or, for LOB table spaces, each data set. If you specify DSSIZE , you must also specify NUMPARTS or LOB . Remember that to specify a value greater than 4GB, the table space must be EA-enabled.

One way of determining whether a data set is EA-enabled is to view it using ISPF option 3.4. The DSORG column will show VS-E for a VSAM EA-enabled data set.

The same cautions regarding the use of LARGE should be adhered to regarding specifying a DSSIZE greater than 4GB.

LOCKSIZE

The LOCKSIZE parameter indicates the type of locking DB2 performs for the given table space. The choices are

ROW

Row-level locking

PAGE

Page-level locking

TABLE

Table-level locking (for segmented table spaces only)

TABLESPACE

Table space-level locking

LOB

LOB locking; valid only for LOB table spaces

ANY

Lets DB2 decide, starting with PAGE


In general, it is fine to let DB2 handle the level of locking required. DB2 will usually use LOCKSIZE PAGE and LOCKMAX SYSTEM unless it is a LOB table space, in which case DB2 will usually choose LOCKSIZE LOB and LOCKMAX SYSTEM . When the number of locks acquired for the table space exceeds the maximum number of locks allowed for a table space, locking escalates to the next higher level. If the table space is segmented, the next higher level is the table. If the table space is nonsegmented, the next higher level is the table space. Any page or LOB locks held are not released when lock escalation occurs.

A good general locking strategy would be to implement LOCKSIZE ANY , except in the following circumstances:

  • A read-only table defined in a single table space should be specified as LOCKSIZE TABLESPACE . There rarely is a reason to update the table, so page locks should be avoided.

  • A table that does not require shared access should be placed in a single table space specified as LOCKSIZE TABLESPACE . Shared access refers to multiple users (or jobs) accessing the table simultaneously .

  • A grouping of tables in a segmented table space used by a single user (for example, a QMF user) should be specified as LOCKSIZE TABLE . If only one user can access the tables, there is no reason to take page-level locks.

  • Specify LOCKSIZE PAGE for production systems that cannot tolerate a lock escalation, but for which row locking would be overkill. When many accesses are made consistently to the same data, you must maximize concurrency. If lock escalation can occur (that is, a change from page locks to table space locks), concurrency is eliminated. If a particular production system always must support concurrent access, use LOCKSIZE PAGE and set the LOCKMAX parameter for the table space to .

  • For LOB table spaces, always specify LOCKSIZE LOB .

  • Consider specifying LOCKSIZE ROW only when concurrency is of paramount importance. When multiple updates must occur to the same page at absolutely the same time, LOCKSIZE ROW might prove to be beneficial. But row locking can cause performance problems, because a row lock requires about the same amount of resources as a page lock. And, because there are usually multiple rows on a page, row locking will typically consume more resources. Do not implement LOCKSIZE ROW , though, unless you are experiencing a locking problem with page locking. Often, at design time, developers believe multiple transactions will be updating the same page simultaneously, but it is not very commonplace in practice. An alternative to LOCKSIZE ROW is LOCKSIZE PAGE with MAXROWS 1 , which will achieve the same purpose by forcing one row per page.

Consider using LOCKSIZE ANY in situations other than those just outlined because it allows DB2 to determine the optimal locking strategy based on actual access patterns. Locking begins with PAGE locks and escalates to TABLE or TABLESPACE locks when too many page locks are being held. The LOCKMAX parameter controls the number of locks that can be taken before escalation occurs. LOCKSIZE ANY generally provides an efficient locking pattern because it allows the DBMS to actively monitor and manage the locking strategy.

Use LOCKSIZE ROW with Caution

The resources required to acquire, maintain, and release a lock at the row level are about the same as required for locking at the page level lock. When row locking is used and a table or table space scan is required, DB2 will lock every row on every page accessed. The number of locks required to successfully accomplish a scan can have a detrimental impact on performance. If a table has 100 rows per page, a table space scan could possibly require nearly 100 times as many resources for row locks as it would for page locks.

Switch Locking Strategies Based on Processing

Some tables have different access patterns based upon the time of day. For example, many applications are predominantly OLTP during work hours and predominantly batch during off hours. OLTP is usually characterized by short, indexed access to tables. Batch processing typically requires more intensive data access and table scans.

To take advantage of these situations, use the ALTER TABLESPACE statement to change the LOCKSIZE parameter to ROW for daylight processing. Before the nightly batch jobs and after online processing diminishes, alter the LOCKSIZE parameter back to ANY or PAGE .

By changing the locking strategy to conform to the type of processing, contention can be reduced thereby enhancing application performance.

Of course, in order to change each program's locking strategy you will need to rebind your static plans and packages after altering LOCKSIZE . For this approach to be successful you should consider assigning different plans and packages to online and batch, if you do not already do so.

LOCKMAX

The LOCKMAX parameter specifies the maximum number of page or row locks that any one process can hold at any one time for the table space. When the threshold is reached, the page or row locks are escalated to a table or table space lock.

Three options are available for setting the LOCKMAX parameter:

  • The literal SYSTEM can be specified, indicating that LOCKMAX should default to the systemwide value as specified in DSNZPARM s.

  • The value can be specified, indicating that lock escalation should never occur for this table space.

  • An integer value ranging from 1 to 2,147,483,647 can be specified, indicating the actual number of row or page locks to tolerate before lock escalation.

Use Caution Before Disabling Lock Escalation

Specify LOCKMAX 0 only when you are absolutely sure of the impact it will have on your processing mix. A very high value for LOCKMAX can have a similar effect to LOCKMAX 0 , with the added benefit of an escape if the number of locks becomes intolerable. Large batch jobs running against a table space specified as LOCKMAX 0 can severely constrain concurrent access if a large number of locks are held without an intelligent commit strategy. When volumes fluctuate (for example, monthly processing cycles), lock patterns can deviate from the norm, potentially causing concurrency problems.

USING

The method of storage allocation for the table space is defined with the USING parameter. You can specify either a STOGROUP name combined with a primary and secondary quantity for space allocation or a VCAT indicating the high-level ICF catalog identifier for user-defined VSAM data sets.

In most cases, you should create the majority of your table spaces and indexes as STOGROUP -defined. This allows DB2 to do most of the work of creating and maintaining the underlying VSAM data sets, which contain the actual data. Another approach that reduces maintenance even more is to let SMS manage your page set allocations .

Table spaces and indexes defined using STOGROUP s provide the additional advantage of automatic data set creation as new data sets are needed. This is more beneficial than simply having DB2 create the initial data sets when the objects are defined. When a table space exceeds the maximum VSAM data set size, DB2 will automatically create additional data sets as needed to store the additional data. If you were using user-defined VSAM data sets instead, you would have to manually add new data sets when new VSAM data sets were needed. It is very difficult to predict when new data sets are needed, and even if you can predict this need, it is difficult to manage and create the data sets when they are needed.

Some DBAs believe that explicitly creating user-defined VSAM data sets for VCAT-defined table spaces gives them more control over the physical allocation, placement, and movement of the VSAM data sets. Similar allocation, placement, and movement techniques, however, can be achieved using STOGROUP s if the STOGROUP s are properly created and maintained and the table spaces are assigned to the STOGROUP s in a planned and orderly manner.

Another perceived advantage of user-defined VSAM data sets is the capability of recovering them if they inadvertently are dropped. The underlying, user-defined VSAM data sets for VCAT-defined objects are not deleted automatically when the corresponding object is dropped. You can recover the data for the table space using the DSN1COPY utility with the translate option. When you intentionally drop table spaces, however, additional work is required to manually delete the data sets.

There is one large exception to this scenario: If a segmented table space is dropped erroneously, the data cannot be recovered regardless of whether it was VCAT- or STOGROUP -defined. When a table is dropped from a segmented table space, DB2 updates the space map for the table space to indicate that the data previously in the table has been deleted, and the corresponding space is available immediately for use by other tables. When a table space is dropped, DB2 implicitly drops all tables in that table space.

A DBA can attempt to recover from an inadvertent drop of a segmented table space, and will appear to be successful with one glaring problem: DB2 will indicate that there is no data in the table space after the recovery. As you can see, the so-called advantage of easy DSN1COPY recovery of dropped tables disappears for user-defined VSAM data sets when you use segmented table spaces. This is crucial because more users are using segmented table spaces instead of simple table spaces to take advantage of their enhanced features.

Another perceived advantage of user-defined VSAM data sets was avoiding deleting and redefining the underlying data sets during utility processing. With STOGROUP -defined data sets, certain utilities, such as REORG , will delete and define the underlying data sets as part of the REORG process. As of DB2 V6, the REUSE option can be specified indicating that STOGROUP -defined data sets should be reused instead of being deleted and redefined. The utilities impacted are LOAD , REBUILD , RECOVER , and REORG .

See Table 5.4 for a comparison of VCAT- and STOGROUP -defined data sets.

Table 5.4. User-Defined VSAM Data Sets Versus STOGROUP s
 

VCAT

STOGROUP

Need to know VSAM

Yes

No

User physically must create the underlying data sets

Yes

No

Can ALTER storage requirements using SQL

No

Yes

Can use AMS

Yes

No [*]

Confusing when data sets are defined on more than one DASD volume

No

Yes

After dropping the table or the table space, the underlying data set is not deleted

Yes

No [**]


[*] A table space initially created as a user-defined VSAM later can be altered to use STOGROUP s. A STOGROUP -defined table space can be altered to user-defined VSAM as well.

[**] Data in a segmented table space is unavailable after dropping the table space because the space map p ages are modified to indicate that the table space is empty after a DROP .

NOTE

If you are using RAID storage devices do not try to explicitly place data sets. RAID storage devices will "mix" up the data anyway, so your placement efforts will be for naught.


PRIQTY and SECQTY

If you are defining your table spaces using the STOGROUP method, you must specify primary and secondary space allocations. The primary allocation is the amount of physical storage allocated when the table space is created. As the amount of data in the table space grows, secondary allocations of storage are taken. To accurately calculate the DASD space requirements, you must know the following:

Number of columns in each row

Data type for each column

Nullability of each column

Average size of variable columns

Number of rows in the table

Row overhead, such as RID size

Growth statistics

Growth horizon

Row compression statistics (if compression is used)

The values specified for PRIQTY and SECQTY are in kilobytes. Most DB2 pages are 4K in size, so you usually should specify PRIQTY and SECQTY in multiples of four. DB2 also supports page sizes of 8KB, 16KB, and 32KB. For table spaces with these page sizes, always specify the PRIQTY and SECQTY amounts in multiples of the page size: 8, 16, or 32, respectively.

Additionally, you should specify PRIQTY and SECQTY amounts in terms of the type of DASD defined to the STOGROUP being used. For example, a table space with 4KB pages defined on an IBM 3390 DASD device uses 48KB for each physical track of storage. This corresponds to 12 pages. A data set cannot be allocated at less than a track, so it is wise to specify the primary and secondary allocations to at least a track boundary. For an IBM 3390 DASD device, specify the primary and secondary quantities in multiples of 48. Here are the physical characteristics of the two most popular IBM DASD devices:

 

Track

Cylinder

Cylinders /Device

Bytes/Track

3380 Device

40KB

600KB

885

47,476

3390 Device

48KB

720KB

1113

56,664


For segmented table spaces, be sure to specify these quantities such that neither the primary nor the secondary allocation is less than a full segment. If you indicate a SEGSIZE of 12, for instance, do not specify less than four times the SEGSIZE , or 48K, for PRIQTY or SECQTY . It is worth noting that a table space with a SEGSIZE of 12 will require 13 total pages, and thus two tracks to store: 1 page for the space map and 12 pages for the first segment.

If you are allocating multiple tables to a single table space, calculate the PRIQTY and SECQTY separately for each table using the formulas in Table 5.5. When the calculations have been completed, add the totals for PRIQTY to get one large PRIQTY for the table space. Do the same for the SECQTY numbers . You might want to add approximately 10% to both PRIQTY and SECQTY when defining multiple tables to a simple table space. This additional space offsets the space wasted when rows of different lengths from different tables are combined on the same table space page. (See the section in this chapter called "Avoid Wasted Space" for more information.) Remember, however, that the practice of defining multiple tables to a single, simple table space is not encouraged.

Table 5.5. Lengths for DB2 Data Types

Data Type

Internal Length

COBOL WORKING STORAGE

CHAR(n)

n

01 identifier

PIC X(n)

VARCHAR(n)

max=n+2

01 identifier

 

49 identifier

PIC S9(4) COMP

 

49 identifier

PIC X(n)

LONG VARCHAR

[*]

01 identifier

 

49 identifier

PIC S9(4) COMP

 

49 identifier

PIC X(n)

GRAPHIC(n)

2 [*] n

01 identifier

PIC G(n) DISPLAY-1

VARGRAPHIC(n)

(2 [*] n)+2

01 identifier

 

49 identifier

PIC S9(4) COMP

 

49 identifier

PIC G(n) DISPLAY-1

LONG

[*]

01 identifier

VARGRAPHIC

   

49 identifier

PIC S9(4) COMP

   

49 identifier

PIC G(n) DISPLAY-1

SMALLINT

2

01 identifier

PIC S9(4) COMP

INTEGER

4

01 identifier

PIC S9(9) COMP

DECIMAL(p,s)

INTEGER (p/2)+1

01 identifier

PIC S9(p)V9(s) COMP-3

FLOAT(n) or REAL

8 ( SINGLE PRECISION if n>21)

01 identifier

COMP-2

FLOAT(n) or FLOAT

4 ( DOUBLE PRECISION if n<21)

01 identifier

COMP-1

DATE

4

01 identifier

PIC X(10)

TIME

3

01 identifier

PIC X(8)

TIMESTAMP

10

01 identifier

PIC X(26)


[*] See text following this table to calculate this length.

To calculate the internal length of a long character column, use these formulas:

Modified row size = (max row size) “(size of all other cols) “(nullable long char cols)

Internal length = 2 * INTEGER (( INTEGER ((modified row size)/(long cols in table))/2))

Next, calculate the number of rows per page and the total number of pages necessary. To do this, use the following formula:

Rows per page = (((page size) “22) * ((100* PCTFREE )/100)/row length)

Total pages = (number of rows) / (rows per page)

Finally, the PRIQTY is calculated as follows:

PRIQTY = total pages * 4

To accurately calculate the primary quantity for a table, you must make a series of calculations.

First, calculate the row length. To do this, add the length of each column, using Table 5.5 to determine each column's internal stored length. Remember to add one byte for each nullable column and two bytes for each variable column.

If the rows are compressed, determine the average compressed row size and use this for the row length in the previous formulas.

To calculate SECQTY , you must estimate the growth statistics for the table space and the horizon over which this growth will occur.

For example, assume that you need to define the SECQTY for a table space that grows by 100 rows (growth statistics) over two months (growth horizon). If free space has been defined in the table space for 1,000 rows and you will reorganize this table space yearly (changing PRIQTY and SECQTY ), you must provide for 200 rows in your SECQTY .

Divide the number of rows you want to provide for (in this case 200) by the number of rows per page. Round this number up to the next whole number divisible by 4 (to the track or cylinder boundary). Then specify this number as your SECQTY .

You might want to provide for secondary allocation in smaller chunks , not specifying the total number of rows in the initial SECQTY allocation. In the preceding example, you provided for 200 rows. By defining SECQTY large enough for 100 rows, you allocate three secondary extents before your yearly reorganization.

You may ask: why three? If each SECQTY can contain 100 rows and you must provide for 200 rows, shouldn't only two extents be allocated? No, there will be three. A secondary allocation is made when the amount of available space in the current extent reaches 50% of the next extent to be taken. So there are three allocations, but the third one is empty, or nearly empty.

As a general rule, avoid a large number of secondary extents. They decrease the efficiency of I/O, and I/O is the most critical bottleneck in most DB2 application systems.

Consider using DB2 Estimator to calculate space requirements for DB2 table space and index data sets. DB2 Estimator is a standalone tool provided by IBM at no cost with DB2 for OS/390. DB2 Estimator can be used to estimate the cost of running DB2 applications. DB2 Estimator also provides a space calculation feature. To calculate space for a table, highlight the table and choose the Space Requirements option in the Tables menu, as shown in Figure 5.3. This will take you to the screen shown in Figure 5.4, which can be used to determine the space requirements for the selected table. This allows the DBA to save time by avoiding the manual space calculations we just covered.

Figure 5.3. DB2 Estimator and space requirements.
graphics/05fig03.jpg

Figure 5.4. Using DB2 Estimator to calculate space.
graphics/05fig04.jpg

Allocate Space on Cylinder Boundaries

Performance can be significantly affected based upon the choice of allocation unit. As an application inserts data into a table, DB2 will preformat space within the index and/or table space page set as necessary. This process will be more efficient if DB2 can preformat cylinders instead of tracks, because more space will be preformatted at once using cylinder allocation.

DB2 determines whether to use allocation units of tracks or cylinders based upon the value of PRIQTY and SECQTY . If either of these quantities is less than one cylinder, space for both primary and secondary will be allocated in tracks. For this reason, it is wise to specify both PRIQTY and SECQTY values of at least one cylinder for most table spaces and indexes.

Allocating space in tracks is a valid option, however, under any of the following conditions:

  • For small table spaces and indexes that consume less than one cylinder of DASD

  • For stable objects that are never updated SECQTY can be set to 0 causing DB2 to consider only PRIQTY when determining the allocation unit

Default Values for PRIQTY and SECQTY

If the PRIQTY parameter is not specified, and the data set is STOGROUP -defined, a default primary quantity value will be chosen by DB2. DB2 will base both the primary and secondary space allocation on the value of the TSQTY DSNZPARM parameter. If TSQTY is 0, then DB2 will default the size as follows, based on the page size of the data set:

  • For 4KB page sizes: 12

  • For 8KB page sizes: 24

  • For 16KB page sizes: 48

  • For 32KB page sizes: 96

If the SECQTY parameter is not specified, but the PRIQTY parameter is specified, the default value for SECQTY is 10% of the PRIQTY value or 3 times the page size of the table space, whichever value is greater.

If both the SECQTY and PRIQTY parameters are not specified, the default value for SECQTY follows the same pattern as for PRIQTY as indicated earlier.

Once again, avoid relying on default values. They are rarely, if ever, the correct choice. And even if the default happens to be the best choice, it is always better to explicitly specify the value to ensure that you are choosing the correct option.

SECQTY 0 Versus No SECQTY Specified

Specifying SECQTY 0 means that no secondary extents will be taken. This is not the same as failing to specify the SECQTY option (which causes DB2 to use the default value). Be sure you understand the difference and only specify SECQTY 0 if you want to avoid extents. If you try to INSERT data and no room is found in the table space and the table space is defined with SECQTY 0 , the INSERT will fail. This is rarely desirable.

Free Space ( PCTFREE and FREEPAGE )

The specification of free space in a table space or index can reduce the frequency of reorganization, reduce contention, and increase the efficiency of insertion. The PCTFREE parameter specifies what percentage of each page should remain available for future inserts. The FREEPAGE parameter indicates the specified number of pages after which a completely empty page is available.

CAUTION

Keep in mind that PCTFREE and FREEPAGE represent the accurate free space only immediately after the object is created or reorganized. As soon as data starts to be inserted or updated, the space set aside using these parameters will start to be used by DB2.


Increasing free space decreases the number of rows per page and therefore decreases the efficiency of the buffer pool because fewer rows are retrieved per I/O. Increasing free space can improve concurrent processing, however, by reducing the number of rows on the same page. For example, consider a table space that contains a table clustered on the DEPARTMENT column. Each department must access and modify its data independent of other departments. By increasing free space, you decrease the occurrences of departments coexisting on table space pages because fewer rows exist per page.

Space can be used to keep areas of the table space available for the rows to be inserted. This results in a more efficient insert process, as well as more efficient access ”with less unclustered data ”after the rows have been inserted.

Understanding how insert activity affects DB2 data pages will aid in understanding how optimal free space specification can aid performance. When a row is inserted, DB2 will perform a space search algorithm to determine the optimal placement of the new row in the table space. This algorithm is different for segmented and non-segmented (simple and partitioned) table spaces. For segmented table spaces DB2 will

  • Identify the page in which the row should be inserted using the clustering index. (If no clustering index exists, DB2 will search all segments for available space to insert the row.)

  • If space is available on that page, the row will be inserted; if space is not available, DB2 will search within the segment containing the target page for available space.

  • If space is available in the segment, the row will be inserted; if space is not available, DB2 will search the last segment allocated in the table space for that specific table.

  • If space is available, insert the row; otherwise DB2 will allocate a new segment.

For non-segmented table space DB2 searches for space as follows:

  • Identify the page in which the row should be inserted using the clustering index.

  • If space is available on that page, the row will be inserted; if space is not available, DB2 will search 16 contiguous pages before and after the target page.

  • If space is available on any of those 32 pages, the row will be inserted; if space is not available, DB2 will scan from the beginning of the table space (or partition).

  • If space is available, insert the row; otherwise DB2 will request a secondary extent.

For both segmented and non-segmented table spaces, DB2 will bypass locked pages even if they contain sufficient free space to hold the row to be inserted.

If insert activity is skewed, with inserts clustered at certain locations in the table space, you might want to increase the free space to offset the space used for the heavily updated portions of the table spaces. This increases the overall DASD usage but can provide better performance by decreasing the amount of unclustered data. Additionally, you could partition the table space such that the data area having the highest insert activity is isolated in its own partition. Free space could then be assigned by partition such that the insert "hot spot"has a higher PCTFREE and/or FREEPAGE specified. The other partitions could be assigned a lower free space.

If more than one table is assigned to a table space, calculate the free space for the table with the highest insert activity. This provides for more free space for tables with lower insert activity, but results in the best performance. Also, if the rows are compressed, calculate free space based on the average compressed row size.

When calculating free space, you must take into account that a certain amount of each page is wasted. DB2 uses 4K page sizes (of which 4,074 bytes are useable for data), and a maximum of 255 rows can be placed on one page. Consider a table space containing a single table with 122-byte rows. A single page can contain 33 rows. This leaves 48 bytes wasted per page, as follows:

4074 / 122 = 33.39

4074 “ (122 * 33) = 48

Suppose that you want 10% free space in this table space. To specify that 10% of each page will be free space, you must factor the wasted space into the calculation. By specifying PCTFREE 10, 407 bytes are set aside as free space. However, 48 of those bytes can never be used, leaving 359 bytes free. Only two rows can fit in this space, whereas three would fit into 407 bytes. Factor the wasted space into your free-space calculations.

As a general rule, free space allocation depends on knowing the growth rate for the table, the frequency and impact of reorganization, and the concurrency needs of the application. Remember, PCTFREE is not the same as growth rate. Consider a table space that is allocated with a primary quantity of 7200K. If PCTFREE was set to 10, 720K is left free, with 6480K remaining for data storage. However, this provides a growth rate of 720/6480, or just over 11%, which is clearly a larger number than the PCTFREE specified. The general formula for converting growth rate to PCTFREE is:

PCTFREE = (growth rate) / (1 + growth rate)

To accommodate a 15% growth rate, only 13% (.15/1.15) of free space is necessary.

The other free space parameter is FREEPAGE . Specifying PCTFREE is sufficient for the free space needs of most table spaces. If the table space is heavily updated, however, consider specifying FREEPAGE in conjunction with PCTFREE . See Table 5.6 for free space suggestions based on update frequency. Modify these numbers to include wasted space, as described previously. These numbers should be used as rough guidelines only. You should always consider the growth rate for data used in your applications when specifying DB2 free space.

Table 5.6. Free Space Allocation Chart

Type of Table Processing

FREEPAGE

PCTFREE

Read only

Less than 20% of table volume inserted between REORG s

10 to 20

20 to 60% of table volumes inserted between REORG s

20 to 30

Greater than 60% of table volumes inserted between REORG s

0 or ( SEGSIZE “ 1)

20 to 30

Most inserts done in sequence by the clustering index

0 to 10

Table space with variable length rows being updated

10 to 20


BUFFERPOOL

DB2 provides eighty buffer pool options for table space and index objects:

  • 50 4KB buffer pools ”BP0 through BP49

  • 10 8KB buffer pools ”BP8K0 through BP8K9

  • 10 16KB buffer pools ”BP16K0 through BP16K9

  • 10 32KB buffer pools ”BP32K through BP32K9

Data accessed from a DB2 table is first read from DASD, and then moved into a buffer pool, and then returned to the requester. Data in the buffer pool can remain resident in memory, avoiding the expense of I/O for future queries that access the same data. There are many strategies for specifying buffer pools, and each is discussed fully in Part V, "DB2 Performance Tuning." For now, it's sufficient to mention the following rules:

  • Some small to medium DB2 shops use a single buffer pool, namely BP0 . For these types of shops, DB2 does an adequate job of managing I/O using a single, large BP0 containing most (or all) of a shop's table spaces and indexes.

  • As usage of DB2 grows, you should specify additional buffer pools tuned for specific applications, table spaces, indexes, or activities. The majority of mature DB2 shops fall into this category. Several buffer pool allocation and usage approaches are discussed in Part V.

  • Avoid using BP32K for application table spaces. DB2 arranges a table space assigned to a 32K buffer pool as eight single 4K pages per 32K page. Therefore, every logical I/O to a 32K table space requires eight physical I/Os. To avoid using BP32K , consider denormalizing your tables, if necessary. (See the "Denormalization" section later in this chapter for more information.) With the addition of 8KB and 16KB buffer pools in DB2 V6, it is easier to manage DB2 table spaces having a page size greater than 4KB.

The number of buffer pools in use at your shop depends on the DB2 workload and the amount of real and extended memory that can be assigned to the DB2 buffer pools. These topics are covered in greater detail in Part V.

Always Specify a Buffer Pool

If you do not specify the BUFFERPOOL clause, the default buffer pools for the database are used (one for table spaces and one for indexes). Do not allow the BUFFERPOOL to default to the buffer pool of the database. It is better to explicitly specify the BUFFERPOOL clause on all table spaces and index CREATE statements.

BP32 and BP32K

Remember that BP32 and BP32K are two different sizes. BP32 is one of the fifty 4K buffer pools. BP32K is one of the ten 32K buffer pools. If you miss , or add, an erroneous "K" you may be using or allocating the wrong buffer pool.

TIP

Any buffer pool that contains a "K" in it is not a 4KB buffer pool; instead it is an 8KB, 16KB, or 32KB buffer pool. If the buffer pool does not contain a "K," it is a 4KB buffer pool.


CLOSE YES or NO

Prior to DB2 V2.3, the CLOSE option specified whether the underlying VSAM data sets for the table space (or index space) should be closed each time the table was used. CLOSE YES indicated that the underlying data set was to be closed after use; CLOSE NO indicated the opposite . A performance gain was usually realized when you specified CLOSE NO . For table spaces accessed infrequently (only once or twice daily), CLOSE YES might have been appropriate.

DB2 V2.3 introduced deferred close processing, sometimes referred to as slow close . Deferred close provided relief from the overhead associated with opening and closing data sets by closing the data sets only when the maximum number of open data sets was reached, regardless of whether CLOSE YES or CLOSE NO was specified. However, DB2 V2.3 will also update SYSLGRNX every time the data set is not in use. This speeds the recovery, because DB2 has a record of when updates could have occurred. However, the constant SYSLGRNX updating can be a performance detriment during normal processing. Also, deferred close is a mixed blessing, because DB2 V2.3 table spaces that need to be closed after each access will remain open regardless of the CLOSE parameter specified.

DB2 V3 introduced a new open/close scenario referred to as pseudo close . Pseudo close offers the following features:

  • A page set is not physically opened until it is first accessed, such as when an SQL statement or utility is executed against it.

  • The VSAM open-for -update timestamp is not modified until data in the page set is updated. Previously, it was modified when the page set was first opened. This timestamp can be used by some types of software to determine when an updated page set needs to be backed -up . If an updated page set has not been modified for a specified number of DB2 checkpoints (DSNZPARM PCLOSEN ) or a specified amount of time (DSNZPARM PCLOSET ), then it is switched to a read-only state.

  • Page sets specified as CLOSE NO are candidates for physical close when either the DDLIMIT or DSMAX limit has been reached.

  • SYSLGRNX records are updated for CLOSE YES data sets and are maintained by partition instead of at the data set level.

  • The performance problems associated with updating SYSLGRNX are eliminated; SYSLGRNX entries will be written only when a data set (or partition) is converted to read-only state, not every time the data set is not in use.

TIP

Favor the use of CLOSE YES when operating with DB2 V3 and greater, because the SYSLGRNX modification performance problems have been eliminated.


graphics/v8_icon.gif

The maximum number of data sets that could be open in MVS at one time was 10,000. For V8, the number of open data sets can be up to 32,000 for z/OS V1.4 or 100,000 for z/OS V1.5.


ERASE YES or NO

The ERASE option specifies whether the physical DASD where the table space data set resides should be written over with binary zeroes when the table-space is dropped. Sensitive data that should never be accessed without proper authority should be set to ERASE YES . This ensures that the data in the table is erased when the table is dropped. Most table spaces, however, should be specified as ERASE NO .

NUMPARTS and SEGSIZE

See the "Use Proper Table space Definitions" section earlier in this chapter for NUMPARTS and SEGSIZE recommendations. The NUMPARTS option is used only for partitioned table spaces, SEGSIZE only for segmented table spaces.

Compression

Data compression can be specified directly in a DB2 table space. Compression is indicated in the DDL by specifying COMPRESS YES for the table space. Likewise, it can be turned off in the DDL by specifying COMPRESS NO . When compression is specified, DB2 builds a static dictionary to control compression. It saves from 2 to 17 dictionary pages in the table space. These pages are stored after the header and first space map page.

DB2 compression provides two very clear benefits:

  • Hardware-assisted compression.

  • It is provided free of charge with the base DB2 product.

Hardware-assisted compression is available only to those users owning IBM's high-end CPU models. This does not mean that DB2 compression features are only available to those with high-end CPUs. Hardware-assisted compression simply speeds up the compression and decompression of data ”it is not a requirement for the inherent data compression features of DB2.

Of course, there are also potential disadvantages to using DB2 compression. Each compressed table space requires a compression dictionary that must be created, stored, and managed. It takes up DBM1 storage and can complicate recovery situations.

Overall, though, DB2 compression generally is efficient and effective. Users who never looked at compression before it was provided by DB2 because of the cost of third-party products should reevaluate their compression needs.

DDL Data Compression Versus Edit Procedures

DB2 data compression definitely should be used instead of the DSN8HUFF routine that is also supplied with DB2. But how does it compare to third-party tools? Most third-party vendors provide compression using EDITPROC s. However, these products are waning in popularity because of the excellent compression available to DB2 and the hardware-assist. Most users will find that DB2 can handle most of their compression requirements without needing a third-party compression tool.

However, before completely refusing to evaluate third-party solutions, consider the following:

  • IBM compression supplies only a single compression routine (based on the Ziv-Lempel algorithm), whereas several third-party tools provide many different compression routines. This enables the user to better fit the algorithm to the composition of the data ”using different compression algorithms for different types of data.

  • The cost in time and effort to convert from prior compression methods to internal DB2 compression may not be cost-justifiable when compared to other tasks facing your enterprise.

  • Third-party tool vendors are constantly enhancing their products to take better advantage of the operating system and the hardware environment. To ensure that you are getting the best "bang for your buck" in terms of data compression, it is wise to evaluate all of your options before settling on any given one. However, most of the third parties have fallen behind in updating their compression routines because of DB2's "out of the box" compression functionality.

CAUTION

For smaller table spaces, it is possible that the dictionary used by DB2 for compression could use more space than compression saves. For this reason, avoid compressing smaller table spaces.


General Data Compression Considerations

Why compress data? Consider an uncompressed table with a very large row size of 800 bytes. Therefore, five of this table's rows fit on a 4K page. If the compression routine achieves 30% compression, on average, the 800-byte row uses only 560 bytes, because 800 “(800*.3) = 560. Now seven rows fit on a 4K page. Because I/O occurs at the page level, the cost of I/O is reduced because fewer pages must be read for table space scans, and the data is more likely to be in the buffer pool because more rows fit on a physical page.

This can be a significant reduction. Consider the following scenarios. A 10,000-row table with 800-byte rows requires 2,000 pages. Using a compression routine as outlined previously, the table would require only 1,429 pages. Another table also with 800-byte rows but now having 1 million rows would require 200,000 pages without a compression routine. Using the compression routine, you would reduce the pages to 142,858 ”a reduction of more than 50,000 pages.

Of course, there is always a trade-off: DASD savings for CPU cost of compressing and decompressing data. However, the cost should be minimal with hardware-assisted compression. Indeed, overall elapsed time for certain I/O heavy processes may decrease when data is compressed. Furthermore, DB2 may require fewer buffer pages to process compressed data versus fully expanded data. Additionally, the compression dictionary is loaded into memory when the page set is opened. Loading lots of compression dictionaries into memory could eventually cause problems.

Encoding Scheme

The CCSID parameter is used to specify the data encoding scheme to use for the table space: ASCII or EBCDIC. All data stored within a table space must use the same encoding scheme.

NOTE

Do not specify an encoding scheme using CCSID for LOB table spaces or table spaces in a temporary database. The encoding scheme for a LOB table space will be inherited from its base table space. The LOB table space must have the same encoding scheme as its base table space ”it cannot be different.

A table space in a TEMP database will not have an encoding scheme, because it can contain temporary tables having a variety of different encoding schemes.


The default encoding scheme for a table space is the encoding scheme of the database in which the table space is being created. So, obviously, the CCSID parameter can be coded for databases as well as table spaces. When CCSID is specified on a CREATE DATABASE statement, it specifies the default encoding scheme for data stored in the database. If no CCSID is specified for the database, the default will be the value specified to the DEF ENCODING SCHEME field on the DSNTIPF installation panel.

LOCKPART

Specifying LOCKPART YES enables selective partition locking (SPL). With SPL, individual partitions of a partitioned table space are locked only when accessed. SPL provides the following benefits:

  • When SPL is enabled, applications accessing different partitions of a partitioned table space can enjoy greater concurrency.

  • In a data sharing environment, DB2 and the IRLM can detect and optimize locking for situations in which no inter-subsystem activity exists by partition.

The default is LOCKPART NO , which indicates that locks are taken on the entire partitioned table space, not partition by partition.

MAXROWS

The MAXROWS parameter indicates the maximum number of rows that can be stored on a table space page. The default is 255. Specify MAXROWS 255 unless there is a compelling reason to limit the number of rows per page, such as to limit contention for page locking.

CAUTION

Do not use MAXROWS for a LOB table space or a table space in a work file database.


Use MAXROWS 1 Instead of Using Dummy Columns

A common design technique for older DB2 systems was to append dummy columns to DB2 tables to arbitrarily extend the row length. This was done to coerce DB2 into storing one row per page, effectively forcing a kludged type of row locking. However, this technique is invasive and undesirable because dummy columns will show up in DCLGEN s and might not always be recognized as "dummies." The same effect can be accomplished by specifying MAXROWS 1 .

MAXROWS 1 also can be a viable alternative to LOCKSIZE ROW .

MEMBER CLUSTER

The MEMBER CLUSTER parameter is used to indicate that inserted data is to ignore the clustering index (whether implicit or explicit). Instead, DB2 will choose where to put the data based on the space available in the table space.

Use this option with great care and only in certain specific situations. For example, if INSERT s are applied during batch processing and then the table space is always immediately reorganized, inserting the data by clustering index just slows down the INSERT processing. In this scenario, specifying MEMBER CLUSTER will speed up the batch jobstream and the subsequent REORG will recluster the data.

CAUTION

Do not use MEMBER CLUSTER for a LOB table space or a table space in a work file database.


TRACKMOD

The TRACKMOD parameter indicates whether DB2 should track modified pages in the space map pages of the table space or table space partition. If you specify TRACKMOD YES , DB2 tracks changed pages in the space map pages to improve the performance of incremental image copy. The default value is YES .

You can specify TRACKMOD NO to turn off the tracking of changed pages in the space map pages. Consider specifying TRACKMOD NO if you never take incremental image copies. Making an incremental copy can be significantly faster than making a full copy if the table space is defined with the TRACKMOD YES option.

Also, you cannot use the CHANGELIMIT option of the COPY utility for a table space or partition that is defined with TRACKMOD NO .

CAUTION

Do not use the TRACKMOD clause for a LOB table space.


Page Size

Each DB2 table space requires an underlying VSAM linear data set in which to store its data. The majority of DB2 table spaces will have 4KB page sizes. However, if the row size is so large that it will not fit onto a 4KB page, the table space can be created with a page size of 8KB, 16KB, or 32KB.

A VSAM CI, or control interval, is basically equivalent to DB2 page size. Prior to DB2 V8, the CI size of every underlying VSAM file was 4KB. DB2 would use multiple 4KB CIs to build up to 8KB, 16KB, or 32KB. For example, DB2 chains together eight separate 4KB CIs to build a 32KB page.

graphics/v8_icon.gif

However, as of DB2 V8 you can direct DB2 to use CI sizes of 8, 16, and 32KB to support table spaces with these large page sizes. To use this feature you need to set a new DSNZPARM. This parameter is set on the DSNTIP7 panel ( VARY DS CONTROL INTERVAL ).


NOTE

DB2 index spaces are still restricted to using 4KB page sizes.


After setting the DSNZPARM, all new 8KB, 16KB, and 32KB table spaces will use the same CI size instead of chaining 4KB CIs. Existing table spaces will be converted when they are reorganized or reloaded from scratch.

Consider using this new feature to synchronize CI size with table space page size to improve the performance of table space scans.

Multi-Table Table Spaces

Most DBAs follow a loose rule of placing only a single table in each table space. In general, this is a wise course of action for simple and segmented table spaces. Of course, it is mandatory for partitioned table spaces because only one table can be defined per partitioned table space. The one table per table space rule eases the administration process and helps to protect data integrity.

In a simple table space, data from more than one table can exist on the same page. Having multiple tables in a simple table space adversely affects concurrent data access, data availability, space management, and LOAD utility processing. For segmented table spaces, each page will contain data for only one table, so the concurrency issues are not relevant.

Another problem for multi-table table spaces is utility processing. DB2 utilities operate at the table space level, even when the code looks like it is for a single table. For example, consider the following LOAD statement:

 

 LOAD DATA REPLACE LOG NO INDDN INPUT INTO TABLE DSN8810.DEPT; 

Most folks would read this statement to say that DB2 will read the data in the data set referenced by the INPUT DDNAME and use it to replace the data in the DEPT table only. That is mostly but not 100% accurate. DB2 will actually replace all of the data in the table space where the DEPT table resides. So, if the EMP table was defined in the same table space, then this LOAD statement would replace all of the DEPT data and completely eliminate all of the EMP data. That is probably not the intent of this LOAD operation. Obliterating data in this manner usually is unacceptable for most applications. This caveat applies to both segmented and simple table spaces.

Additionally, compression can be problematic for multi-table table spaces. The compression ratio can be adversely affected by storing multiple tables in a single table space.

Define One Table per Table Space

For the reasons outlined in the previous section, it is a good idea to follow the rule of placing only a single table into each table space. This is so regardless of the type of table space (simple, segmented, or partitioned). If you are going to put more than one table into the same table space, do so only for small, static tables that will not be loaded using the REPLACE option.

Defining Multiple Tables per Segmented Table Space

However, at times it is advisable to assign multiple tables to a single segmented table space. Although doing so in the wrong situation can be disastrous, there are advantages to multi-table table spaces, too, if they are implemented properly and with discretion. Consider the following advantages and disadvantages before proceeding with more than one table assigned to a segmented table space.

Advantages to defining multiple tables to a segmented table space are as follows:

  • There are fewer open data sets, causing less system overhead.

  • There are fewer executions of the COPY , REORG , and RECOVER utilities per application system because these utilities are executed at the table space level.

  • It is easier to group like tables for administrative tasks because the tables reside in the same physical table space.

Disadvantages to defining multiple tables to a segmented table space are as follows:

  • When only one table needs to be reorganized, all must be REORG ed because they coexist in a single data set or group of data sets.

  • If compression is used the compression ratio will be impacted by multiple tables instead of being optimized for the data patterns of a single table.

  • The LOAD REPLACE utility will replace all data for all tables defined to the table space.

  • There may be confusion about which tables are in which table spaces, making monitoring and administration difficult.

As a very rough general guideline, define small- to medium-size tables (less than 1 million pages) to a single, segmented table space. Create a partitioned table space for each large table (more than 1 million pages). If you decide to group tables in a segmented table space, group only small tables (less than 32 pages). Provide a series of segmented table spaces per application such that tables in the ranges defined in the following chart are grouped together. This will save space. Avoid grouping larger tables (more than 32 pages) with other tables.

Number of Pages

Table Space Segment Size

1 to 4

4

5 to 8

8

9 to 12

12

12 to 16

16

17 to 20

20

21 to 24

24

25 to 28

28

29 to 32

32


When the table space contains tables with the number of pages in the range on the left, assign the SEGSIZE indicated on the right to the table space.

When considering whether to place more than one table in a segmented table space, keep in mind that such a strategy is more preferable for static data than for rapidly changing data. This is so because there will be fewer requirements for running utilities against static data ”and remember, DB2 utilities are run against table spaces.

Multi-Table Code and Reference Table Spaces

Consider placing your code and reference tables into multi-table segmented table spaces. Code and reference tables are likely to be static and frequently used by many programs. It is also plausible that a single application could have numerous code and reference tables. Placing multiple code and references tables into a single tablespace will reduce the number of open data sets required.

Multi-Table Table Spaces and RI

Consider grouping tables related by referential integrity into a single, segmented table space. This is not always feasible , because the size and access criteria of the tables might not lend themselves to multi-table segmented table spaces. Grouping referentially related tables, however, simplifies your QUIESCE processing.

Actually, RI can be a good reason to avoid multi-table table spaces entirely. If unrelated tables are assigned to the same table space you can wind up having to recover data unnecessarily because of a referential constraint.

Multi-Table Table Spaces and DBD Growth

Use caution when dropping and creating large numbers of tables in a single segmented table space, because over time, the DBD for the database containing the segmented table space will grow. There might be a high volume of tables being created and dropped in test environments, ad hoc environments, and any environment where end users have control over the creation and removal of DB2 tables.

Remember that a large DBD can affect storage and processing by consuming a large amount of EDM pool space.

General Table Space Guidelines

As you create DB2 table spaces, refer to the following list of guidelines for proper table space creation and usage.

Use Proper Table Space Definitions

Explicitly define table spaces. If a table space is not specified in the table creation statement, DB2 creates an implicit table space for new tables and sets all table space parameters to the default values. These values are unacceptable for most applications.

Favor Segmented Table Spaces

In general, use segmented table spaces except as follows:

  • Use partitioned table spaces when you want to encourage parallelism. (Non-partitioned table spaces can be accessed in parallel, but partitioned table spaces are preferred for performance and data set placement reasons.)

  • Use partitioned table spaces when the amount of data to be stored is very large (more than several million pages).

  • Use partitioned table spaces to reduce utility processing time and decrease contention

  • Use partitioned table spaces to isolate specific data areas in dedicated data sets.

  • Use partitioned table spaces to improve data availability. If the data is partitioned by region, the partitions for the eastern , southern, and northern regions can be made available while the western region partition is being reorganized.

  • Use partitioned table spaces to improve recoverability. If the data is partitioned by region and an error impacts data for the eastern region only, only the eastern partition needs to be recovered.

  • Use a simple table space only when you need to mix data from different tables on one page.

Consider More Frequent Partitioning

To optimize query parallelism, it is wise to reevaluate your basic notions regarding partitioning. The common "rule of thumb" regarding whether to create a partitioned table space instead of a segmented table space was to use partitioning only for larger table spaces. This strategy is outdated .

Consider partitioning table spaces that are accessed in a read-only manner by long-running batch programs. Of course, very small table spaces are rarely viable candidates for partitioning, even with DB2's advanced I/O, CPU, and Sysplex parallelism features. This is true because the smaller the amount of data to access, the more difficult it is to break it into pieces large enough such that concurrent, parallel processing will be helpful.

Place Partitions on Separate DASD Devices

Move each partition of the same partitioned table space to separate DASD volumes. Failure to do so will negatively affect the performance of query parallelism performed against those partitions. Disk drive head contention will occur because concurrent access is being performed on separate partitions that coexist on the same device.

Consider Single Table Space Databases

For larger table spaces (100K pages and more) that are very active, consider defining a single table space per database. This can reduce contention. To increase efficiency, assign very active table spaces to volumes with low activity.

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net