Section 13.2. DB2 Transaction Logs

13.2. DB2 Transaction Logs

DB2 uses transaction logs to record all changes to your database so that they can be either reapplied or rolled back in the event of problems.

13.2.1. Understanding the DB2 Transaction Logs

The ability to perform both crash recovery and roll forward recovery is provided by the database transaction logs. Transaction logs keep track of changes made to database objects and data. During the recovery process, DB2 examines these logs and decides which changes to redo or undo.

Logs can be stored in files or in raw devices. In this chapter, we use files in our examples for simplicity. To ensure data integrity, DB2 uses a "write-ahead logging" mechanism to write to the logs before writing (externalizing) the database changes to disk. To illustrate this process, assume a user issues the following statements:

 UPDATE t1 SET year = 2000 WHERE ID = '007' UPDATE t1 SET year = 2001 WHERE ID = '011' COMMIT UPDATE t1 SET year = 2004 WHERE ID = '003'

Table t1 and its index ix1 are shown in the logical view of Figure 13.1.

Figure 13.1. Concept of transaction logging

As each statement is executed, the following takes place (the figure uses the first UPDATE statement for illustration purposes).

The DB2 optimizer parses the query and determines that using index ix1 is the fastest way to retrieve the desired data. An index page access followed by a data page access is required.
The statement and access plan information is stored in the package cache (1).
The extent containing the desired index page (2) is brought from disk to the buffer pool (3). The index points to a record in the data page, and thus the extent containing the pertinent data page (4) is also brought from disk to the buffer pool (5).
The UPDATE operation takes place in the buffer pool.
The UPDATE operation is recorded in the log buffer. The old and new values are kept in case the operation needs to be reverted (6).

DB2 checks to see if the log buffer is full or if MINCOMMIT commits have been performed. If any of these occurred, the information in the log buffers is written to the log files in the disk for the logs (7).

NOTE

The MINCOMMIT database configuration parameter indicates the minimum number of COMMIT commands required before writing from the log buffer to the disks for the logs.

The log files can contain committed or uncommitted data. When a crash recovery happens, DB2 will undo any statement that was not committed, and will redo any statements that were committed.

Note too that the information that was changed in the buffer pool and that was written to the log files has not yet been saved in the disks for the database. This will eventually happen when "page cleaner" processes are run to "clean" the modified pages in the buffer pool and write them to disk (8). This is not done immediately after the statement is executed for performance reasons. DB2 is already storing the information in the log files on disk, so there is no need to perform another I/O right away to store the changed pages to the database disk.

When data is committed and saved in the disk for the database, the data is considered to be externalized.

NOTE

We discuss the DB2 optimizer and page cleaners in detail in Chapter 16, Database Performance Considerations.

13.2.2. Primary and Secondary Log Files

The Disk for Logs in Figure 13.1 is known as the active log directory or log space. Its location is specified in the database configuration parameter Path to log files.

The size of the log space is controlled by three database configuration parameters.

The LOGPRIMARY parameter specifies the number of primary log files that are allocated in the active log directory. Primary log files are allocated during the first database connect or during database activation by the ACTIVATE DATABASE command.
The LOGSECOND parameter controls the maximum number of secondary log files that can be allocated in the active log directory. Secondary log files are allocated dynamically one at a time as needed, when there are no more primary logs available for transaction processing.
The LOGFILSIZ parameter controls the size of the log files. This value is specified as a number of 4KB pages.

For example, let's say you have the following values in your database configuration:

 Log file size (4KB)           (LOGFILSIZ)  = 250 Number of primary log files   (LOGPRIMARY) = 3 Number of secondary log files (LOGSECOND)  = 2 Path to log files                          = C:\mylogs\

Since each file is 1MB (250 x 4KB), and there are a total of 5 log files (3 primary logs and 2 secondary logs), the log space is 5MB.

NOTE

The maximum log space you can configure is 256GB.

In this example, as soon as the first connection to the database is established, 3 primary log files of 250 4KB-pages each are allocated. If you change directories to C:\mylogs, you will see the three files:

 2004-03-10 06:06p 1,032,192 S0000000.LOG 2004-03-10 06:06p 1,032,192 S0000001.LOG 2004-03-10 06:06p 1,032,192 S0000002.LOG 3 File(s) 3,096,576 bytes

Now, let's say you decide to perform the following transaction, which inserts a million records into a table:

 INSERT INTO TABLE1 VALUES(1); INSERT INTO TABLE1 VALUES(2); ... INSERT INTO TABLE1 VALUES(1000000); COMMIT;

DB2 will fill up the first log file, continue with the second log file, and then the third log file. After it fills up the third log file, there are no more primary logs available (remember that LOGPRIMARY is set to 3, so a maximum of three primary logs can exist at any time). At this point DB2 will dynamically allocate a secondary log file. Once the first secondary log file is filled up, DB2 will allocate another secondary log file, and this process will continue until the maximum number of secondary log files, indicated by the database configuration parameter LOGSECOND, is reached. In this example the maximum is two.

At the point when the maximum number of secondary log files is reached, if DB2 still needs more space to complete the transaction, a log full condition occurs. This means there is no more room in the log space to complete the transaction. The first transaction causing this log full condition will be rolled back. (Section 13.2.4, Logging Methods, discusses how DB2 tries to reuse the old logs first, if possible, before creating a new one. DB2 uses an algorithm that reduces the chances of encountering a log full condition.)

Log full is a very undesirable condition: Not only all the work performed up to this point is lost, but roll back may take a long period of time. For this reason, it is important to ensure that you allocate enough log space to accommodate your workload.

Generally, you do not want to allocate a huge number of PRIMARY logs, because they are allocated when a database is activated. If you specify a large number, DB2 will spend a lot of time creating these files; thus your first connection will take a long time. Moreover, if your transaction workload is generally small throughout the day, all that log space will be wasted. Instead, you may specify enough LOGSECOND log files to handle a spike in your workload (e.g., a heavier workload with long transactions at the end of a month).

Another undesirable condition is the log disk full condition. Unlike a log full condition, where DB2 runs out of logging space because the maximum numbers of primary and secondary log files have been reached, a log disk full condition occurs when the file system that hosts the active log directory is physically full, meaning no more log files can be created, even though the maximum numbers of primary and secondary log files may not have been reached. This condition could be caused by the file system being too small or the active log directory becoming filled by too many inactive (aka archive) log files. (You will learn what inactive logs are in the next section.)

By default, a transaction that receives a disk full error will fail and will be rolled back, just as in the case of a log full condition. However, you can change this behavior by setting the database configuration parameter BLK_LOG_DSK_FUL to YES. Setting this parameter to YES causes applications to block or wait instead of rolling back when DB2 encounters a disk full error when writing log files. While the applications are waiting (or blocked), DB2 attempts to create the log file every five minutes until it succeeds. After each attempt, DB2 writes a message to the administration notification log. The way to confirm that your application is blocked because of a log disk full condition is to monitor the administration notification log.

NOTE

The administration notification log is discussed in Chapter 17, Diagnosing Problems.

Until the log file is successfully created, any user application that attempts to update table data will not be able to commit transactions. Read-only queries may not be directly affected; however, if a query needs to access data that is locked by an update request or a data page that is fixed in the buffer pool by the updating application, read-only queries will also appear to hang.

Once you have determined that the hang is caused by a disk full error, you can resolve the situation by moving inactive log files to another file system or by increasing the size of the file system so that hanging applications can complete.

13.2.3. States of Logs

The state of a log is determined by whether the transactions it contains are committed and whether the transactions have been externalized to disk. There are three states of logs: active, online archive, and offline archive.

13.2.3.1 Active Logs

A log is considered active if one of the following applies:

It contains transactions that have not yet been committed or rolled back.
It contains transactions that have been committed but whose changes have not yet been written to the database disk (externalized).

In Figure 13.1, log file 2 is an active log because it contains a transaction that has not been committed (the last UPDATE statement). Log file 2 also contains a transaction that has been committed but has not been externalized to disk (the first two UPDATE statements).

Imagine that at this point a power failure strikes and everything in the log buffer and the buffer pool is wiped out. The only place where you can find a record of these transactions is in the log files. When the database is restarted, it will go through crash recovery. DB2 will first open log file 2 and read its contents. DB2 will redo the transactions that have a commit or rollback, and undo the transactions that do not. Using the example in Figure 13.1, DB2 will redo the first two UPDATE statements, but undo the last UPDATE statement.

Active log files are necessary for crash recovery. If you lose the active logs, crash recovery will fail and the database will be inaccessible.

Active logs typically reside in the active log path. If you have enabled infinite logging, archived log files may need to be retrieved from the archive site. (Infinite logging is covered in section 13.2.4, Logging Methods.)

13.2.3.2 Online Archive Logs

Online archive logs are files that contain only committed, externalized transactions. In other words, they are logs that are no longer active, and therefore no longer needed for crash recovery.

Online archive logs reside in the active log directory as well. This is why they are called "online." The term online archive logs may sound complicated, but all it means is that inactive logs reside in the active log directory.

Although these logs are no longer needed for crash recovery, they are retained for roll forward recovery. You will see why in section 13.2.4, Logging Methods.

13.2.3.3 Offline Archive Logs

A file system has limited space. If all the online archive logs stay in the active log directory, this directory will soon be filled up, causing a disk full condition. Therefore, the online archive logs should be moved out of the active log directory as soon as possible. You can do this manually, or DB2 can invoke a program to do this for you. Once this has been done, these logs become offlinearchive logs . They are called "offline" because they reside outside of the active log directory.

Like the online archive logs, offline archive logs are also retained for roll forward recovery.

13.2.4. Logging Methods

DB2 supports three logging methods: circular logging, archival logging, and infinite active logging.

13.2.4.1 Circular Logging

Circular logging is the default logging mode for DB2. As the name suggests, in this method the logs are reused in a circular mode. For example, if you have three primary logs, DB2 uses them in this order: Log #1, Log #2, Log #3, Log #1, Log #2….

Note that in the above sequence Log #1 and Log #2 are reused. When a log file is reused, its contents are completely overwritten. Therefore, a log can be reused if and only if the transactions it contains have already been committed and externalized to the database disk. In other works, the log must not be an active log. This ensures DB2 will have the necessary logs for crash recovery when needed.

Figure 13.2 shows how circular logging works.

Figure 13.2. Circular logging

During database activation, DB2 creates LOGPRIMARY number of logs. If LOGPRIMARY is 3, during transactions, DB2 fills up the first log file, continues with the second log file, and then the third log file. After it finishes with the third log file, DB2 cannot allocate a new primary log because the maximum number of primary logs is three. DB2 then checks if log #1 has become inactive. If it has, then DB2 reuses log #1. If it has not, then DB2 allocates a secondary log. When another log is needed, DB2 checks to see if any of the primary logs can be reused and repeats the same process until the maximum specified in LOGSECOND is reached, at which point the log full condition occurs.

With this logging method, DB2 reuses or overwrites logs that are no longer active. Although ability to recover from a crash is assured, you cannot reapply the transactions that were in these logs, because they have been overwritten. Therefore, circular logging only supports crash recovery, not roll forward recovery.

13.2.4.2 Archival Logging

Archival logging keeps the logs even after they contain committed and externalized data. To enable this mode, you have to change the LOGARCHMETH1 database configuration parameter to ON. We will discuss the possible values to which this parameter can be set later in this section.

With archival logging, roll forward recovery is supported. The contents of inactive logs are saved rather than overwritten; therefore, they can be reapplied during roll forward recovery. Depending on the value set in LOGARCHMETH1, you can have it saved to various locations. When the log is needed during roll forward recovery, DB2 retrieves it from that location and restores it into the active log directory.

With archival logging, if you have three primary logs in a database, DB2 uses them in this order: Use Log #1, use Log #2, use Log #3, archive Log #1, create and use Log #4, archive Log #2, create and use Log #5.... Notice that the log number increases as new logs are required. Unlike circular logging where log numbers stay the same, only the log content is overwritten.

Figure 13.3 shows how archival logging works.

Figure 13.3. Archival logging

How DB2 archives and retrieves a log file depends on the value set in the LOGARCHMETH1 database parameter. The possible values are OFF, LOGRETAIN, USEREXIT, TSM, and VENDOR.

LOGARCHMETH1 = OFF
This is the default. When set to OFF, archival logging is disabled and circular logging is used.
LOGARCHMETH1 = LOGRETAIN
Let's use the example in Figure 13.3 to demonstrate how LOGRETAIN works. DB2 starts with three primary logs: log #1, log #2, and log #3. When DB2 fills up all these three logs, it checks if log #1 has become inactive. If it has, DB2 creates log #4 as a primary log. The set of primary logs then becomes log #2, log #3, and log #4. If log #1 is still active, DB2 creates log #4 as secondary log. When a new log is needed, DB2 checks if log #2 has become inactive. If it has, DB2 creates log #5 as a primary log. If not, it creates log #5 as a secondary log. The process is repeated until LOGSECOND is reached, at which point the log full condition occurs.
Note with LOGRETAIN, inactive log files are never overwritten. In Figure 13.3, even though log #1 and log #2 have already become inactive, they still remain in the active log directory. (At this point, they are online archive logs.) You have to manually move them to a different location or the active log directory will soon be filled up by these logs. However, you should never delete these logs without making a copy of them somewhere, because they may be needed for roll forward recovery. After logs #1 and #2 have been moved to another location, they become offline archive logs.
LOGARCHMETH1 = USEREXIT
With LOGARCHMETH1 set to USEREXIT, the archive and retrieval of the logs are performed automatically by a user-supplied user exit program called db2uext2.
The user exit program archives a log file to a different location as soon as it becomes full, even if it is still active. Archiving a log file simply means making a copy of it somewhere; the log itself still remains in the active log directory. If the log is still active, DB2 will not reuse it. If the log is inactive, when a new log is required, DB2 will rename it and reuse it.
Once again, let's use the example in Figure 13.3 to explain how the USEREXIT works. DB2 starts with three primary logs: log#1, log #2, and log #3. As soon as these logs are full, DB2 calls the user exit program to archive them. When DB2 needs a new log, it checks to see if log #1 is active. If log #1 is still active, DB2 creates secondary log #4. If log #1 is inactive, DB2 renames log #1 to log #4 and reuses it (instead of creating a new log #4). This helps to eliminate the overhead of creating a new file. There is no loss of data in reusing a log that has been archived, because its copy can always be retrieved when needed.
When logs are needed during recovery, DB2 calls the user exit program to retrieve the necessary logs. Because everything is handled by the user exit program, you should not manipulate the log files manually. Doing so may potentially interfere with the user exit program.
LOGARCHMETH1 = DISK:directory
With this setting archival logging uses a similar algorithm as in USEREXIT. The only difference is instead of calling the user exit program, DB2 will automatically archive the logs from the active log directory to the specified directory. During recovery, DB2 will automatically retrieve these logs back to the active log directory.
LOGARCHMETH1 = TSM:[management class name]
With this setting archival logging uses a similar algorithm as in USEREXIT. The only difference is that the logs will be archived on the local Tivoli Storage Manger (TSM) server. The management class name parameter is optional. If not specified, the default management class is used.
LOGARCHMETH1 = VENDOR:library
With this setting archival logging uses a similar algorithm as in USEREXIT. The only difference is that logs are archived using the specified vendor library.
LOGARCHMETH2
This optional parameter specifies the secondary archive log method. It can be set using the same values as for LOGARCHMETH1. If set, logs will be archived to both this destination and the destination specified by the LOGARCHMETH1 parameter.

NOTE

LOGARCHMETH1 and LOGARCHMETH2 are new database configuration parameters introduced in DB2 Version 8.2. They replace the LOGRETAIN and USEREXIT parameters, which are still supported for backward compatibility. If you update the USEREXIT or LOGRETAIN parameters, LOGARCHMETH1 will automatically be updated and vice versa. However, if you are using either USEREXIT or LOGRETAIN, LOGARCHMETH2 must be set to OFF.

In addition to LOGARCHMETH1 and LOGARCHMETH2, a number of other logging-related database parameters are also introduced in DB2 Version 8.2. Table 13.1 lists them.

Table 13.1. New Logging-Related Database Configuration Parameters in DB2 Version 8.2
New DB CFG Parameters	Description
FAILARCHPATH	Failover archive path. Specifies a third target to archive log files if the primary and secondary archival paths fail. The media must be disk. It is a temporary storage area for the log files until the primary path(s) becomes available again, at which time the log files will be moved from this directory to the primary archive path(s). By moving the log files to this temporary location, log directory full situations might be avoided.
NUMARCHRETRY	Specifies the number of retries attempted on primary target(s) before archiving to FAILARCHPATH. The default is 5.
ARCHRETRYDELAY	Specifies the number of seconds between retry attempts. The default is 20 seconds.
LOGARCHOPT1 and LOGARCHOPT2	Specifies a string that is passed on to the TSM server or vendor APIs. For TSM, this field is used to allow the database to retrieve logs that were generated on a different TSM node or by a different TSM user. The string must be provided in the following format: `"-fromnode=nodename` `-fromowner=ownername"` where `nodename` is the name of the TSM node that originally archived the log files, and `ownername` is the name of the TSM user who originally archived the log files. Each log archive options field corresponds to one of the log archive methods: LOGARCHOPT1 is used with LOGARCHMETH1, and LOGARCHOPT2 is used with LOGARCHMETH2.

13.2.4.3 Infinite Active Logging

Infinite active logging is actually a spin-off of archival logging. With circular logging and archival logging, log space can potentially be filled with active logs. If you have a long-running transaction and do not want to run out of log space in the log path, you can use infinite active logging.

To enable infinite active logging:

Set the LOGSECOND database configuration parameter to -1.
Archive logging must be enabled with one of the automatic archival methods; that is, LOGARCHMETH1 must be set to one of USEREXIT, DISK, TSM, or VENDOR.

When archival logging is enabled, a log is archived as soon as it becomes full. DB2 leaves the log in the log directory until it becomes inactive and then renames the file for reuse. With infinite logging, DB2 still archives the log as soon as it is full, but it does not wait for it to become inactive before it renames the file for reuse. This guarantees that the active log directory will never fill up, because any logs can be reused once they are filled. Note that the use of infinite active logging can prolong crash recovery times as active logs may need to be retrieved from the archive site.

13.2.5. Handling the DB2 Transaction Logs

DB2 logs are crucial for roll forward recovery. A missing or corrupted log will cause roll forward recovery to fail and can potentially render the database unusable. Therefore, we recommend that you do not handle any logs manually.

If it becomes necessary for some reason to handle the logs manually, exercise extra care. Never remove log files based solely on their timestamps. Understanding how logs are timestamped may save you from losing active logs and creating a potential disaster.

When primary logs are created at database activation, they are all given a timestamp based on the activation time. These timestamps do not change until DB2 writes transaction updates to the logs, one log file at a time. These logs are kept in the active log directory, even though they may be empty. For example, if LOGPRIMARY is set to 20, then 20 log files will be created at timestamp A. Suppose transactions begin and write to logs 1 through 10 at timestamps greater than A. At this point in time, you still have 20 logs in the active log directory. Logs 1 through 10 will have timestamps greater than A. Logs 11 through 20 will have timestamps at exactly A. Assume that these logs (logs 1 to 10) span multiple days of work. In this scenario, it is possible to think that logs 11 to 20 are older logs, because of their older timestamps, and can be removed. In fact, these logs are still active logs. If you remove them, and DB2 requires those logs (e.g., the next log required would be log 11), the database will crash and be marked as corrupted. The only way to recover is to restore from a recent backup. Therefore, we highly recommend that you let DB2 handle the logs automatically.

To determine which log files are active and which ones are not, look at the value of the First active log file parameter in the database configuration. All the logs prior to the value are inactive; all the log files starting at the value are activetherefore, you should not touch them.

For example, if the first active log file is S0000005.LOG, then logs 1, 2, 3, and 4 are inactive. All the logs starting at log 5 are active.

13.2.6. Userexit

We have mentioned user exit several times in this chapter. But what exactly is a user exit?

A user exit is a program written in the C language that handles the archival and retrieval of DB2 logs automatically when called by DB2.

DB2 provides two sample user exit programs: db2uext2.cdisk and db2uext2.ctsm. You can customize them and use them in your environment. They are found in:

DB2_install_dir\SQLLIB\samples\c on Windows
DB2_instance_home/sqllib/samples/c on Linux/UNIX

The db2uext2.cdisk program archives logs to a local disk on the database server. The db2uext2.ctsm program archives logs to Tivoli Storage Manager (TSM).

Instructions on how to use them are included in these sample programs. You need a C compiler to compile the programs; the resulting executable is called db2uext2. It must be installed in:

DB2_install_dir\SQLLIB\bin directory on Windows
DB2_instance_home/sqllib/adm directory on Linux/UNIX

To enable it, set the database configuration parameter LOGARCHMETH1 to USEREXIT.

Prior to Version 8.2, a user exit was the only way users could "ask" DB2 to automatically archive and retrieve logs. Users had to archive and retrieve the logs manually using LOGRETAIN, or not use archival logging at all. In Version 8.2, you can specify how you want the logs to be archived and retrieved by setting the LOGARCHMETH1 parameter. For example, LOGARCHMETH1 = DISK replaces the db2uext2.cdisk program, and LOGARCHMETH1=TSM replaces the db2uext2.ctsm program.

There is no need to use user exit anymore, but for compatibility reasons user exit support is still available in Version 8.2.

13.2. DB2 Transaction Logs

13.2.1. Understanding the DB2 Transaction Logs

Figure 13.1. Concept of transaction logging

13.2.2. Primary and Secondary Log Files

13.2.3. States of Logs

13.2.3.1 Active Logs

13.2.3.2 Online Archive Logs

13.2.3.3 Offline Archive Logs

13.2.4. Logging Methods

13.2.4.1 Circular Logging

Figure 13.2. Circular logging

13.2.4.2 Archival Logging

Figure 13.3. Archival logging

Table 13.1. New Logging-Related Database Configuration Parameters in DB2 Version 8.2

13.2.4.3 Infinite Active Logging

13.2.5. Handling the DB2 Transaction Logs

13.2.6. Userexit