log file parallel write

The log file parallel write wait event has three parameters: files, blocks, and requests . In Oracle Database 10 g , this wait event falls under the System I/O wait class. Keep the following key thoughts in mind when dealing with the log file parallel write wait event.

The log file parallel write event belongs only to the LGWR process.
A slow LGWR can impact foreground processes commit time.
Significant log file parallel write wait time is most likely an I/O issue.

Common Causes, Diagnosis, and Actions

As the db file parallel write wait event belongs only to the DBWR process, the log file parallel write wait event belongs only to the LGWR process. When it is time to write, the LGWR process writes the redo buffer to the online redo logs by issuing a series of system write calls to the operating system. The LGWR process waits for the writes to complete on the log file parallel write event. The LGWR process looks for redo blocks to write once every three seconds, at commit, at rollback, when the _LOG_IO_SIZE threshold is met, when there is 1MB worth of redo entries in the log buffer, and when posted by the DBWR process.

Although user sessions never experience the log file parallel write wait event, they can be impacted by a slow LGWR process. A slow LGWR process can magnify the log file sync waits, which user sessions wait on during commits or rollbacks . User sessions will not get the commit or rollback complete acknowledgement until the LGWR has completed the writes. Chapter 7 has more details on the log file sync wait event.

The key database statistics to look at are the systemwide TIME_WAITED and AVERAGE_WAIT of the log file parallel write and log file sync wait events because they are interrelated:

 select event, time_waited, average_wait 
 from   v$system_event 
 where  event in ('log file parallel write','log file sync'); 
 
 EVENT                     TIME_WAITED AVERAGE_WAIT 
 ------------------------- ----------- ------------ 
 log file parallel write      11315158   .508570816 
 log file sync                 7518513   .497255756

If the log file parallel write average wait time is greater than 10ms (or 1cs), this normally indicates slow I/O throughput. The cure is the same as for the db file parallel write waits. Enable asynchronous writes if your redo logs are on raw devices and the operating system supports asynchronous I/O. For redo logs on file systems, use synchronous direct writes. Unfortunately, you cannot spawn more than one LGWR process. In this case, it is critical that nothing else is sharing the mount point of the redo log files. Make sure the controller that services the mount point is not overloaded. Moving the redo logs to higher speed disks will also help. We strongly suggest that you avoid putting redo logs on RAID5 disks, but we also understand that many times you don ‚ t have a choice or a say in it. You can vent your frustration at www.baarf.com.

Besides improving the I/O throughput, you can also work on lowering the amount of redo entries. This will provide some relief, but not the cure. Whenever possible, use the NOLOGGING option. Indexes should be created or rebuilt with the NOLOGGING option. CTAS operations should also use this option.

Note ‚

The NOLOGGING option doesn ‚ t apply to normal DML operations such as inserts , updates, and deletes. Objects created with the NOLOGGING option are unrecoverable unless a backup is taken prior to the corruption. If you have to take an additional backup, then the I/Os that you save by not logging will be spent on backup. Database in FORCE LOGGING mode will log all changes (except for changes in temporary tablespaces), regardless of the tablespace and object settings.

A lower commit frequency at the expense of higher rollback segment usage can also provide some relief. A high commit frequency causes the LGWR process to be overactive and when coupled with slow I/O throughput will only magnify the log file parallel write waits. The application may be processing a large set of data in a loop and committing each change, which causes the log buffer to be flushed too frequently. In this case, modify the application to commit at a lower frequency. There could also be many short sessions that log in to the database, perform a quick DML operation, and log out. In this case, the application design may need to be reviewed. You can find out who is committing frequently with the following query:

 select sid, value 
 from   v$sesstat 
 where  statistic# = (select statistic# 
 from   v$statname 
 where  name = 'user commits') 
 order by value; 
 
 -- Another evidence of excessive commits is high redo wastage. 
 select b.name, a.value, round(sysdate - c.startup_time) days_old 
 from   v$sysstat a, v$statname b, v$instance c 
 where  a.statistic# = b.statistic# 
 and    b.name      in ('redo wastage','redo size'); 
 
 NAME                      VALUE        DAYS_OLD 
 --------------- --------------- --------------- 
 redo size          249289419360               5 
 redo wastage         2332945528               5

Check the job scheduler to see if hot backups run during peak hours. They can create large amounts of redo entries, which in turn increases the log file parallel write waits. Hot backups should run during off-peak hours, and tablespaces should be taken out of hot backup mode as soon as possible.

Lastly, be careful not to jam the LGWR with too many redo entries at one time. This can happen with large log buffer because the one-third threshold is also larger and holds more redo entries. When the one-third threshold is met, the LGWR process performs a background write if it is not already active. And the amount of redo entries may be too much for the LGWR to handle at one time, causing extended log file parallel write waits. So the idea is to stream the LGWR writes. This can be done by lowering the one-third threshold, which is controlled by the initialization parameter _LOG_IO_SIZE. By default the _LOG_IO_SIZE is one-third of the LOG_BUFFER or 1MB, whichever is less, expressed in log blocks. Query the X$KCCLE.LEBSZ for the log block size. Typically, it is 512 bytes. For example, if the LOG_BUFFER is 2,097,152 bytes (2MB), and the log block size is 512 bytes, then the default value for _LOG_IO_SIZE is 1,365 used log blocks. At this size, the LGWR process becomes lazy and normally writes only on transaction terminations (sync writes) or when it wakes up from its three-second timeouts. You should set the _LOG_IO_SIZE at the equivalent of 64K. That way, you can still have a larger log buffer to accommodate the spikes for buffer space after checkpoints, but the writes will start when there is 64K worth of redo entries in the buffer, assuming there is no user commit or rollback, and the LGWR sleep hasn ‚ t timed out during that time.

Note ‚

This method is not without overhead. The LGWR write operation requires the redo copy and redo writing latches. So a more active LGWR process will increase the load on these latches. Do not reduce the _LOG_IO_SIZE if these latches currently have high SLEEPS. However, if the condition allows you to change the _LOG_IO_SIZE, you must monitor its impact over time by querying the V$LATCH view. Make sure you obtain a baseline before implementing the change.

You can use the following query to find the average number of redo log blocks per write and the average LGWR I/O size in bytes:

 select round((a.value / b.value) + 0.5,0) as avg_redo_blks_per_write, 
 round((a.value / b.value) + 0.5,0) * c.lebsz as avg_io_size 
 from   v$sysstat a, v$sysstat b, x$kccle c 
 where  c.lenum = 1 
 and    a.name  = 'redo blocks written' 
 and    b.name  = 'redo writes'; 
 
 AVG_REDO_BLKS_PER_WRITE AVG_IO_SIZE 
 ----------------------- ----------- 
 8        8192