Evaluating Performance

 < Day Day Up > 



NetBackup Server Performance

This section lists some factors to consider when you evaluate the NetBackup server component of the NetBackup data transfer path to identify possible changes that may improve the overall performance of NetBackup.

  • Number and Size of Shared Data Buffers. By default, NetBackup uses eight shared data buffers for a multiplexed backup, 16 shared data buffers for a non-multiplexed backup, 12 shared data buffers for a multiplexed restore, and 16 shared data buffers for a non-multiplexed restore.

To change these settings, create the following file(s):

 <install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS  <install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS_RESTORE 

These files contain a single integer specifying the number of shared data buffers NetBackup will use.

If the NUMBER_DATA_BUFFERS file exists, its contents will be used to determine the number of shared data buffers to be used for multiplexed and non-multiplexed backups.

If the NUMBER_DATA_BUFFERS_RESTORE file exists, its contents will be used to determine the number of shared data buffers to be used for multiplexed restores.

By default, NetBackup uses 64 KB (65536 bytes) as the size of each shared data buffer. A single tape I/O operation is performed for each shared data buffer. Therefore, this size must not exceed the maximum block size for the tape device or operating system. For Windows systems, the maximum block size is generally 64 KB, although in some cases customers are using a larger value successfully.

For this reason, the terms ‘tape block size' and ‘shared data buffer size' are synonymous in this context.

The NetBackup media server will query the tape device for its maximum block size, and cause the backup operation to fail if the shared data buffer size exceeds the value that is returned.

Note: NOTE

Some tape devices may not reliably return this information. Therefore, it is critical to perform both backup and restore testing if the shared data buffer size value is changed. If all NetBackup media servers are not running in the same operating system environment, it is critical to test restores on each of the NetBackup media servers that may be involved in a restore operation. For example, if a UNIX NetBackup media server is used to write a backup to tape with a shared data buffer (block size) of 256 KB, then it is possible that a Windows NetBackup media server will not be able to read that tape. In general, we strongly recommend you test restore as well as backup operations.

To change the size of the shared data buffers, create the following file:

 <install_path>\NetBackup\db\config\SIZE_DATA_BUFFERS 

This file contains a single integer specifying the size of each shared data buffer in bytes. For example, to use a shared data buffer size of 32 KB, the file would contain the integer 32768.

Note that the size of the shared data buffers used for a restore operation is determined by the size of the shared data buffers in use at the time the backup was written. This file is not used by restores.

In general, the number and size of the shared data buffers can be used to calculate the amount of shared memory required by NetBackup using this formula:

(number_data_buffers * size_data_buffers) * number_tape_drives * max_multiplexing_setting

For example, assume that the number of shared data buffers is 16, the size of the shared data buffers is 64 KB, there are two tape drives, and the maximum multiplexing setting is four. Following the formula above, the amount of shared memory required by NetBackup is:

 (65536 * 16) * 2 * 4 = 8 MB 

See below for information about how to determine if you should change these settings.

  • Parent/Child Delay Values. Although rarely changed, it is possible to modify the parent and child delay values for a process.

To change these values, create the following files:

 <install_path>\NetBackup\db\config\PARENT_DELAY  <install_path>\NetBackup\db\config\CHILD_DELAY 

These files contain a single integer specifying the value in milliseconds to be used for the delay corresponding to the name of the file. For example, to use a parent delay of 50 milliseconds, the PARENT_DELAY file would contain the integer 50.

See below for more information about how to determine if you should change these values.

The following section refers to the bptm process on the media server during back up and restore operations from a tape storage device. If you are backing up to or restoring from a disk storage device, substitute bpdm for bptm throughout the section. For example, to activate debug logging for a disk storage device, the following directory must be created:

 <install_path>\NetBackup\logs\bpdm 

Using NetBackup Wait and Delay Counters

During a backup or restore operation the NetBackup media server uses a set of shared data buffers to isolate the process of communicating with the tape from the process of interacting with the disk or network. Through the use of Wait and Delay counters, you can determine which process on the NetBackup media server, the data producer or the data consumer, has to wait more often.

Achieving a good balance between the data producer and the data consumer processes on the NetBackup media server is an important factor in achieving optimal performance from the NetBackup server component of the NetBackup data transfer path.

Understanding the Two-Part Communication Process

The two-part communication process differs depending on whether the operation is a backup or restore and whether the operation involves a local client or a remote client.

Local Clients

When the NetBackup media server and the NetBackup client are part of the same system, the NetBackup client is referred to as a local client.

  • Backup of Local Client. For a local client, the bpbkar32 process reads data from the disk during a backup and places it in the shared buffers. The bptm process reads the data from the shared buffer and writes it to tape.

  • Restore of Local Client. During a restore of a local client, the bptm process reads data from the tape and places it in the shared buffers. The tar32 process reads the data from the shared buffers and writes it to disk.

Remote Clients

When the NetBackup media server and the NetBackup client are part of two different systems, the NetBackup client is referred to as a remote client.

  • Backup of Remote Client. The bpbkar32 process on the remote client reads data from the disk and writes it to the network. Then a child bptm process on the media server receives data from the network and places it in the shared buffers. The parent bptm process on the media server reads the data from the shared buffers and writes it to tape.

  • Restore of Remote Client. During the restore of the remote client, the parent bptm process reads data from the tape and places it into the shared buffers. The child bptm process reads the data from the shared buffers and writes it to the network. The tar32 process on the remote client receives the data from the network and writes it to disk.

Roles of Processes during Backup and Restore Operations

When a process attempts to use a shared data buffer, it first verifies that the next buffer in order is in a correct state. A data producer needs an empty buffer, while a data consumer needs a full buffer. The following chart provides a mapping of processes and their roles during backup and restore operations:

OPERATION

DATA PRODUCER

DATA CONSUMER

Local Backup

bpbkar32

bptm

Remote Backup

bptm (child)

bptm (parent)

Local Restore

bptm

tar32

Remote Restore

bptm (parent)

bptm (child)

If a full buffer is needed by the data consumer but is not available, the data consumer increments the Wait and Delay counters to indicate that it had to wait for a full buffer. After a delay, the data consumer will check again for a full buffer. If a full buffer is still not available, the data consumer increments the Delay counter to indicate that it had to delay again while waiting for a full buffer. The data consumer will repeat the delay and full buffer check steps until a full buffer is available.

This sequence is summarized in the following algorithm:

 while (Buffer_Is_Not_Full) {            ++Wait_Counter;     while (Buffer_Is_Not_Full) {       ++Delay_Counter;       delay (DELAY_DURATION);        }     } 

If an empty buffer is needed by the data producer but is not available, the data producer increments the Wait and Delay counter to indicate that it had to wait for an empty buffer. After a delay, the data producer will check again for an empty buffer. If an empty buffer is still not available, the data producer increments the Delay counter to indicate that it had to delay again while waiting for an empty buffer. The data producer will relate the delay and empty buffer check steps until an empty buffer is available.

The algorithm for a data producer has a similar structure:

 while (Buffer_Is_Not_Empty) {     ++Wait_Counter;     while (Buffer_Is_Not_Empty) {       ++Delay_Counter;  delay (DELAY_DURATION);        }     } 

Analysis of the Wait and Delay counter values indicates which process, producer or consumer, has had to wait most often and for how long.

There are four basic Wait and Delay Counter relationships:

  • Data Producer >> Data Consumer. The data producer has substantially larger Wait and Delay counter values than the data consumer.

    The data consumer is unable to receive data fast enough to keep the data producer busy. Investigate means to improve the performance of the data consumer. For a back up operation, check if the data buffer size is appropriate for the tape drive being used (see below).

    If data consumer still has a substantially large value in this case, try increasing the number of shared data buffers to improve performance (see below).

  • Data Producer = Data Consumer. The data producer and the data consumer have very similar Wait and Delay counter values, but those values are relatively large.

    This may indicate that the data producer and data consumer are regularly attempting to use the same shared data buffer. Try increasing the number of shared data buffers to improve performance (see below).

  • Data Producer = Data Consumer. The data producer and the data consumer have very similar Wait and Delay counter values, but those values are relatively small.

    This indicates that there is a good balance between the data producer and data consumer, which should yield good performance from the NetBackup server component of the NetBackup data transfer path.

  • Data Producer << Data Consumer. The data producer has substantially smaller Wait and Delay counter values than the data consumer.

    The data producer is unable to deliver data fast enough to keep the data consumer busy. Investigate means to improve the performance of the data producer. For a restore operations, check if the data buffer size (see below) is appropriate for the tape drive being used.

    If the data producer still has a relatively large value in this case, try increasing the number of shared data buffers to improve performance (see below).

The points above describe the four basic relationships possible. Of primary concern is the relationship and the size of the values. Information on determining substantial versus trivial values appears on the following pages. The relationship of these values only provides a starting point in the analysis. Additional investigative work may be needed to positively identify the cause of a bottleneck within the NetBackup data transfer path.

Determining Wait and Delay Counter Values

Wait and Delay counter values can be found by creating and reading debug log files on the NetBackup media server.

Note: NOTE

Writing the debug log files introduces some additional overhead and will have a small impact on the overall performance of NetBackup. This impact will be more noticeable for a high verbose level setting. Normally, you should not need to run with debug logging enabled on a production system.

To determine Wait and Delay counter values for a local client backup:

  1. Activate debug logging by creating these two directories on the media server:

     <install_path>\NetBackup\Logs\bpbkar  <install_path>\NetBackup\Logs\bptm 

  2. Execute your backup.

  3. Look at the log for the data producer (bpbkar32) process in:

     <install_path>\NetBackup\Logs\bpbkar 

    The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the backup:

      waited 224 times for empty buffer, delayed 254 times 

    In this example the Wait counter value is 224 and the Delay counter value is 254.

  4. Look at the log for the data consumer (bptm) process in:

     <install_path>\NetBackup\Logs\bptm 

    The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the backup:

      waited for full buffer 1 times, delayed 22 times 

    In this example, the Wait counter value is 1 and the Delay counter value is 22.

To determine Wait and Delay counter values for a remote client backup:

  1. Activate debug logging by creating this directory on the media server

     <install_path>\NetBackup\Logs\bptm 

  2. Execute your backup.

  3. Look at the log for the bptm process in:

     <install_path>\NetBackup\Logs\bptm 

    Delays associated with the data producer (bptm child) process will appear as follows:

      waited for empty buffer 22 times, delayed 151 times,  

    In this example, the Wait counter value is 22 and the Delay counter value is 151.

    Delays associated with the data consumer (bptm parent) process will appear as:

      waited for full buffer 12 times, delayed 69 times 

    In this example the Wait counter value is 12, and the Delay counter value is 69.

To determine Wait and Delay counter values for a local client restore:

  1. Activate logging by creating the following two directories on the NetBackup media server:

     <install_path>\NetBackup\Logs\bptm 

    and

     <install_path>\NetBackup\Logs\tar 

  2. Execute your restore.

  3. Look at the log for the data consumer (tar32) process in:

     <install_path>\NetBackup\Logs\tar 

    The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the restore:

      waited for full buffer 27 times, delayed 79 times 

    In this example, the Wait counter value is 27, and the Delay counter value is 79.

  4. Look at the log for the data producer (bptm) process in:

     <install_path>\NetBackup\Logs\bptm 

    The line you are looking for should be similar to the following, and will have a timestamp corresponding to the completion time of the restore:

      waited for empty buffer 1 times, delayed 68 times 

    In this example, the Wait counter value is 1 and the delay counter value is 68.

To determine Wait and Delay counter values for a remote client restore:

  1. Activate debug logging by creating the following directory on the media server:

     <install_path>\NetBackup\Logs\bptm 

  2. Execute your restore.

  3. Look at the log for bptm in:

     <install_path>\NetBackup\Logs\bptm 

  4. Delays associated with the data consumer (bptm child) process will appear as follows:

      waited for full buffer 36 times, delayed 139 times 

    In this example, the Wait counter value is 36 and the Delay counter value is 139.

    Delays associated with the data producer (bptm parent) process will appear as follows:

      waited for emtpy buffer 95 times, delayed 513 times 

    In this example the Wait counter value is 95 and the Delay counter value is 513.

Note: NOTE

When you run multiple tests, you can rename the current log file. NetBackup will automatically create a new log file, which prevents you from erroneously reading the wrong set of values.

Deleting the debug log file will not stop NetBackup from generating the debug logs. You must delete the entire directory. For example, to stop bptm logging, you must delete the bptm subdirectory. NetBackup will automatically generate debug logs at the specified verbose setting whenever the directory is detected.

Using Wait and Delay Counter Values to Analyze Problems

You can use the bptm debug log file to verify that the following tunable parameters have successfully been set to the desired values. You can use these parameters and the Wait and Delay counter values to analyze problems. These additional values include:

  • Data buffer size. The size of each shared data buffer can be found on a line similar to:

      io_init: using 65536 data buffer size 

  • Number of data buffers. The number of shared data buffers may be found on a line similar to:

      io_init: using 16 data buffers 

  • Parent/child delay values. The values in use for the duration of the parent and child delays can be found on a line similar to:

      io_init: child delay = 20, parent delay = 30 (milliseconds) 

  • NetBackup Media Server Network Buffer Size. The values in use for the Network Buffer Size parameter on the media server can be found on lines similar to these (may only be part of 4.5 debug log files):

    The receive network buffer is used by the bptm child process to read from the network during a remote backup.

     setting receive network buffer to 263168 bytes 

    The send network buffer is used by the bptm child process to write to the network during a remote restore.

     setting send network buffer to 131072 bytes 

    See the section on NetBackup Network Performance for more information about the Network Buffer Size parameter on the media server.

Suppose you wanted to analyze a local backup in which there was a 30-minute data transfer duration baselined at 5 MB/sec with a total data transfer of 9,000 MB. Because a local backup is involved, if you refer to the table under 'Roles of Processes during Backup and Restore Operations,' you can determine that bpbkar32 is the data producer and bptm is the data consumer.

You would next want to determine the Wait and Delay values for bpbkar32 and bptm by following the procedures described in the section 'Determining Wait and Delay Counter Values.' For this example, suppose those values were:

PROCESS

WAIT

DELAY

bpbkar32

29364

58033

bptm

95

105

Using these values, you can determine that the bpbkar32 process is being forced to wait by a bptm process which cannot move data out of the shared buffer fast enough.

Next, you can determine time lost due to delays by multiplying the Delay counter value by the parent or child delay value, whichever applies.

In this example, the bpbkar32 process uses the child delay value, while the bptm process uses the parent delay value. (The defaults for these values are 20 for child delay and 30 for parent delay.) The values are specified in milliseconds. See 'Parent/Child Delay Values' under the 'NetBackup Server Performance' section for more information on how to modify these values.

Use the following equations to determine the amount of time lost due to these delays:

bpbkar32

= 58033 delays X 0.020 seconds

 

= 1160 seconds

 

= 19 minutes 20 seconds

bptm

= 105 X 0.030 seconds

 

= 3 seconds

This is useful in determining that the delay duration for the bpbkar32 process is significant. If this delay were entirely removed, the resulting transfer time of 10:40 (total transfer time of 30 minutes minus delay of 19 minutes and 20 seconds) would indicate a throughput value of 14 Mb/sec, nearly a threefold increase. This type of performance increase would warrant expending effort to investigate how the tape drive performance can be improved.

The number of delays should be interpreted within the context of how much data was moved. As the amount of data moved increases, the significance threshold for counter values increases as well.

Again, using the example of a total of 9,000 MB of data being transferred, assume a 64 KB buffer size. You can determine the total number of buffers to be transferred using the following equation:

Number_Kbytes

= 9,000 X 1024

 

= 9,216,000 KB

Number_Slots

= 9,216,000 / 64

 

= 144,000

The Wait counter value can now be expressed as a percentage of the total divided by the number of buffers transferred:

bpbkar32

= 29364 / 144,000

 

= 20.39%

bptm

= 95 / 144,000

 

= 0.07%

In this example, in the 20 percent of cases where the bpbkar32 process needed an empty shared data buffer, that shared data buffer has not yet been emptied by the bptm process. A value this large indicates a serious problem, and additional investigation would be warranted to determine why the data consumer (bptm) is having problems keeping up.

In contrast, the delays experienced by bptm are insignificant for the amount of data transferred.

You can also view the Delay and Wait counters as a ratio:

bpbkar32

= 58033/29364

 

= 1.98

In this example, on average the bpbkar32 process had to delay twice for each wait condition that was encountered. If this ratio is substantially large, you may wish to consider increasing the parent or child delay value, whichever one applies, to avoid the unnecessary overhead of checking for a shared data buffer in the correct state too often. Conversely, if this ratio is close to 1, you may wish to consider reducing the applicable delay value to check more often and see if that increases your data throughput performance. Keep in mind that the parent and child delay values are rarely changed in most NetBackup installations.

The preceding information explains how to determine if the values for Wait and Delay counters are substantial enough for concern. The Wait and Delay counters are related to the size of data transfer. A value of 1,000 may be extreme when only 1 MB of data is being moved. The same value may indicate a well-tuned system when gigabytes of data are being moved. The final analysis must determine how these counters affect performance by considering such factors as how much time is being lost and what percentage of time a process is being forced to delay.

Common Restore Performance Issues

This section details performance problems often seen with restore actions.

  • Improper Multiplex Settings. If multiplexing is too high, needless tape searching may occur. The ideal setting is the minimum needed to stream the drives.

  • NetBackup Catalog Performance. The disk subsystem where the NetBackup catalog resides has a large impact on the overall performance of NetBackup. To improve restore performance, configure this subsystem for fast reads.

  • Fragment Size. The fragment size affects where tape markers are placed and how many tape markers are used. Fewer tape marks can slow recovers if a fast locate block is not available. SCSI fast locate block positioning can help. A typical fragment size setting is 2048 MB.

  • MPX_RESTORE_DELAY setting. NetBackup can perform multiple restores at the same time from a single multiplexed tape. The default delay setting is 30 seconds. If multiple restore requests are submitted within the time window indicated by this delay setting, they will be considered as candidates to be run at the same time, if possible. This may be a useful parameter to change if multiple stripes from a large database backup are multiplexed together on the same tape.



 < Day Day Up > 



Implementing Backup and Recovery(c) The Readiness Guide for the Enterprise
Implementing Backup and Recovery: The Readiness Guide for the Enterprise
ISBN: 0471227145
EAN: 2147483647
Year: 2005
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net