Sampling for Performance Data Using PLSQL Procedure


Sampling for Performance Data Using PL/SQL Procedure

The database logoff trigger method for collecting historical performance data comes close to meeting all the requirements for root cause analysis established previously, except it lacks fine-grain data and a SQL statement repository. It looks like you need something between the trace event 10046 and the database logoff trigger. We can live with near fine-grain data, but we must absolutely have low overhead ongoing monitoring capability as well as wait event and SQL statement repositories. To achieve this, sampling may be the best approach. If you can sample every foreground process that connects to the instance at regular intervals and write the data to repository tables, you will have a history of what each process does in the database from about the time it starts to the time it completes. This ability is critical; otherwise you have no history and remain in the dark. The goal is to collect just enough information for root cause analysis.

Some of you may have heard that the Oracle Database 10 g Active Session History (ASH) feature also samples for performance data. That is correct, and we are glad that Oracle is on the same page as we are. The sampling methodology that we are about to share with you is proven and is implemented in many of our databases since Oracle 7.3.4. The DBAs of those databases are less dependent on the trace event 10046 because the data collected through sampling is sufficient for them to perform root cause analyses. This sampling methodology does not replace the event 10046 trace facility; rather, it complements it. Those of you who are not using Oracle Database 10 g can certainly benefit from this method.

We will now show you how to develop this performance data collector in PL/SQL. We will guide you step by step, and provide examples. For the purpose of our discussion, we will refer to this data collector as DC. Following are four areas you must consider when developing DC:

  • Data source

  • Sampling frequency

  • Repositories

  • Events to monitor

Data Source

The best data source for this task is the V$SESSION_WAIT view. This view shows the resources or events for which active sessions are currently waiting. The quality of data that it offers is fine grain, which is most suitable for root cause analysis.

 Name                 Type            Notes 
-------------------- --------------- ---------------
SID NUMBER
SEQ# NUMBER
EVENT VARCHAR2(64)
P1TEXT VARCHAR2(64)
P1 NUMBER
P1RAW RAW(8)
P2TEXT VARCHAR2(64)
P2 NUMBER
P2RAW RAW(8)
P3TEXT VARCHAR2(64)
P3 NUMBER
P3RAW RAW(8)
WAIT_CLASS_ID NUMBER In Oracle Database 10g
WAIT_CLASS# NUMBER In Oracle Database 10g
WAIT_CLASS VARCHAR2(64) In Oracle Database 10g
WAIT_TIME NUMBER
SECONDS_IN_WAIT NUMBER
STATE VARCHAR2(19)

Sampling Frequency

The frequency of sampling affects the quantity of the historical data and determines the type of scheduler. You must choose a sampling frequency that is appropriate for your environment. A batch process can tolerate lower frequency sampling, while an OLTP process needs higher frequency sampling. Lower sampling frequencies yield lesser data volume while higher frequencies yield higher data volume.

If you chose a sampling frequency that is less than a minute, you would probably use a simple Unix shell script that wakes up at the preset intervals to execute the DC procedure to collect performance data. But if you chose a low sampling frequency such as the one-minute intervals, then you could also use the Unix cron or an Oracle SNP process to kick off the DC procedure. One bad thing about the Oracle SNP process, however, is that it is not reliable. It is notorious for sleeping on the job, causing loss of valuable data.

DBAs have the tendency to choose high-frequency sampling, and some may be shocked by the one-minute intervals. Again, don ‚ t forget that the goal is to collect just enough information for root cause analysis and not run into a space problem as you would with the trace event 10046. In batch environments where jobs usually run for over half an hour , it is sufficient to take a snapshot of the V$SESSION_WAIT view at the top of every minute. This provides a minute-by-minute picture or history of what happened to every process in the database. If you supplement this sampling with the database logoff trigger discussed earlier, you will get both the summarized wait events and the detailed minute-by-minute events of all sessions, and you should be able to perform root cause analysis. If your users are unhappy with the performance of a short running job (say, less than 15 minutes), you can still use the trace event 10046 to trace the session.

Before we get into the crux of what to monitor, we want to show you two examples to give you an idea of how this sampling method can help you identify the root causes of performance problems.

In the following real-life example, user ULN8688 wanted to know why his job was taking an unusually long time to complete. He assured the DBA that he didn ‚ t make any code changes. He said the same code had been running for over a year and suddenly it was very slow. The on-call DBA pulled out the minute-by-minute history for that job as shown below and discovered that it was performing full table scans on the ADJUSTMENTS table. The DBA also discovered that all indexes were invalid. It turned out that another DBA had reorganized the table the night before with the ALTER TABLE MOVE command and had not rebuilt the indexes.

 SmplTme                                                       Usernam SID DD/HHMI  EVENT              OBJECT_NAME         HASH_VALUE 
------- --- ------- ----------------- ------------------- ----------
ULN8688 9 31/0808 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0809 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0810 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0811 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0812 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0813 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0814 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0815 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0816 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0817 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0818 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0819 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0820 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0821 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0822 db file scattered ADJUSTMENTS 159831610
ULN8688 9 31/0823 db file scattered ADJUSTMENTS 159831610
. . .

Following is another real-life example of a minute-by-minute history of a process that belonged to the GATEWAY user. The process began execution shortly before 1:59 A.M . The first event captured was the latch free event. A quick glance over the report showed the latch free was the primary wait event in terms of the number of occurrences. By the way, this event also ranked the highest in terms of time waited according to the summarized wait event data collected by the database logoff trigger.

 SmplTme 
Usernam SID DD/HHMI EVENT OBJECT_NAME HASH_VALUE
------- --- ------- ------------------ ------------------- ----------
GATEWAY 13 04/0159 latch free cache buffers chain 249098659
GATEWAY 13 04/0200 latch free cache buffers chain 249098659
GATEWAY 13 04/0201 db file sequential PRODUCT_MASTER 4120235380
GATEWAY 13 04/0202 SQL*Net more data 249098659
GATEWAY 13 04/0203 latch free cache buffers chain 249098659
GATEWAY 13 04/0204 SQL*Net more data 249098659
GATEWAY 13 04/0205 latch free cache buffers chain 4120235380
GATEWAY 13 04/0206 latch free cache buffers chain 249098659
GATEWAY 13 04/0207 db file sequential ORDERDATA_IX04 249098659
GATEWAY 13 04/0208 db file sequential ORDERDATA_IX04 249098659
GATEWAY 13 04/0209 latch free cache buffers chain 4120235380
GATEWAY 13 04/0210 latch free cache buffers chain 249098659
GATEWAY 13 04/0211 latch free cache buffers chain 249098659
GATEWAY 13 04/0212 db file sequential PRODUCT_MASTER_IX02 249098659
GATEWAY 13 04/0213 enqueue TX4 249098659
GATEWAY 13 04/0214 latch free cache buffers chain 4120235380
GATEWAY 13 04/0215 db file sequential ORDERDATA_IX04 249098659
GATEWAY 13 04/0216 enqueue TX4 249098659
GATEWAY 13 04/0217 latch free cache buffers chain 249098659
GATEWAY 13 04/0218 db file sequential ORDERDATA_IX04 249098659
GATEWAY 13 04/0219 latch free cache buffers chain 249098659
GATEWAY 13 04/0220 latch free cache buffers chain 249098659
GATEWAY 13 04/0221 db file sequential RMORDDTL_01 249098659
. . .

Also notice that the latch free waits were competitions for the cache buffers chains child latches. This was discovered by translating the P2 parameter value in the V$SESSION_WAIT view. We will show you how to do the translation in the ‚“Events to Monitor ‚½ section. Based on the latch addresses (P1 parameter) that were collected, we also learned that the contentions were focused on one particular child cache buffers chains latch. (The latch address is not shown in the preceding example due to space constraints.) The information was important because it helped the DBA better understand and deal with the problem by informing that the sessions were competing for one or more blocks that were hashed to the same latch.

Another critical piece of data was the SQL hash value. The SQL statement with hash value 249098659 was a query, and it was associated with most of the latch free wait events and a few db file sequential read events on the ORDERDATA_IX04 object, which was an index. The DBA realized that it normally takes at least two active processes to create a latch free contention , so the DBA queried the repository tables to see if there were concurrent processes that executed the same hash value. The DBA found 14 concurrent processes and all of them executed the same SQL hash value. Bingo! That was a classic case of an application spawning multiple concurrent sessions to execute the same code.

Many application designers and developers fail to understand that only one process can acquire an Oracle latch at any one time and when a process holds a latch, it prevents all other processes from accessing any data blocks that are hashed to the same latch. In their logic, if a single-threaded process completes a task in an hour, then the same task can be completed in six minutes by ten parallel processes. This logic is too simplistic and does not always work when latches are involved.

Repositories

Repository tables are the core components of this sampling methodology. They allow you to look back in time to discover what transpired in the database. The DC data collector populates these tables at predetermined intervals. The following repositories are recommended. You may normalize or denormalize them as you see fit.

  • Wait events repository Keeps track of the wait events that appear in the V$SESSION_WAIT view as encountered by each session chronologically. Therefore, the wait event repository table will have the same attributes as the V$SESSION_WAIT view. Additionally, it should include attributes that define:

    • The sampling date and time.

    • The hash value of the SQL statement that is associated with the wait event.

    • The key that identifies the session (not just the SIDs because they are recycled when sessions log off and are not always associated with the same database users). The key that uniquely identifies a session comprises of SID, SERIAL#, and LOGON_TIME and can be extended to include USERNAME, OSUSER, PADDR (process address), and PROCESS.

    • The database object name that belongs to the wait event. This may be the table name, table partition name, index name, or index partition name for events such as db file scattered read , db file sequential read , buffer busy waits, and free buffer waits; or the latch name for the latch free wait event; or the enqueue name for the enqueue wait event; or the SQL text, procedure name, function name, or package name for events such as library cache pin, library cache lock, and library cache load lock; or the file name for events such as direct path read and direct path write; and so on. The reason this is so important is that it enhances the readability and user-friendliness of the history. Otherwise, the history will show only P1, P2, and P3 values much like the event 10046 trace files, and you will waste a lot of precious troubleshooting time in translating those values, assuming the objects still exist in the database. You can see what we mean by this in the OBJECT_NAME column of the two minute-by-minute examples we discussed earlier.

    • The primary key of the wait event repository table.

  • SQL statement repository Wait events by themselves are of little value. You must evaluate wait events with the SQL statement that generates them. This means each time you capture a wait event, you must also capture the SQL statement that is associated with (or causing) the wait event. Remember, the wait event wouldn ‚ t have occurred without the SQL statement. Together, they can help define the context and get you closer to the real problem. For example, a process waits on the buffer busy waits wait event for a data block that belongs to the EMP table, according to the P1 and P2 parameter values. Let ‚ s say the P3 parameter value is 220, which means multiple sessions are competing to perform DML operations within the same block. Now, that can be an insert, update, or delete statement. You can ‚ t be certain unless you have the SQL statement that is associated with that buffer busy waits wait event. Another important reason for capturing SQL statements is so that you can take them back to the application team and the developers can correctly identify the modules. The SQL statement repository should have the same attribute as the V$SQLTEXT view. In addition, for each SQL statement, the repository should include statistics such as the DISK_READS and BUFFER_GETS from the V$SQLAREA view, and the application module name from the V$SESSION view. Of course, don ‚ t forget the primary key for the repository itself.

Using the information from the wait event and SQL statement repositories, you can investigate what users were doing in the database and the kind of bottlenecks they encountered. The bottlenecks can lead you to the root cause of the performance problem.

Events to Monitor

Lastly, you must decide what wait events you want to monitor. Don ‚ t be too aggressive in the beginning. Start with a few common wait events to get a feel for the historical data and then gradually add new wait events to the list. Fortunately, there are only a handful of common wait events, and they are sufficient for most diagnostics. There are also idle events that you may ignore; we discussed this in Chapter 2. You can begin by making your own list of events that you want to monitor, or simply use the following list:

  • db file sequential read

  • db file scattered read

  • latch free

  • direct path read

  • direct path write

  • enqueue

  • library cache pin

  • buffer busy waits

  • free buffer waits

Once you have decided on the list, it is time to start coding! Once you have created your own tool, you can take it with you anywhere you go, and you don ‚ t have to rely on expensive monitoring tools. Begin by setting up a PL/SQL cursor that camps at the V$SESSION_WAIT view to collect and translate the relevant wait events. You may join the V$SESSION_WAIT view to the V$SESSION view to get the user ‚ s information. An example of the cursor follows :

 -- Remark: This is an incomplete procedure. 
create or replace procedure DC
as
cursor current_event is
select a.sid, a.seq#, a.event, a.p1text,
a.p1, a.p1raw, a.p2text, a.p2,
a.p2raw, a.p3text, a.p3, a.p3raw,
a.wait_time, a.seconds_in_wait, a.state, b.serial#,
b.username, b.osuser, b.paddr, b.logon_time,
b.process, b.sql_hash_value, b.saddr, b.module,
b.row_wait_obj#, b.row_wait_file#, b.row_wait_block#,
b.row_wait_row#
from v$session_wait a, v$session b
where a.sid = b.sid
and b.username is not null
and b.type <> 'BACKGROUND'
and a.event in (
'db file sequential read',
'db file scattered read',
'latch free',
'direct path read',
'direct path write',
'enqueue',
'library cache pin',
'library cache load lock',
'buffer busy waits',
'free buffer waits');
. . .

Using the SQL hash value from the CURRENT_EVENT cursor, you can capture the SQL text from the V$SQLTEXT view as shown next and store it in the appropriate history table. If you are also interested in the SQL statistics, you can get the information from the V$SQLAREA using the same SQL hash value from the cursor.

 -- To extract the SQL text for a given hash value. 
select hash_value, address, piece, sql_text
from v$sqltext
where hash_value = <cursor hash value>
order by piece;

For each wait event supplied by the CURRENT_EVENT cursor, the DC procedure should translate the event parameters into meaningful names . For example, in case of db file scattered read wait event, the DC procedure should translate the P1 and P2 parameters into the object name and partition name if applicable and store the information in the history table. This value-added feature not only improves the readability of the history but, more importantly, it also allows you to focus on troubleshooting. Following is a list of common events and examples of the kind of information that the DC procedure should translate and store in the repository tables.

db file sequential read and db file scattered read

The db file sequential read and db file scattered read are I/O related wait events.

  1. Determine the object name and partition name (if applicable) using the P1 and P2 values supplied by the CURRENT_EVENT cursor:

     select segment_name, partition_name 
    from dba_extents
    where <cursor P2> between block_id and (block_id + blocks - 1)
    and file_id = <cursor P1>;

    Querying the DBA_EXTENTS view in this manner can be slow, especially when many sessions are waiting on db file sequential read and db file scattered read events. This is because DBA_EXTENTS is a complex view that is comprised of many views and base tables. The performance gets worse when the number of extents in the database is high. The following are two alternatives that improve performance.

    • Precreate a working table (regular heap or global temporary table) with the same structure as the DBA_EXTENTS view. Then take a snapshot of the DBA_EXTENTS view once at the beginning of each sampling interval (or once a day depending on the number of extents and their volatility) and insert the data into the working table. In this case, the preceding query can be rewritten to go against the working table instead of the DBA_EXTENTS view.

    • Obtain the object number (OBJ) from the X$BH view using the P1 and P2 from the CURRENT_EVENT cursor. With the object number, you can resolve the object name (NAME) and subobject name (SUBNAME) by querying the DBA_OBJECTS view. An example of the query follows. The caveat to this method is that you must wait for the block to be read into the SGA; otherwise, the X$BH view has no information on the block that is referenced by the P1 and P2. Also, you may not be quick enough to catch the blocks that are being read in by a full table scan operation. This is because full table scans against NOCACHE objects do not flood the buffer cache. A small number of blocks are quickly reused. By the time you get the P1 and P2 values from the V$SESSION_WAIT view, the block in X$BH could have been reused by another block from subsequent reads.

       select distinct a.object_name, a.subobject_name 
      from dba_objects a, sys.x$bh b
      where (a.object_id = b.obj or a.data_object_id = b.obj)
      and b.file# = <cursor P1>
      and b.dbablk = <cursor P2>;
  2. Obtain the SQL statement that is associated with the db file sequential read or db file scattered read event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

latch free

The latch free wait event is related to concurrency and serialization.

  1. Get the name of the latch that is being competed for using the P2 value supplied by the CURRENT_EVENT cursor:

     select name 
    from v$latchname
    where latch# = <cursor P2>;
  2. Obtain the SQL statement that is associated with the latch free event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

direct path read

The direct path read wait event is related to direct read operations that read data into the session ‚ s PGA.

  1. Find out the name of the object the session is reading from using the P1 and P2 values supplied the CURRENT_EVENT cursor:

     select segment_name, partition_name 
    from dba_extents
    where <cursor P2> between block_id and (block_id + blocks - 1)
    and file_id = <cursor P1>;
  2. You can get a rough idea of what the session is doing from the type of database file from which it reads. If the file is a TEMPFILE , then you know the session is reading temporary segments that it previously created through direct path write operations. However, if it is a data file, it is probably the parallel query slave at work. The following query determines the database file name using the P1 value supplied by the CURRENT_EVENT cursor:

     select name 
    from v$datafile
    where file# = <cursor P1>
    union all
    select a.name
    from v$tempfile a, v$parameter b
    where b.name = 'db_files'
    and a.file# + b.value = <cursor P1>;
  3. If the session is reading from the temporary tablespace, find out what type of segment it is as this can also give you an idea of what the session is doing. For example, if the segment is SORT, then you know the sort batch is larger than your SORT_AREA_SIZE (or work area size ), and you may have a SQL statement that employs one or more aggregate functions or the merge-join operation. If the segment is HASH, then you know your HASH_AREA_SIZE is too small for the SQL statement that employs the hash-join operation. However, this doesn ‚ t mean you simply increase the sort and hash memory. You should first optimize the SQL statements. The following query reveals the type of segment that the session is reading using the SADDR (session address) and SERIAL# supplied by the CURRENT_EVENT cursor:

     select distinct decode(ktssosegt, 1,'SORT', 2,'HASH',  3,'DATA', 4,'INDEX', 5,'LOB_DATA', 6,'LOB_INDEX', '  UNDEFINED  ') 
    from sys.x$ktsso
    where inst_id = userenv('instance')
    and ktssoses = <cursor session address>
    and ktssosno = <cursor serial#>;
  4. Obtain the SQL statement that is associated with the direct path read event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

direct path write

The direct path write wait event is related to direct write operations.

  1. Ascertain the name of the object the session is writing to using the P1 and P2 values supplied by the CURRENT_EVENT cursor:

     select segment_name, partition_name 
    from dba_extents
    where <cursor P2> between block_id and (block_id + blocks - 1)
    and file_id = <cursor P1>;
  2. You can get a rough idea of what the session is doing from the type of database file it writes to. If the file is a TEMPFILE, then you know the SQL statement that the session is executing is creating temporary segments. But if it is a data file, then the session is performing a direct path load operation. The following query determines the database file name using the P1 value supplied by the CURRENT_EVENT cursor:

     select name 
    from v$datafile
    where file# = <cursor P1>
    union all
    select a.name
    from v$tempfile a, v$parameter b
    where b.name = 'db_files'
    and a.file# + b.value = <cursor P1>;
  3. If the session is writing to the temporary tablespace, find out what type of segment it is. If the segment is SORT, then you know the sort batch is larger than the SORT_AREA_SIZE (or work area size) in memory and sort runs are being written to the temporary tablespace. You may have a SQL statement that employs one or more aggregate functions or the merge-join operation. If the segment is HASH, then you know the HASH_AREA_SIZE is too small for the SQL statement that performs the hash-join operation. It is also possible for a SQL statement to create both SORT and HASH segments in the temporary tablespace. This is common when the execution plan contains both the hash and merge join, or a hash join with an aggregate function. The following query reveals the type of segment that the session is writing to using the SADDR (session address) and SERIAL# supplied by the CURRENT_EVENT cursor:

     select distinct decode(ktssosegt, 1,'SORT', 2,'HASH',  3,'DATA', 4,'INDEX', 5,'LOB_DATA', 6,'LOB_INDEX', '  UNDEFINED  ') 
    from sys.x$ktsso
    where inst_id = userenv('instance')
    and ktssoses = <cursor session address>
    and ktssosno = <cursor serial#>;
  4. Obtain the SQL statement that is associated with the direct path write event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

enqueue

The enqueue wait event is related to transaction locking.

  1. Discover the requested lock type and mode by deciphering the P1 value supplied by the CURRENT_EVENT cursor.

     select chr(bitand(<cursor P1>,-16777216)/16777215) 
    chr(bitand(<cursor P1>,16711680)/65535) lock_type,
    mod(<cursor P1>,16) lock_mode
    from dual;
  2. Determine the name of the object the enqueue is for using the ROW_WAIT_OBJ# supplied by the CURRENT_EVENT cursor:

     select object_name, subobject_name 
    from dba_objects
    where object_id = <cursor row_wait_obj#>;
  3. Identify the blocking session and the information about the lock that is being held, such as the lock type and lock mode and time held using the P2 and P3 values supplied by the CURRENT_EVENT cursor:

     select a.sid,  a.serial#,  a.username,  a.paddr,  a.logon_time, 
    a.sql_hash_value, b.type, b.lmode, b.ctime
    from v$session a, v$lock b
    where a.sid = b.sid
    and b.id1 = <cursor P2>
    and b.id2 = <cursor P3>
    and b.block = 1;
  4. Obtain the SQL statement that is associated with the enqueue event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

buffer busy waits

The buffer busy waits event is related to read/read, read/write, and write/write contention.

  1. Establish the name of the object that the buffer lock contention is for using the P1 and P2 values supplied by the CURRENT_EVENT cursor:

     select segment_name, partition_name 
    from dba_extents
    where <cursor P2> between block_id and (block_id + blocks - 1)
    and file_id = <cursor P1>;
  2. Find out the type of block using the P1 and P2 values provided by the CURRENT_EVENT cursor. The contention may be for a segment header block, freelist group block, or data block. In Oracle Database 10 g , this information is provided by the P3 parameter. See the ‚“ buffer busy waits ‚½ section in Chapter 6 for more details.

     select segment_type  ' header block' 
    from dba_segments
    where header_file = <cursor P1>
    and header_block = <cursor P2>
    union all
    select segment_type ' freelist group block'
    from dba_segments
    where header_file = <cursor P1>
    and <cursor P2> between header_block + 1 and
    (header_block + freelist_groups)
    and freelist_groups > 1
    union all
    select segment_type ' data block'
    from dba_extents
    where <cursor P2> between block_id and (block_id + blocks - 1)
    and file_id = <cursor P1>
    and not exists (select 1
    from dba_segments
    where header_file = <cursor P1>
    and header_block = <cursor P2>);
  3. Obtain the SQL statement that is associated with the buffer busy waits event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

free buffer waits

The free buffer waits is latency related wait event. Obtain the SQL statement that is associated with the free buffer waits event using the hash value supplied by the CURRENT_EVENT cursor:

 select hash_value, address, piece, sql_text 
from v$sqltext
where hash_value = <cursor hash value>
order by piece;

library cache pin

The library cache pin wait event is related to contention for library cache objects.

  1. Look up the name of the object that the pin is for using the P1RAW value supplied by the CURRENT_EVENT cursor:

     select kglnaobj 
    from x$kglob
    where inst_id = userenv('instance')
    and kglhdadr = <cursor P1RAW>;
  2. Find out who is blocking, the statement that is executing, and the mode in which the object is being held using the P1RAW value supplied by the CURRENT_EVENT cursor:

     select a.sid,  a.serial#,  a.username,  a.paddr,  a.logon_time, 
    a.sql_hash_value, b.kglpnmod
    from v$session a, sys.x$kglpn b
    where a.saddr = b.kglpnuse
    and b.inst_id = userenv('instance')
    and b.kglpnreq = 0
    and b.kglpnmod not in (0,1)
    and b.kglpnhdl = <cursor P1RAW>;
  3. Obtain the SQL statement that is associated with the library cache pin event using the hash value supplied by the CURRENT_EVENT cursor:

     select hash_value, address, piece, sql_text 
    from v$sqltext
    where hash_value = <cursor hash value>
    order by piece;

Pros and Cons

This sampling method is truly a great complement to the event 10046 trace facility. It lets you monitor all foreground connections to the database 24x7 and gives you a snapshot-by-snapshot history of the bottlenecks that each session encountered. When your user inquires why a particular job ran like molasses, you can look into the history to discover the bottlenecks and determine the root cause. The history also enables you to proactively monitor production critical processes and inform your users when you see anomalies. For instance, you may notice a session clocked a lot of time on a particular SQL hash value and the main event was the db file scattered read , so you fetch the SQL text from the history table and tune it. Then you call and offer the user a more efficient SQL statement. What an exceptional level of service! You will have also turned the tide. Instead of the user calling you and blaming the database, you will be calling the user to complain about their SQL code! Needless to say, soon you will be a respected (and perhaps feared) DBA.

Another nice feature is that you can determine the elapsed time of every session from the history by a simple calculation. Let ‚ s say you use the one-minute sampling interval. You can compute the session ‚ s elapsed time by comparing the first and the last historical record of the session. In this case, the margin of error is at most two minutes. The elapsed time can be used to validate users ‚ claims and keep them honest. Many times, when asked how long their job ran, they will give you a highly inflated number.

The disk space requirement depends on the number of connections that are active, their processing time, the number of events you are monitoring, and the amount of history you plan on keeping. If you ignore idle events and limit your history to seven days, this method normally takes less than half a gigabyte of disk space as experienced in our installation sites.

This sampling method requires you to possess a basic working PL/SQL knowledge. This is a mini-development project, and it involves some amount of coding. The overhead introduced by this method strictly depends on the quality of your PL/SQL code. If quality is not an issue, this method has far less overhead than the trace event 10046. You should also implement maintenance and purge routines for the history tables. Consider using partitioning to help maintenance.




Oracle Wait Interface
Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning (Osborne ORACLE Press Series)
ISBN: 007222729X
EAN: 2147483647
Year: 2004
Pages: 114

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net