Active Session History (ASH) is absolutely new in Oracle Database 10g. Very simply, it is both the recording and retentionshort and long termof crucial session-state information. The reasoning behind the provisioning of this very interesting data is to allow the performance analyst to perform on-the-spot analysis of a current session as well as to be able to look at the overall breakdown of session-related information in the short term. Session-State InformationYou saw an example of a session-related analysis in Chapter 10, which presented the details of current performance issue and drilled down into the details of a Top Session. This section illustrates how ASH helps collect this information. The Database Control in Oracle Database 10g is, of course, able to roll this up and present it graphically and easily. The session-state information for active sessions is collected by a very low-overhead direct memory sampling of the session-state information that is held in the SGA for all sessions. The display of session-state information is not exactly new, however. Way back, Oracle 7 introduced the V$SESION and V$SESSION_WAIT views, which could be combined to provide many details of the state of a particular session from a snapshot of it. The information, of course, was current at the time these views were queried, and this information was not stored anywhere for later analysis. To analyze this, let's use an example. The query in Listing 11.9 shows how you can obtain these session details. We have listed the various column and other SQL*Plus settings in the query because they control the heading, formatting, and placement, and add value to the query. Listing 11.9. Query Against Views Showing Session-State InformationSQL> set linesize 80 verify off SQL> column sid_serial heading "Sid,Ser#" format a10 SQL> column spid format a5 heading "OSPid" SQL> column process format a6 heading "ClntPr" SQL> column username format a17 heading "DB/OSUser" SQL> column status heading "Status" format a8 SQL> column program heading "Program" format a31 trunc SQL> column machine_prog heading "Machine/Program/Module" format a32 wrap SQL> column last_call_et format 9999.99 heading "LastCallMn" SQL> column logon_time format a18 heading "Logon Time" SQL> column waiting_event format a47 heading "Waiting on event + p1/p2/p3" trunc SQL> select s.sid || ',' || s.serial# sid_serial, p.spid, s.process, 2 s.username || '/' || s.osuser username, s.status, 3 to_char(s.logon_time, 'DD-MON-YY HH24:MI:SS') logon_time, 4 s.last_call_et/60 last_call_et, 5 w.event || ' / ' || w.p1 || ' / ' || w.p2 || ' / ' || w.p3 waiting_event, 6 s.machine || '/' || s.program || '/' || s.module machine_prog 7 from v$process p, v$session s, v$session_wait w 8 where s.paddr=p.addr and s.sid in (&List_of_Oracle_SIDs) 9 and w.sid = s.sid 10 order by s.logon_time; Enter value for list_of_oracle_sids: 180,842 Sid,Ser# OSPid ClntPr DB/OSUser Status Logon Time LastCallMn ---------- ----- ------ ----------------- -------- ------------------ ---------- Waiting on event + p1/p2/p3 Machine/Program/Module ----------------------------------------------- -------------------------------- 180,8310 3156 2368 SYSTEM/oracle ACTIVE 25-JAN-05 06:00:16 .38 db file scattered read / 12 / 75857 / 8 tst10gclnt/dbsnmp@tst10gclnt (TN S V1-V3)/VPXDBA 842,47516 17847 17846 SYS/oracle INACTIVE 25-JAN-05 18:55:06 10.02 SQL*Net message from client / 1650815232 / 1 / tst10gclnt/sqlplus@tst10gclnt (T NS V1-V3)/ Using Session-State InformationThe SQL in Listing 11.9 cleverly determines quite a lot of information from just the V$SESSION and V$SESSION_WAIT views; the V$PROCESS process is joined in order to provide operating-system process information (column OSPid) alone. Let's look in detail at what these columns are so that you can understand how ASH builds on this. The Sid, Ser# (serial number), OSPid, ClntPr (client process ID, if available), Logon Time, and DB/OSUser (database and OS user) columns are pretty self-explanatory, and are available from the V$SESSION and V$PROCESS views. As well, the Machine/Program/Module column is taken directly from the corresponding V$SESSION columns. The MACHINE column is automatically populated and is a good one to look at when determining the source of the connection. The PROGRAM and MODULE columns are populated only when the DBMS_APPLICATION_INFO package is used within the program to record this information. For more details on usage of the DBMS_APPLICATION_INFO, refer to MetaLink Note #67549.1. The V$SESSION.STATUS column shows whether the process was in the ACTIVE state within a database call or in the INACTIVE state after having completed it. A database call is an action being performed due to an SQL executed either directly by the session or indirectly due to a recursive SQL such as a trigger. Obviously, you would want to track all currently ACTIVE sessions. In the case of an ACTIVE session, LastCallMn (V$SESSION.LAST_CALL_ET) shows how many minutes the session has clocked since the last new call the session issued directly or indirectly and is a good measure of long-running SQL queries. In the case of an INACTIVE session, this is a good indication of how long it has been since the client session interacted with the back-end process to make a new call. Of course, the most interesting part is the detailed event information that is available in V$SESSION_WAIT. These columns are described in detail here so that you can understand them both in the context of older versions as well as within the context of Oracle Database 10g. Note that the meanings of these columns have not changed between versions. The columns of interest within V$SESSION_WAIT are listed in Table 11.2.
Hence, using this table, you can conclude that the session with SID 180 performed a db file scattered read event, which is another name for a multiblock read. For this event, the parameter P1 represents the file ID (12), P2 represents the starting block read (block #75857), and P3 represents the number of blocks read. In this case, you know that the DB_FILE_MULTIBLOCK_READ_COUNT parameter, which determines how many blocks should be read in a multiblock read, was set to 8, and hence eight blocks were read into the database buffer cache. Similarly, the session with SID 842 is now inactive, waiting on the SQL*Net message from client, which normally represents a session waiting for input from a user. The Connection to Active Session HistoryIf a simple, static snapshot of the V$SESSION_WAIT and V$SESSION was able to provide so much information, would it not be beneficial to provide a set of snapshots of the session-state information so that a performance analyst could see what occurred within a session in terms of events? The obvious answer is a resounding yes, and this is exactly what Active Session History does. ASH samples session-state information for active sessions once every second and exposes it via a number of views, the details of which you will see later in this section. ASH records these details in a circular memory buffer for quick access. CPU overheads associated with ASH have been claimed to be as low as 0.1% of 1 CPU. This sampling and recording is performed by the AWR-related processes, namely MMNL and MMON. This activity occurs without any latching and serialization, which results in the relatively low overhead for such collections at the expense of some accuracy. As well, every 10th sample is also persisted to disk, effectively recording a snapshot of the session state for every 10th second in which the session was active. The persisted session-state history data is both timestamped and associated with the SNAP_ID of the corresponding AWR snapshot for that period. In fact, as noted before, ASH data is also collected via the same processes and stored within the same repository as AWR snapshots. Thus, using ASH data, you can perform both on-the-spot analysis as well as look back into history and perform retroactive analysis because the ASH data is tied to a snapshot and a period. The advantage of immediate analysis is that you can obtain a larger number of samples directly from the memory buffer, while the data is coarser when viewed later using the persisted copy on disk.
ASH data is used in AWR snapshot and other collections as well. For example, we mentioned previously that AWR tracks and collects high-load SQL that occurs within the snapshot period rather than trolling through V$SQL for previously recorded SQL. It does this based on ASH data for SQL from sessions that either caused a majority of the waits during the period or consumed the most CPU. Other information collected directly from ASH includes "hot" files and segments as well as database metrics. ASH data can be viewed both directly using the views as well as indirectly via OEM Database Control. We look at some of these views in greater detail next. V$ACTIVE_SESSION_HISTORYThis view is the base ASH view that exposes the contents of the ASH buffers and in turn is based on the internal views, such as X$KEWASH and X$ASH. This view holds all the relevant details that both V$SESSION and V$SESSION_WAIT hold, along with some additional columns. Note that the column names for some of the common columns have changed. For example, the SID and SERIAL# columns are now known as SESSION_ID and SESSION_SERIAL# respectively. Table 11.3 describes only the new columns.
V$SESSION_WAIT_HISTORYThis view shows the last 10 waits for any session from V$ACTIVE_SESSION_HISTORY. This view can be accessed only via the SID and does not contain the SERIAL#, which uniquely identifies a session. Thus, when multiple sessions connect and disconnect within a short period of time, you might be getting incorrect datathat is, data from a previous session that completed but whose SID is being reused by a later, newer connection. We suggest ignoring this view and looking up such data from V$ACTIVE_SESSION_HISTORY. Unfortunately, it seems that OEM uses this view, and incorrectly. WRH$_ACTIVE_SESSION_HISTORYThis is the persisted version of the ASH data. As indicated before, the session-state information in V$ACTIVE_SESSION_HISTORY from every 10th sample is written out to disk into this table. Each of these samples is associated with the SNAP_ID of the current AWR snapshot. In fact, output from the query listed in Listing 11.10 will confirm that this is indeed the case. Listing 11.10. Query to Confirm AWR Snapshot Matches ASH Persisted Data[View full width] select s.snap_id, min(begin_interval_time), max(end_interval_time), min(sample_time), max (sample_time) from sys.wrm$_snapshot s, sys.wrh$_active_session_history h where s.snap_id = h.snap_id group by s.snap_id; Size of Active Session History BuffersAs mentioned before, ASH maintains samples of the session-state information in its own buffers within the System Global Area (SGA). The amount of information that ASH maintains could quickly become substantial if not managed properly. Hence, ASH maintains a fixed size circular buffer that is allocated in the SGA at system startup using the formula:
In other words, it is at least 1MB and goes up to a maximum of 30MB. In between these extremes, the size is determined by the number of CPUs (No. of CPUs * 2) or by the size of the shared pool (5% of the shared Pool size). Because ASH collects session-state information for active sessions only, it is obvious that the buffer will become full more quickly when there are more active sessions in the database. By default, this persistence, or writing out to disk, is set to occur when the buffers are 66% full. When this occurs, every 10th ASH record for each session is associated with the current snapshot and written out to disk as the rows in the WRH$_ACTIVE_SESSION_HISTORY table. This occurs in the same manner of persisting AWR data to disk. The ASH buffer space is then released for capturing newer session-state information.
An Example of Spot Analysis Using ASHTo showcase how you can use ASH from SQL, let's use the same example of high-load SQL that we used in Chapter 10. The SQL query runs against the HR schema and joins EMPLOYEES and its related tables. In this example, the query was run simultaneously from two SQL*Plus sessions. The result was that both sessions contended on the cache buffer chain latch because they were trying to access the same blocks in memory while performing occasional physical reads. Listing 11.11 shows the query used to display the ASH data. The session and serial numbers of the sessions as they occurred in the tests are shown. Listing 11.12 shows the output of the query, partially snipped showing a time of interest in which this contention could be seen. Lines in this listing are numbered and these line numbers are used in the explanation that follows. Listing 11.11. Query to Show Many Details from V$ACTIVE_SESSION_HISTORY for Two Known Contending Sessionscolumn time format a8 column sid format a9 column event format a25 trunc column wt_time format 999999 column tm_waitd format 99999 column seq# format 9999 select to_char(sample_time,'HH24:MI:SS') time, session_id ||','|| session_serial# sid, session_state s_state,event,seq#, wait_time wt_time,time_waited tm_waitd from v$active_session_history where session_id in (136,158) and session_serial# in (67,11953) order by session_id,sample_time; Listing 11.12. Output of Query in Listing 11.111 TIME SID S_STATE EVENT SEQ# WT_TIME TM_WAITD 2 -------- --------- ------- ------------------------- ----- ------- -------- 3 <snip - First session> 4 23:53:09 136,67 ON CPU SQL*Net message from clie 34 1134 0 5 23:53:10 136,67 ON CPU SQL*Net message from clie 34 1134 0 6 23:53:11 136,67 ON CPU latch: cache buffers chai 35 1949 0 7 23:53:12 136,67 ON CPU latch: cache buffers chai 36 4 0 8 23:53:13 136,67 ON CPU latch: cache buffers chai 36 4 0 9 23:53:14 136,67 ON CPU latch: cache buffers chai 38 19462 0 10 23:53:15 136,67 WAITING latch: cache buffers chai 40 0 19506 11 23:53:16 136,67 ON CPU latch: cache buffers chai 40 19506 0 12 <snip> 13 23:53:25 136,67 ON CPU latch: cache buffers chai 49 19502 0 14 23:53:26 136,67 ON CPU latch: cache buffers chai 49 19502 0 15 23:53:27 136,67 ON CPU db file sequential read 51 84 0 16 23:53:28 136,67 ON CPU latch: cache buffers chai 52 5 0 17 23:53:29 136,67 ON CPU latch: cache buffers chai 54 19445 0 18 23:53:30 136,67 ON CPU latch: cache buffers chai 54 19445 0 19 23:53:31 136,67 ON CPU latch: cache buffers chai 56 19468 0 20 23:53:32 136,67 ON CPU latch: cache buffers chai 58 2 0 21 23:53:33 136,67 ON CPU latch: cache buffers chai 58 2 0 22 <snip - Second session> 23 23:53:08 158,11953 ON CPU db file sequential read 206 26 0 24 23:53:09 158,11953 ON CPU db file sequential read 206 26 0 25 23:53:10 158,11953 ON CPU latch: cache buffers chai 216 19485 0 26 <snip> 27 23:53:25 158,11953 WAITING latch: cache buffers chai 253 0 13112 28 23:53:26 158,11953 ON CPU latch: cache buffers chai 257 19642 0 29 23:53:27 158,11953 ON CPU latch: cache buffers chai 262 19493 0 30 23:53:28 158,11953 ON CPU latch: cache buffers chai 264 4 0 31 23:53:29 158,11953 ON CPU latch: cache buffers chai 266 19583 0 32 23:53:30 158,11953 ON CPU latch: cache buffers chai 267 4 0 Let us now analyze the output shown in Listing 11.12. You can arrive at a number of conclusions from this data. The Wait event in lines 4 and 5 show that the session was waiting on SQL*Net message from client. Although this event is classified as an idle event, you know that the session was active because ASH data would not have otherwise been collected! This is clarified by the STATE column, which shows ON CPU for these two lines. You can thus understand that between 23:59:09 and 23:59:10, the session was actually consuming CPU waiting on the client (the SQL*Plus session in this case), which in turn was processingprobably formatting the data sent across, which consumes some CPU. In lines 710, you see that the session was waiting on the latch: cache buffer chain event. Lines 7 and 8 look the same, so you can conclude that the session was using CPU but never posted a Wait event during this period. In fact, from the nature and operation of a latch, you can conclude that the process was spinning on the latch. This is a basic operating system call where you can perform a set-and-test operation, where the process idly spins for a short period and tests for the latch before yielding the CPU. Although this spinning reduces operating-system process context switching (which is expensive in itself), it does consume some CPU. Monitoring of this process from the operating system supported the conclusion that a significant amount of user CPU was consumed by this process. You see a state change between lines 8 and 9; the event is the same, but the SEQ# column has moved from 36 to 38. What happened to 37? Actually, ASH missed recording the event; the sampling period wasn't small enough to catch all the events. This looks like a weakness in ASH, and we will deal with this later. Line 10 implies some waiting on the same latch, probably for a latch sleep. The TIME_WAITED is now posted and is taken into account in other counters, such as V$SESSION_EVENT and V$SYSTEM_EVENT, as well as in V$LATCH and V$LATCH_CHILDREN. We hope you are now able to see how this information is rolled up from ASH into higher levels. Note that in the definition of TIME_WAITED, we stated that "If a Wait event lasted for more than a second and was caught waiting in more than one session sample row, then the actual time spent waiting for that Wait event will be populated in the last of those session sample rows." You can see this occur in line 10. Let us now look at lines 15, 23, 24, and 29. Except line 15, all are from the second session. And although you do see file I/O in the form of db file sequential read events from both sessions, you also see the contention on the cache buffer chain latch. Further analysis of other data, such as P1, P2, and P3 (not shown due to size and formatting limitations) helped you understand that these two sessions may be contending for the same blocks, and the solution may be to schedule them to run one after another. You can also summarize information from V$ACTIVE_SESSION_HISTORY to obtain summary counts that will help you quickly get to the main bottleneck at hand, and support the conclusion arrived at here. This is made clear in the query and output shown in Listing 11.13. Listing 11.13. Query to Confirm Conclusions Using Summaries on V$ACTIVE_SESSION_HISTORYSQL> column event format a35 trunc SQL> select session_id sid, event, session_state state, 2 count(*) times, sum(time_waited) tm_waited 3 from v$active_session_history 4 where program like 'sqlplus%' 5 group by session_id, event, session_state; SID EVENT STATE TIMES TM_WAITED ---------- ----------------------------------- ------- ---------- ---------- 136 db file sequential read ON CPU 32 0 136 SQL*Net message from client ON CPU 9 0 136 latch: cache buffers chains ON CPU 426 0 136 latch: cache buffers chains WAITING 15 430403 158 db file sequential read ON CPU 2 0 158 SQL*Net message from client ON CPU 17 0 158 latch: cache buffers chains ON CPU 466 0 158 latch: cache buffers chains WAITING 16 718216 It is clear from this listing that the most waiting was on the latch: cache buffer chains event. In fact, you can conclude that both sessions spent almost 100% of their time on this latch, and were in fact contending for the same buffers in memory. This becomes apparent when the session is run alone. The output from the same query when run alone is shown in Listing 11.14. Listing 11.14. Query to Show Wait Events When Run AloneSQL> column event format a35 trunc SQL> select session_id sid, event, session_state state, 2 count(*) times, sum(time_waited) tm_waited 3 from v$active_session_history 4 where session_id = 151 and session_serial# = 365 5 group by session_id, event, session_state; Sid EVENT STATE TIMES TM_WAITED ---- ----------------------------------- ------- ------------ ------------ 151 db file sequential read ON CPU 105 0 151 SQL*Net message from client ON CPU 107 0 This query shows that minimal if no waiting was recorded in the wait history for that session, and most work was only on the CPUprobably due to formatting and displaying the data when run alone by itself. You should be able to cross-check these findings using the Time and Wait model views at the session level, namely the V$SESSION_WAIT_CLASS view using the SESSION and SERIAL# values. Note, however, that the data for that session in this view exists only as long as the said session is connected and alive. In contrast, the ASH data is available even after the session has disconnected. Using OEM to Obtain Active Session HistoryYou can use OEM Database Control to obtain session statistics from ASH. There are available from the Top Sessions link in the Performance screen or from any session-related drill-down. Figure 11.5 shows the Session Details screen for one session numbered 150. Figure 11.5. The Session Details screen.Note that the status, current SQL, and so on are obtained and displayed from the V$SESSION view. The menu includes the following:
Upon clicking the Wait Events link in the menu, the last 10 waits suffered by the session are displayed, as shown in Figure 11.6. Note, however, that these waits are shown regardless of whether the session was active. As well, this information is not from V$ACTIVE_SESSION_HISTORY, but from V$SESSION_WAIT_HISTORY; hence, you might be looking at incorrect data as explained when we discussed the latter view. Figure 11.6. The Wait Events screen.Dumping ASH Buffer ContentsAs per the previously mentioned MetaLink Note #243132.1, it is possible to dump the contents of the ASH buffers. The command does not work as stated, however. The difference is minor and has to do with the statement in the MetaLink Note that the number specified in the command is the number of minutes, when it seems to be the number of seconds instead. To dump the contents of the ASH buffers, obtain a login as SYS. Then run the commands as shown in Listing 11.15. Listing 11.15. Dumping Contents of the ASH Buffers to TraceSQL> oradebug setmypid Statement processed. SQL> oradebug dump ashdump 10000 Statement processed. The trace file can be found in the UDUMP directory. The file is named according to the convention of your platform, as with any other user trace file. Although the MetaLink Note suggests that the number specified is the number of minutes of historical data, it seems that only a large number produces all the output. ORADEBUG is a special command available in the SYS login to produce a variety of dumps and other traces. The file that is produced, after some header information is removed, is a comma-separated value (CSV) file that can be loaded into a Microsoft Excel spreadsheet or even another database table using SQL*Loader for analysis. The partial contents are shown in Listing 11.16. You will notice that there is a header line that specifies the full column list of V$ACTIVE_SESSION_HISTORY. Listing 11.16. Partial Contents of the ASH Dump*** 2005-01-30 23:22:37.981 *** SERVICE NAME:(SYS$USERS) 2005-01-30 23:22:37.979 *** SESSION ID:(140.360) 2005-01-30 23:22:37.979 <<<ACTIVE SESSION HISTORY - PROCESS TRACE DUMP HEADER BEGIN>>> DBID, INSTANCE_NUMBER, SAMPLE_ID, SAMPLE_TIME, SESSION_ID, SESSION_SERIAL#, USER _ID, SQL_ID, SQL_CHILD_NUMBER, SQL_PLAN_HASH_VALUE, SERVICE_HASH, SESSION_TYPE, SQL_OPCODE, QC_SESSION_ID, QC_INSTANCE_ID, CURRENT_OBJ#, CURRENT_FILE#, CURRENT_ BLOCK#, EVENT_ID, SEQ#, P1, P2, P3, WAIT_TIME, TIME_WAITED, PROGRAM, MODULE, ACT ION, CLIENT_ID <<<ACTIVE SESSION HISTORY - PROCESS TRACE DUMP HEADER END>>> <<<ACTIVE SESSION HISTORY - PROCESS TRACE DUMP BEGIN>>> 845721274,1,3064021,"01-30-2005 23:22:35.178099000",166,1,0,"",0,0,165959219,2,0 ,0,0,4294967295,0,0,866018717,536,300,0,0,3004674,0,"oracle@ussccpvcs01t (CKPT)" ,"","","" <snipped> 845721274,1,3063605,"01-30-2005 23:15:26.638099000",170,1,0,"",0,0,165959219,2,0 ,0,0,4294967295,0,0,3539483025,7,300,0,0,2936573,0,"oracle@ussccpvcs01t (PMON)", "","","" 845721274,1,3063597,"01-30-2005 23:15:18.398099000",152,1,22,"c1599tjmzu1fc",0,0 ,3427055676,1,0,0,0,8548,3,34825,1148964757,1500,8552,2429532560,5,5008574,0,"em agent@ussccpvcs01t (TNS V1-V3)","emagent@ussccpvcs01t (TNS V1-V3)","","" <snipped> 845721274,1,3063515,"01-30-2005 23:13:53.928099000",145,9,54,"2b064ybzkwf1y",0,0 ,1594506128,1,47,0,0,47741,3,19916,218649935,295,2432172472,2432129528,0,1005847 ,0,"OMS","OEM.SystemPool","","" 845721274,1,3063453,"01-30-2005 23:12:50.068099000",150,19,58,"b5v2yu6f47cw2",0, 4106914013,3427055676,1,3,0,0,49788,5,10659,2652584166,252,5,10659,1,63,0,"sqlpl us@ussccpvcs01t (TNS V1-V3)","SQL*Plus","","" 845721274,1,3063452,"01-30-2005 23:12:49.038099000",150,19,58,"b5v2yu6f47cw2",0, 4106914013,3427055676,1,3,0,0,49788,5,10659,2652584166,252,5,10659,1,63,0,"sqlpl us@ussccpvcs01t (TNS V1-V3)","SQL*Plus","","" The Downside of ASHAlthough the overhead for collection of ASH data is minimal, access to the ASH data in human-readable format, either via the OEM interface or the more traditional SQL access, is not without the normal overhead of SQL processing, including the need for a connection, some latching, as well as associated CPU usage. On a badly overloaded system, you may not be able to obtain a new SQL*Plus or OEM session in order to get to ASH data. As well, when the system is heavily contending on the shared pool and library cache latches, your query may take a long time to parse and execute. Indeed, frequent queries to ASH data may impose additional waits for these latches. This is not the case with an external program that reads SGA directly. ASH, however, needs to run within the context of Oracle and hence will have to follow all the rules of the Oracle kernel as far as a normal process is concerned. A workaround is to maintain a perpetual connection to the database using an account that has privileges to query ASH data. If you used precompiled code in a PL/SQL package that is pinned in the SGA and can produce canned ASH queries, then you can even get around the parsing issues. It is very possible, indeed probable, to lose some wait information because the sampling rate is only once a second by default. You saw an example of this issue in the previous example. So how can this be countered? The answer lies in using Extended SQL Trace, better known as 10046 trace. Using this method, all the Wait events as well as the SQL executed by the session are dumped directly to the trace file, which can later be viewed and processed. These trace files are cumbersome to track and process, however. As well, you can only perform such a trace as the problem is occurring, not in a retrospective manner as with ASH. |