Chapter 15: Hardware Resource Usage Tuning | Oracle High Performance Tuning for 9i and 10g

< Day Day Up >

This chapter will briefly introduce hardware resource usage tuning, specifically dealing with CPU usage, memory usage, and I/O usage. More of the details on how to tune these various areas of an Oracle installation will be covered in later chapters. Let's start with tuning CPU usage.

15.1 Tuning Oracle CPU Usage

CPU usage can be best assessed at peak times and thus peak CPU workload times. What are peak workloads? This depends entirely on your application. An OLTP application is generally active and available 24 h a day, 365 days a year. There may be peak times and quiet times for OLTP databases but any very large highly active OLTP database generally passes batch operation work onto a data warehouse database, wherever possible. DSS-type databases tend to be a mixture of OLTP, client-server, and data warehouse type functionality. Therefore, some databases may have peak workloads for different types of activity at different times of the day. It is quite possible that client-server or OLTP-type service is provided during the day and batch operations are processed at night. The point to be made is this. Do not waste your time looking for something wrong with CPU usage when the CPU is idle. Wait until the CPU is really busy and then try to figure out what is overloading it, if anything. The CPU could simply be busy because busy is what it is meant to be!

So what do we do about tuning CPU usage? Well we cannot actually tune CPU usage itself, only determine what is being processed at any particular time and try to figure out what is causing heavy load or spiking. Then we can perhaps tune what is being processed, and perhaps better distribute processing if there is conflict as a result of different types of processing executing concurrently. Obviously processors can be upgraded and multiple CPU hardware platforms can be purchased. The crux of tuning CPU usage is to find out what is causing the problem and then tune the cause rather than attempt to "do something about the CPU".

15.1.1 Busy I/O and Intense CPU Activity

Is I/O activity intense and are CPUs very busy? If this is the case then it means the I/O is the same speed as CPU processing time. I/O activity should be much slower than the CPU. The CPU is a much faster piece of hardware than disk storage, without question. Excessive CPU activity can be due to a number of causes. We will concentrate on things related to Oracle Database rather than digging into the operating system too much.

Swapping and Paging

Swapping and paging are not precisely the same thing. A swap or a page fault occurs when a chunk of data is required but that chunk is not in memory. The chunk of data is then retrieved from swap space on disk and exchanged with something else if there is not enough space in RAM. Thus the information is swapped between RAM and a swap file on disk.

Tip

A swap file in Windows is called a paging file. The Windows paging file is added to RAM and constitutes virtual memory. The Windows paging file can be found in the System icon on the Control Panel.

Swapping or paging between disk "memory" and RAM physical memory is not a problem unless it occurs frequently, generating large amounts of I/O. Sometimes a process can cause a lot of swapping over and over again.

Swapping and paging can cause unnecessary use of the CPU, with respect to Oracle Database, often as a result of something caused within an Oracle Database configuration or an application activity. The obvious solution to swapping and paging is purchasing more RAM, as an easy short-term solution. However, as a system scales, if other problems are continuously placated with hardware upgrades, those hardware upgrades may eventually become very expensive.

15.1.2 Possible Oracle Database Causes of High CPU Activity

Unnecessary CPU use can often be the result of Oracle Database activity and related applications behavior.

Poorly Tuned SQL Code

Poorly tuned SQL code can cause a lot of re-parsing, which can cause intense CPU activity. Following are parsing and execution statistics.

COL name FORMAT a20; SELECT name, value FROM v$sysstat WHERE name LIKE '%parse%' OR name LIKE '%execute count%';     NAME                           VALUE --------------------------  -------- parse time cpu                  1618 parse time elapsed             15096 parse count (total)            82683 parse count (hard)               293 parse count (failure)              8 execute count                 184523

In the next query currently executing SQL code is interrogated. Note that the Product column is a product of both parses and executions. The frequency of parses is more important than just the parses or executions.

COL sql_text FORMAT A38; SELECT * FROM( SELECT parse_calls*executions "Product", parse_calls "Parses"    ,executions "Execs", sql_text FROM v$sqlarea ORDER BY 1 DESC) WHERE ROWNUM <= 10;     Product  Parses  Execs  SQL_TEXT --------  -----  -----  ------------------------ 53740146  2454   21899  INSERT into cashbookline                            values(:b3,:b 53740146  2454   21899  INSERT into ordersline                            values(:b4,:b3, 53740146  2454   21899  INSERT into transactionsline                           values(:b 53740146  2454   21899  INSERT into stockmovement                            values(stock 49695750  3250   15291  SELECT to_number                           (to_char(SYSTIMESTAMP, 40742689  6383   6383   COMMIT 37466641  6121   6121   INSERT into generalledger                           values (gene 37466641  6121   6121   INSERT into generalledger                           values (gene 32890000  3250   10120  SELECT systimestamp FROM sys.dual 10575504  3252   3252   select privilege#,level from sysauth$

Increasing the SESSION_CACHED_CURSORS parameter allows caching of more SQL statements per session, potentially helping to reduce parsing. However, use of bind variables in SQL code will reduce parsing much more significantly, especially for highly concurrent OLTP databases where SQL code is highly shareable and if SQL code is poorly tuned.

Poor Index Usage

Missing indexes or too many large composite indexes can cause a lot of full table scans, leading to high I/O, swapping and probably high CPU usage times. Lack of use of indexes is most likely indicated by too many full table scans, particularly those on large tables, as indicated in the following query by full table scans on long tables (table scans (long tables)).

COL name FORMAT a32; SELECT name, value FROM v$sysstat WHERE name IN ('table scans (short tables)'   ,'table scans (long tables)','table fetch by rowid');     NAME                          VALUE ---------------------------  ------ table scans (short tables)    40420 table scans (long tables)       111 table fetch by rowed          85673

Rollback and Undo

Problems with overusage of rollback and undo can possibly be resolved by increasing COMMIT command frequency in SQL code and reducing the number of SQL statements in single transactions, reducing transaction size. However, higher commit rates will lead to higher CPU usage. The following query can be used to check rollback segment usage in relation to extension and shrinkage. Similar information can be obtained from the V$UNDOSTAT performance view for automated undo. My database is currently using manual rollback segments. Automated undo tends to be less controllable with disk space. I have very limited disk space. On the other hand, automated undo simply removes much of the tuning requirements for manual rollback. However, it does not hurt to understand how rollback works because rollback and automated undo are essentially performing the same function.

Note

Oracle Database 10 Grid Manual rollback is deprecated.

COL RBS FORMAT a4; SELECT n.name "RBS", s.extends "Extends", s.shrinks    "Shrinks"   ,s.wraps "Wraps", s.aveshrink "AveShrink"   ,s.aveactive "AveActive" FROM v$rollname n JOIN v$rollstat s USING(usn) WHERE n.name != 'SYSTEM';

This query shows wrapping on all manual rollback segments. Rollback segment wraps are not a serious problem. It is a commonly held misconception that wrapping is reuse of the first extent of the currently used rollback segment, causing a snapshot too old error. On the contrary, wrapping is expanding a transaction into a new extent inside the same rollback segment. Excessive wrapping could imply that storage parameters within a rollback segment might be adjusted upwards to prevent wrapping.

RBS    Extends  Shrinks  Wraps  AveShrink  AveActive -----  -------  -------  -----  ---------  --------- RB00         0        0     13          0     779254 RB01         0        0     12          0     749639 RB02         0        0     12          0     749639 RB03         0        0     12          0     749330 RB04         0        0     13          0     779254 RB05         0        0     12          0     749330 RB06         0        0     12          0     749330 RB07         0        0     12          0     749330 RB08         0        0     13          0     779254 RB09         0        0     11          0     716991 RB10         0        0     13          0     778713 RB11         0        0     13          0     779254 RB12         0        0     12          0     749330 RB13         0        0     12          0     749330 RB14         0        0     12          0     749639 RB15         0        0     12          0     749639

Now let's take a different perspective and examine the amount and spread of activity within rollback segments.

COL RBS FORMAT a4; SELECT n.name "RBS", s.status, s.waits, s.gets, s.writes       ,s.xacts "Active Trans" FROM v$rollname n JOIN v$rollstat s USING(usn) WHERE n.name != 'SYSTEM';

This query shows a few interesting things. Transactions are fairly well spread across all the rollback segments, which is good. However, it might be that there are too many transactions per rollback segment. The impact of transactions per rollback segment depends somewhat on transaction size, which is extremely small in my database. There appears to be a fair amount of waiting for space in rollback segments, which is not so good.

RBS    STATUS  WAITS   GETS    WRITES  Active Trans -----  ------  -----  -----  --------  ------------ RB00   ONLINE      9   6452  10669902             5 RB01   ONLINE     15   6375  10485074             4 RB02   ONLINE     14   6614  10670958             5 RB03   ONLINE     14   6800  10670618             5 RB04   ONLINE     19   6872  10759932             4 RB05   ONLINE      7   6431  10527624             5 RB06   ONLINE     13   6478  10707842             4 RB07   ONLINE     13   6345  10324652             4 RB08   ONLINE     10   6662  11237024             4 RB09   ONLINE     16   6270   9979316             4 RB10   ONLINE     14   6668  11021116             5 RB11   ONLINE     11   6412  10775736             3 RB12   ONLINE     13   6762  10702146             4 RB13   ONLINE      6   6476  10279358             5 RB14   ONLINE     12   6497  10475656             5 RB15   ONLINE      9   6341  10398742             4

Since transaction waits on rollback segments appear to occur let's dig further in that direction. This query will give an impression of rollback segment contention or competition for use of rollback segments. At just over 3% it might be warranted to create more rollback segments. It certainly would not do any harm.

SELECT ROUND(SUM(waits/gets)*100,2)||'%' "Contention" FROM   v$rollstat;     Contention ---------- 3.34%

I originally had 15 consistently sized rollback segments, which has now been increased to 50 rollback segments. Let's do some further querying to see if there is any improvement.

COL RBS FORMAT a4; SELECT n.name "RBS", s.status, s.waits, s.gets, s.writes       ,s.xacts "Active Trans" FROM v$rollname n JOIN v$rollstat s USING(usn) WHERE n.name != 'SYSTEM' ORDER BY 1;

This query looks at the first 10 of 50 rollback segments only. Transactions are now better spread across rollback segments with fewer active transactions in each rollback segment. Additionally there are much fewer waits.

RBS    STATUS  WAITS   GETS    WRITES  Active Trans -----  ------  -----  -----  --------  ------------ RB00   ONLINE      0    149    158860             1 RB01   ONLINE      0    134    167698             1 RB02   ONLINE      0    145    203588             1 RB03   ONLINE      0    155    186144             2 RB04   ONLINE      0    149    147222             1 RB05   ONLINE      2    276    166638             2 RB06   ONLINE      0    150    151052             1 RB07   ONLINE      0    116    171126             1 RB08   ONLINE      0    145    215020             1 RB09   ONLINE      3    133    202338             1

Looking at contention again, much to my surprise it has actually increased. This is a classic example of a ratio giving a false impression. I just restarted my database so all rollback statistics have been cleared. As time passes this ratio will decrease. When I first started up my database after adding new rollback segments the ratio was over 12%. It has now dropped to around 7%. Over time it will decrease further. Additionally I am currently running over 100 jobs whereas over the last few hours, prior to bouncing my database, I was only running 10 jobs per second. The contention percentage shown previously was misleading, as is the one below. The previous percentage for contention was based on a much smaller load on the database. This current contention value shown here is misleading because it requires a period of time to settle down and build up statistics.

COL contention FORMAT 9999999990; SELECT       AVG(xacts) "Trans per RBS"       ,ROUND(SUM(waits/gets)*100,2)||'%' "Contention" FROM v$rollstat;     Trans per RBS   Contention -------------   ----------    1.35294118        6.92%

This query shows a picture of automated undo segment statistics when executing highly active concurrency processing in my Accounts schema.

COL undob FORMAT 99990; COL trans FORMAT 99990; COL snapshot2old FORMAT 9999999990; SELECT undoblks "UndoB", txncount "Trans"  ,maxquerylen "LongestQuery", maxconcurrency   "MaxConcurrency"  ,ssolderrcnt "Snapshot2Old", nospaceerrcnt "FreeSpaceWait" FROM v$undostat;     UndoB   Trans   LongestQuery   MaxConcurrency -----   -----   ------------   --------------  186    24596             30               66 4152    24339             71               67 3703    18835             58               62 2991    13644            143               74 2734     9175             13                1 2696     5372             12                3  975     1383             12                2

Serious problems with rollback segments can sometimes cause snapshot too old errors. Snapshot too old errors occur when a rollback segment runs out-of-space and uncommitted transactions are overwritten. Do not confuse snapshot too old errors with wrapping in manual rollback segments. Adjust the UNDO_RETENTION parameter to remove snapshot too old errors when using automated undo segments for rollback.

Temporary Sort Space

Temporary sort space can be interrogated using a query such as the following query. Increasing the SORT_AREA_SIZE parameter decreases requirements for temporary sort space on disk but potentially increases memory requirements for database connections.

COL username FORMAT a10; COL segtype FORMAT a10; SELECT username, segtype, extents "Extents Allocated"  ,blocks "Blocks Allocated" FROM v$tempseg_usage;     USERNAME   SEGTYPE   Extents Allocated   Blocks Allocated --------   -------   -----------------   ---------------- ACCSTEST      SORT                  47               6016

Row Locks and Latch Waits

Table or row locking and latch waits can cause waiting, sleeping, and spinning of processes. Locks in the database are possibly indicative of poor data model design or perhaps use of trigger code for event detection. Latch waits can be caused by poor buffer configuration parameters and, once again, poor SQL coding. It is important to note that buffer cache areas can be inappropriately sized in one of four ways. They can be too small, they can be the wrong sizes in relation to other buffers, they can be sized incorrectly according to application requirements, and they can even be too big. Making buffers too small can cause latch contention issues. Very large buffers may require intense management by Oracle Database. There can come a point where the database buffer cache or the shared pool can become so large as to somewhat negate their usefulness. Additionally large buffers increase the temptation to create sloppy SQL code. Many database administrators simply increase buffer sizes when there is a problem, without ever solving the root cause of that problem. Quite often that problem is caused as a result of a low ratio such as the database buffer cache hit ratio. Ratios will be examined in Chapter 18. Ratios provide possible symptoms of problems but are not problems in themselves. Oracle Database tuning documentation often stresses the importance of ratios far too much. Ratios are symptomatic of problems and should not be used as a measurement of performance but rather an indicator of possible performance problems, which are not necessarily directly related to those ratios.

Various performance views can be used to track locks and latch wait events. Wait events, latches, and the Oracle Database Wait Event Interface will be covered in Chapters 19-22. The V$SYSTEM_EVENT performance view can be used to determine wait events for free latches.

COL event FORMAT a30; COL waits FORMAT 9999990; COL timeouts FORMAT 99999990; COL average FORMAT 99999990; SELECT event "Event", time_waited "Total Time", total_waits    "Waits", average_wait "Average", total_timeouts    "Timeouts" FROM V$SYSTEM_EVENT WHERE event = 'latch free' ORDER BY EVENT;     Event        Total Time   Waits   Average   Timeouts ----------   ----------   -----   -------   -------- latch free        77421   23645         3       3304

There are many more events than this.

High Network Activity

High network activity with many small queries can cause problems with high CPU activity. Be aware that breaking SQL code into small constituent parts is more natural to object designs rather than relational design. On the contrary, reduction of complexity into smaller parts leads to simplicity in tuning, especially when it comes to SQL code. There is a fine balance between too many small transactions and large complex SQL code join statements. Do not be concerned about network activity with SQL*Net too much. If you are sure that your network is the bottleneck take a look at your network, never overcomplicate SQL code to compensate for other problems.

The next query shows idle SQL*Net events. These values can help to describe Oracle network usage activity. However, for an OLTP database server high network activity is normal.

SELECT name FROM v$EVENT_NAME WHERE name LIKE '%SQL*Net%' OR name LIKE '%ispatcher%' OR name LIKE '%virtual circuit%';     NAME ------------------------------- alter system set dispatcher dispatcher shutdown virtual circuit status dispatcher timer dispatcher listen timer SQL*Net message to client SQL*Net message to dblink SQL*Net more data to client SQL*Net more data to dblink SQL*Net message from client SQL*Net more data from client SQL*Net message from dblink SQL*Net more data from dblink SQL*Net break/reset to client SQL*Net break/reset to dblink

< Day Day Up >