| < Day Day Up > |
|
This chapter will briefly introduce hardware resource usage tuning, specifically dealing with CPU usage, memory usage, and I/O usage. More of the details on how to tune these various areas of an Oracle installation will be covered in later chapters. Let's start with tuning CPU usage.
CPU usage can be best assessed at peak times and thus peak CPU workload times. What are peak workloads? This depends entirely on your application. An OLTP application is generally active and available 24 h a day, 365 days a year. There may be peak times and quiet times for OLTP databases but any very large highly active OLTP database generally passes batch operation work onto a data warehouse database, wherever possible. DSS-type databases tend to be a mixture of OLTP, client-server, and data warehouse type functionality. Therefore, some databases may have peak workloads for different types of activity at different times of the day. It is quite possible that client-server or OLTP-type service is provided during the day and batch operations are processed at night. The point to be made is this. Do not waste your time looking for something wrong with CPU usage when the CPU is idle. Wait until the CPU is really busy and then try to figure out what is overloading it, if anything. The CPU could simply be busy because busy is what it is meant to be!
So what do we do about tuning CPU usage? Well we cannot actually tune CPU usage itself, only determine what is being processed at any particular time and try to figure out what is causing heavy load or spiking. Then we can perhaps tune what is being processed, and perhaps better distribute processing if there is conflict as a result of different types of processing executing concurrently. Obviously processors can be upgraded and multiple CPU hardware platforms can be purchased. The crux of tuning CPU usage is to find out what is causing the problem and then tune the cause rather than attempt to "do something about the CPU".
Is I/O activity intense and are CPUs very busy? If this is the case then it means the I/O is the same speed as CPU processing time. I/O activity should be much slower than the CPU. The CPU is a much faster piece of hardware than disk storage, without question. Excessive CPU activity can be due to a number of causes. We will concentrate on things related to Oracle Database rather than digging into the operating system too much.
Swapping and paging are not precisely the same thing. A swap or a page fault occurs when a chunk of data is required but that chunk is not in memory. The chunk of data is then retrieved from swap space on disk and exchanged with something else if there is not enough space in RAM. Thus the information is swapped between RAM and a swap file on disk.
Tip | A swap file in Windows is called a paging file. The Windows paging file is added to RAM and constitutes virtual memory. The Windows paging file can be found in the System icon on the Control Panel. |
Swapping or paging between disk "memory" and RAM physical memory is not a problem unless it occurs frequently, generating large amounts of I/O. Sometimes a process can cause a lot of swapping over and over again.
Swapping and paging can cause unnecessary use of the CPU, with respect to Oracle Database, often as a result of something caused within an Oracle Database configuration or an application activity. The obvious solution to swapping and paging is purchasing more RAM, as an easy short-term solution. However, as a system scales, if other problems are continuously placated with hardware upgrades, those hardware upgrades may eventually become very expensive.
Unnecessary CPU use can often be the result of Oracle Database activity and related applications behavior.
Poorly tuned SQL code can cause a lot of re-parsing, which can cause intense CPU activity. Following are parsing and execution statistics.
COL name FORMAT a20; SELECT name, value FROM v$sysstat WHERE name LIKE '%parse%' OR name LIKE '%execute count%'; NAME VALUE -------------------------- -------- parse time cpu 1618 parse time elapsed 15096 parse count (total) 82683 parse count (hard) 293 parse count (failure) 8 execute count 184523
In the next query currently executing SQL code is interrogated. Note that the Product column is a product of both parses and executions. The frequency of parses is more important than just the parses or executions.
COL sql_text FORMAT A38; SELECT * FROM( SELECT parse_calls*executions "Product", parse_calls "Parses" ,executions "Execs", sql_text FROM v$sqlarea ORDER BY 1 DESC) WHERE ROWNUM <= 10; Product Parses Execs SQL_TEXT -------- ----- ----- ------------------------ 53740146 2454 21899 INSERT into cashbookline values(:b3,:b 53740146 2454 21899 INSERT into ordersline values(:b4,:b3, 53740146 2454 21899 INSERT into transactionsline values(:b 53740146 2454 21899 INSERT into stockmovement values(stock 49695750 3250 15291 SELECT to_number (to_char(SYSTIMESTAMP, 40742689 6383 6383 COMMIT 37466641 6121 6121 INSERT into generalledger values (gene 37466641 6121 6121 INSERT into generalledger values (gene 32890000 3250 10120 SELECT systimestamp FROM sys.dual 10575504 3252 3252 select privilege#,level from sysauth$
Increasing the SESSION_CACHED_CURSORS parameter allows caching of more SQL statements per session, potentially helping to reduce parsing. However, use of bind variables in SQL code will reduce parsing much more significantly, especially for highly concurrent OLTP databases where SQL code is highly shareable and if SQL code is poorly tuned.
Missing indexes or too many large composite indexes can cause a lot of full table scans, leading to high I/O, swapping and probably high CPU usage times. Lack of use of indexes is most likely indicated by too many full table scans, particularly those on large tables, as indicated in the following query by full table scans on long tables (table scans (long tables)).
COL name FORMAT a32; SELECT name, value FROM v$sysstat WHERE name IN ('table scans (short tables)' ,'table scans (long tables)','table fetch by rowid'); NAME VALUE --------------------------- ------ table scans (short tables) 40420 table scans (long tables) 111 table fetch by rowed 85673
Problems with overusage of rollback and undo can possibly be resolved by increasing COMMIT command frequency in SQL code and reducing the number of SQL statements in single transactions, reducing transaction size. However, higher commit rates will lead to higher CPU usage. The following query can be used to check rollback segment usage in relation to extension and shrinkage. Similar information can be obtained from the V$UNDOSTAT performance view for automated undo. My database is currently using manual rollback segments. Automated undo tends to be less controllable with disk space. I have very limited disk space. On the other hand, automated undo simply removes much of the tuning requirements for manual rollback. However, it does not hurt to understand how rollback works because rollback and automated undo are essentially performing the same function.
Note | Oracle Database 10 Grid Manual rollback is deprecated. |
COL RBS FORMAT a4; SELECT n.name "RBS", s.extends "Extends", s.shrinks "Shrinks" ,s.wraps "Wraps", s.aveshrink "AveShrink" ,s.aveactive "AveActive" FROM v$rollname n JOIN v$rollstat s USING(usn) WHERE n.name != 'SYSTEM';
This query shows wrapping on all manual rollback segments. Rollback segment wraps are not a serious problem. It is a commonly held misconception that wrapping is reuse of the first extent of the currently used rollback segment, causing a snapshot too old error. On the contrary, wrapping is expanding a transaction into a new extent inside the same rollback segment. Excessive wrapping could imply that storage parameters within a rollback segment might be adjusted upwards to prevent wrapping.
RBS Extends Shrinks Wraps AveShrink AveActive ----- ------- ------- ----- --------- --------- RB00 0 0 13 0 779254 RB01 0 0 12 0 749639 RB02 0 0 12 0 749639 RB03 0 0 12 0 749330 RB04 0 0 13 0 779254 RB05 0 0 12 0 749330 RB06 0 0 12 0 749330 RB07 0 0 12 0 749330 RB08 0 0 13 0 779254 RB09 0 0 11 0 716991 RB10 0 0 13 0 778713 RB11 0 0 13 0 779254 RB12 0 0 12 0 749330 RB13 0 0 12 0 749330 RB14 0 0 12 0 749639 RB15 0 0 12 0 749639
Now let's take a different perspective and examine the amount and spread of activity within rollback segments.
COL RBS FORMAT a4; SELECT n.name "RBS", s.status, s.waits, s.gets, s.writes ,s.xacts "Active Trans" FROM v$rollname n JOIN v$rollstat s USING(usn) WHERE n.name != 'SYSTEM';
This query shows a few interesting things. Transactions are fairly well spread across all the rollback segments, which is good. However, it might be that there are too many transactions per rollback segment. The impact of transactions per rollback segment depends somewhat on transaction size, which is extremely small in my database. There appears to be a fair amount of waiting for space in rollback segments, which is not so good.
RBS STATUS WAITS GETS WRITES Active Trans ----- ------ ----- ----- -------- ------------ RB00 ONLINE 9 6452 10669902 5 RB01 ONLINE 15 6375 10485074 4 RB02 ONLINE 14 6614 10670958 5 RB03 ONLINE 14 6800 10670618 5 RB04 ONLINE 19 6872 10759932 4 RB05 ONLINE 7 6431 10527624 5 RB06 ONLINE 13 6478 10707842 4 RB07 ONLINE 13 6345 10324652 4 RB08 ONLINE 10 6662 11237024 4 RB09 ONLINE 16 6270 9979316 4 RB10 ONLINE 14 6668 11021116 5 RB11 ONLINE 11 6412 10775736 3 RB12 ONLINE 13 6762 10702146 4 RB13 ONLINE 6 6476 10279358 5 RB14 ONLINE 12 6497 10475656 5 RB15 ONLINE 9 6341 10398742 4
Since transaction waits on rollback segments appear to occur let's dig further in that direction. This query will give an impression of rollback segment contention or competition for use of rollback segments. At just over 3% it might be warranted to create more rollback segments. It certainly would not do any harm.
SELECT ROUND(SUM(waits/gets)*100,2)||'%' "Contention" FROM v$rollstat; Contention ---------- 3.34%
I originally had 15 consistently sized rollback segments, which has now been increased to 50 rollback segments. Let's do some further querying to see if there is any improvement.
COL RBS FORMAT a4; SELECT n.name "RBS", s.status, s.waits, s.gets, s.writes ,s.xacts "Active Trans" FROM v$rollname n JOIN v$rollstat s USING(usn) WHERE n.name != 'SYSTEM' ORDER BY 1;
This query looks at the first 10 of 50 rollback segments only. Transactions are now better spread across rollback segments with fewer active transactions in each rollback segment. Additionally there are much fewer waits.
RBS STATUS WAITS GETS WRITES Active Trans ----- ------ ----- ----- -------- ------------ RB00 ONLINE 0 149 158860 1 RB01 ONLINE 0 134 167698 1 RB02 ONLINE 0 145 203588 1 RB03 ONLINE 0 155 186144 2 RB04 ONLINE 0 149 147222 1 RB05 ONLINE 2 276 166638 2 RB06 ONLINE 0 150 151052 1 RB07 ONLINE 0 116 171126 1 RB08 ONLINE 0 145 215020 1 RB09 ONLINE 3 133 202338 1
Looking at contention again, much to my surprise it has actually increased. This is a classic example of a ratio giving a false impression. I just restarted my database so all rollback statistics have been cleared. As time passes this ratio will decrease. When I first started up my database after adding new rollback segments the ratio was over 12%. It has now dropped to around 7%. Over time it will decrease further. Additionally I am currently running over 100 jobs whereas over the last few hours, prior to bouncing my database, I was only running 10 jobs per second. The contention percentage shown previously was misleading, as is the one below. The previous percentage for contention was based on a much smaller load on the database. This current contention value shown here is misleading because it requires a period of time to settle down and build up statistics.
COL contention FORMAT 9999999990; SELECT AVG(xacts) "Trans per RBS" ,ROUND(SUM(waits/gets)*100,2)||'%' "Contention" FROM v$rollstat; Trans per RBS Contention ------------- ---------- 1.35294118 6.92%
This query shows a picture of automated undo segment statistics when executing highly active concurrency processing in my Accounts schema.
COL undob FORMAT 99990; COL trans FORMAT 99990; COL snapshot2old FORMAT 9999999990; SELECT undoblks "UndoB", txncount "Trans" ,maxquerylen "LongestQuery", maxconcurrency "MaxConcurrency" ,ssolderrcnt "Snapshot2Old", nospaceerrcnt "FreeSpaceWait" FROM v$undostat; UndoB Trans LongestQuery MaxConcurrency ----- ----- ------------ -------------- 186 24596 30 66 4152 24339 71 67 3703 18835 58 62 2991 13644 143 74 2734 9175 13 1 2696 5372 12 3 975 1383 12 2
Serious problems with rollback segments can sometimes cause snapshot too old errors. Snapshot too old errors occur when a rollback segment runs out-of-space and uncommitted transactions are overwritten. Do not confuse snapshot too old errors with wrapping in manual rollback segments. Adjust the UNDO_RETENTION parameter to remove snapshot too old errors when using automated undo segments for rollback.
Temporary sort space can be interrogated using a query such as the following query. Increasing the SORT_AREA_SIZE parameter decreases requirements for temporary sort space on disk but potentially increases memory requirements for database connections.
COL username FORMAT a10; COL segtype FORMAT a10; SELECT username, segtype, extents "Extents Allocated" ,blocks "Blocks Allocated" FROM v$tempseg_usage; USERNAME SEGTYPE Extents Allocated Blocks Allocated -------- ------- ----------------- ---------------- ACCSTEST SORT 47 6016
Table or row locking and latch waits can cause waiting, sleeping, and spinning of processes. Locks in the database are possibly indicative of poor data model design or perhaps use of trigger code for event detection. Latch waits can be caused by poor buffer configuration parameters and, once again, poor SQL coding. It is important to note that buffer cache areas can be inappropriately sized in one of four ways. They can be too small, they can be the wrong sizes in relation to other buffers, they can be sized incorrectly according to application requirements, and they can even be too big. Making buffers too small can cause latch contention issues. Very large buffers may require intense management by Oracle Database. There can come a point where the database buffer cache or the shared pool can become so large as to somewhat negate their usefulness. Additionally large buffers increase the temptation to create sloppy SQL code. Many database administrators simply increase buffer sizes when there is a problem, without ever solving the root cause of that problem. Quite often that problem is caused as a result of a low ratio such as the database buffer cache hit ratio. Ratios will be examined in Chapter 18. Ratios provide possible symptoms of problems but are not problems in themselves. Oracle Database tuning documentation often stresses the importance of ratios far too much. Ratios are symptomatic of problems and should not be used as a measurement of performance but rather an indicator of possible performance problems, which are not necessarily directly related to those ratios.
Various performance views can be used to track locks and latch wait events. Wait events, latches, and the Oracle Database Wait Event Interface will be covered in Chapters 19-22. The V$SYSTEM_EVENT performance view can be used to determine wait events for free latches.
COL event FORMAT a30; COL waits FORMAT 9999990; COL timeouts FORMAT 99999990; COL average FORMAT 99999990; SELECT event "Event", time_waited "Total Time", total_waits "Waits", average_wait "Average", total_timeouts "Timeouts" FROM V$SYSTEM_EVENT WHERE event = 'latch free' ORDER BY EVENT; Event Total Time Waits Average Timeouts ---------- ---------- ----- ------- -------- latch free 77421 23645 3 3304
There are many more events than this.
High network activity with many small queries can cause problems with high CPU activity. Be aware that breaking SQL code into small constituent parts is more natural to object designs rather than relational design. On the contrary, reduction of complexity into smaller parts leads to simplicity in tuning, especially when it comes to SQL code. There is a fine balance between too many small transactions and large complex SQL code join statements. Do not be concerned about network activity with SQL*Net too much. If you are sure that your network is the bottleneck take a look at your network, never overcomplicate SQL code to compensate for other problems.
The next query shows idle SQL*Net events. These values can help to describe Oracle network usage activity. However, for an OLTP database server high network activity is normal.
SELECT name FROM v$EVENT_NAME WHERE name LIKE '%SQL*Net%' OR name LIKE '%ispatcher%' OR name LIKE '%virtual circuit%'; NAME ------------------------------- alter system set dispatcher dispatcher shutdown virtual circuit status dispatcher timer dispatcher listen timer SQL*Net message to client SQL*Net message to dblink SQL*Net more data to client SQL*Net more data to dblink SQL*Net message from client SQL*Net more data from client SQL*Net message from dblink SQL*Net more data from dblink SQL*Net break/reset to client SQL*Net break/reset to dblink
| < Day Day Up > |
|