|< Day Day Up >|| |
We begin by looking at one of the key components of these enhancements in Oracle Database 10g-the Automatic Workload Repository, or AWR. For those of you who are familiar with StatsPack from previous releases, AWR will have a familiar look and feel to it. For those who are new to Oracle, or perhaps did not take advantage of StatsPack in previous releases, we will begin with an overview of how it works. Later in this section, we will talk about what is new between AWR and the older versions of StatsPack.
AWR is essentially a job-based scheduled collection of statistics, gathered and stored in the Oracle database, containing information about the database itself. This metadata repository by default is kept in a new tablespace in Oracle Database 10g, called SYSAUX. The script catawr.sql creates the Workload Repository, with objects owned by sys. Workload Repository object names are prefaced with WRH$, WRI$, or WRM$. When creating a new database in Oracle Database 10g, the Workload Repository schema and the job to gather statistics are automatically created as part of the database creation. By default, statistics collection will run every 60 minutes, gathering information from within the database on stats such as I/O waits and other wait events, CPU used per session, sorts, I/O rates on various datafiles, and so forth. In order for this to occur, the parameter STATISTICS_LEVEL must be set to ALL or TYPICAL (TYPICAL being the default). At the end of a run, the repository is updated with current information for that period. Reports can then be generated, using begin and end values corresponding to previous Workload Repository runs, to determine what was happening within the database during a given period of time.
Since AWR runs are scheduled by default when the database is created, nothing special need be done to enable this functionality. As mentioned above, AWR runs at 60-minute intervals, collecting stats and storing that information. However, as you can imagine, this can lead to a rather large repository over time, so by default, the data will be purged after seven days. Should you wish to change these defaults-either the frequency of the runs, or how soon the repository is purged-it is easy enough to do so using Enterprise Manager. Should you wish to disable AWR altogether, you may do that as well; however, we strongly recommend that you do not do this. The overhead for AWR is minimal, and the gains achieved by allowing the statistic gathering can come in handy at the most unexpected times.
As mentioned in the previous section, the functionality of the Workload Repository can best be utilized within Enterprise Manager (EM). Thus, the majority of examples in this chapter will be EM-centric. However, for those of you who are command-line junkies, the API for the Workload Repository is a new one, aptly named DBMS_WORKLOAD_REPOSITORY. This can be used to create snapshots, drop snapshots, build baselines, and so forth. We will not be delving into the details of the DBMS_WORKLOAD_REPOSITORY package, but for those of you who are so inclined (or are desperately in need of some heavy reading), please refer to the Oracle Database 10g Release 1 Guide titled 'PL/SQL Packages and Types Reference,' which is part of the Oracle documentation set.
OK, so the Workload Repository is populated every 60 minutes with statistics, and this happens automatically. That's nice. Now what? What do I do with this information? The answer is that you define times outlined by AWR runs, in which to turn this information into a report. When creating an AWR report, you define the beginning and ending interval for the reports using snapshot IDs from the Workload Repository. Your report will begin at one snapshot and end at a later snapshot.
An AWR report can be viewed in Enterprise Manager, using the administration page. From there, select Automatic Workload Repository, from under the Workload section, and then click on the underlined number next to Snapshots. Note that here you can also change the interval for AWR-we will cover these steps in more detail in an HA Workshop later in this section. As mentioned previously, you can determine the interval of the reports, with the beginning of the interval going back as far as the repository keeps the data. Reports can be viewed using a full report view in HTML format, or you can view the data with pertinent information summarized in the details view. If you prefer the old text look of the reports, as in the Oracle9i and Oracle8i StatsPack Reports, then a text report must be generated using the AWR_REPORT_TEXT function of the DBMS_WORLOAD_REPOSITORY command-line API.
The report, when viewed in its full form, will begin with a heading, which provides the DB information such as DB_Name, DBID, instance_name, and so on. The begin and end snap times will also be shown, with the total elapsed time for the report displayed in minutes. This interval should not be too long, or else the results will be diluted. The default interval of 60 minutes is usually a good starting point, but when troubleshooting a specific issue, you may want to lower that interval to 30 minutes to get more granularity in your data. After the heading information, the report is then broken out into several sections, covering various different types of statistics of value.
When reviewing the report, the Report Summary section at the top will give a quick overview of the database performance. The ending cache sizes will be shown, as well as the load profile, instance efficiency percentages, shared pool statistics, and the top five wait events. This report summary gives a quick look at the system, allowing an immediate review of the most common sources of bottlenecks. The Instance Efficiency section gives a quick look at the buffer and library cache hit ratios, sorts, parsing information, and so on, and allows you to immediately see if these areas of memory need to be increased. The load profile gives you an overall picture of how much of certain types of activity is going on in the database, such as physical reads and writes, sorts, parses, logons, and so forth. The top five timed events tell us where we are spending most of our time waiting, and are also categorized now into wait classes, which we will discuss later in this chapter. The wait classes give the HA DBA another opportunity for a quick look at the situation. For example, if the top five wait events are all I/O related, it can be made readily apparent by showing that all five events are part of the same wait class.
If you are running in a RAC environment, the next section will contain a quick look at RAC statistics to give you an idea of the overall performance of your cluster. This section provides information on the workload characteristics of your cluster, and gives insight into how much traffic is going across the interconnect, how frequently instances need to read from the remote cache vs. the local cache, and what the average times are for these operations.
After the top sections, the main report body will follow. Here you will have links to various other sections of the report that are of interest, including sections on SQL statements ordered by various stats such as buffer gets, reads, executions, file and tablespace I/O stats, SGA stats, and so forth. These sections will help pinpoint specific areas that are common bottlenecks, by highlighting poorly tuned or frequently executed SQL, and/or by bringing to light I/O hot spots where disk response time is not acceptable. Each report contains a virtual goldmine of information on what is happening within the database, and what may be causing slowdowns at various stages of operation. Our purpose here is not to discuss each individual section, but rather to provide an overview of how to gather and review the reports. Automating the interpretation of these reports via ADDM will be discussed later in this chapter.
Evaluating the reports is also made easier if there is a comparison point. We may suspect, as we evaluate sections of the AWR report, that something does not look right-the top SQL may show SQL statements with an excessive number of buffer gets, as an example. It would be invaluable to be able to compare the report to a similar point in time in the past, when similar work was being done and performance was better. For example, perhaps the same SQL was run at points in the past, but the buffer gets for that same statement were much lower. This might indicate that an index has been dropped, or the data distribution of the tables queried has somehow changed.
A baseline (also called a Preserved Snapshot Set) is used to have a comparison point to refer back to if and when performance suffers. The basic idea is that on a Monday morning (or any other time), when out of the blue the HA DBA is barraged with users advising that the database performance is less than exemplary, the first thing that he or she can do is compare the current performance (or, say, the performance over the past hour) to a baseline that was previously set up for that time frame. This will give the HA DBA an immediate idea of what is different from last week (or whatever point in the past you are comparing to) and a head start on where to look for the solution to the problem.
As you can probably tell already, it is necessary to anticipate periods when you might expect to have performance problems. For this, you must have an understanding of the nuances of your particular business. For example, you may experience different peaks at different times of the day-a busy period may occur between 8:30 A.M. and 1:00 P.M., and another peak from 2:30 P.M. to 4:30 P.M. Fortunately, it is possible to create multiple baselines, each covering different points in time. Therefore, you may decide to create a different, separate baseline for each of those periods.
On the other hand, it is difficult to create a baseline for every possible period. If performance problems occur with no appropriate baseline having been created, don't fret. Depending on how long the retention time is for the Workload Repository, you can go back and create the baseline from a similar time frame in the past. For that reason, we recommend that you increase the workload retention time from the default of seven days. You may want to consider two weeks, or even a value of just over a month, depending on your storage capacity and the overall impact this has on your system. Keep in mind that this information will be stored in the SYSAUX tablespace, so size the SYSAUX tablespace accordingly.
Consider our scenario at the beginning of this section. Let's say the DBA did not have a baseline for the early Monday morning period when performance was bad. Let's assume that the problem began at 5:30 A.M. In this shop's line of work, Monday mornings are rather unique, but it is unusual for the system to be loaded this early. The DBA would like to compare this morning's workload to last week at the same time. However, with the default retention time of seven days, stats from last Monday at 5:30 A.M. would have already been purged from the database. If the retention time were set to eight days, or fourteen days, the DBA could go back in time in the Workload Repository, create a quick baseline covering last Monday morning, and then compare the current day's snapshot to last week at the same time.
Comparing Current Stats to Baseline The advantage of creating baselines is twofold. First, the baseline data remains in the repository indefinitely, until it is explicitly deleted, so it is not subject to being aged out like the rest of the snapshots. This means that the data will still be there in a month to refer back to and compare. Hence the term 'Preserved Snapshot Set.' The second advantage is the ability to quickly run comparisons of a snapshot to a baseline.
It is easiest to do the comparison using EM. From the Workload Repository section, you can select the option to Compare Timelines. This will allow you to first select the time frame that you wish to compare (presumably a period of time with poor performance), and then you can select a preserved snapshot set (or any other period of time within your retention range) to compare that to. The output will give you such fodder as comparison of block change rates for each period, physical reads, parse time, user commits, and so on. The display will show the timelines side by side, in graphical format, allowing you to quickly see the differentiators between the two timelines for the given periods.
HA Workshop: Exploring the Workload Repository
The Workload Repository is best understood by using Enterprise Manager, giving a graphical view of the snapshots available and masking the complexity of the APIs used. This example walks through using EM to view reports, edit the retention policies, and create baselines for comparison.
Step 1. Log in to EM using the following URL, substituting in the hostname of the database server for <server_name>:
If using Enterprise Manager Grid Control (discussed in Chapter 5), use the machine name for the management repository database, with port 7777, as such:
Navigate to the Workload Repository by clicking on the Administration link for your particular database, and then, under Workload, click the Automatic Workload Repository link.
Step 2. Note the Snapshot Retention time (in days), the Snapshot Interval, and the Collection Level, under the General section. Click on Edit to change these. Set the Retention Period value to 30 days, and the System Snapshot Interval to 30 minutes, as shown in Figure 3-1, and then click OK.
Figure 3-1: Changing default snapshot retention time
Step 3. Back on the main Workload Repository page, in the Snapshots section, click on the underlined number next to Snapshots. This link will propel you into the Snapshots screen, where you can view the times of all AWR snapshots that currently reside in the repository.
Step 4. To view a report, navigate to the box titled Actions and change the selection to View Report. Click on the radio button next to an existing snap ID (not the last one) and then click Go. The next page will display all snapshots created after the beginning snap ID that was previously selected. Select the one created directly after the beginning snap ID (this will give us a 30-minute interval), and then click OK. After a few seconds, you will see the Snapshot Details screen, with the ability to see in report format or details format. Save it by clicking on the Save to File button in the Report screen.
Step 5. Next, to create a baseline, return to the Snapshots screen by clicking on the Snapshots link across the top. For this exercise, we will create a baseline for the time period of 9:00 A.M. to 12:00 P.M., on a weekday, so select the radio button next to an existing snapshot with a capture time as near as possible to 9:00 A.M. on a weekday. Ensure that Create Preserved Snapshot Set is selected in the Actions box, and choose Go.
Step 6. On the Create Preserved Snapshot Set page, give the Snapshot Set a name of Weekday Morning and then select the Ending Snapshot to be the snapshot ID closest to 12 P.M. on the same day. Click OK to create the baseline (preserved snapshot set).
Step 7. Now, to compare a report interval to the baseline you have just created, return to the Snapshots screen by clicking on the Workload Repository link and then the Snapshots link again (see Step 3). In the Actions box, select Compare Timelines and, for the beginning snap ID, choose a snapshot from 9:00 A.M. on a weekday other than the baseline you just created.
Step 8. Choose the ending snapshot of 12:00 P.M. on that same day and choose Go.
Step 9. On the Second Timeline Start screen, choose the Select a Preserved Snapshot Set option and then select the weekday morning baseline that was created in Step 6. Click Next, and then click Finish to get the Compare Timelines: Results screen. This screen will give you a graphical comparison between the baseline (second timeline) and the date that you had selected for comparison.
You do not have to have a preserved snapshot set to do the comparison-you can compare intervals between any two periods for which you still have snapshots in the repository, based on your snapshot retention period.
|< Day Day Up >|| |