Section 2.2. Making a Good Specification | Optimizing Oracle Performance

2.2 Making a Good Specification

Let's stop fooling around with faulty project specifications and start constructing some good ones. It shouldn't take you more than a couple of hours to create a good specification for most performance improvement projects. Here's how:

Identify the user actions that the business needs you to optimize, and identify the contexts in which those actions are important.
Prioritize these user actions into buckets of five.
For each of the actions in your top bucket, determine whom you can observe executing the action in its suboptimal context and when you can make the observation.

2.2.1 User Action

In this book, I try to make a careful distinction between user actions , programs , and Oracle sessions . A user action is exactly what it sounds like: an action executed by a user. Such an action might be the entry of a field in a form or the execution of one or more whole programs. A user action is defined as some unit of work whose output and whose performance have meaning to the business. The notion of user action is especially important during project specification because the user action is precisely the unit of work that has business meaning.

A program is of course a sequence of computer instructions that carries out some business function. A user action might be a program, a part of a program, or multiple programs. An Oracle session is a specific sequence of database calls that flow through a connection between a user process and an Oracle instance. A program can initiate zero or more Oracle sessions, and in some configurations, more than one program can share a single Oracle session. The notion of an Oracle session is important during data collection because the Oracle kernel keeps track of performance statistics at the Oracle session level.

Oracle does make a distinction between a connection (a communication pathway ) and a session . You can be connected to Oracle and not have any sessions. On the other hand, you can be connected and have many simultaneous sessions on that single connection.

2.2.2 Identifying the Right User Actions and Contexts

The first step in your specification is to identify the user actions that the business needs you to optimize. If you mess up this step, it is likely that your performance improvement project will fail. It is vital for you to obtain a list of specific user actions . The ones you select should be the ones that are the most important in the business's pursuit of net profit, return on investment, and cash flow.

I emphasize "that the business needs you to optimize" because you are specifically not looking for a database administrator's opinion about performance at this point. One of the most common mistakes that Oracle performance analysts make is that they consult their V$ views to learn where their system needs "tuning." Your V$ views can't tell you. I'll describe in Chapter 3 some of the technical reasons why it's unreliable to consult your V$ views for this information.

Finding out what your business needs is usually easy. It is almost never the result of a long goal-definition project. It is almost always the result of asking a business leader who speaks in commonsense language, "If we could make one program faster by the end of work today, which program would you choose?" The following examples illustrate the type of response that you're looking for:

We manufacture disk drives. We have a warehouse full of disk drives that are ready to ship. We receive hundreds of telephone calls each morning from angry customers who placed orders with us over two weeks ago, demanding to know the status of their shipments. At any given time, there is an average of over two dozen empty FedEx trucks parked at our loading dock. If you go down to the loading dock, you can see that our packers and the truck drivers are sitting on boxes drinking coffee right now. They can't load the boxes on the trucks because the program that prints shipping labels is too slow. Our business's most important performance problem is the program that prints shipping labels.
We're spending too much on server license and maintenance fees. We have 57 enterprise-class servers in our shop, and we need to cut that number to ten or fewer. We already house 80% of our enterprise data on one large storage area network (SAN). However, our total CPU workload that is presently distributed across 57 servers is probably too large to fit onto ten machines. Our business's most important performance problem is eliminating enough unnecessary CPU workload so that we can perform the server consolidation effort and ditch about fifty of our servers.

The hardest part is usually gaining access to the right people in the business to get the information you need. You might have to dig a little bit for your list. The following techniques can help:

Ask your boss where the performance risks are: Steer him away from answers that refer to technical components of the database. Force the conversation into the domain of user language. Ask which user is giving him the most flak about system performance, and then book a lunch with the user. The loudest user is not necessarily the one with the business's most critical problem, but understanding that user's problems are probably a good start.
Take a user to lunch: Buy him a sandwich, and ask down-to-earth questions like, "If I could make something you use faster today, what would you want it to be?"
Find a sales forecast for your business: Consider which application processes are going to be the most important ones to facilitate your company's planned sales growth. Are those processes running as efficiently as they can?

If you get stuck in your conversations with people with whom you're trying to identify user actions that are important to the business, ask them which actions fit into these categories:

Actions that are business critical
Actions that run a long time
Actions that are run extremely often
Actions that consume a lot of capacity of a resource you're trying to conserve

In addition to identifying which user actions require optimization, you need to identify the context in which those actions are important. For example:

Is the action always slow?
Is it slow only at a particular time of day (week, month, or year)?
Is it slow only when it runs at the same time as some other program(s)?
Is it slow only when the number of connected users exceeds some threshold?
Is it slow only after some other program runs (upload, delete, etc.)?

Without context, you run the risk that you'll collect performance diagnostic data for the "problem" action and then find after all your effort that there's apparently nothing wrong with it. You have to identify how to find the user action when it is performing at its worst. Otherwise, you're not going to be able to see the problem. This concept is so important that I'll say it again:

You have to identify how to find the user action when it is performing at its worst .

In this step, it is usually important to select more than one user action, especially in situations where many users perceive many different performance problems. This is true even in situations where the number-one system performance problem has a priority that clearly exceeds everything else on the system. The reasons for this advice come from the experience of using the method many times:

Because cost is a factor in net benefit, the business net benefit of improving, for example, user action #3 may actually exceed the business net benefit of improving user action #1.
Producing significant improvement quickly in any of a system's top five most important performance problems can create a significant political advantage, including factors like project team morale and project sponsor confidence.
You might not know how to improve performance for user action #1. But fixing, for example, user action #3 may eliminate so much unnecessary workload that #1 becomes a non-issue.
You can't tell which performance improvement action will produce the greatest net benefit to the business until you can see a high-level cost-benefit analysis for the user actions in your top-five bucket.

2.2.3 Prioritizing the User Actions

Once you have constructed the list of candidate user actions, you need to rank the importance of their improvement to the business. Everything you do later will require that you have chosen the most important actions to optimize first. Business prioritization is vital for several reasons, including:

The most important actions will get fixed the soonest: This is the most important reason. Quite simply, if you don't optimize the most important business processes first, then you're not optimizing.
Trade-off decisions will always favor more important user actions: On occasion, you may find that an optimization for one user action inflicts a performance penalty upon another. This happens frequently when the optimization strategy you choose is to increase the capacity of some component. However, because I hope to convince you to increase capacity only when necessary (that is, rarely ), such trade-offs should be rare.
Less important user actions enjoy collateral benefits: The term collateral damage has been introduced into our language by discussions of accidents that occur during wartime. The opposite of collateral damage is collateral benefit ”a benefit yielded serendipitously by attending to something else. Collateral performance benefits occur frequently on computer systems in which we eliminate huge amounts of unnecessary work.

It's easy to over-analyze at this stage, but there's actually no need to spend much time here. All you need are rough categories. I recommend grouping your user actions into prioritized buckets of no fewer than five. This way, you won't be tempted to obsess over the precise ranking of actions that are close in importance. For example, if you have ten important problem user actions, then create no more than two groups of five. If you have more than ten problem actions (I've visited sites whose lists numbered in excess of fifty), then I suggest partitioning your list into three parts :

The five most important user actions (your first bucket).
The five next most important user actions (your second bucket).
The remainder of the important user actions you've listed (the union of your third and subsequent buckets)

Be especially wary of executing any prioritization task with the participation of large groups. Every user, of course, will try to convince you that his actions are the very most supremely important actions on the entire system. And of course, every action on the system cannot take top priority. Most of the time that you might spend negotiating whether a user action belongs in one group or another could be invested more wisely in other steps of the method. If you find that the whole prioritization task is consuming more than just a few minutes, then step back and just make some sensible decisions. Assure the users whose actions don't fall into the top priority class that they haven't lost anything; you'll attend to their problems too.

2.2.4 Determining Who Will Execute Each Action and When

The final step in the construction of a good spec for your performance improvement project is the specification of how you'll be able to find each targeted action when it next runs in its targeted context. This information will allow you to find the programs implementing those actions so that you can measure their performance.

Often, the success of a diagnostic data collection effort will be determined by your ability to establish simple human contact with a person who will execute the slow action and answer the following simple questions:

When is the next time that this person expects for the action to exhibit the performance problem?
How can you watch?

The answers to these questions unambiguously define the parameters you'll use for your diagnostic data collection process, which I describe in Chapter 3.

If you have a tool that constantly monitors the appropriate performance statistics for every individual user action on your system, then predicting who will run a problem program and when it will happen becomes unnecessary. The luxury of having such data for every user action on your system will allow you to respond to a complaint about an action in the recent past instead of having to predict their occurrences in the imminent future. Such tools are expensive, but they do exist.

If you do not own such a tool, then you'll have to be more selective in which diagnostic data you'll want to collect, and the step described in this section will be essential. For you, I hope that Chapter 6 and Chapter 8 will provide significant value.

Top