Managing Performance: A Proven Methodology


Although you can use many approaches to address performance and optimization management with WebSphere, I've found one that tends to synchronize well with the WebSphere platform. It also has the added benefit of being relatively straightforward.

In essence, it's a hybrid of a number of old-style methodologies grouped with newer schools of thought and best practices. For most environments, it'll work well; however, be careful to validate and confirm the approach before diving in. Attempting to overlay complex methodologies with something as complex and large as WebSphere is destined for failure. Instead, a straightforward and logical approach is essential. This methodology is also a living one; that is, you should continue to use it proactively during the lifetime of the system.

Before looking at the methodology itself, you should consider some key points for commencing a performance management program. The next sections highlight these.

Performance Management 101

Like in any testing situation, you can break down the basic scope of effort to the aim, the method, the test, and the results. Although it's a discussion about fundamentals, I believe it's important enough to consider in this context.

Considering these four key points, let's outline a brief performance management approach that the performance methodology, discussed shortly, sits on top of.

Problems found in the results typically can be fed back into the aim of another cycle. Using this aspect of performance management, the model should show the process cycle around continuously, being used proactively to identify and help rectify issues.

The Aim of Performance Management

The aim is sometimes not as obvious as you'd initially think. Consider an underperforming system that needs to be fixed. In this case, the aim is to increase performance by a determined factor. That factor is defined by business sponsors, end user demand, legal agreements, and so forth.

Take, for example, a situation where the aim is to resolve an application performance problem, specific to database- related transactions where a certain query is taking five seconds to respond. This query is fundamental to the application's functionality, and its lengthy delay is causing end users to complain.

The aim is therefore to bring the database response times in under an acceptable threshold. Let's assume that the SLA for the full end-to-end user transaction is five seconds, of which in this example the database query itself is consuming the ensure SLA, leaving no overhead for content composition and so on. It can also be said that the aim is the problem you're attempting to solve.

To get the SLA back under its threshold, the system manager has determined that the SQL query must not take any longer than one second. Therefore, the aim of the performance management approach is to perform analysis on the application environment to bring the database query response time down from five seconds to one second.

The Method of Performance Management

Now that you have the aim, the performance management approach needs to define a method for analysis and testing (quantification). This part of your management approach should address how the analysis will take place and the tools you'll be using.

Continuing on from the example from the aim section, the method of analysis will be to run a SQL query monitor on the database server to determine what's wrong with the query itself as well as to monitor the output from the WebSphere Resource Analyzer to investigate what Java components , if any, are causing the delay.

The method should also include contingency plans for backing out any monitoring components or changes to environments because of an issue in operating the tests.

Testing Performance Management

The test to conduct the analysis will be to run the monitors in the production environment to get the most accurate data set in the least amount of time. This test will operate for 12 hours, during both peak and low utilization periods. The system manager will monitor it continuously to ensure that the monitoring doesn't affect the actual operation of the production system.

The attributed requirements from the aim should be repeatable within the test. This ensures that an averaged result is obtained so that singular uncontrolled events (external issues and so on) don't impact or interfere with the result.

The output will be captured in raw and binary formats ”raw from the SQL Query Analyzer and binary from the WebSphere Resource Analyzer tool.

Performance Management Results

The result of the test is the result of the analysis. The result therefore, driven from the output from the test, will be fed into the performance methodology discussed in the next section. Analysis from the test should provide a clear indication of where the problem lies in the environment and will help to provide a guide for where the analysis phase should start.

Now that you've considered this semiformal approach to performance management fundamentals, let's now look at the methodology and walk through how it works, what it covers, and how to implement it.

The Mirrored Waterfall Performance Methodology

Before looking at what the Mirrored Waterfall Performance Methodology (MWPM) is, first you'll see the key areas of a J2EE-based application server environment that affect an application's responsiveness (in other words, its perceived performance) the most:

  • Java/JVM memory and object management, queues

  • JDBC/Container Managed Persistence (CMP)/Bean Managed Persistence (BMP), pooling, and Java Message Service (JMS)

  • The platform components: database, networks, and their configurations

  • The physical server configurations (memory, CPU, disk, and so on)

  • The operating system: the kernel itself and all associated settings (for example, networks)

I'll go into these five key areas in more detail in future chapters; however, this list is the foundation of what the methodology addresses.

The methodology works using a directional flow rule, somewhat similar to a waterfall (see Figure 1-11). At the top of the waterfall diagram you have the physical server(s) or hardware, followed by the operating system and its associated configurations. The next level down contains the platform components such as databases and network settings, followed by the JDBC, pooling, and JMS type services. Next come the Java and JVM settings and the queues.

click to expand
Figure 1-11: The waterfall model

Essentially , how this model works is this: If you want to tune the JDBC Connection Pool Manager settings in WebSphere, you must, according to the model, consider tuning and analyzing the configuration and settings of components in the bottom family grouping ”the JVM/Java and queue settings. The rule of the model is that if you need to tune or alter something, you must also consider and analyze all component groups down the waterfall. Like water, you can't go back up the model; you must always work down the model on the left side.

The previous example relating to the JDBC settings requiring changes or analysis on the JVM/Java and query component grouping is driven by the fact that altering the JDBC connections will affect on the flow of transactions through the overall WebSphere environment. This is typically associated with what's known as queues . I'll discuss these at length in Chapter 9; however, for the purpose of this chapter, understand that the concept of queues essentially relates to the number of open connections at each tier and how those open connections are managed.

It should be obvious that given the JDBC connections are changing ”which will affect either how many or the characteristic of the SQL connections to the database ”this will result in a change to the overall balance of the platform tiers.

Secondary to this, because the balance or characteristics of the application platform will change as a result of the JDBC changes, this may change the JVM requirements. How many users (which equates to sessions) will now be required to operate on the platform? Users or sessions drive JVM heap size , so how many concurrent users are on the system, and what type of profile do those customers have?

As you can see, it's all connected.

The second point to make regarding the model depicted in Figure 1-11 is that of the opposite or mirror waterfall model. The right side of the model is the "driver" or mirror aspect. That is, each of the right side component groups drive a change to the next component group up the waterfall, if required.

To put it another way, if after monitoring the HTTP transport queue level in the WebSphere Resource Analyzer it was found that the transport queue was running at 100-percent utilization, then this would therefore mean, based on the model, that the problem with the queues would "drive" the need for a change in the component group one level up the mirror side of the waterfall. In this case, it would drive the need to investigate why the HTTP transport queue was saturated .

Essentially, the right side should only be used as a "finger-pointing" exercise. Use it in the event that something has broken or something is performing badly . It'll help to direct where the source of the problem is.

Tip  

Remember that if a particular component isn't performing or is broken, it's not going to always be the fault of that component. In the previous example of the HTTP transport queues reaching maximum capacity, the problem or root cause may not be that the HTTP transport queues are set incorrectly, but it may be that the JDBC settings aren't aligned correctly with the entire platform. In these situations, identify the problem, and then identify the root cause using the right side of the waterfall model.

The pitfall in not conducting these " sanity checks" as part of the methodology is that you end up with a turnip-shaped environment. This introduces potential bottlenecks and will most likely cause more headaches than what you had to start!

The shape of your environment, in terms of incoming requests from customers or users, should be carrot shaped, or a long funnel (see Figure 1-12).

click to expand
Figure 1-12: Correctly configured environment ”the funnel, or carrot-shaped, model

You'll investigate why these environment shapes are important in Chapter 9; however, as a brief overview, you can look at the overall environment from a cost point of view. The pointy end (less volume) of the carrot is the most expensive end, and the larger, less pointy end (high volume) is less expensive.

Most J2EE-based applications will see their environment as having the database tier as the pointy yet more expensive end of the funnel, and the Web tier as the less pointy, less expensive end. Databases transactions are heavy and therefore expensive. Web transactions are lightweight and are somewhat inexpensive.

This boils down to queuing methodologies, a key area of an overall system's performance and one I'll address in detail in future chapters.

Other Considerations

The MWPM (Mirrored Waterfall Performance Methodology) highlighted in Figure 1-11 will be referenced throughout the rest of the book. Therefore, you need to consider a number of other issues with the methodology. Many of these are foundational statements, but many systems managers miss or don't incorporate them formally into their operational models. I'll also cover these in more detail throughout the book; however, they can be summarized as follows :

  • Educate your developers on the workings of WebSphere ”what's a queue, what really is a transaction, what's a small lighter-weight SQL query versus a larger heavyweight query? (See Chapters 9 through 12 for these topics.)

  • Use this book and best practice development guides to help your developers understand the implications of non-system -friendly code (for example, leaving hash tables open in session state and so on).

  • Build standards for development, and ensure they're included in the quality assurance and peer review checkpoints.

  • Implement historic monitoring and reporting. Always ensure you're charting the key components of the system. This should include CPU, JVM, memory, disk, network, and database utilization. I'll discuss some tools to do this later in the book.

  • Monitor and plan your tuning and optimization approach; in other words, focus on one (problem) aspect of the system at a time. Don't take on too much ”conduct the analysis and tuning in small steps.

  • Implement one change at a time, and monitor. Don't fall into the trap of implementing a whole range of performance tuning changes. Implement one change, monitor for a period of time that gives suitable exposure to system load characteristics (in other words, don't implement on Friday, monitor on Saturday when no one is using the system, and take that as gospel!), and then analyze. If the implementation was successful, roll out the next change.

The final point you should consider isn't a particularly difficult issue, but it's one that's commonly overlooked: conducting performance management and analysis without the testing or monitoring itself, skewing the results, and affecting the outcome.

Measurement Without Impact

Measuring without impact is difficult. I'll first briefly discuss what this means.

How do you measure something without affecting what you're measuring? Take a simple example of a Unix server. If you have a system that's under fairly considerable load, and you run a script every five seconds that performs a detailed ps command ( ps -elf or ps -auxw , depending on your Unix of choice) to determine what's taking up the load, you'll affect the very problem you're measuring.

For those who know a little about quantum physics, you can associate "measuring without impact" with the quantum effect attributable to the uncertainty principle. One of the obstacles with science's quantum mechanics is that it's difficult or impossible to directly measure the state of a quantum bit, or some form of quanta . This partially has to do with what's known as the uncertainty principle, but it extends to the fact that to measure something directly, you affect it.

Therefore, measuring the state of some quanta would make the observation useless. Although this isn't a book on quantum mechanics, the same problem arises in the higher world of WebSphere optimization and tuning efforts!

You can overcome this problem, but it's important to keep it in mind as you go through the WebSphere performance and optimization process in this book. The way I propose to approach this issue throughout the book is to simply minimize the impact. It's not possible (or extremely difficult) to get around the issue; however, through the careful planning and design of your performance management approach, you can minimize the impact.

At a high level, the methodology I'll use in the book considers the following: Whatever you do in production to measure performance, do in all other environments ” specifically , development and systems integration. In other words, control your environments. So often I come across sites testing components in multiple environments, with each environment having slightly different software or patch levels.

Furthermore, keep your performance management monitoring system/application/probes constantly running. This provides a standard and common baseline to measure performance degradation and improvements under all situations as well as ensure that if there's a common load characteristic for your monitors, its loading factor on the overall system isn't an unknown.

In other words, keep low-level monitoring continuous but nonintrusive. Understand your environment workload and map your monitoring to that. That is, it's no use running debug or monitoring output on every end user transaction if the data you're capturing is too rich in information. Understand what type of transactional workload is operating on your platform (for example, B2B, consumer-based online shopping, and so on), and tune your monitoring tools accordingly , based on transaction rate, depth, and requirements (such as SLAs and so on).




Maximizing Performance and Scalability with IBM WebSphere
Maximizing Performance and Scalability with IBM WebSphere
ISBN: 1590591305
EAN: 2147483647
Year: 2003
Pages: 111
Authors: Adam G. Neat

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net