Tool Requirements for Reproducible Results | Performance Analysis for Javaв„ў Websites

Performance testing demands repeatable and reliable run measurements. Your test procedures and setup play a major role in the repeatability and reliability of the results. Likewise, the test tool also influences your ability to generate these results. If your tool does not produce reliable and repeatable results, you cannot determine if your tuning adjustments actually improve performance. Beyond a shadow of a doubt, this is the most critical factor in performance tool selection.

Some of the key tool requirements for supporting reproducible results include the following: The tool

Provides a warm-up period to initialize the environment
Obtains measurements within a steady-state period
Verifies error-free test execution
Drives the same workload between runs
Captures measurements without impacting performance

The performance tools collect the measurements for your analysis. At a minimum, insist on average throughput and response time measurements from your tool. Many tools provide the minimums and maximums for each data point reported as well as standard deviations.

Also, look for measurement granularity. The better tools break down measurements by script to help you better determine the performance of various functions during the run. This level of analysis shows you how the function performs under load and with other key web site functions in use. This gives you a more realistic systemic analysis of individual web site functions.

Some tools collect key measurements from the machines under test as well as the test client machines themselves . Usually this requires installing small monitors from the tool vendor on the systems under test. These monitors report measurements back to the test controller. Of course, there's nothing preventing you from collecting these measurements yourself (as we'll discuss in Chapter 12). However, using the test tool to collect and collate these measurements usually makes for easier analysis.

Capturing any errors during the test is critical. This includes error pages returned or logged application errors. Tests requesting a few thousand pages over the course of the run normally experience a handful of page errors. However, significant numbers of page or application errors invalidate a performance run. Page errors often artificially inflate throughput results, as error pages require less processing and embed fewer static elements. Therefore, web sites return error pages significantly faster than other dynamic pages. Conversely, web application errors lower performance, as these tend to require more processing.

Reporting

As we mentioned above, select a test tool with the reporting capabilities you need. The high-end performance test tool vendors differ significantly in their test reports . In many cases, this becomes one of the key differentiators in performance tool selection. Even though the tool provides terrific test runtime support, it must capture and deliver relevant information after the test finishes. Often, test teams purchase a tool without realizing it cannot generate the reports they need.

Key reporting differentiators:

Summary reports containing the key results you want
Summary reports for the steady-state time period
Collated reports from multiple runs
Real-time reporting during test execution

Key Results

As part of your tool selection process, review each tool's generated reports, and identify data interesting for your particular web site. Remember, you want the tool to work for you. Poor or insufficient reporting usually requires significant manual data manipulation by your test team. This makes for longer analysis on each test run and reduces the "tune and test" iterations accomplished by the team each day. Good reporting makes for faster test cycles.

Consider how the tool reports data and how well the tool isolates key measurements. More data is not always better. If you must glean two or three data points from seven reports, consider a simpler or more customizable tool. During testing, we always want to see key measurements such as response time and throughput. We also like to select a time range or set of values for the tool to use in its calculations. (If the test performs a ramp-up and ramp-down of users, this feature allows us to eliminate the data collected during those times from our steady-state analysis.)

Figure 8.5 below shows a sample Summary Report from LoadRunner. Notice the summary includes client load levels, throughput in bytes per second, HTTP hits per second, and response time. However, this report does not contain one of our favorite measurements: transactions per second (often different than hits per second). To obtain this data, we use the New Graph option to include this particular detailed graph.

Figure 8.5. Sample LoadRunner Summary Report. 2002 Mercury Interactive Corporation.

graphics/08fig05.gif

In analyzing reports, also look at the granularity of data returned. Low-end load drivers usually provide only averages. High-end tools often provide more granularity, such as minimum, maximum, and 90th percentile measurements. For example, Figure 8.5 shows a LoadRunner report of this data for the Snoop servlet. ^[4] This granular data is extremely useful in problem determination and capacity planning.

^[4] Snoop Servlet is a sample servlet provided with IBM WebSphere Application Server and is used here with permission from IBM.

Some high-end tools provide automatic charting. The tool generates charts from the collected data and provides them as part of the runtime reports. Charts allow you to easily visualize and analyze the collected measurements.

Finally, if you need special measurements, consider a tool with data export capability. Many tools export data in various spreadsheet data formats. You then pull the data into your favorite spreadsheet and develop your own custom reports and graphs.

Steady-State Measurements

As Chapter 11 discusses in more detail, always try to capture performance measurements during the steady-state period of your test run. Different tools make obtaining steady-state measurements more or less difficult. If you use simple performance tools with no support for user ramp-up or warm-up, run multiple tests or very long tests to obtain valid steady-state measurements. If you run multiple tests, consider the first run a priming run and discard the results. Use the data from subsequent tests in your actual measurements.

High-end test tools make steady-state data capture easier. SilkPerformer, for example, provides steady state as one of its predefined test configurations. When you select this option, the tool automatically discards the warm-up and cool-down periods of the test in its reported measurements. Figure 8.3 shows a screen capture of the SilkPerformer Workload Configuration. Notice the tool contains steady state as a pre-defined workload option, along with selections for the number of virtual users, measurement times, and close down time.

Some high-end load drivers record all their measurements in a database, which you then manipulate to obtain different reports. This approach works well for client ramp-up measurements. For example, you might ramp up a test by adding users every 15 minutes. The post-analysis tool then allows you to examine data specific to these 15-minute periods. This technique takes a little more effort to get the data, but it may work well for your test environment.

Real-Time Reporting

Real-time reporting gives you performance measurements during the actual test run. This incredibly useful feature lets you observe the performance characteristics of the system as the test tool ramps up users and runs at steady state. This reporting gives you instant "cause-and-effect" feedback during the test runs. For example, five minutes into the run, the real-time report might indicate a significant performance degradation, followed in a few minutes by a period of normal operation. This report points out a potential problem with web site "burstiness" (discussed more in Chapter 13).

Of course, this detail also appears in the summary reports of many tools, but watching the system behave during a test often makes performance analysis and tuning faster. Also, you receive immediate feedback on serious environment problems, such as a misconfigured router, that are impacting your performance tests. Rather than finding out much later, with real-time reporting you know to halt the test and resolve the problem.

Most high-end monitoring tools now provide real-time charting. At a minimum, make sure any tool you select provides some visible indicator of slowdowns or problems during execution.

Multiple Runs

Controlling the amount of data returned and the number of reports generated becomes especially important if you plan to run a lot of unattended tests. For example, if you typically run a series of tests overnight and analyze the results later, you may want short summary reports from each run, instead of generating many, extremely detailed reports to wade through. In addition, multiple tests producing large reports often overflow the available disk space and disrupt other planned tests. Particularly if your tests run unattended, keep the reports generated to a useful minimum.

This also becomes a differentiating factor among the high-end tools. If your test plans call for unattended testing, explore this feature carefully in the tools under consideration.

Verification of Results

Avoid misleading and worthless test measurements from a web site or performance test generating errors. As we mentioned earlier, tests receiving error pages from the web site often generate better response time and throughput than tests producing correct results. Likewise, a web application producing errors may run more slowly than its healthy counterpart . For example, load- related defects in Java applications often cause exceptions. The exception paths often require more processing in terms of exception handling and logging, causing the application performance to be extremely poor. Collecting performance measurements in either case we've discussed is a waste of time.

Some tools include verification support. At a minimum, many tools look at the incoming HTML pages and check them for errors or unexpected conditions. Most tools readily identify failed or timed out HTTP connections as well.

Key results verification differentiators :

Automatic data verification
Log verification

Data Verification

Some test tools verify the data on the incoming pages. Of course, doing this for dynamic pages presents special challenges: What defines a "correct" page when every page request generates a potentially infinite set of response pages? In these cases, consider adding custom code to your performance tool to verify the incoming pages. Perhaps the error pages always contain a special header or other element; use this in the verification script to distinguish the error pages.

If your performance tool does not support customization, consider manual verification of many pages during the test runs. In fact, this proves useful for other reasons apart from simple error verification. As we discussed in Chapter 4, web applications sometimes mismanage variables in the concurrent servlet environment. Checking key dynamic pages, such as customer account information, alerts you to concurrency issues in the web application. (For example, if you want to see customer A's account information, but suddenly find yourself looking at customer B's information instead, the web application has a problem.)

Be careful that the data verification does not interfere with client driver or network performance. Verification usually takes a significant amount of client resources as well and may require additional client hardware to support on a large scale.

Log Verification

In addition to verifying the pages returned, look at the logs from all the test systems, including the HTTP servers, database servers, and application servers after every test run . We recommend building scripts to gather up the logs after each test run and to reset them for the next run. After examining the logs for errors, keep them at least until you complete the data analysis for the run.

To our knowledge, no commercial test tool provides log analysis and collection for you. However, you must perform this manual verification after each run to truly validate your results.

Real-Time Server Machine Test Monitoring

Earlier we described monitoring test results in real time. Those results included performance metrics such as throughput and response time. Other real-time monitors also provide useful information during the test run. In particular, some test tools allow you to monitor the remote systems under test during the test runs. These tools report key server machine metrics, such as CPU utilization, as the run progresses. The tool usually displays this information on one panel, as demonstrated using the LoadRunner run-time monitor in Figure 8.6. Seeing all of your performance data on one screen helps you understand the relationship between your performance numbers and the state of the systems monitored . Also, machine monitor data provides information about machine capacity, and helps you identify resource bottlenecks during testing.

Figure 8.6. Sample LoadRunner monitor 2002 Mercury Interactive Corporation.

graphics/08fig06.gif

Chapter 4 discussed the typical components in a web site. You need a way to monitor all the components in your test environment:

Network
System
HTTP server
Application server
Database
Additional back-end systems

Just as performance test tools vary in price and effectiveness, test system monitoring tools vary significantly depending on the level of functionality you want. However, you need at least some basic tools, and you may want to consider purchasing better tools for specialized analysis. Key differentiators in real-time server machine test monitoring products are the following:

Integration with performance test reporting
Price
Detailed analysis and problem isolation
Reusable in the production environment

Integration with Performance Test Reporting

The performance test tool monitors throughput and response time data at the test client. However, for a full understanding of the test environment, we also need measurements from the servers and test client machines themselves. Of course, we usually consider each machine's CPU utilization the most critical of these measurements, but other information, such as paging rates and disk I/O, prove useful as well.

We monitor CPU to detect systems with overutilized or underutilized CPUs, which may indicate a bottleneck (either at the machine itself or elsewhere in the environment) or available capacity. For example, if we drive the test client machines to 100% CPU utilization, the other machines in the system may present low CPU utilization because the test client cannot generate sufficient load. (Test client capacity is one of the most frequently overlooked test bottlenecks we encounter.)

In addition to system statistics, we often need middleware statistics such as the number of HTTP server processes, application server threads, and database locks to resolve performance problems. Monitoring these key tuning and analysis elements is important. Low-end drivers rarely contain any monitoring capabilities. High-end load driver tools typically support basic system monitoring. For example, these tools capture CPU metrics from remote servers and graph these statistics during the test run. Your test controller assembles the statistics from multiple systems and reports them real-time with other data. This capability is extremely useful. For example, when troubleshooting a "burstiness bottleneck," watch the CPU on the application and database servers in conjunction with the overall test throughput and response times to find a correlation between the burstiness patterns and the CPU utilization of these machines.

Additionally, some vendors provide middleware metrics. For example, Mercury Interactive provides performance data from middleware systems such as HTTP servers, system resources, networks, web application servers, and database servers. Figure 8.6 provides an example from a LoadRunner test. On the left-hand side, you see the selection of monitors available. In the center, you see real-time charts displaying the client-side results for transactions per second and response time. Along with this data, you see monitor data about the Apache Web Server busy processes as well as data on the Windows resource which tracks CPU utilization on the application server. LoadRunner allows you to select different graphs and different servers to monitor. It also allows you to customize test reports with your desired metrics.

Price

If your load driver tool does not provide integrated monitors, or does not provide the depth of monitors required for your analysis, other monitors exist to help you collect data during your test. Most operating systems provide some degree of machine monitoring. Depending on the platform, these monitors range from detailed and elaborate to very basic. As with any tool, understand how these monitors impact your performance before you use them. Even some operating system monitors use significant resources, robbing your test of valuable capacity. Of course, operating systems provide some level of monitoring for free, and may offer other, more sophisticated monitors for a relatively small upgrade charge.

On the downside, you must manually control and monitor these tools apart from your performance test tool. Again, writing scripts to start these monitors and save their data often makes coordinating the data they produce much simpler. Regardless, after collecting this data, you must manually coordinate the performance test results with the data captured by these monitors. Appendix C provides a list of common, free monitoring tools. Chapter 12 also provides example data generated from some of these tools.

Detailed Analysis and Problem Isolation

For many customers, the tools described above provide sufficient monitoring for their performance testing and analysis requirements. However, some sites may require more data and analysis. In these cases, consider purchasing a high-end monitoring tool to complement your performance test tools.

The design of most high-end monitoring tools focuses on production monitoring and warning. These tools normally watch production systems and issue warnings to a system operator when the machines fail or reach some critical threshold. Not surprisingly, these tools usually come with a significant price tag, but if you need advanced monitoring data, you might consider using them to augment your test environment.

These tools provide specialized monitors and allow centralized control of data capture. They also support reporting data from multiple systems. Because they specialize in system monitoring, they frequently provide data not available through the traditional monitoring products and interfaces. Specialized monitoring tools simplify the troubleshooting portion of your test by quickly pinpointing system problems. They also provide useful information for network diagnosis, application server problems, and database issues.

Appendix C provides a list of specialized monitoring tools such as Wily Introscope and Precise Indepth. Both of these vendors provide specialized tools for application servers. These products leverage " byte-code " insertion technology to capture detailed information about the running web applications. They also provide response time data, which allows you to see if a specific EJB or JDBC call invoked by a servlet actually takes most of the overall servlet response time. Such data simplifies problem isolation.

Price and learning curve become limiting factors for monitoring tools. For many test environments, the runtime monitoring provided by a high-end performance test tool provides sufficient information to tune the web site. However, in many cases using the right tool (particularly for very large or complex tests) shortens your test and tuning process, and contributes to significant performance improvements. Before introducing specialized monitoring tools into your test environment, validate that they do not cause a significant performance degradation to your overall system.

Reusability in the Production Environment

When evaluating the monitoring tools for your performance testing, think ahead to the tools and processes planned for your production environment. Your operations team may already use tools to monitor servers. Consider using the same tools for your testing. Conversely, the operations team may rely on you to recommend monitoring products. Having the same capabilities in both environments helps you to establish initial baseline behavior and thresholds for production monitoring.