Load Test Beds and Load Generators | Practical Service Level Management: Delivering High-Quality Web-Based Services

As much as possible, the test bed must be a faithful copy of the operating environment. It includes servers, applications, Internet connectivity, databases, load balancers, caches, and whatever else is needed. This can be a costly undertaking; most organizations must settle for a test bed that is similar to, but not exactly the same as, the production environment reduced in scope or in scale.

Operational environments with thousands of servers are usually too expensive to replicate completely. I have worked with several large organizations that have an exact copy of their operating environmenthundreds of switches, servers, routers, and other equipment. These few companies are the exception to the rule. A smaller, but still faithful, copy must be used with adjustments to the load testing results to give realistic operational guidance. In effect, the test bed is a manageable subset of the target production environment.

Costs of administering the test bed should not be underestimated. (In addition, as with most testing, investments must be compared with the cost of failure if investments are not made.) There must be close coordination between the operations and testing teamsnew software upgrades, patches, and changes in operating systems or connectivity must be included in the test bed as well. Some organizations manage their test-bed environments as though they were production-quality operations when new software is distributed, to ensure consistency between the two environments.

An environment with a test bed and load generators can be useful in other ways. For example, planners and administrators can carry out some real-world what-if analyses by changing the test bed or the profiles to test for extremes in behavior or sensitivities. Experimenting with different conditions may reveal more information, for instance. Performance degradation may follow different scenarios when it results from a steadily increasing load versus when a sudden jump in the load occurs. Carefully controlling the offered loads applied to the test bed can reveal inflection points in a variety of subsystems. What-if analysis also extends to other management strategies. Trying new content distribution techniques, or assessing the real impact of a cache in the test bed, helps drive better resource allocation decisions.

Many valuable tests can be conducted in a local environment, especially those identifying the inflection points for LANs, switches, and servers. However, for many services, the Internet and the other networks used by the enterprise must be part of the testing procedure. Testing across the Internet and enterprise networks is necessary because of the variability introduced by distance and routing changes, the use of different providers, and the interactions with other traffic flows.

Testers can use desktop systems located in various locations to drive the transactions at the test bed. For large-scale tests over the Internet or within corporate networks, service providers such as Keynote Systems and Mercury Interactive can provide load testing on demand. In many cases, it's much less expensive to use a service for highly realistic, massive load tests than it is to acquire the software, hardware, network connectivity, and expertise to run the test on your own.

Testing staff should perform highly repetitive preparatory testing using in-house systems, but they should consider using an external service for final, large-scale acceptance tests before production. (Those external services can normally reuse your in-house scripts, although they often need to be supplemented by parameters for abandonment behavior and flash load characteristics.) External testing organizations offer advantages beyond just saving money for major test efforts. They have extensive testing experience, may be faster than your own organization, and do not take your staff away from their normal functions.

Load testing of web applications requires load generators, which are specially programmed computer systems that produce large numbers of synthetic (virtual) transactions. Load generators run scripts of synthetic transactions that follow a prescribed set of steps. Some of these steps can be parameterized to simulate a wider variety of users or transaction types found in normal operations. These steps may include the following:

Establishing a connection to a web server
Authenticating the customer identity
Carrying out a task, such as browsing various web pages, tracking an order, or buying products
Completing the transaction, such as providing credit or shipping information
Checking accuracy
Breaking the connection

It is important to emphasize checking the completed transactions to determine correct functioningthat the expected information was actually delivered, the order was completed, the purchase was tracked, or the form was filled in properly. Correct functioning under normal loads should have been checked by regression testing, which is the comprehensive, highly-structured testing that's done to detect incorrect operation, long before the load tests. However, checking of proper function must be continued during load testing, as some failures appear only under severe load stresses.

The better the load generator, the more control it offers in creating controlled variation of different parameters of the test. At the same time, it's easy to get carried away with too much creativity. Like any test, tradeoffs must be made between faithful approximation of the real thing and keeping the tests sufficiently straightforward so that their management does not become more burdensome than operating the real environment.

Note that the efficiency of the load generators must be considered because each one supports a finite number of virtual users executing scripts. (Web load generators can alter the IP addresses in the packets they generate to simulate a large and diverse user population.) Large-scale testing may require a large number of load generators, and it is important to understand this limitation for any testing products. For example, if a load generator could handle only 500 virtual users, you would need 200 such load generators to simulate 100,000 concurrent users. This can be another reason for using load-testing services for large-scale tests.

A variety of additional scaling issues must be considered when the test bed is smaller than the actual operating environment. The results must be interpreted for a real environment. Application modeling tools, discussed in Chapter 12, can be helpful in extrapolating from a small test bed.

Other tips that can help in this effort include the following:

Maintaining the same ratios of resources seems to help with gauging results. This is a practical rule of thumb: the same dependencies are more likely to be revealed if the ratio of aggregated uplinks to backbone speed is maintained. As an example, if the production environment aggregates eight 100-Mb Fast Ethernet links into a 1-Gb Ethernet backbone, the test bed should use the same ratios. Using half that number would not create the same bottlenecks at high loads. It would, therefore, be misleading if used to determine a practical level of service over-subscription that both supports that target usage rates and is also economical in terms of the needed amount of supporting hardware and software.
Performing unit testing helps you see the maximum number of concurrent transactions or connections a single server can actually handle, along with the effect of adding multiple servers. (In some cases, you don't get full advantage from each additional server because of inter-server synchronization overhead and other factors.) Then the number of servers for a given load can be estimated. It is also important to stress test elements, such as load balancers, to determine their actual performance envelope for the anticipated number of connections and workload.

The collected information from load testing is used to characterize the performance envelope, including the system behaviors under flash load and when users are abandoning transactions. A good data management capability is needed to save and organize data from a series of tests. Planners, developers, and administrators can compare results from different tests and refine their understanding of behavior and their testing procedures. Statistical techniques can be applied profitably to tease out the past contributions of different factors; good software for this has come down in price over the past several years. The most important statistical technique is graphing; visual representation of test results can quickly identify inflection points.