Our Commercial Workload Model: Stock Trading

The representative application domain discussed in this chapter is stock trading. Not only is the domain widely known, but it is also representative of many electronic businesses online that use the "buy and sell" model, which is rudimentary to commerce. The application, called Trade3, is a benchmarking application developed by IBM for measuring the performance of its own application server, the IBM WebSphere Application Server. Trade3 is publicly available from IBM at http://www-306.ibm.com/software/webservers/appserv/benchmark3.html.

Trade3 models an electronic stock brokerage providing web-based online securities trading. Trade3 uses many features of J2EE 1.3, which is the supported J2EE version by IBM WebSphere Application Server Version 5.1. Some of the items included in J2EE 1.3 are local interfaces, message-driven beans, and Container-Managed Relationships (CMRs). Trade3 also incorporates web services as one of its major enhancements. For more information about J2EE 1.3, visit the Sun web site at http://java.sun.com/j2ee/1.3/index.jsp.

Before jumping into the performance analysis exercise, the remainder of this section discusses the following three areas:

The configuration of the system in which the application is deployed. In performance analysis, it is highly important to fully understand the system-wide configuration. All future analyses will be based on this understanding. In a real customer environment, this configuration can be very complicated. For example, firewalls and DMZ might be in place, a number of geographically dispersed subnetworks hosting the same application could exist, database replication might occur on different hosts, and gateways to legacy systems or web services might be available to external systems. You also need to know the hardware specifications used across the entire system to account for their performance contribution.
Background information on the Trade3 application from the user's perspective without discussing how the application was implemented or how many J2EE features it uses. Performance tuning requires that you have some background information on the enterprise application itself because some performance problems can be traced back to the enterprise application.
The performance analysis methodology. When fixing a performance problem, it is important to have a good methodology in place. Not only does it allow you to solve the problem in a very systematic way, it also makes the whole experience less painful.

The next section details the configuration of the system where the application used for our performance exercise was run.

System Configuration

The case study involves a four-tier, nonclustered configuration, as shown in Figure 22-1. There are four machines, one for each tier, that are connected via a 100mbps Ethernet network. As indicated, all four machines have exactly the same specifications. The first tier is the client tier with a workload driver that sends HTTP requests to the web server. The driver, a homegrown tool called the Web Performance Tool (WPT), simulates concurrent users by creating a thread for each virtual user. WPT lets you specify the URL requests to send to the web server and the number of concurrent users who will send those requests. For the length of the test run, you can specify either the total number of requests that need to be sent or an absolute period of timefor example, 15 minutes. At the end of each run, the tool outputs statistics such as throughput, average response time, total requests sent, and number of failed responses.

Figure 22-1. Four-tier configuration for the Trade3 performance benchmark.

The second tier contains an Apache-based web server along with a plug-in that allows the web server to route requests to the application server. Putting the web server in a separate machine is a common configuration in big enterprises. For security purposes, the web server might be placed in a DMZ (demilitarized zone) where it is protected by two firewalls.

The third tier contains the application server with the Trade3 application. The application server used is the IBM WebSphere Application Server 5.1. The Trade3 application uses two resourcesone for messaging through the JMS framework and the other for data source. The messaging is used for notification that a transaction, such as buying stock, has been completed. The data source contains user account information as well as stocks and their prices. The JMS provider is IBM WebSphere MQ v5.1, which ships with the application server. The database provider, which is located in the fourth tier, is IBM DB2 v8.1 SP3. The JDBC driver uses type 2, which requires a DB2 database client module running on the application server box to communicate with the database server. The arrows in the diagram indicate the flow of data. Arrows with dotted lines indicate data flow within the same box.

After Trade3 was installed on the application server, the database, Trade3DB, was populated with 500 users and 1,000 stocks using the application itself as a configuration option. The queues for JMS were also created, and IBM WebSphere MQ processes were started.

A Little More About Trade3

About 75% of the time performance problems can be attributed to the application. Thus, it is important to profile the application to find potential performance problems. Ideally, the development team should practice software performance engineering, which advocates the early consideration of performance from the beginning of the development cycle. For some, this may be a revolutionary ideaquite different from what most of us were taught in school.

In this exercise, we assume that Trade3 has been carefully analyzed and optimized. Our focus is on the system itselfthe application server, database server, and Linux. This section shows what the application looks like from a user's perspective and how it flows.

Use any browser to get to the application by entering the URL http://helene/trade, where helene is the hostname of the web server. The Trade3 welcome page is shown in Figure 22-2. The welcome page itself shows the high-level architecture of the Trade3 application. It shows that requests can be handled by either a servlet or a web service. This chapter does not use any of Trade3's web services features.

Figure 22-2. Trade3's welcome page.

Trade3 uses servlets to process requests and then passes the requests to WebSphere's Command Bean framework. From there, the requests are wrapped in a command bean and passed to a session bean in the EJB container. From the EJB container, the session bean decides which entity beans the request must be handled by as well as any message-driven beans (MDBs). The entity beans actually access the database, while the MDBs talk to the JMS provider. Response back to the user is implemented through JSPs.

To use the application, click Go Trade! Trade3 prompts you for a username and password. After validation, the home screen opens for the user who logged on, as shown in Figure 22-3.

Figure 22-3. The Trade3 home screen after a successful login.

After you are inside the home screen, you can view your account and edit your account information. To sell some of your stocks, click the Portfolio link. A screen similar to what is shown in Figure 22-4 opens. You can sell a stock by clicking the Sell link of the corresponding stock or "holding" what you want to sell. Similarly, click the Quotes/Trade link if you want to buy a stock. The screen for buying stocks is shown in Figure 22-5.

Figure 22-4. The Portfolio screen, where you can sell a stock.

Figure 22-5. The screen for buying stocks.

To check the current price of a stock, check its ticker symbols. To buy the stock, specify in the corresponding box the number of shares to buy, and then click Buy. The transaction is processed, and you are notified when the transaction has been completed.

When we run our performance tests, the workload driver sends HTTP requests corresponding to logging in to an account, viewing an account, buying and selling a stock, and logging out.

Performance Analysis and Methodology

Before attempting to analyze and tune the performance of any system, you need a well-defined methodology for carrying out the analysis. Table 22-1 lists several points to consider when developing a methodology for performance analysis and tuning.

Table 22-1. Initial Performance Data for the User Ramp Up Test of Trade3
Number of Concurrent Users	Throughput (Req/Sec)	Avg Response Time (Seconds)	Application Server CPU %	Database Server CPU %	Little'sLaw
1	15.92	0.062	10.7/1.16	1.83/0.47	1
2	30.72	0.064	20.84/2.25	3.87/0.82	2
5	54.82	0.09	41.16/4.89	9.05/1.74	5
10	67.69	0.147	51.96/5.35	12.48/2.31	10
20	67.7	0.294	53.22/6.38	14.29/2.32	20
40	67.03	0.597	52.98/5.73	16.84/2.56	40
80	66.7	1.2	49.99/5.93	16.41/2.19	80
160	62.1	2.562	51.81/6.34	18.19/2.57	159

For the performance exercise discussed in this chapter, the main goal is to determine the maximum capacity of the 2-way 2.8GHz System on IBM WebSphere Application Server 5.1 running Trade3 on Linux (kernel 2.4.21-17smp). Capacity refers to the number of concurrent users. The requirements are as follows:

The throughput is maximized.
The average response time is no more than two seconds.

Some Key Points to Remember When Tuning for Performance

Know what the performance requirements/goals are, and determine which performance metrics really matter.
Perform the right test for the right purpose.
Define how the test will be performedwhat data to use, how long is a test run, when do you start getting numbers during the run, what tools to use, and so on.
Stick to the performance goals. Do not attempt to change them unless there are good reasons to do sofor example, the goals are unrealistic given the environment.
It is important that the tests be done consistently and that results are repeatable if exactly the same set of data and procedures were used.
Change only one variable at a time when running a series of tests for problem analysis or diagnosis.
Record all results in a methodical manner with very good documentation. Your records are good if someone can understand the historical results, visualize the tests without having seen them, and draw some conclusions.
Automate the procedures whenever possible to reduce human error. However, it is important that the correctness of the automation be verifiable and that 100% confidence in the test results can be claimed.
Minimize sharing of resources whenever possible to avoid contamination. If sharing is inevitable, back up your system before giving up resources for the next person. It is also a good practice to maintain an audit trail of what changes were made to the system.
Perform the same test run at least three times to make sure that the numbers are right.

The next step is to determine which performance tests need to be run. We need to do a User Ramp Up test to determine the performance of the system while the workload is increased. The workload is influenced by the number of concurrent users sending requests to the server. We will stop increasing the workload when the average response time goes above 2 seconds. To maximize throughput, the CPU utilization can be near 100% on the application server as long as there are no major bottlenecks. Thus, important metrics to gather for each test run are CPU utilizations on the web, application and database servers, the throughput, and the average response time.

A performance test run consists of a warmup run followed by the actual run. For each actual run, the number of concurrent users should be defined in the workload driver. Our test starts with one user, then two, then five, and continues to double the number of users from then on. The warmup run consists of sending 1,000 requests by one user, then by two users, five users, 10 users, 20 users, and so on up to 100 users. An actual test run is 5 minutes long. The actual test run (for a given number of concurrent users) should be repeated three times. No warmup is needed between these runs, but the database needs to be reset. The final results should be the average of the three runs. Make sure that the results produced by these three runs are not very different from each other. Use the sar or iostat tools to get the CPU utilization of each box. You should be able to differentiate between the CPU used by the user processes and the CPU used by the system (kernel).

In the case study, the tests were run with no think time. When a user submits an HTTP request, he waits until he gets the response in his browser. Typically, a user spends some time (for example, reading the page) before sending another HTTP request. That time interval is called think time (or pause time). So by not specifying any think time (think time=0) to the workload driver, a user immediately submits the next request upon receiving the response to the previous request. Eliminating think time works the application server really hard. In the real world, however, users do pause for some time. One hundred concurrent users with no think time is actually equivalent to a higher number of users in the real world. There is no exact formula for computing for the actual "higher number of users," but the most simplistic approximation is to multiply the number of users by the estimated think time divided by the average response time plus 1. While a user is thinking for, say, 8 seconds, other users are submitting their requests and waiting for the response to come back. If the average response time is 2 seconds, that means that while the user was thinking, four other users were able to submit a request and get a response. Four other users plus the thinking user equals five. Multiplying five by 100 gives us 500 users, which is the estimated number of real-world users.