Projecting Performance | Performance Analysis for Javaв„ў Websites

In order to do a capacity plan, you sometimes must estimate performance beyond the actual results measured in your lab tests. In this section, we work through an example of capacity extrapolation. While this procedure is never ideal, many web sites use extrapolation to cover gaps in their performance testing.

Projecting Application Server Requirements

As in the previous steps, start with the client load and throughput at your response time objective. As shown earlier in Table 15.3, the site response time objective of two seconds occurs at 2,000 users and 100 transactions per second. A cluster of three application servers with 6,000 concurrent users achieved 295 transactions per second and a two-second response time. This also represents the highest user load achieved in the lab.

However, we really want to support 10,000 concurrent users, 500 requests per second, and a two-second response time. So we extrapolate our performance test results across additional servers to meet the plan requirements for concurrent users and the throughput objectives. Start by calculating the scaling ratio from the cluster tests (lines 11 “13), and then estimate the scaling ratio across additional servers (lines 14 “16), as shown in Table 15.6.

Table 15.6. Capacity Planning Worksheet: Scaling Ratio Calculations, Lines 11 “16

	Number of Application Servers	Measured Throughput	Scaling Ratio Calculation	Scaling Ratio
11.	1	100 requests/sec	Throughput (1) / Throughput (1)	1x
12.	2	196 requests/sec	Throughput (2) / Throughput (1)	1.96x
13.	3	295 requests/sec	Throughput (3) / Throughput (1)	2.95x
	Number of Application Servers ( projected )		Scaling Ratio Calculation	Scaling Ratio Estimate
14.	4		2.95 + (2.95 “ 1.96)	3.94x
15.	5		3.94 + (3.94 “ 2.95)	4.93x
16.	6		4.93 + (4.93 “ 3.94)	5.92x

Now use the scaling ratio calculations to estimate the throughput supported by additional servers. Table 15.7 shows the estimated performance through a five-server cluster. We estimate that five servers will support the 10,000 client requirement (2,000 clients per server) with two-second response time. Using the five-server scaling ratio of 4.93 calculated earlier, the estimated throughput is 493 transactions per second. This is just short of the 502 transactions per second requirement in our test plan, and considering the headroom factors built into the estimates, it is probably close enough. There is no need to plan on a sixth application server at this time.

Table 15.7. Capacity Planning Worksheet: Estimated Servers, Lines 17 “19

	Number of Application Servers	Estimated Load (Number of Application Servers * Load, Line 5)	Meets Requirement? (Line 1)	Scaling Ratio	Estimated Throughput (Scaling Ratio * Throughput, Line 5)	Meets Requirement? (Line 3)
17.	4	8,000	no	3.94x	394 requests/sec	no
18.	5	10,000	yes	4.93x	493 requests/sec	yes
19.	6	”	”	5.92x	”	”

This simple example assumes five servers exhibit the same scaling proportions as your three-server test results. As a general capacity planning rule, do not extrapolate beyond twice the performance limits measured in your test environment. Extrapolating within double capacity generally presents manageable risk.

Projecting Hardware Capacity

Completing a capacity plan requires an in-depth look not only at application server requirements, but also at the other servers and application factors that affect scaling. Limiting your estimates just to your HTTP or application server does not give you a sufficient capacity for your web site as a whole. Let's now look at the hardware requirements across all the servers.

We use CPU utilization in this step as a gauge of server capacity. By measuring the increase in CPU utilization on the servers during scalability testing, we obtain a key indicator of how increasing the application server capacity impacts other parts of the web site.

Projecting Processor Utilization

In this step, you look for the pattern of processor utilization demonstrated by each of the servers during the cluster tests. For example, the CPU utilization results from Table 15.5 show that as load increases from 2,000 to 4,000 users, the HTTP server, database server, and load balancer CPU utilization all increase. During the test runs, the HTTP server and database CPU utilization doubled as the client load doubled . The load balancer CPU, on the other hand, did not significantly increase. The doubling of resources on the HTTP server and database is not surprising, given that both these servers handled twice the transactions; however, we need to factor these increases into our planning estimates.

For hardware planning purposes, extrapolate the processor utilization for each server using the same relative increase shown in the performance tests. Table 15.8 shows the relative increase for the HTTP server, load balancer, and database CPU utilization as we added application servers and load to the cluster.

Table 15.8. Capacity Planning Worksheet: CPU Utilization Delta, Lines 20 “23

	Number of Application Servers	Application Server CPU	HTTP Server CPU	Load Balancer CPU	Database 1 CPU
20.	1	75%	20%	10%	15%
21.	2	75%	40%	15%	30%
22.	3	75%	60%	20%	45%
23.	Delta	n/a	+20%	+5%	+15%

Using the delta, project the processor utilizations for all the servers when adding more application servers. Table 15.9 shows these projections for our example. These projections point out a significant issue. As you can see in Table 15.9, we project the processor utilization for the system running the HTTP server to be at 100% CPU utilization with five application servers. Remember, once a server is running near 100% utilization, wait time increases for resources (a bottleneck), which also results in longer response times. Therefore, at five servers, we project that the HTTP server becomes a bottleneck for our system. Also, note the estimates for database processor utilization. While they still fall within acceptable ranges, you may want to consider an upgrade for this system as well.

Table 15.9. Capacity Planning Worksheet: CPU Utilization Projections, Lines 24 “25

	Number of Clients	Number of Application Servers	Application Server CPU	HTTP Server CPU	Load Balancer CPU	Database CPU
24.	8,000	4	75%	80%	25%	60%
25.	10,000	5	75%	100%	30%	75%

Based on the processor utilization projections, the application server performance projections done earlier in Table 15.7 are most likely invalid (HTTP server capacity needs adjusting). Let's look at how to do this next .

Identify Server Ratios

Good capacity planning determines the resources needed throughout the web site to handle projected user loads. This means carrying your scalability extrapolations to the rest of the equipment in your web site as well. The scaling alternatives for other systems are similar to those discussed in Chapter 11 for the application server: Employ additional hardware resources on a single server or use some type of cluster configuration.

Many production sites deploy multiple HTTP servers and use a centralized database. Many web sites also devote separate servers to each database used by the web applications. For example, they place the persistent HTTP session database on one server and the application databases on another server. Also, as we've discussed previously, if the HTTP servers and database receive lots of traffic, consider caching techniques to reduce the stress on these systems.

To continue our web site projections, we try to develop capacity ratios between the various components in the web site. For example, a single HTTP server requires only 20% CPU utilization to support one application server operating at almost maximum CPU utilization. Therefore, a ratio of approximately three or four application servers for each HTTP server is appropriate for this web site. So, for five application servers, we need two HTTP servers.

The Capacity Sizing worksheet assists with this calculation, as shown in Table 15.10. Configurations with significant static content and high security needs may actually invert these ratios (they require more HTTP servers than application servers).

Table 15.10. Capacity Planning Worksheet: HTTP Server Projections

	Number of Application Servers	Projected HTTP Server CPU CPU ( n “ 1) + Delta	Additional Server (If CPU 80% or Higher)	New Projected Server CPU (If New Server Added)
27.	4	60% + 20% = 80%	+1	40% each
28.	5	40% + (20% / 2) = 50%	+1	n/a
29.	6	”	”	”

Similarly, the test results show that one application server only requires 15% of the database CPU capacity, so a single database probably supports five application servers. However, as a rule of thumb, assume no more than 80% CPU utilization on a database server before impacting overall application response time. On the basis of these results, we develop the following planning ratios:

 1 HTTP server : 4 application servers 5 application servers : 1 database server

Now, let's reapply these ratios to our earlier projections. In our example, we estimated requiring five application servers to handle the web site's load. Using our newly developed ratios, we plan for two HTTP servers, five application servers, and one database server to support the web site's load. Table 15.11 contains the completed projections.

Table 15.11. Final Hardware Projections

	Number of Application Servers	Number of Times to Scale HTTP Servers	Number of Times to Scale Databases	HTTP Server	Database Server
36.	5 at 75%	1		2 at 40%	1 at 75%

At this point, if you've never tested with multiple HTTP servers, it might be worthwhile to set up a small test in your lab with two HTTP servers and two application servers. Through this test, you establish that the HTTP servers do not impact vertical scalability. While we feel very confident that this is true for most commercial HTTP servers, any time you develop a plan with an untested configuration, consider testing this setup briefly to confirm your assumptions about its performance.

In this example, we've discussed horizontal scaling for this web site and how to develop estimates for a cluster of machines. Of course, if you prefer, you might apply your vertical scaling test results in the capacity planning phase and develop a plan for growing your existing hardware rather than acquiring more machines. In fact, depending on the headroom in your calculations, consider vertically scaling the database server. We currently project 75% utilization for this component, which nears its maximum useful capacity. Placing it on a larger machine gives you a wider margin of comfort .

Scaling Assumptions

As the previous exercise demonstrated, capacity planning involves a certain amount of guesswork. The closer your tests come to simulating production conditions and loads, the better your capacity projections. Once you deploy your application, monitor its performance and validate the assumptions you made about load and usage patterns. Often, things don't quite operate as planned, and you must readjust your web site accordingly .

Before you have production data, what are reasonable assumptions to make based on test hardware configuration and test results? If your tests were comprehensive enough to actually meet your user load and response time goals, start by assuming that your production application performs comparably to your test application. It bears repeating: This assumption is only as good as the test plan behind it, as well as your ability to simulate the production environment and user patterns. Having performance data that covers your expected load makes your capacity planning much easier. Use the test numbers directly, apply a headroom factor to account for any estimation errors or unexpected conditions, and complete the planning exercise.

Looking back to our example, if we really required only 1,000 users and a one-second response time, the test data shows us directly that we can support this with one server. If you want significant headroom in this case, use a two-server cluster (also included in our tests). Your test data gives you a higher level of confidence that the two-server cluster meets your capacity requirements. Sometimes it's impossible to test the full load of the web site in your test lab. In these cases, the extrapolations we just covered provide the best planning data available.

In general, making any scalability assumptions implies risk. A web site contains many components, and any one these may become a bottleneck under the right conditions. Often, adding resources in one area of the web site produces a bottleneck in another area. If possible, prove scalability beyond two servers during your testing in order to confidently project near linear scalability. In general, web applications typically show good "near linear" horizontal scalability, assuming the absence of application bottlenecks or other infrastructure issues.

As a general rule, we recommend running your tests with at least three servers in a cluster configuration. Also, try to test at least half of the performance your production web site requires. This lowers the extrapolation risk in your production performance.