Case Study: Capacity Planning


We left the TriMont team in Chapter 14 working in their test phase. Now let's build a capacity plan based on the results of their test. Since the capacity planning task is relatively short, we'll complete it here, rather than devoting an entire chapter. This part of the case study looks at the three phases of capacity planning just outlined in this chapter: Review plan requirements, review performance test results, and project capacity.

Review Plan Requirements

In Chapter 14, the case study testing phase, TriMont estimated load and throughput requirements. They also decided on 25% headroom . Based on the work in Chapter 14, the first part of the Capacity Planning worksheet is completed as shown in Table 15.12.

Table 15.12. TriMont Capacity Planning Worksheet, Part 1
  Input Data Source of Data Your Requirement
1. Concurrent users (with headroom) Appendix A, Hardware Sizing worksheet, line 7 10,000 concurrent users
2. Response time Appendix A, Hardware Sizing worksheet, line 2 5 sec
3. Throughput (with headroom) Appendix A, Hardware Sizing worksheet, line 9 120 pages/sec

Review Performance Test Results

We left Chapter 14 without developing any final test results, so before going any further, let's look at some performance results achieved during the TriMont tests. After the DBA investigated and corrected the HTTP session database CPU utilization problem by adding additional disk drives to the database to reduce I/O contention , TriMont completed single-server ramp-ups, with the results shown in Figure 15.2. As you see from the data shown in the figure, the test reached the target response time of five seconds with 400 concurrent users. At this point, we also reach a throughput plateau of 41 pages per second.

Figure 15.2. User ramp-up performance test results

graphics/15fig02.gif

TriMont also set up a two system cluster and, as expected, worked through some issues; eventually, however, the team produced good cluster scalability, as shown in Figure 15.3.

Figure 15.3. TriMont scalability results

graphics/15fig03.gif

Let's fill in Part 2 of the Capacity Planning worksheet with the TriMont test results. Remember, in order to reduce the load driver virtual user costs, the team ran the tests with reduced think time. Based on the calculations in the testing phase, 1 user with a 45-second average user visit time is approximately equivalent to 10 users with the full 7-minute average user visit. Therefore, the 400 client test load corresponds to 4,000 concurrent users with the full 7-minute user visit time.

To fill out the cluster scalability section of the worksheet, we need the cluster load and throughput corresponding to 5-second response time results. However, TriMont's cluster tests only went up to a 2.5-second response time because they did not simulate enough client load to reach the 5-second mark. If you recall, TriMont only purchased a 500 user license for this phase of testing. For now, let's use the cluster results TriMont achieved at 400 users, and decide later if the capacity plan contains too much risk and requires more testing.

Similarly, the CPU utilization collected for the cluster run does not represent the 5-second response time objective. However, single-server CPU utilization results do exist for response times at both 2.5 seconds and 5 seconds. We document all this data for Part 2 of the Capacity Planning worksheet in Tables 15.13, 15.14, and 15.15.

Table 15.13. TriMont Capacity Planning Worksheet, Part 2: Single-Server Results
  Response Time (Line 2 Above) Simulated Load (from Performance Results) Measured Throughput (from Performance Results)
4. 5 4,000 concurrent users 41 pages/sec
Table 15.14. TriMont Capacity Planning Worksheet, Part 2: Cluster Scalability Results
  Number of Application Servers Response Time Simulated Load Measured Throughput
5. 1 2.5 sec 2,000 22 pages/sec
6. 2 2.5 sec 4,000 42 pages/sec
7. 3 (not done)      
Table 15.15. TriMont Capacity Planning Worksheet, Part 2: CPU Utilization Data
Measurement 200 Clients 400 Clients (1 Application Server) 400 Clients (Cluster)
CPU Utilization      
HTTP server 41% 80% 41%/40%
Web application server 43% 86% 43%/43%
Catalog database server 8% 15% 15%
HTTP session database server 20% (after DBA) 38% 38%
Order test database server 15% 30% 30%
Boat Selector database server 6% 12% 12%
Account database server 8% 16% 16%
Response time 2.5 sec 5 sec 2.5 sec
Throughput 22 pages/sec 41 pages/sec 42 pages/sec

Project Capacity

Though TriMont only tested a two-server cluster, the scaling ratio data is still useful to the capacity plan. The cluster results of 42 pages per second with a 400-client load against two servers makes for an excellent 1.9x scaling ratio, as shown in line 12 of Table 15.16. Using this scaling ratio, we estimate the three- and four-server cluster scaling ratios at 2.8x and 3.7x respectively, as shown in Table 15.17. Remember, we do not recommend projecting beyond double the cluster size tested; therefore, we do not include estimates for a five- or six-machine cluster.

Now we must estimate the load and throughput even though the cluster results did not reach a five-second response time. Based on the single-server ramp-up results and the cluster results, we assume their two-server cluster reaches five-second response time with approximately 800 test clients and a throughput of approximately 42 pages per second. This closely matches the single-server results shown earlier for 400 users: a five-second response time and 41 pages per second.

Using this data, TriMont estimates that a three-server cluster supports 12,000 concurrent users and 118 pages per second. This matches nicely with the test plan estimate of three application servers to support the web site's peak load.

TriMont is relatively confident that a cluster of three application servers supports the load and dynamic page requests for the site. Now, to complete the capacity planning exercise, we need to consider the processor utilization data across all the servers in the web site. As you can see from the CPU utilization results in Table 15.15, at 400 clients none of the servers in the cluster goes beyond 50% utilization. As mentioned, the DBA solved the original problem of high utilization on the HTTP session database, bringing it down to only 38% at 400 clients.

Table 15.16. TriMont Capacity Planning, Part 3: Cluster Scaling Ratio
  Number of Application Servers Measured Throughput Scaling Ratio Calculation Scaling Ratio
11. 1 22 pages/sec Throughput (1) / Throughput (1) 1x
12. 2 42 pages/sec Throughput(2)/Throughput(1) 1.9x
13. 3 n/a Throughput(3)/Throughput(1)  
Table 15.17. TriMont Capacity Planning Worksheet, Part 3: Estimate Scaling Ratio
  Number of Application Servers (Projected) Scaling Ratio Calculation Scaling Ratio Estimate
13. 3 Line 12 + (Line 12 “ Line 11) 2.8x
14. 4 Line 13 + (Line 13 “ Line 12) 3.7x
15. 5 Line 14 + (Line 14 “ Line 13) n/a
16. 6 Line 15 + (Line 15 “ Line 14) n/a
Table 15.18. TriMont Capacity Planning Worksheet, Part 3: Estimate Throughput and Load
  Number of Application Servers Estimated Load (Number of Application Servers * Load, Line 5) Meets Requirement? (Line 1) [10,800] Scaling Ratio Estimated Throughput (Scaling Ratio * Throughput, Line 5) Meets Requirement? (Line 3) [120]
17. 3 12,000 yes 2.8 118 pages/sec close
18. 4 16,000 yes 3.7 156 pages/sec yes
Total number of application servers required: 3

Now let's estimate the CPU utilization for a three-server cluster supporting 10,080 users and 120 pages per second in Table 15.18. First, we double the projections for 800 users. Then, as shown in Table 15.19, we use these results to estimate the CPU utilization on each server at 1,200 test clients. The last column meets the plan objectives for load, throughput, and response time.

Table 15.19. TriMont Processor Utilization Projections
Measurement 400 Clients (2-Server Cluster) 800 Clients (2-Server Cluster) 1,200 Clients (3-Server Cluster)
Projected user load CPU utilization 4,000 8,000 12,000
HTTP server 41%/40% 82%/80% 80%/80%/80%
Web application server 43%/43% 86%/87% 87%/87%/87%
Catalog database server 15% 30% 45%
HTTP session database server 38% 76% 100%+ [!!]
Order test database server 30% 60% 90%
Boat Selector database server 12% 24% 48%
Account database server 16% 32% 48%

Based on these projections, our original estimate of a 1:1 ratio between the HTTP server and the application server looks good. The addition of the caching proxy servers also lowers the HTTP server processor requirements. However, based on these numbers , the HTTP session database server looks like a potential bottleneck at peak load. At a minimum, the HTTP session database requires a larger server.

The Order test database server processor utilization projection of 90% doesn't really concern us, since this is not the production database. Of course, we recommend that TriMont monitor the additional load against the real production Order database to make sure that it is sufficiently sized .

Once TriMont resizes the HTTP session database, they begin their production deployment of their web site. Just like any other web site, the TriMont site needs an ongoing capacity plan, as we'll discuss in the next section.



Performance Analysis for Java Web Sites
Performance Analysis for Javaв„ў Websites
ISBN: 0201844540
EAN: 2147483647
Year: 2001
Pages: 126

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net