Detailed TriMont Web Site Planning Estimates


The TriMont web site team reviewed your list of questions, and provided additional information prior to your second visit.

  • A user averages five pages per visit.

  • The average page size is about 60KB and contains 15 static elements.

  • The catalog database contains about 1M items and is located on a mainframe.

    • The average catalog database request returns 2KB of data.

    • Ninety percent of page hits touch the catalog data.

  • The order system contains about 50,000 active orders during the peak shopping season (it takes about a week to ship the order and have it delivered to the customer). The order system database is on a mid- tier UNIX machine.

    • The team expects about 3% of a day's users to check order status.

    • The order system also connects to the shipping company's site via a URL link on TriMont's site.

  • The account database contains 17M accounts and is located on the mainframe.

    • The team expects 2% of a day's users to check their accounts.

  • The River Conditions applet updates its condition information every 15 minutes. The condition information is only a few hundred bytes, at most, so we consider its overall effect on the traffic negligible. Other applet information of interest:

    • The applet is 750K.

    • The team guesses about 10% of the daily users load the applet.

    • The traffic from the applet to the server is included in the daily figures.

  • TriMont's Boat Selector function uses a special local application database kept on a mid-tier UNIX system. The database is about 20KB in size, and it references catalog items kept in the host catalog database.

    • Boat Selector receives about 15% of the day's user traffic.

    • Each query returns an average of 1K.

  • The site only uses SSL if the user does one of the following:

    • The user enters the Purchase sequence.

    • The user checks private information, such as his account.

  • The average HTTP session size is 2K per user.

    • User sessions time out after 30 minutes.

    • Users have a logout function that immediately destroys the HTTP session object.

  • Your question about failover struck a chord with the TriMont team.

    • They are very interested in setting up the site for failover.

    • Their application server shares HTTP session data between servers by using a common HTTP session database.

    • They want a minimum of two servers to provide failover capability, whether or not the expected load actually requires multiple servers.

Now that you have some more information about the site, you may run some additional estimates prior to your follow-up visit with the customer. Let's begin with the throughput calculations to complete the Capacity Sizing worksheet, and then we'll move on to the network analysis. Table 10.3 shows the remaining input data for the Capacity Sizing worksheet.

Table 10.3. Capacity Sizing Worksheet, Remaining Input Data
6. Average pages per visit Marketing team 5
7. Average static elements per page Web site application architect 15

Calculating Throughput (Page Rate and Request Rate)

Page Rate

The customer tells us the average user views five pages during her visit. Let's use this number to determine how many pages per second the site must serve at peak. The peak page rate calculation looks like this:

 Users arriving/hour * Pages each user requests / minutes/hour / seconds/hour = Page arrival rate/second 63,000 users/hour * 5 pages/user visit  87.5 pages/second 
Network Traffic Mix

TriMont told you something very interesting about its web pages: Each page contains 15 static elements, on average. So, for every page the site returns, the browser asks for 15 static elements from the site to complete the page. At peak, the site receives the following request burden :

 87.5 pages/second * 15 static requests/page = 1,312.5 static requests/second 

Assuming the pages don't contain framesets (which might result in multiple dynamic calls), each page request results in a call to one dynamic element, such as a servlet or JSP. This means the site takes the following total request rate during peak:

 1,312.5 static requests/second + 87.5 dynamic requests/second = 1,400 total requests/second 

You might encourage this customer to consider a caching proxy server for the static content, as we discussed in Chapter 3. Such a server might improve the overall performance of the web site by returning static elements more quickly to the users.

Table 10.4 shows the completed throughput section of the Capacity Sizing worksheet.

Table 10.4. Capacity Sizing Worksheet, Throughput
  Calculated Data Equation Total (per hr) Total (per min) Total (per sec)
13. Throughput: page rate

User arrival rate * Average pages/visit

Line 11 * Line 6

63,000 visitors /hr * 5 pages/visit = 315,000 pages/hr 5,250 pages/min 87.5 pages/sec
14. Throughput: requests

Dynamic page rate + Average static elements/dynamic page * Page rate

Line13 + (Line7 * Line13)

Not typically used Not typically used 87.5 dynamic page/sec + 87.5 pages/sec * 15 static elements/page = 1,400 total hits/sec

Network Analysis

Now let's complete a rough network analysis as described in Chapter 9. For this example, we'll go through the steps; later we'll show you how to use the Network Capacity worksheet in Appendix A to build the estimate. This example assumes that all of the components of network traffic (outbound, inbound, database transfers, and so on) share the same network at some point. Some web sites use different networks for inbound versus outbound traffic. Likewise, database traffic sometimes moves on a separate network as well. In practice, you need to consider the size of data elements sharing the same network in order to size the network in question.

Always use peak load for your estimates. You must build a site capable of handling the largest anticipated traffic volume.

Calculating Network Analysis Components

Most networks rate their transmission capabilities in terms of the bits or bytes per second they transmit. Therefore, we need to break our data into bytes and focus on reducing our data to the peak second.

User Arrival Rate

A key estimate required for subsequent network analysis calculations is the user arrival rate. This estimate tells us how many users come onto the site during any given second in the peak hour. On some sites, the user's first page request "weighs" more than subsequent visits . For example, the user might trigger a download of a frameset, JavaScript, or graphics on her first visit. The browser caches this information for subsequent requests, so the subsequent requests generate less network traffic. In addition, the server sets up HTTP session data and perhaps retrieves preference data from a database on the first viewed page.

For this site, we know some users make use of a heavyweight applet. We need the user arrival rate for later estimates involving the impact of this applet on the network traffic. If you didn't do this calculation before, the user arrival rate in seconds is

 Users arriving at the peak hour / Minutes/hour / Seconds/minute = User arrival rate/second 63,000 users at peak/hour  17.5 users/second 

The input data for the Network Capacity Sizing worksheet is shown in Table 10.5.

Table 10.5. Network Capacity Worksheet, Input Data
  Input Data Source of Data Your Data
1. Average page size (bytes) Web application architect 60KB
2. Maximum page size (bytes) Web application architect (NA)
3. Requests per second Appendix A: Capacity Sizing worksheet 87.5
4. User arrival rate per second Appendix A: Capacity Sizing worksheet 17.5
5. Average bytes of application data transfer per request (i.e., from database) Web application architect 2KB for catalog; 1K for Boat Selector data
6. Percent of requests requiring data transfer Web application 90%
7. HTTP session size (only if shared session is used) Web application architect 2KB
8. Size of applets Web application architect 750KB
9. Percentage of user requests requiring applet download Web application architect/marketing 10%
Outbound Sustained Traffic

First, let's look at the traffic generated by the page requests. As we discussed in Chapter 5, e-Commerce sites typically do not benefit significantly from browser caching. Thus, let's assume most of the data on each page requires loading on each request. The calculation in this case looks as follows :

 Page rate * Average page size = Bytes/second 87.5 pages/second * 60KB/page = 5.25MB/second 
Inbound Sustained Traffic

Outbound traffic is always larger than the inbound request traffic. As a generalization, we assume inbound traffic at 20% of the outbound traffic. For an e-Commerce web site, this might be a bit generous, given the overall size of the pages, but we always prefer to overestimate rather than underestimate. (Slightly too much capacity is less problematic than slightly insufficient capacity.) This is our estimate for inbound traffic in this case:

 Outbound sustained traffic * 20% = Inbound traffic/second 5.25MB/second * 20% = 1.05MB/second 
Database Transfer

We must consider other data moving in the network as well, such as data returned from application databases, in our calculations. The customer tells us the database returns about 2KB of data. About 90% of the site's requests generate a database call. The database transfer calculation looks like this:

 Page rate * Data retrieved/page * Weighting factors = Bytes from db/second 87.5 pages/second * 2KB/page * 90% = 157.5KB/second 

Notice that TriMont did not provide any information about the database transfers from the account and order databases. These data transfers may be very small relative to other traffic (such as the outbound HTML pages), but you need to verify this with the customer at your next meeting.

HTTP Session Transfer

The customer wants to use a database for HTTP session failover purposes. For this calculation, we like to assume the full contents of the HTTP session transfers between the application server and the persistent HTTP session database. This transfer occurs on every request, depending on the customer's application's characteristics.

We also assume some type of caching mechanism at the server, so we do not consider a round trip for this information in our calculations. This means that the session database is not read on every request, but is updated on every request, at a minimum to update the last access timestamp. If the caching scheme is not in place or does not work effectively, you must account for this in your calculation. The HTTP session transfer calculation looks like this:

 Page rate * Average HTTP session data transferred/page = Bytes to the HTTP session db/second 87.5 page requests/second * 2KB/request = 175KB/second 
Applet Transfer

This web site contains a large applet (750KB). Depending on the frequency of the applet's download, the applet might impact the network capacity significantly. The customer tells us that about 10% of the site's users access this applet, so let's use the user arrival rate to give us a feel for the impact of the applet at any given second. The applet transfer calculation looks like this:

 User arrival rate * Applet size * Weighting factors = Applet bytes transferred/second 17.5 users/second * 750KB/user transfer * 10% = 1.313MB/sec 
Total Transfers

Now we add up all of the elements of the network traffic to get a rough total of sustained network traffic:

 Outbound traffic + Inbound traffic + Database transfer + HTTP session database transfer + Applet transfer = Total known data transferred/second 5.25MB/second + 1.05MB/second + 157.5KB/second + 175KB/second + 1.313MB/second = 7.946MB/second total known data transfer 

In Chapter 9, we estimate how much sustained traffic three popular network types support. These estimates include some allowance for protocol overhead (packet headers, connection "handshaking," and so on).

This web site probably needs a Gigabit network to give us a comfortable operating margin in production. If we use a switched Ethernet, we might experience a site slowdown if our estimates prove even just a little too low. Also, the switched network does not give us a lot of room for future web site growth.

Using the Network Planning Worksheet

Let's complete the calculations in the worksheet from Appendix A in Table 10.6.

Table 10.6. Determining a Minimum Network Requirement of 1Gb
  Network Data Equation Total
10. Outbound HTML / static elements

Average page size * Requests/sec

Line 1 * Line 3

60KB * 87.5 pages/sec = 5.25MB/sec
11. Inbound HTTP requests (estimate)

20% * Outbound data

20% * Line 10

5.25MB/sec * 20% = 1.05MB/sec
12. Application data transfer

Average bytes/request * Requests/sec * % request requiring data

Line 4 * Line 3 * Line 6

(Catalog) 2KB * 87.5 pages/sec * 90% = 157.5KB/sec

Average bytes/request * Requests/sec * % requiring data

(Boat Selector) 1KB * 87.5 pages/sec * 15% = 13KB/sec

13. HTTP session data transfer

HTTP session size * Requests/sec

Line 7 * Line 3

2KB * 87.5 pages/sec = 175KB/sec
14. Applet transfer

Applet size * User arrival rate/sec * % requests using applet

Line 8 * Line 4 * Line 9

750KB * 17.5 users/sec * 10% utilization = 1.313MB/sec
  Total   7.959MB/sec
Network Sizing
Network Speed (bits) Estimated Bytes Supported (Planning) Equation Yes/No
100Mbps Ethernet 5MBps 5MBps > 7.959? No
100Mbps Switched Ethernet 8MBps 8MBps > 7.959? Yes (but really, really close)
1Gbps Ethernet 50MBps 50MBps > 7.959? Yes
Dial-up User Considerations

If TriMont receives a lot of dial-up traffic, the page sizes might become a factor over slow phone lines. The average page size is 60KB. Over a 24.4Kbps phone line, this requires at least

 24,400 bits/second / 8 bits/byte = 3,050 bytes/second 60,000 bytes / 3,050 bytes/second = 19.6 seconds! 

This well exceeds the five-second response time threshold the TriMont team set for themselves .

You discuss this with the TriMont team. They believe most of their traffic comes from high-speed connections or users equipped with 56Kps modems. This brings the transfer time on the large pages to the sub-10-second threshold. They believe this is a reasonable response time expectation for people using this level of equipment. You consider the TriMont team's attitude a bit optimistic. In case they change their minds in the future, you give them the following suggestions to consider:

  • They could give users a choice of " low-resolution " pages returning fewer or no graphics.

  • The pages might return fewer search items by default. For example, if the TriMont site normally returns a maximum of 15 items per user catalog search, they might instead return only the top 3 (with a link to view all the search results) to reduce transfer size. Users with higher bandwidth connections could choose to see more items per page on their search results.

  • They might consider using different compression techniques to reduce the graphical element sizes.

It is also worth mentioning to the customer that the applet requires very long download times over a 24.4Kbps modem (more than four minutes). Even doubling the modem speed does not move the applet into the desired response time range.

HTTP Session Pressure

After completing the network analysis, you turn your attention to memory issues. Since the TriMont site uses HTTP sessions, you need an estimate for how much memory the HTTP sessions require during the peak period. As we discussed in Chapter 4, you need to make sure the Java application server has enough memory to operate . The TriMont site removes the user's HTTP session when the user logs out. Do not count on this in your estimations of HTTP session pressure! Most users do not log out at the end of their visits; they just move on to another web site. So, when you calculate HTTP sessions, do not assume the web site gets significant benefit from the logout function. (Likewise, do not write your test scripts so that each virtual user dutifully logs out of the test case.) See Chapter 7 for further information.

TriMont's HTTP session timeout is 30 minutes. Given that the average user visit is 7 minutes, let's assume the user's session lasts 37 minutes (7 minutes for the actual visit and 30 minutes for HTTP session timeout). Given our previous user arrival rate estimate of 17.5 users per second, the web site might have the following simultaneous HTTP session in memory during the peak:

 User arrival rate (seconds) * 60 seconds/minute * Number of minutes user's session lasts = Maximum HTTP sessions 17.5 users/second * 60 seconds/minute * 37 minutes = 38,850 HTTP sessions 

TriMont HTTP sessions average 2KB of data in size. At peak load, the web site requires the following memory to hold the HTTP sessions:

 38,850 simultaneous HTTP sessions * 2KB/HTTP session = 77.7MB 
Using the JVM Sizing Worksheet

Let's put all of the HTTP session sizing into the worksheet from Appendix A in Table 10.7.

Table 10.7. HTTP Session Impact Worksheet
  Input Data Source of Data Your Data
1. User arrival rate (per minute) Appendix A: Capacity Sizing worksheet 1,050 visits/min
2. Average user visit time (minutes) Appendix A: Capacity Sizing worksheet 7
3. Planned HTTP session timeout interval (in minutes) System Administrator 30
4. HTTP session size (per user) Appendix A: Network Capacity Sizing input 2K
Calculating HTTP Session Memory Requirement
  Calculated Data Calculation Total
5. Average time HTTP session stays resident in memory (in minutes)

Average user visit time + HTTP session timeout

Line 2 + Line 3

7-min visits + 30 min timeout = 37 min
6. Number of HTTP sessions in memory

User arrival rate * Average time session in memory

Line 1 * Line 5

17.5 users/sec * 37 min = 38,850 max HTTP sessions in memory
7. HTTP session memory required

Number of user sessions in memory * HTTP session size

Line 6 * Line 4

38,850 HTTP sessions in memory * 2KB/session = 77.7MB

Given the number of users on the web site during the peak time, this is a very reasonable HTTP session footprint. In fact, the HTTP sessions for all these users could, depending on the web application footprint, fit inside a single JVM. Keep in mind that a small increase in the average HTTP session on this site means a significantly larger HTTP session footprint. If the session size grows to 10K (still not very much data), we'd need 385MB just for session data at peak usage. Your JVM might not have enough heapsize to handle this. Make a note to recheck the average HTTP session size with the TriMont development team before the testing begins.

The TriMont team's usage of HTTP sessions puts them in a great position regarding their memory footprint. Of course, not every customer manages their HTTP session footprint as well as TriMont apparently has. See our discussion in Chapter 2 for more details on techniques to reduce HTTP session pressure if you find yourself working with a web site with large HTTP sessions.

Test Scenarios

The customer gave you a lot of information about visitor usage patterns. You can use this information to develop a rough breakout of the test scenarios you need to test the web site.

Potential Test Scripts

We know of the following scripts and their relative weighting:

  • Browse (90%)

  • Check Order (3%)

  • Check Account (2%)

We also know of some special functions of the web site:

  • Purchase (5% of Browse traffic)

  • Boat Selector (15% of all customers)

  • River Conditions applet (10% of all traffic)

Notice our "bread and butter" functions of Browse, Check Order, and Check Account do not add up to 100% of the traffic. Maybe the TriMont team left some buffer for users of Boat Selector function and River Conditions applet who do not go further into the site to use other features such as Browse. You need to double-check this with the TriMont team (we don't want to miss a significant function).

Also, you might want to request a further breakdown of the Browse activity. Do most users take advantage of the Search functionality discussed earlier while browsing, or do most users browse by category rather than searching? Finally, will it make a significant difference in our performance test if the users search versus walking through a series of categories? Again, clarification on small details might make a significant difference in the success of the test.

Test Scenario Considerations

You need a very large shopping list. The catalog database contains one million items. Avoid reusing the same items repeatedly in your scripts, or the items you use repeatedly might be cached at the host database. This results in a lower response time for the test than you will see in the field. See Chapter 7 for details.

You also need to interact with a production account and order system. TriMont plans to exercise the site fully, including placing orders and checking on their status. However, you don't want to trigger delivery of merchandise to the test lab! You want to operate outside the order and delivery system, but keep the shopping experience as authentic as possible for the performance tests.

In these cases, you need to work closely with the TriMont team. As we discussed in Chapter 9, the customer has probably resolved this problem at some time in the past. TriMont may use a special set of account numbers to generate test orders. The order system recognizes these special accounts and does not reduce or ship inventory. Other options include building a special order database just for the test that bypasses the standard order and inventory system. (You also need a set of order numbers for the Check Order function. Again, the customer might generate a set of dummy orders just for this part of the test.)

Moving Ahead

The Java application server vendor probably provides some capacity planning information to customers. Now that you know more about the TriMont application, you should consult the vendor's web site for white papers to help you pick the server capacity the application needs. The test needs hardware. TriMont needs to pick a hardware vendor for their new site, if they haven't already. They also need to pick machines for testing that match the eventual production configuration.

Capacity planning at this stage is only an educated guess. The vendor's guidance probably narrows the range of possible servers. For example, the application might require something in an 8 “12 CPU configuration, but it definitely needs more than a 4 CPU server provides. However, the estimates give you a starting point for machine sizing, which you will confirm as part of the performance test. (See Chapter 15 for more details on determining machine capacity.)



Performance Analysis for Java Web Sites
Performance Analysis for Javaв„ў Websites
ISBN: 0201844540
EAN: 2147483647
Year: 2001
Pages: 126

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net