TCA In Five Steps | Performance Testing Microsoft .NET Web Applications

TCA In Five Steps

There are five distinct steps to performing a TCA. The first critical step is to define your Web application s usage profile. Why is the usage profile so critical? Because the estimations that you assign to site operations in this step will ultimately weight your site s hardware resource costs, greatly influencing the final site capacity estimations. If you put garbage in you ll only get garbage out, so invest the time to do as thorough a user profile analysis as possible. The second step is to actually execute discrete stress tests on the scenarios defined in the user profile to identify server resource costs at maximum load. The third step is to make a set of calculations to quantify these costs. We have provided a Microsoft Excel spreadsheet that will help streamline these calculations. The TCA spreadsheet is included on this book s companion CD. The fourth step in this process is to make another set of calculations that will result in the actual number of estimated concurrent users. This calculation can also be made using the aforementioned spreadsheet. The fifth and final step is to run verification tests on your site to confirm the capacity numbers that you generated through the TCA model. Figure 9-1 illustrates the five step TCA methodology.

figure 11-1 the five step tca methodology

Figure 9-1. The Five Step TCA Methodology

Step 1 Create A User Profile

The first step is to create a user profile from existing production traffic data where available. This can be done by viewing historical data from your Web server log files (through various log parsers) or by analyzing database activity. If production traffic data does not exist, you can estimate user traffic loads by researching traffic patterns on similar sites. You will use the user profile to weight specific site transaction costs based on the ratio of specific operations to overall traffic. Our IBuySpy Web application browsing products transaction accounts for 61 percent of the traffic.

A good source for site traffic information is production IIS logs. However, analyzing gigabytes of IIS log data can be extremely time consuming, so automate this process wherever possible. There are many log file analyzers available, so find and use the one that suits your needs. In selecting the sampling of logs, it is better to use a set of logs covering as long a period of time as possible (at least a week s worth of IIS logs if they are available) to obtain realistic averages. The goal is to include as large a population of production traffic data as possible to generate more reliable usage profile weightings.

TIP
When creating your profile, exclude traffic that extraordinarily inflates page view statistics. Ensure that page views are successful 200 code return requests and not just error pages represented as 200 code returns.

In some cases, to ensure that your Web application can meet the capacity needs regardless of seasonal peaks, you may have to use IIS logs for key peak traffic periods (for example, holiday shopping seasons). This will help your TCA estimates reflect your Web application s worst case traffic scenario. If historical production data is not available because your site is new, you can make an educated guess by reviewing publicly available statistics for similar sites including page views, unique users, and demographic data about their user base.

Identifying one profile that best represents your site traffic is just as critical as defining the probability of site usage distributions to be included in the what if scenarios. For example, after analyzing the IIS logs, we find that the browse operation has a normal distribution with a mean of 60 percent and a standard deviation of 10 percent. This means that determining the operational costs for 95 percent of the possible browse distributions would require you to calculate browse costs for two standard deviations or from a 40 percent browse weighting on the low end to an 80 percent browse weighting on the high end. Looking at the specific operational weightings as probability distributions rather than discrete data points provides a statistical basis for performing your site traffic what if scenarios mentioned in the beginning of this chapter. This allows you to easily model hardware capacity estimates across a range of probable traffic patterns.

Table 9-2 shows the user profile we created for our IBuySpy TCA:

Table 9-2. IBuySpy TCA User Profile
User Operations	Mean Hit Ratio (User Profile)	Standard Deviation
Basic Search	14.14 percent	2 percent
Browse for Product	61.49 percent	10 percent
Add to Basket	10.47 percent	1.5 percent
Login & Checkout	7 percent	.5 percent
Registration & Checkout	6.90 percent	.5 percent
Totals	100 percent

Because this is only a sample site, we did not have production data from which to estimate our user profile. As an alternative, we based the ecommerce IBuySpy sample site user profile estimates on production data from the shop.microsoft.com e-commerce Web application. You will run into the same situation if you lack production data to base your user profile estimates on.

Another important aspect of setting up a user profile is determining the rate at which transactions occur. To illustrate, we estimated an average session length for our IBuySpy user to be 10 minutes. You use these transaction rates for stress script sleep times in the verification stress tests (Step 5) that verify the estimated capacity produced by the TCA model.

Step 2 Stress Test for User Operation Costs

After you have created the user profile, the next step is to create the transaction performance data needed to measure each user operation resource cost. You can do this by creating a stress script to exercise each of the identified shopper operations. The goal is to identify server resource costs such as CPU utilization at maximum throughput load. Microsoft ACT, a load generation/simulation tool discussed in Chapter 3, can be used for this task and is the one we used for this TCA. A set of sample ACT scripts that we utilized is included on this book s companion CD.

TCA Stress Test Goals and Parameters

Running a script exclusively for an individual operation loads the IIS server with as many requests as possible in order to achieve maximum ASP requests per second for that operation. Maximum ASP throughput occurs right before you observe a decline in ASP requests per second, as indicated in Performance Monitor at a higher load level. You also need to ensure that operational latency does not suddenly increase. We recommend keeping your average ASP latency under 2 seconds. You can calculate operational latency using the following formula:

Average ASP latency = (Avg. ASP execution time + Avg. ASP wait time)

Before each stress test, it s good practice to clear server level caches to restore the system to a consistent baseline for performance data collection. Each test should run for a duration sufficient to reach steady-state. This state is reached when resource consumption related to test start up are completed, network connections are established, server caches are appropriately populated, and periodic behavior such as batch jobs or production back up processes are occurring. For the IBuySpy TCA, we ran the stress tests for 10 minutes after allowing for a minute or two of test start up time. The resource measurements, such as CPU utilization and ASP requests per second, should be averaged over the length of the steady state test.

When you notice a decline in ASP request per second throughput as more load is applied, it could be due to context switching, excessive memory paging, disk I/O, network saturation or a stress client bottleneck. For example, in our IBuySpy TCA we monitored context switches per second to ensure they did not exceed 15,000. A context switch occurs when a thread no longer runs because it is blocked waiting for a logical or physical resource, or the thread puts itself to sleep. Symptoms of high context switching can include lower throughput coupled with high CPU utilization, which begins to occur at switching levels of 15,000 or higher. Record maximum ASP requests per second before your application demonstrates signs of performance degradation by closely observing these relevant performance counters and ensuring that your client is not bottlenecking overall throughput. More details on how to use Performance Monitor and important counters to monitor are presented in Chapter 4.

Identifying Maximum Throughput

A TCA approach to capacity planning can be used to measure any hardware resource costs, such as memory, disk I/O, or CPU utilization. For our IBuySpy TCA we are measuring CPU utilization as the limiting hardware resource. Transaction costs should be recorded, using Performance Monitor, at the point where maximum ASP requests per second is reached. See Table 9-1 for Windows DNA versus .NET counters. Increasing the number of users executing the script should increase the ASP requests per second throughput. When a site has been properly developed and tuned for performance, ASP requests per second and CPU utilization grow as the load increases. If adding more load results in lower ASP requests per second, you know that the load level needs to be decreased.

TIP
Gradually increase the amount of load on your Web application while monitoring ASP requests per second. If you observe that an incremental increase in load applied does not subsequently result in higher ASP requests per second, ensure that your client isn t the throughput bottleneck. If your client is bottlenecking, reduce the load/client ratio and add more clients, to ensure your measuring the server resource cost at the true ASP request per second maximum. In addition, monitor ASP queuing on the server. If the queue exceeds a value of 1 then you have blocking. Reduce the load level.

If the number of ASP requests per second continues to grow until CPU utilization reaches 100 percent, the number of ASP requests per second at that point is the maximum. We do not recommend pushing CPU utilization to 100 percent, because this can introduce additional latency for your system and inaccuracies in TCA estimates. For IBuySpy, our maximum allowable CPU utilization was 90 percent.

Minimizing the Number of Pages Needed per Operation

When executing the operational cost stress tests it is important to reduce the number of pages required to successfully execute a particular operation. When you run a stress script to measure the Add to Cart costs including product browse pages, a portion of the cost you record from that test will be attributable to both Browse operations as well as the Add to Cart operations. The goal is to focus as narrowly as possible on the server resource cost produced by the operation in question. Narrowing the number of pages required to execute an operation down to the absolute minimum helps achieve that goal. This may require that you include parameters for dynamic pages such as shopper IDs or product IDs in your stress scripts to bypass pages that a user would normally be required to hit first.

Step 3 Calculate the Cost per User Operation

The stress tests you performed in Step 2 resulted in a set of resource costs measured in CPU cycles for each user operation. Our IBuySpy IIS server was configured with two processors with clock speeds of 1000 MHz per processor. The CPU utilization transaction costs were recorded when maximum ASP requests per second was achieved for the operation in question, while at the same time keeping the average operational latency (ASP execution time + ASP wait time) below 2 seconds, ASP queuing below 1, context switching per second below 15,000, and CPU utilization below a 90 percent average. The performance metrics we observed to verify that our test was achieving maximum ASP request per second are detailed in Table 9-3. These metrics are not used in the cost per operation calculations but are critical to monitor to ensure you re achieving the true maximum ASP requests per second. The results from our stress tests and subsequent cost per operations are detailed in Table 9-4.

Table 9-3. IBuySpy Cost per Operation Test Performance Monitor Metrics
User Ops	Execution Time (ms)	Wait Time (ms)	Avg. ASP Latency	Context Switches/Sec
Basic Search	62	56	118	8188
Browse for Product	1012	3.8	1015.8	4415
Add to Cart	350	5	355	3262
Login & Checkout	12.5	.88	13.38	5287
Register & Checkout	600	130	730	8355

Table 9-4. IBuySpy Cost/Operation
User Ops	CPU Util @ Max ASP Req/Sec	Max ASP Req/Sec	Cost (Mcycles)	Cost/Operation (Mcycles)
Browse for Product	58.20 percent	193.3	1164	6.02172
Basic Search	84 percent	349.8	1680	4.80722
Add to Cart	89.50 percent	98.3	1790	18.20956
Register & Checkout	76.20 percent	307.3	1524	4.95932
Login & Checkout	89.00 percent	279.9	1780	6.35941

The initial result of your TCA stress tests are the CPU costs per operation (last column in Table 9-4) calculated on the basis of megacycles (Mcycles) or 1 Mhz. For our IBuySpy TCA, we used a 1000-Mhz dual processor, which has a total capacity of 2000 Mcycles. Using the maximum number of ASP requests per second, you can calculate the costs per operation as follows:

Cost per Operation = CPU utilization * number of CPUs * speed of CPU (in MHz) / ASP requests per second

To illustrate, when we stressed our IBuySpy browse operation, we achieved a maximum of 349.8 ASP requests per second with a CPU utilization of 84 percent, the cost per ASP page is then 84 percent * 2 * 1000 / 349.8 = 4.80274 Mcycles. Before you calculate the cost of an operation, you need to know the number of ASP pages involved in that operation. Checkout operations typically involve several ASP pages (personal information page, credit card page, shipping page, confirmation page, and so on). You calculate the cost of a shopper operation by normalizing the number of ASP pages involved.

TIP
Some ASP pages, such as redirects, are not visually seen by the clients since they contain server-side logic and redirect to a continuing page. You must also account for these types of redirect pages in your cost per operation calculations.

Cost Per User Operation/Sec Calculation

The behavior of user activity against a Web application can be random. But, over time Web usage statistically evens out to average behavior. The user profile you created in Step 1 should reflect your Web application s average user behavior. We calculated the hit ratio means as well as standard deviations to describe user activity for site traffic what if scenarios. In this fashion, you can model the effect that different site traffic distributions have on site capacity estimates, one of TCA s principle benefits. We calculated the number of shopper operations during the course of 10 minutes, and following that, the number of user operations per second. We need to know the number of operations per second because the total operation cost number is expressed in terms of clock speed per second (1000 MHz CPU speed is 1000 Mcycles per second). The cost per user operation per second for IBuySpy using the mean hit ratio distribution is detailed in Table 9-5:

Table 9-5. IBuySpy Cost Per User Operation/Sec (Mcycles)
Ops	Mean Hit Ratio (User Profile)	Min ASP Pages	Norm User Profile	User Profile Ops*	User Profile Ops/Sec	Cost/Op	Cost/Sec
Browse for Products	61.49 percent	1	.61	2.18919	.00365	4.80	.01752
Basic Search	14.14 percent	2	.28	1.00684	.00168	6.02	.01010
Add to Basket	10.47 percent	3	.31	1.11827	.00186	18.21	.03394
Registration & Checkout	6.90 percent	13	.90	3.19353	.00532	6.36	.03385
Login & Checkout	7 percent	10	.70	2.49217	.00415	4.96	.02060
TOTAL	100 percent	N/A	2.81	10.00	N/A		.11601

Interpreting the Cost per User Operation Numbers

The key number in Table 9-5 is the Total Operational Cost/Sec number (.11601 Mcycles per user), which represents the total cost of the mean user profile. This number reflects the cost of an average user executing operations in the manner described by our mean user profile. This number will be used to estimate site capacity in Step 4 of the TCA methodology. Table 9-5 not only indicates the total cost of an average user but also provides clues regarding where to begin optimizing operations for better performance which will lead to improved site scalability. According to Table 9-5, the Add to Basket and Registration & Checkout operations have the highest cost per user, .03394 and .03385 respectively. In the case of the Add to Basket operation, the high cost per user operation per second partially results from a relatively high cost per operation of 18.21 M cycles. This indicates that the development team should focus on reducing server CPU resource costs needed to complete this operation in order to increase overall site scalability.

In the case of the Registration & Checkout operation, the high cost per user operation per second results from a relatively high number of ASP pages (that is, 13) needed to complete this operation. The high number of pages means the operation s cost (6.36 Mcycles) needs to be weighted higher than the other operational costs. Development must focus on reducing the cost of this operation or reducing the number of pages needed to complete Registration & Checkout. Either approach will bring the cost per user per second for Registration & Checkout down, thereby increasing scalability.

Performing TCA What If Scenarios

At this stage in the methodology, we can perform the what if scenarios we discussed earlier. If you want to determine the impact of increasing the number of users performing the Add to Basket operation on your site s overall transaction cost per second and ultimately on your site s capacity, simply adjust your user profile by 2 standard deviations or more as needed.

In Step 1, we calculated the standard deviation for the Add to Basket operation at 1.5 percent. Therefore a 2 standard deviation increase in this operation s hit ratio means Add to Basket now represents 10.47 percent + (2*1.5 percent) = 13.47 percent of our Web application s traffic. When we increases the Add to Basket weighting, we must correspondingly decrease weightings in other operations. For the purpose of this example, we assumed that users increasing their Add to Basket activity were finding what they were looking for and subsequently browsing for fewer IBuySpy products. Hence, we reduced the browse hit ratio by 3 percent and correspondingly increased the Add to Basket operation hit ratio. The total effect on the cost for increasing the weight of our highest cost operation is detailed below in Table 9-6:

Table 9-6. IBuySpy Cost Per User Operation/Sec (Mcycles)
Ops	Mean Hit Ratio (User Profile)	Min ASP Pages	Norm User Profile*	User Profile Ops	User Profile Ops/Sec	Cost/Op	Cost/ User/ Sec
Browse for Products	58.49 percent	1	.58	2.03883	.00340	4.80	.01632
Basic Search	14.14 percent	2	.28	1.00684	.00168	6.02	.0101
Add to Basket	13.47 percent	3	.31	1.40860	.00235	18.21	.04275
Registration & Checkout	6.90 percent	13	.90	3.19353	.00532	6.36	.03385
Login & Checkout	7 percent	10	.70	2.49217	.00415	4.96	.02060
TOTAL	100 percent	N/A	2.81	10	N/A		.12227

As detailed in Table 9-6, increasing the hit ratio for the Add to Basket scenario has resulted in a slight increase in our overall cost per user operation per second from .11601 Mcycles to .12227 Mcycles. At first glance this does not sound like much, but notice how much this seemingly small site traffic change affects our overall site capacity in Step 4.

Step 4 Estimate Site Capacity

Step 4 of the TCA process calculates hardware resource costs (in our case CPU utilization) relative to concurrent users levels. You can start by building a table similar to Table 9-7. In this table, we have taken the cost per user operation per second value that we calculated in Step 3 and multiplied it by the number of concurrent users we anticipate accessing our Web application.

Mean User Profile Capacity Estimates

When calculating maximum capacity, assign an allowable maximum server resource (i.e. CPU, Disk, or Memory) level. In this case we are using CPU utilization as the finite server resource. As we mentioned in Step 2, it is not a good idea to run your server at 100 percent CPU utilization. In order to estimate site capacity, we will assume an 85 percent CPU utilization as the maximum for a 2 processor IIS server. This leaves us with a maximum available CPU resource of 2000Mcycles * 85 percent = 1700 M cycles. Table 9-7 illustrates the range of concurrent user capacity for our 2 processor 1000MHz IIS server in terms of concurrent users:

Table 9-7 shows us the maximum capacity of our IBuySpy IIS Server is 14,654 users, with a cost of 1700.01 Mcycles. This is the maximum limit, since the two processors with a theoretical maximum of 85 percent provide the site with a budget of 1700 Mcycles of capacity. In order to attain more capacity, given existing hardware, we will need to focus on reducing server CPU costs for those higher cost operations, which were Add to Basket and Register & Checkout.

Table 9-7. IBuySpy Capacity Estimates
Concurrent Users	Cost
14300	1658.94
14400	1670.54
14500	1682.15
14600	1693.75
14654	1700.01 (Maximum Available CPU Resources)
14700	1705.35
14800	1716.95

Increased Add to Basket Traffic What If Scenario Capacity Estimates

In Step 3 we illustrated how increasing the average users activity for the Add to Basket operation by two standard deviations affected our overall cost per user operation per second. The result was an increase in cost from .11601 Mcycles to .12227 Mcycles. How much does this small 3 percent change in traffic affect the overall concurrent user capacity for IBuySpy? Table 9-8 illustrates the capacity estimates with the higher average user cost:

Table 9-8. IBuySpy Site Capacity Estimates
Concurrent Users	Cost
13500	1650.65
13600	1662.87
13700	1675.10
13800	1687.33
13903	1700.04 (Maximum Available CPU Resources)
14000	1711.78
14100	1724.01

As illustrated above, increasing the average user s Add to Basket operation activity by two standard deviations or 3 percent has resulted in a drop in our IIS servers maximum concurrent user level from 14,654 to 13,903 users. This is a significant drop in our server s maximum concurrent user ability from only a small shift in the allocation of average user activity, and it only took a minor adjustment in our model to obtain this data.

The main point of this what if scenario is to illustrate the time saving value of TCA modeling. Now that we have completed our TCA model, we can reallocate site user traffic in an infinite number of distributions and immediately determine the probable effect on our overall site capacity. When you compare the time it takes to complete one TCA to the time it takes to run discrete stress tests to produce the same capacity data, you can appreciate the value of a TCA. The TCA provides the ability to predict site capacity for an infinite number of site traffic scenarios.

Step 5 Verify Site Capacity

The last step in the TCA process is to verify site capacity by running stress tests that reflect the user profile properties, including traffic distribution and session length. You then compare the resource utilization costs you obtain in these tests against those predicted by the TCA model. The actual values for tested resource utilizations and the TCA model resource predictions should fall within an acceptable margin of error. The goal in this step is to confirm the TCA model site capacity predictions. It is not necessary to run a verification test at every load level. After all, this would defeat the purpose of the TCA in predicting resource usage at various concurrent user levels. Rather, a single test will allow you to gain a desired level of confidence in the TCA model predictions.

To confirm our IBuySpy TCA model, we ran five verification tests across a wide range of concurrent user levels. The results for our IBuySpy verification tests were obtained using the same stress test scripts we created in Step 2. However, all scripts were run simultaneously and distributed as defined in our user profile. Additionally, since our model used a 10-minute user session length, our verification scripts all contained 10-minute sleep times. This means that each script takes 10 minutes to complete the IBuySpy operations, just as the average user would. Refer to Chapter 3 for details on sleep times in stress scripts. Figure 9-2 illustrates our IBuySpy verification test result.

When analyzing the verification test results we are checking to make sure that the percent difference in megacycles predicted and megacycles obtained does not exceed 10 percent. In addition, we also verify that our ASP scripts are executing within our 2 second maximum latency parameters. According to the results in Table 9-9, we see that for each of the test runs we fall well within our 10-percent margin of error. When running verification tests remember to keep an eye out for average ASP latency of greater than 2 seconds. As indicated in Table 9-9, ASP latency was not a limiting throughput factor during our verification tests.

figure 11-2 verification test results

Figure 9-2. Verification test results

Our verification tests confirm the costs predicted by the TCA model and increase our confidence in the TCA estimate s accuracy. We can now use the TCA to predict hardware resource requirements for various traffic distribution scenarios.

Table 9-9. IBuySpy Verification
Concurrent Users	Mcycles Predicted	Mcycles obtained	ASP Wait Time (ms)	ASP Execution Time (ms)	ASP Latency (ms)
100	11.60	11.54	.238	4.063	4.301
200	23.2	23.8	.769	5.094	5.863
1000	116.01	120.19	1.9	6.09	7.99
10000	1160	1147	27.203	52.441	79.644
14653	1699	1661	39.98	72.47	111.45