As illustrated in Figure 10.4, defining the necessary level of performance is a critical first step in the validation process. After the requirements have been defined, a set of tests for measuring performance can be identified. These tests should be conducted at various points during the development process to ensure that performance is within striking distance of requirements. As the application approaches completion, performance can be validated in the context of a test environment that resembles the one in which the application will ultimately be deployed. If the tests indicate that performance requirements are not being met, a series of controlled experiments should be conducted to locate performance bottlenecks. The bottlenecks can then be removed and testing can be repeated until the requirements are met.
Figure 10.4 The performance validation process
Performance requirements should be defined up front, before development and debugging begins. To define a good performance requirement, project constraints must be identified, services to be performed by the application must be determined, and the load on the application must be specified. This information can then be used to select appropriate metrics and determine specific performance goals that must be achieved.
Some aspects of a project cannot be changed to improve performance. There may be constraints on the schedule or on the choice of development tools or technologies. For example, the application might need to be deployed by a certain date to meet contractual obligations. The development team might have Visual Basic expertise but no expertise in Visual C++, making it impractical to develop components using Visual C++. Hardware constraints might be a factor, particularly for user workstations. Whatever the constraints are, they must be documented. These are the factors to be held constant during performance tuning. If satisfactory performance cannot be achieved within these constraints, they may have to be revisited by management or the customer.
The aspects of the project that are not constrained can be modified during the tuning process to see whether performance can be improved. For example, can components be implemented in a different language? Can different data access technologies be used? Are transactions really needed? Can computers be added to the application topology? These questions can help identify ways to remove bottlenecks in the system.
Applications typically provide one or more services that correspond to use cases and usage scenarios. Usually each scenario can be described as a set of transactions. Even if transactions are not involved, a sequence of interactions with the user takes place for each scenario. The semantics of each performance-sensitive scenario (what the user does and what the application service does in response) should be defined precisely, including how the service accesses databases and other system services. These definitions drive the tests that measure performance.
In addition to defining which services should be measured, how often these services are used must be specified. Accurate estimates of the usage of various application services helps create tests that closely mimic the expected usage of the system, improving the accuracy of performance test results.
A common way to measure the load on an application is to identify the number of clients that will use the application. A related measure is think time, which is the elapsed time between receiving a reply to one request and submitting the next request. For example, if it takes about 60 seconds for a user to enter all the information required for a Web-based time-entry form, 60 seconds is the think time for the Time Entry scenario.
Load variance over time must be considered. For some applications, the load remains fairly constant, while for other applications, the load may vary. For example, a payment processing application will have a heavier load during the week when payments are due. An insurance claims application will have a heavier load when a natural disaster, such as a hurricane or tornado, occurs. A help desk application will have a heavier load in the month following the release of a software upgrade. Information about how the load varies over time can be used to determine the peak and average system loads. Performance requirements can then be based on either or both of these measures.
When constraints, services, and load have been identified, the specific performance goals, or requirements, for the application need to be defined. The first step is to select the specific metrics to be measured. Common metrics include:
After the appropriate metrics have been selected, the required values for those metrics must be specified. These values should be realistic measures of the necessary performance of the application—"as fast as possible" is almost never a good metric. A simple way to determine the TPS requirement is to divide the number of clients by the think time. For example, if on average an application needs to support 1200 simultaneous clients with a 60-second think time, a value of 20 TPS (1200/60) can be specified as the average load. Response-time measures should take user expectations into account. For example, suppose that after users submit a form, they should wait no longer than 5 seconds before they can assume that the application is not working correctly. The response time requirement would then be specified as 95 percent response within 5 seconds over a 28.8 KB modem connection (the lowest possible denominator).
After specific performance requirements have been identified, testing can begin to determine whether the application meets those requirements. It is important to eliminate as many variables as possible from the tests. For example, bugs in the code can create the appearance of a performance problem. To accurately compare the results from different performance test passes, the application must be working correctly. It is especially important to retest application functionality if modifications have been made to the implementation of a component of the application as part of the tuning process. The application must pass its functional tests before its performance is tested. In addition to application changes, unexpected changes can occur in hardware, network traffic, software configuration, system services, and so on. Both types of change must be controlled.
To correctly tune performance, accurate and complete records of each test pass must be maintained. Records should include:
These records not only help determine whether performance goals have been met, but also help identify the potential causes of performance problems down the road.
Exactly the same set of performance tests should be run during each test pass; otherwise, it is not possible to discern whether any difference in results is due to changes in the tests rather than to changes in the application. Automating as much of the performance test set as possible helps eliminate operator differences.
During performance testing, values for the metrics specified in the performance goals are measured and recorded. Think time, transaction mix, and any other performance metrics must also be met. Within these constraints, the testing should be as realistic as possible. For example, the application should be tested to determine how it performs when many clients are accessing it simultaneously. Multiple clients can be simulated in a reproducible manner using a multi-threaded test application, in which each thread represents one client. If the application accesses a database, the database should contain a realistic number of records, and the test should use random (but valid) values for data entry. If the test database is too small, the effects of caching in the database server will yield unrealistic test results. The results might also be unrealistic if data is entered or accessed in unrealistic ways. For example, it's unlikely that new data would be created in alphabetical order on the primary key.
The MTS Performance Toolkit provides sample test harnesses that can be used as models for building automated test harnesses for applications. These sample test harnesses demonstrate how to collect TPS and response-time metrics, as well as how to simulate multiple clients using multiple threads. Usually, test harnesses need to accept user-specified input parameters, such as the transaction mix, think time, number of clients, and so on. However, the rules for creating realistic random data will probably be encoded within the test harness itself.
After a test harness has been created to drive the application, all the invariant conditions for running the tests should be documented. At the very least, these conditions should include the input parameters required to run the test harness. How to set up a "clean" database for running the test—that is, a database that does not contain changes made by a previous test pass—should also be documented, as well as the computer configurations used for the test. Usually, the test harness should be run on a separate computer from the MTS application, because this setup more closely approximates a production environment.
After performance goals have been defined and performance tests have been developed, the tests should be run once to establish a baseline. The more closely the certification environment resembles the production environment, the greater the likelihood that the application will perform acceptably after deployment. Therefore, it's important to have a realistic certification environment right from the beginning.
With luck, the baseline performance will meet performance goals, and the application won't need any tuning. More likely, the baseline performance will not be satisfactory. However, documenting the initial test environment and the baseline results provides a solid foundation for tuning efforts.
If performance requirements are not met after the application is scaled out (as discussed in Chapter 2), or if scaling out is not an option, data from the test results should be analyzed to identify bottlenecks in the system and form a hypo-thesis about their cause. Sometimes the test data is not sufficient to form a hypothesis, and additional tests must be run using other performance-monitoring tools to isolate the cause of the bottleneck. Commonly used tools for monitoring the performance of MTS-based applications include the following:
Figure 10.5 The Performance tab of the Task Manager dialog box
Figure 10.6 MTS Explorer Transaction Statistics pane
Figure 10.7 Microsoft Windows Performance Monitor
Visual Studio 6.0 provides extensive documentation on using the Visual Studio Analyzer.
Although MTS does not currently provide any performance counters per se, counters for devices such as memory, disks, and the CPU can be used to identify many bottlenecks. System applications such as SQL Server also provide performance counters that can help identify bottlenecks.
The most common performance problems in MTS applications are due to insufficient RAM, insufficient processor capacity, disk access bottlenecks, and database hotspots. Table 10.1 describes a set of performance counters that can be used to identify these common bottlenecks.
Table 10.1 Performance counters for identifying common bottlenecks
|Performance counter||Description||Common bottleneck|
|Memory: Page Faults/Second||Number of page faults in the processor||Sustained page fault rates over 5/sec indicate that the system has insufficient RAM.|
|Physical Disk: % Disk Time||Percentage of elapsed time selected disk drive is busy servicing read or write requests||Percentages over 85%, in conjunction with Average Disk Queue Length over 2, might indicate diskbottlenecks, if insufficient RAM is not causing the disk activity.|
|Physical Disk: Average Disk Queue Length||Average number of read and write requests queued up during the sampling interval||Queue lengths over 2, in conjunction with % Disk Time over 85%, might indicate disk bottlenecks, if insufficient RAM is not causing the disk activity.|
|System: % Total Processor Time||Percentage of time processors are busy doing useful work||Percentages consistently over 80% indicate CPU bottlenecks.|
|System: Processor Queue Length||Instantaneous count of the number of threads queued up waiting for processor cycles||Queue lengths greater than 2 generally indicate processor congestion.|
|SQL Server: Cache Hit Ratio||Percentage of time that SQL Server finds data in its cache||Percentages less than 80% indicate that insufficient RAM has been allocated to SQL Server.|
|SQL Server-Locks: Total Blocking Locks||Number of locks blocking other processes||High counts can indicate database hot spots.|
Many other performance counters are available. For additional information, see the Windows NT Performance Monitor and the Microsoft Windows NT Workstation 4.0 Resource Kit.
After data has been collected using various performance monitoring tools, any bottlenecks should be pinpointed and possible causes identified. Solutions need to be devised and implemented based on hypotheses about the causes of the bottlenecks. Sometimes this process is easy, but often the performance data does not give a clear indication of how the problem might be fixed. In this case, experiments must be conducted that change one aspect of the application or test environment at a time so that the impact of the change on performance can be observed. If the change has no impact or makes performance worse, that change must be undone and another solution tried.
Developers who have experience with performance tuning begin to see common problems and solutions. The performance group on the Microsoft COM team has identified several common bottlenecks that are commonly seen in MTS-based applications. These bottlenecks and some of the experiments done to identify them are described in the MTS Performance Toolkit. In this section, we look at some of the more common bottlenecks and how to work around them.
Both the communication protocol and the login credentials used for SQL Server connections can be tuned to provide better performance.
The default client protocol for SQL Server is the named pipe communication protocol. However, better performance and higher scalability can be achieved using TCP/IP as the client protocol. To use this protocol, TCP/IP Sockets must first be enabled in SQL Server using the SQL Server Setup program. Then for each system that runs components that access SQL Server, the SQL Client Configuration Utility can be used to specify TCP/IP Sockets as the default network on the Net Library tab. SQL Server must be stopped and restarted before the changes will take effect.
System Administrator Login
When the system administrator login is used to access SQL Server, the master database is written to in every transaction. However, the application probably does not use the master database. To avoid the overhead of accessing the master database, a specific login can be created for the application, with the database accessed by the application as the default database. This login can then be used for all data source names (DSNs) or connection strings in the application.
Accessing data is an expensive process, and ways can almost always be found to improve data access performance. This section describes a few of the data access "gotchas" to look out for.
File DSNs provide an easy way for developers to define the database they need to access. However, File DSNs have very poor performance because the system must continually open and read the .dsn file to determine the database connection parameters. Substantially better performance can be achieved using the following alternatives:
ADO and OLE DB Performance
Early versions of ADO and OLE DB scale poorly in multi-threaded clients that use connection pooling, particularly on multiple-processor computers. This shortcoming can greatly affect the performance of MTS components and ASP pages. Microsoft Data Access Components (MDAC) 2.0 and later versions contain fixes for this problem, so these versions should be used if possible. Other possible workarounds are:
Late-Binding to Data Access Components
As we've seen in earlier chapters, late binding to COM components is inherently slower than vtable binding or early binding, because late binding must make multiple calls through the IDispatch interface for each method call in the client. In addition, packaging method call parameters into the data structures required by the IDispatch Invoke method has a cost in terms of performance. If supported by the development tool being used, early binding or vtable binding improves application performance.
MS DTC Log Device
The MS DTC writes log records for every transaction. If the log is stored on a slow hard drive, it can create a bottleneck. The performance of an application that uses transactions can easily be improved by configuring the MS DTC to store its log on a dedicated, fast drive with a high-speed controller.
Multiple MS DTCs
By default, each server running SQL Server and MTS uses a local MS DTC. The overhead of communicating between all the MS DTCs can have an impact on an application's performance. Configuring the system to use a single MS DTC reduces this bottleneck. A remote MS DTC can be set up by stopping the MS DTC service, removing the MS DTC service from the local computer, and then using the MS DTC Control Panel applet to point to the computer on which the MS DTC is running.
The MTS Performance Toolkit provides a more complete list of potential bottlenecks. We'll look at just two more examples: accessing the system registry and dynamic memory allocation.
Accessing the System Registry
Reading the system registry is an expensive process. If the registry is used to store configuration information for the application, components and applications should read the information only once. Components can then store the information in the Shared Property Manager (SPM) so that it is readily accessible to all objects in the process. Applications can store the relevant information in local variables.
Dynamic Memory Allocation
Dynamic memory allocation is another expensive process that should be eliminated if possible. Allocation is especially expensive when heaps are created and destroyed. Microsoft Visual Basic 5.0 is particularly troublesome because it releases and re-creates project heap space even when the heap space can be reused.