Indy: A Performance Technology Infrastructure

In this chapter, we ll use examples from a Microsoft performance modeling project called Indy to compare the results available through TCA (from the previous chapter) and those available through more advanced forms of performance modeling. Indy is designed to tackle the problem of performance engineering by creating a performance technology infrastructure.

Performance modeling is a multidimensional space, with different modeling tools having different requirements for level of detail, modeling technique, and target audience. No one tool or static application can meet all these needs. However, a toolkit approach allows construction of specialized tools from a basic infrastructure combined with customized components. This approach also permits an infinite range of complexity in those tools and models. Thus, a simple question can be answered quickly, or a critical system component can be tested in great detail.

Indy Concepts

Indy uses a simulation-based approach to performance modeling with analytical shortcuts to improve its performance. That is, it uses internal models of each of the major devices of the system being modeled, and simulates how these devices behave and interact. After a simulated run, the performance of each device can be examined in detail.

Indy comes with a predefined library of device models. Each model can in turn have sub-models. Thus, the Indy model of a server farm might consist of various server models connected by a network model. The server models can in turn contain models of their CPUs, disks, and network interfaces. Instances of these hardware models can then be arranged in a system topology that matches the hardware configuration of the real server farm.

Given a model of the system configuration, we also need to model the input load that it will experience. The input load is referred to as its workload. For example, the Indy model of an e-commerce site might have a workload defined in terms of how often it receives requests for the various pages and actions on the site.

Finally, we must model how the various components in the system will react to the workload: that is, we must define the behavior of the system. Indy provides a range of ways to define this behavior, but here we will concentrate on an XML-based scripting language called Transaction Modeling Language (TML). A TML script defines the transactions that a Web site will support in terms of their component actions (such as computation, disk operations, or network traffic), and on what devices these actions will run.

Indy Architecture

Figure 10-1 illustrates the basic architecture of Indy and its components.

figure 12-1 indy architecture

Figure 10-1. Indy architecture

Kernel

At the heart of Indy is the kernel, which interacts with and controls the other components via well-defined APIs. The kernel must be present in all tools produced with the Indy toolkit. It includes the central evaluation engine that is used to produce simulation results. As noted above, the current Indy kernel uses an event-based evaluation engine that combines direct simulation with some hybrid shortcut techniques to improve performance. However, for other purposes a different evaluation engine could be used: for example, one that uses analytical or statistical modeling techniques.

Hardware DLLs

The hardware DLLs implement the models of the individual system devices, such as CPUs and disks. A library of models is available to choose from, and additional models can be easily added. Multiple models might be available for a particular device, differing in the level of detail they go into to model performance. A more detailed model can give more accurate results and allow more performance effects to be considered, but may in turn require more information at run time and may take longer to simulate.

Workload DLLs

The workload DLLs are responsible for defining and injecting events into the kernel representing the workload for a particular simulated run. As shown, a variety of workload DLLs can be used. In this chapter we describe the use of a workload DLL that interprets TML scripts to create a workload. Alternate workload DLLs could be used to interpret UML diagrams or produce a customized workload for a specialized tool.

Interface Definitions

Information about the configuration of other components is stored by the kernel in a metadirectory. For example, different versions of the same basic disk can use the same hardware model, but with different performance characteristics. Similarly, performance characteristics of a workload, such as how often a transaction occurs or how much network traffic it causes, can be varied without having to recode the workload DLL or TML script.

Front-end EXEs

The front-end executable combines the kernel, hardware, and workload DLLs, and the metadirectory with an appropriate user interface for the final tool to use. In a production environment we might choose to export data about a simulated run to an Excel spreadsheet or a SQL database. However, in this chapter we will concentrate on a graphical interface called IndyView. It is intended to be used by performance engineers who require more detailed access to information about an Indy simulation. You will see samples of its output later this chapter.

IndyView

In this section we will see some elements of the IndyView interface used to examine a performance model of the IBuySpy sample Web site. We will also see the underlying XML code used to represent various aspects of the model.

System Topology

Consider a simple configuration of the IBuySpy sample Web site, as shown in Figure 10-2:

figure 12-2 physical topology for a typical ibuyspy sample web site

Figure 10-2. Physical topology for a typical IBuySpy sample Web site

The system consists of an IIS server and a SQL server (which together make up the IBuySpy sample Web site), connected to one another over the LAN. The IIS server is also connected to the Internet, and processes requests from remote clients via the Internet.

Figure 10-3 is an example of how this simple topology for the IBuySpy Web site can be represented using the IndyView interface.

figure 12-3 indy topology for the ibuyspy sample web site

Figure 10-3. Indy topology for the IBuySpy sample Web site

Examining this in more detail, the IIS server (iissrv) includes Network Interface Card (NIC) devices and CPU devices. Devices that are functionally identical can be represented in multiples, as the CPUs are shown here. The SQL server (sqlsrv) includes NIC, CPU and disk devices. Although it is obvious that the real IIS server also has a disk, its performance is not relevant to the model being tested, and so it has not been included.

On the sqlsrv device, below the CPU x 2 heading, there are two more devices: a CPU performance counter, and the CpuModel:Pentium 1000MHz device, which defines the behavior of this particular type of CPU, including how quickly it can process calculations. Similarly, below the Disk x 1 heading, you find both a disk performance counter and a DiskModel: HP NetRAID-4M device, which has its access speed and other hardware performance factors predefined.

For the purposes of the performance being measured here, the client s hardware configuration does not need to be detailed. Therefore, the client device is just a black box, referenced in the transactions as the requester or recipient of data.

The net and lan devices, located at the same hierarchical level as the two servers, represent the properties of the network connections for the Internet and LAN segments respectively.

Finally, the links between each of the computer devices and the network devices are detailed, including which interface on each computer connects to which network device. This is so that the output of the test results will distinguish between the different devices activities at any point in the transaction.

The following XML code underlies the diagram in Figure 10-3:

<?xml version="1.0" encoding="utf-8"?> <system name="IBuySpy"> <active_device type="computer" name="iissrv" count="1"> <active_device type="generic" name="lan_nic_send" count="1"/> <active_device type="generic" name="lan_nic_recv" count="1"/> <active_device type="generic" name="net_nic_send" count="1"/> <active_device type="generic" name="net_nic_recv" count="1"/> <active_device type="cpu" name="cpu" count="2"> <rct name="cpu"/> <use_template name="CpuModel:Pentium 1000MHz"/> </active_device> </active_device> <active_device type="computer" name="sqlsrv" count="1"> <active_device type="generic" name="lan_nic_send" count="1"/> <active_device type="generic" name="lan_nic_recv" count="1"/> <active_device type="cpu" name="cpu" count="2"> <rct name="cpu"/> <use_template name="CpuModel:Pentium 1000MHz"/> </active_device> <active_device type="generic" name="disk" count="1"> <rct name="disk"/> <use_template name="DiskModel:HP NetRAID-4M"/> </active_device> </active_device> <open_device name="client"/> <passive_device type="network" name="net" ports="100"> <use_template name="NetModel2:OptimumCapacity"/> </passive_device> <passive_device type="network" name="lan" ports="100"> <use_template name="LanModel:Ethernet"/> </passive_device> <link active="client" passive="net" fromport="0" toport="0"/> <link active="iissrv[?].2" passive="net" fromport="0" toport="99"/> <link active="iissrv[?].3" passive="net" fromport="0" toport="99"/> <link active="iissrv[?].0" passive="lan" fromport="0" toport="99"/> <link active="iissrv[?].1" passive="lan" fromport="0" toport="99"/> <link active="sqlsrv[?].0" passive="lan" fromport="0" toport="99"/> <link active="sqlsrv[?].1" passive="lan" fromport="0" toport="99"/> </system>

The devices referenced in the TML code are defined in the metadirectory of hardware configurations. By changing the underlying properties of one of the referenced devices, this same script could be used to test different architectural options. Similarly, the number of devices can be changed to examine the performance impact of factors such as number of CPUs in a server.

IBuySpy Search Transaction

Having constructed our topology, we can now define the transactions that it will support. Here we see a simple example of a transaction written in TML to simulate the request and processing of a search page on IBuySpy.

<tml> ... <!-- BasicSearch Request the .aspx and then the two gifs --> <transaction name="BasicSearch" frequency="BasicSearchFreq"> <include name="ChooseClientSpeed" /> <action name="net_msg_sync_async" connection="net" service="Client" saveschedule="clientstate"> <param name="linkspeed" value="transaction.ClientSpeed" /> <param name="msgsize" value="HttpRequestSize*3" /> <peer name="target" service="IIS" saveschedule="iisstate" /> </action> <action name="compute" service="IIS" useserver="iisstate"> <param name="cpuops" value="BasicSearchCpu" /> </action> <action name="net_msg_async_sync" connection="net" service="IIS" useserver="iisstate"> <param name="linkspeed" value="transaction.ClientSpeed" /> <param name="msgsize" value="BasicSearchSize" /> <!-- just the .aspx page size --> <peer name="target" service="Client" useserver="clientstate" /> </action> <fork> <branch> <action name="net_msg_sync_async" connection="net" service="Client" saveschedule="clientstate"> <param name="linkspeed" value="transaction.ClientSpeed" /> <param name="msgsize" value="HttpRequestSize" /> <peer name="target" service="IIS" saveschedule="iisstate" /> </action> <action name="net_msg_async_sync" connection="net" service="IIS" useserver="iisstate"> <param name="linkspeed" value="transaction.ClientSpeed" /> <param name="msgsize" value="0.04" /> <!-- 1x1.gif --> <peer name="target" service="Client" useserver="clientstate" /> </action> </branch> <branch> <action name="net_msg_sync_async" connection="net" service="Client" saveschedule="clientstate"> <param name="linkspeed" value="transaction.ClientSpeed" /> <param name="msgsize" value="HttpRequestSize" /> <peer name="target" service="IIS" saveschedule="iisstate" /> </action> <action name="net_msg_async_sync" connection="net" service="IIS" useserver="iisstate"> <param name="linkspeed" value="transaction.ClientSpeed" /> <param name="msgsize" value="1.52" /> <!-- thumbs/image.gif --> <peer name="target" service="Client" useserver="clientstate" /> </action> </branch> </fork> </transaction> ... </tml>

The transaction definition begins with a name and a relative frequency with which the transaction occurs. Then the actions within the transaction are listed in the order in which they occur. In this example script, the individual actions (which each begin with action name and end with /action) are:

net_msg_sync_async: Send an HTTP request message from the client service (representing all of the possible client machines on the Internet) to the IIS service over the Internet, using variable parameters for the link speed and message size. This message is sent synchronously (that is, the client waits for a response), but is received asynchronously (the server can handle many simultaneous requests).
compute: Process the HTTP request on the IIS server with the variable parameter of how many CPU operations are required.
net_msg_async_sync: Send a message back from the IIS service to the client service over the Internet, using variable parameters of link speed and message size. In the comments, we see that the value BasicSearchSize is just the ASPX page size, meaning that the variable BasicSearchSize has been previously defined as a workload parameter that contains the network size of the ASPX file. This variable can then be easily modified from within IndyView.
At this point, the script dictates a fork into two branches, which will be executed simultaneously:

The first branch, containing the actions net_msg_sync_async and net_msg_async_sync, make up the request and response for a GIF file (referred to in the comment as 1x1.gif) with a size of 0.04 KB.

The second branch, containing the actions net_msg_sync_async and net_msg_async_sync, make up the request and response for a GIF file (referred to in the comment as thumbs/image.gif) with a size of 1.52 KB.

Hard coding parameter values in this way results in a script that will require editing if any of the values changed. For parameters whose values a user might want to change frequently, it makes more sense to use a workload variable, as with the ASPX page size and CPU cost.

We can drill down to another level of detail to see what information is embedded in one of the service definitions in the script:

<service name="IIS"> <serverlist> <server name="iissrv" /> </serverlist> <actionscheduling> <schedule action="compute" policy="roundrobin"> <target device="cpu" /> </schedule> <schedule action="net_msg_async_sync" connection="net" policy="random"> <target device="nic_send" /> </schedule> </actionscheduling> </service>

This tells us that for the service IIS, the device iissrv (defined in the system topology) is to be used. Actions can be scheduled on iissrv s sub-devices. In this script, when the action compute is required by a transaction, the target sub-device is one of the two CPUs, chosen using a round-robin policy. When traffic must be sent out to the Internet (the net device in the system topology script), the target device is the NIC dedicated to sending.

When processed by the Indy kernel API using the device definitions in the system topology, IndyView can produce a number of different visual representations of the transaction as a whole, or can focus on specific devices and how they are affected throughout the flow of the transaction. Figures 10-4, 10-5, 10-6, and 10-7 were all produced using a sample model of the IBuySpy sample Web site.

figure 12-4 control flow of basic search transaction

Figure 10-4. Control flow of Basic Search Transaction

After we have defined a transaction in TML, we can use IndyView to visualize it with a transaction flow diagram, as shown in Figure 10-4. This simple view can be used to inspect and debug the TML, and would typically be used by a performance engineer in the development stage of a model.

This search flow diagram shows the order and dependencies of each action in the transaction. The sequence of actions include: request from client to Web server, computation on the Web server, response from the Web server to the client, and then two image fetches in parallel being received by the client. Clicking on one of the actions in the flow diagram opens a window that provides more information. In this figure, one is a message from server to client (background) and the other is the computation on an iissrv CPU (foreground).

figure 12-5 time-space analysis

Figure 10-5. Time-space analysis

Figure 10-5 shows the time and resource requirements of events taking place on all system devices. This diagram can be used by a performance engineer to visualize the level of utilization of individual devices and determine possible performance problems by simple inspection of detailed events. Time goes left to right, and each line represents a device (identified in the device list on the left side of the screen), while boxes represent events. The color map, which corresponds to the type of event, is shown on the diagram below the toolbar. Lines in the window connect communication events where both partners in the communication are visible in the window. Black circles represent communication events in which one partner of the communication is not visible. Since clients are not shown in this view, communications with them are displayed this way. The device numbers on the left are derived from the topology script. The most utilized resources in the diagram are the IIS server CPUs (iissrv[0000].0004-0005) and the SQL disk (sqlsrv[0000].0004).

Rather than looking at all of the actions taking place on the entire system, we can use the Transaction Analysis view of IndyView to examine how long each of the individual actions in a particular instance of a transaction take, and what resources they require, as shown in Figure 10-6.

figure 12-6 search transaction analysis

Figure 10-6. Search transaction analysis

The top half of the screen shows the control flow of a search transaction, stretched to represent the actual time of taken by each action. In addition, the panel to the left shows the start time and name of each transaction in the simulated run, allowing each of them to be examined individually.

In the lower half, the events that take place during the selected transaction are highlighted in green. The panel to the left of this section shows just the devices involved in this particular transaction: iissrv[0000].0002 is the sending activity on the NIC that the first IIS server is connected to the Internet with; iis srv[0000].0003 is the receiving activity on the same; and iissrv[0000].0004 is the first of the CPUs.

A performance engineer can use the previous views to construct and debug a performance model. Then, additional IndyView screens can be used to evaluate and analyze the performance impact of various scenarios. Figure 10-7 shows a diagram similar to that produced by the Windows monitoring tool System Monitor.

This screen shows the predicted CPU utilization for the SQL server (black line), displayed simultaneously with the utilization of the backbone network (highlighted blue line, averaging around 5 percent). For more information on performance counters, refer to Chapter 4.

figure 12-7 performance counter prediction

Figure 10-7. Performance counter prediction

figure 12-8 predicted queue lengths

Figure 10-8. Predicted queue lengths

IndyView includes a statistics engine that allows users to examine any performance metric of the system being modeled, using either a built-in graphing tool or by exporting data to an Excel spreadsheet or a database. The two graphs in Figure 10-8 show the predicted average queue size for different event types during a sample run of IBuySpy on a particular system topology. The top graph shows event queues on the IIS server while the bottom graph shows event queues on the SQL server. Looking at the scales on the graphs, it is clear the system bottleneck of the system is the IIS processor since the average computation queue size is 21.4. By comparison, very little queuing is taking place on its NICs. The SQL server also has very small average queue sizes.

A performance engineer would typically use this view to predict possible methods of improving overall system performance. By changing the system topology script to include more processors or a set of load-balanced IIS servers, improvements in performance would immediately become visible. In addition, since the overall model takes into account the actual behavior of the other devices in the system, improving the CPU capacity of the IIS server in the model would then show the next possible bottleneck in the system s performance.

TCA vs. Performance Modeling Conclusions

In Chapter 9, we used verification tests to confirm the costs predicted by the TCA model (see Figure 9-9). For purposes of comparison, we used Indy to define a performance model of IBuySpy s concurrent user capacity, using the numbers we obtained from TCA as event costs. The results are shown here side by side:

Table 10-1. Comparing TCA and Indy Predictions
Concurrent Users	TCA Predicted Mcycles	Indy Predicted Mcycles	Measured Mcycles
1000	116.0	115.8	120.2
10000	1160.0	1166.0	1147.0
14653	1699.0	1694.0	1661.0

For this simple model, Indy accurately tracks the results of both TCA and the measurements. This shows that two completely different performance-modeling techniques, namely the analytical model of TCA, and the hybrid simulation approach of Indy, can accurately model the same system. We will now explore the areas in which Indy can extend upon the capabilities of TCA.

Building What-if Scenarios Using Indy

As we discussed earlier, one of the major advantages of performance modeling is the ability to configure each minute element of the overall client/server interaction, in order to test different scenarios before a particular configuration or code architecture is chosen. In the following examples, two key performance issues bottleneck analysis and architectural evaluation are evaluated using Indy.

figure 12-9 bottleneck analysis

Figure 10-9. Bottleneck analysis

What-if Scenario 1: Bottleneck Analysis

Figure 10-9 shows an example of the type of bottleneck analysis possible with Indy. The graph shows the predicted performance of an e-commerce site as we change the number of Web servers. The site is being stress-tested to show the maximum achievable throughput for purchase transactions. As we would expect, increasing the number of Web servers increases the total throughput of the system in terms of purchase transactions per second. However, we reach a plateau at seven Web servers: beyond this point, adding extra Web servers does not increase the throughput of the system. When we use Indy to look at the simulated queuing delays in each of the active components, we see that the SQL server has reached saturation point. After this point the system throughput will remain the same until we increase the number of SQL servers or their performance.

Given this conclusion, further tests using this existing set of transactions could be performed to determine how much of an improvement hardware changes on the SQL servers might provide, or how many more SQL servers could feasibly be added before other elements like network performance were affected.

What-if Scenario 2: Architectural Improvements

Another feature of Indy is the ability to model how architectural changes will affect the performance of a system. For example, imagine we are running IBuySpy on an e-commerce site with only two old 450-MHz CPU Web servers with a static load-balancer. For the standard user mix we have used in this chapter, we can use Indy to determine a maximum throughput of 46.8 transactions per second. Christmas is coming, so we decide to add a third server to the mix. This is a more modern Web server with a 1-GHz CPU. Despite more than doubling the total CPU horsepower of our Web servers, Indy predicts that they will only support a maximum throughput of 53.5 transactions per second. The problem is that we are still using round-robin load balancing, so that only one-third of our transactions are benefiting from the faster CPU. If we change to using a dynamic load balancing technique that takes account of relative server load, Indy predicts our throughput will increase to 73.4 transactions per second. This type of modeling of dissimilar server types, combined with the dynamic runtime behavior of a load-balancing system, would be impossible in TCA.