The Need for Speed | Beyond Software Architecture[c] Creating and Sustaining Winning Solutions

An important aspect of usability is performance, both actual and perceived. Unfortunately, this is quite misunderstood. It is beyond the scope of this book to provide detailed advice on how to create high-performance systems. Instead, my focus will be on precisely defining performance- related terms so that the marketect and tarchitect can communicate clearly, and on showing what performance information the development team needs to provide the marketect.

Let's Be Clear on What We're Talking About

The terminology I use to describe performance is presented in Table 10-1. All of these terms are related to each other. To illustrate , let's say that you want to calculate results for some of these terms for one of your systems. Your first step is to specify, or "fix," one or more of your system configurations. This could mean specifying underlying hardware, amount of disk space, memory size, operating system, database, Web server, and any other applications required. As for configurations, consider the following:

Table 10-1. Performance Terminology

Term	Definition
Throughput	Number of bits/transactions per unit of time
Performance	Time per unit (inverse of throughput)
Latency	Wait time for response
Capacity	Number of users (entities) system can support in a given configuration at a fixed level of performance
Scalability	Ability to increase capacity by adding hardware
Reliability	The length of time a system can run without failing
Response time	Total perceived time it takes to process requests (an emotional, subjective qualitative rating, the "human" dimension of performance)

Below: below the minimum specified configurationjust for testing.
Minimum: the minimum specified configuration. This should be well below the average configuration in your target market (market configuration).
Average: slightly below the market configuration.
Ideal: the market configuration of early adopters and influential end users (demo).
Max: The best system you can (practically) assemble.

The next step is to identify the transaction or operation you want to test, complete with reference data. If you're going to test the performance of a spreadsheet recalculation, the same data must be available for every test run!

In this example, I'll assume that your system is server based, constructed in a manner similar to that described in Chapter 8. Once the server is up and running, you can program a test or driver program to simulate a request. Let's say that when you send a single request for a specific, common operation your system responds in .1 second. This is a base level of performance. If you configure your test driver to stimulate your system with up to 10 requests per second, and it runs without failing for 24 hours while maintaining the same performance, you also have some baseline data for reliability.

As you begin to increase transactions you'll probably notice more interesting things. You might expect the system to fail when the number of transactions exceeds 600 per minute (since 600 x 0.1 = 60 seconds). In reality, the system may not fail until the number of transaction exceeds, say, 700 per minute. This shows the difference between latency and throughput on the total system, because parts of various transactions are held in different components as they pass through the various layers .

Let's further assume that you can separate the transaction logic from the persistent store. This means that you should be able to increase capacity by adding more machines. You add more machines, but because you have a single, shared database all of these transaction servers are pointed at it. You might find that you can now handle 2,500 transactions per minute before some kind of failure, such as a completely dropped request. This is a measure of the scalability of your systemthat is, its ability to add more hardware to improve various performance factors. If you find that you can't improve beyond this, you know you've reached the database server's maximum. Improving this may require you to optimize the server in some special way, such as installing special disk-processing hardware, changing database vendors , or tuning or optimizing it by adding indices or profiling SQL queries.

Note that in this example the mechanism used to improve performance shifts as various limits are reached. This is common in complex systems. More memory might buy you better performanceto a point. Then you have to get a faster processor or perhaps faster disk hardware.

You might also find that your system does not exhibit linear performance. In fact, complex systems seldom exhibit linear performance curves. Most often, you will have linear responses for small areas of the various curves that are plotted from these data. Don't be surprised if your system " falls over" or " cliffs " when certain tolerances are exceeded. This is the reason for a scalable system (one that you can "throw hardware at" to increase capacity at an acceptable level of performance).

My example thus far has assumed that the driver program sends requests uniformly. This does not match reality, in which most non-batch-processing, server-based systems have bursts of activity. Since servers queue requests, it's possible to handle bursts that are significantly higher than the steady-state throughput limit without failure. For example, your original configuration may be able to handle 1,400 requests during the first minute as long as no requests come in during the second minute. Since different servers vary in the number of requests they can successfully queue, it's also a good idea to know the queue limit. That will give you the maximum burst that can be handled by the server, which is another factor in reliability.

The system's perceived response time may not actually match the performance just described. Performance is typically measured at a layer below the user interface, because the user interface layers can add any number of additional processing steps or transformations that are considered a realistic part of performance. Moreover, response time may be more unpredictable. Still, response time is important, and having a good response time matters because it is associated with our perception of the system and our ability to use it for useful work. Subsecond response times are often required to create the perception that the system is responding to our needs instantaneously. Response times of less than 5 to 10 seconds are needed to enable the average user to maintain an uninterrupted flow of thought. Users may lose focus if the response time is greater than 10 seconds.

Reliability also deals with the " graceful " handling of failure. If a server crashes when its peak load is exceeded, it's less reliable than a server that sends back an error code in response to the messages it can't handle or that simply refuses to allow a user to log in.

Note that it is rare for a system to process only one kind of transaction or operation. Since most systems support a variety of operations, each taking different amounts of time and resources, the ideal way to measure performance is to define a user model that helps you understand the effects of an average user on your system. Some authors recommend a stochastic model of system usage, but this is only appropriate when requests are truly random. Most systems do not have random distribution of requests, so a random model can give misleading data. A better source for good user-model data is the log files associated with the operation of your system in a production environment.

In addition to the complexities of testing the system, you should also learn to be conservative in your performance estimates. Real-world conditions can vary quite a bit from lab conditions, causing potentially misleading results.

What a Marketect Really Wants with Respect to Performance

It is tempting to think that what a marketect really wants is the fastest possible system. This is most obviously true: Of course, everyone wants that. A good engineer hates waste, and poor performance is wasteful . Also, poor performance is harmful to customers, as it forces them to purchase unnecessary and unwanted equipment and/or to limit growth. Still, the fastest possible system is not what a marketect really wants.

What a marketect really wants is a way to confidently, reliably, and above all accurately answer performance-related questions. That is more important than simply building the fastest possible system because this is what is needed to create a winning solution.

To illustrate what I mean, here are sample questions that I've received from customers regarding various performance attributes. Each of these questions was asked relative to a specific, customer-defined system configuration (hardware, storage, network, and so forth).

How many simultaneous users/connections/sessions will the system support? What is the load on various components?
How many transactions can be processed with my current hardware [where the hardware is precisely detailed]?

The biggest reason that marketects need good answers to such questions is that most of the time customers don't have raw performance queries. Instead, they come with some basic understanding of their needs and their environment and ask for help in creating the required infrastructure support. The marketect needs some way to respond. To get a sense for what I mean, consider the following questions, abstracted from real customer requests.

We anticipate 200 active and up to 500 casual users. What hardware configuration do you recommend to support these users?
We estimate that our system will need to support 800 downloads per hour during the new product release. How should we structure our Web infrastructure to support these downloads?
We estimate that our system will initially handle 2,500,000 requests annually, with a projected 25 percent growth rate. Make certain your hardware estimates account for a three-year period and that your design provides for at least 98.5 percent (or 99% or 99.9%) overall system availability.
If the database size were to grow faster than anticipated, what impact would there be on the system? Where would possible bottlenecks occur, and how would you scale the proposed system to handle them?

Questions such as these form the foundation for a long and complex sales process. Answering them well usually requires additional information from the customer, but ultimately, the marketect must have the necessary data to provide accurate answers.

One of the most effective ways to communicate this information is through case studies or whitepapers, with an appendix outlining additional performance scenarios. Another method, suitable for complex performance scenarios, is a program that estimates performance under various conditions.

Any performance data published by the marketect must be recalculated for each new release or when advances in hardware motivate changes in recommended system configurations. This last point is especially important. As consumers, we have been collectively conditioned by the hardware industry to expect that performance will continually improve. It doesn't matter if we're an end user installing old software on a new computer, or a Fortune 2000 enterprise installing the latest release of their CRM system on a new multiprocessor server. Whenever we invest in new hardware we expect that performance will improve.

Of course, we also expect that new releases of existing software will exhibit better performance on existing hardware, even if new features are added. This can be challenging, but reducing performance can seriously lower customer satisfaction. Performance always matters.

Responding to the User

One of the timeless pieces of advice from the usability literature is that, unless the system can respond truly instantaneously, you're going to need some feedback mechanism. In general, there is a strong correlation between good user feedback and problem/application/system complexity. What has worked well for me is to first broadly characterize the feedback you can provide. Table 10-2 lists the two most important categories.

Percent-done progress are your best choice when you need to provide continuous feedback to the user. They reassure the user that the system is processing their request and ideally , they allow him to cancel a task if it takes too long. The very best progress indicators also provide a reasonably accurate estimate of the total amount of time needed to complete the task. Be careful of your estimatestarchitects are notoriously bad at estimating how long something will take, and if you estimate wrong you're going to choose the wrong kind of feedback.

Table 10-2. Feedback Mechanisms

Feedback

Examples

Immediate: task is expected to take less than 12 seconds

Visual changes to common objects, such as briefly changing the cursor to a "mail has arrived" icon when your mail client has downloaded a new e-mail message in the background

Auditory responses, such as a beep (use sparingly and put under user control) Information messages or other changes displayed in status bars

Continuous: task is expected to take more than 2 seconds

General animation in an animation "loop;" appropriate when you don't know how long a task will take, such as the spinning globe that appears in Microsoft Internet Explorer when it is loading a Web page

Percent-done progress indicators that estimate time or effort to complete a task. These can be shown in a status bar (if the task cannot be canceled) or in a separate dialog (if the task can be canceled )

Feedback Eliminates Unnecessary Work

One client/server system I worked on didn't provide users with enough feedback about their requests. As a result, users would submit the same request over and over, at times overloading the server. The problem went away once we implemented appropriate feedback mechanisms, illustrating the main point of this discussion: Response time and feedback are qualitative, emotional perceptions of system performance. You are far better off creating the right emotional perception of acceptable response before making a large investment to provide cold facts and figures about system throughput.

Performance and Tarchitectural Impact

The authors of Software Architecture in Practice [Bass 98] note that the performance factors just described are affected by both architectural and nonarchitectural choices. Architectural choices include the allocation of functions among various components, the manner in which these components interoperate and are operationally deployed, the management of state, and the management of persistent data. Different architectural choices affect different factors, and the needs of the overall system must be considered as choices are made. For example, converting a stateful architecture to a stateless one may increase latency but dramatically improve scalability.

Nonarchitectural choices include the algorithms for key operations within a single component and any number of technology idioms related to the specific implementation. Examples are implementing a more efficient sort algorithm or slightly restructuring a database to improve query performance

Managing performance factors is a complex topic that is beyond the scope of this book. That said, there are some basic tools and tricks that every tarchitect should have when considering them.

Throw Hardware at the Problem

In many of the systems I worked on, the easiest way to solve a performance issue was to throw some hardware at itsometimes a bit more memory, sometimes an extra CPU rather than a faster one. The specific ways that you can use hardware are nearly endless, so architect your system to take advantage of the ones that matter most to your customers.

Of course, there is a fine balance. Too many engineers believe that throwing hardware at the problem justifies sloppy development practices. I vividly remember a conversation with an engineer whom I ultimately fired because of his poor coding. He worked on a system made up of CGIs written in C++. His coding style produced massive memory leaks, which he claimed were acceptable because all we had to do was purchase additional memory to "cover up" his mistakes. There is no justification for such sloppy development, no matter how much hardware you have at your disposal!

Finally, hardware will only work if you understand the "scalability" of your tarchitecture .

Use Large-Grained Transactions

Performance is typically enhanced in distributed systems when the transactions between components are fairly large.

Understand the Effects of Threading on Performance

Multi-CPU systems are pretty much the standard for servers. UNIX-based operating systems, such as Solaris, have proven that they can scale to dozens of processors. You should understand how multiple CPUs affect the performance of your architecture. It may not be what you think. In one application we relied on a rather old search engine, thinking that adding processors would help performance. It didn't, because the search engine wasn't multithreaded.

Use a Profiler

Performance can only be reliably increased if you know what is too slow. One technique is to use a profiler. Be forewarned, however, that a profiler can only identify and improve situations involving nonarchitectural bottlenecks. When it reveals an architectural problem, it often means that you either live with itor rewrite.

Another technique is to run your program on a small number of data elements and then extrapolate the results to a larger number. This can tell you fairly quickly if you're headed in the wrong direction.

Handle Normal Operations and Failure Conditions Separately

This is a variant of advice I first read in Butler Lampson's excellent paper Hints for Computer System Design [Lampson 1984]. In general, normal operations should be fast. Failure conditions, which presumably happen much less frequently, should be handled appropriately. There is usually very little motivation for quick recovery.

Cache Results

In its simplest form, a cache saves some previously computed result so that it can be reused. Caches can have an enormous impact on performance in an extraordinary number of circumstances, from operating systems to companies that improve performance factors on the Internet, such as latency, by caching Web pages. If you use a cache, make sure you understand when and how a cached result should be recomputed. Failure to do so inevitably means that your results will be incorrect. The ramifications of this vary considerably by application, but in any case you should know what can go wrong if you are using poor results. Lampson refers to wrong cached results as a hint. Hints, like caches, are surprisingly useful in improving performance.

Make sure that your architecture has a programmatic ability to turn caching on and off on-the fly so that you can test its impact. Cache problems are among the most insidious to find and fix.

Perform Work in the Background

In one system I worked on, one of the most frequent customers requests was to issue print commands as a background task. They were right. We should have designed this in right from the start. There are many processes that can be handled as background tasks . Find them, for doing so will improve usability.

Design Self-Service Operations

One of most striking examples of improving efficiency in noncomputer systems is self-service. From ATMs to pay at the pump gas, we consumers have many opportunities to serve our own needs. Although not necessarily their intent, self-service operations can improve any number of performance parameters. I've found that this concept also helps in tarchitectural design. For example, by letting client components choose how they process results, I've found ways to dramatically simplify client/server systems. Consider an application that enables the user to download prepackaged data into a spreadsheet and then utilize any number of built-in graphing tools to manipulate the results. Of course, you may have legitimate concerns regarding the distribution of data, but the performance benefits of self-service designs cannot be ignored.

Learn the Idioms of Your Implementation Platform

Your implementation platformlanguage, operating system, database, and so forthall have a wide variety of idioms for using them efficiently . Generally techniques such as passing by reference in C++ or preventing the creation of unnecessary objects in Java, can have a surprisingly large impact on performance. I am not advocating that you simply design for performance. Instead, I'm merely saying that one of the ways to improve overall performance is to make certain you're using your implementation technologies sensibly. The only way to do this is to thoroughly understand your platform.

Reduce Work

Perhaps this advice is a bit trite, but it is surprising how often an application or system performs unnecessary work. In languages such as Java, C++, and Smalltalk, unnecessary work often takes the form of creating too many objects. In persistent storage, it often means having the application perform one or more operations sequentially when restructuring the data or the operations would enable the database to do the work or would enable the application to produce the same result through batching . A related technique is to use stored procedures or database triggers. The non-database world provides such examples as precomputed values or lazy initialization.