9.2 Throughput Requirement Pattern

Basic Details

Related patterns:	Scalability, response time, inter-system interface
Anticipated frequency:	One requirement in a straightforward case, up to three or more per system and per inter-system interface (for one or more interfaces)
Pattern classifications:	None

Applicability

Use the throughput requirement pattern to specify a rate at which the system-or a particular intersystem interface-must be able to perform some type of input or output processing.

Discussion

How fast can we throw things at our system? This is the type of question most commonly answered by throughput requirements. Less frequently: how fast must our system churn things out? This sounds easy enough to specify: just say how many whatsits per whenever. But unfortunately it's not as simple as all that-and it can be downright difficult to write satisfactory throughput requirements, for several reasons. First, how do we work out what rate we need? It involves predicting the future, which is never spot-on at the best of times but might be little more than guesswork if this is a new business venture or if we're building a product for other businesses to use. Second, if critical pieces-particularly the main hardware and, for an interface, the communication lines-are outside the scope of the system, how can we set meaningful throughput targets for the software alone?

Before agonizing about how to work out a throughput target, ask yourself: what's it for? Can we do without it? Don't specify throughput just for the sake of it. For most systems, being scalable is more important than achieving a fixed throughput figure. If we specify strict scalability requirements, we can often either avoid specifying throughput at all or we can specify a relatively modest throughput target, confident that if we need more, we can scale up (normally by adding more hardware). If you don't have a sound basis for determining target throughput, it's often better not to try, rather than putting forward impressive-looking figures that are meaningless and perhaps dangerously misleading.

These days, throughput is mainly an issue for a commercial system only if there are a large number of users, which means either it's open to the world (typically via the Internet) or it's used by a large organization. This requirement pattern assumes we're dealing with a high throughput system-because with available technology, anything else can be handled comfortably simply by buying more hardware and using better underlying products (such as a database). There's nothing to be gained by writing a requirement that the system must cope with at least two orders per day.

You can't specify throughput just by asking people or by thinking about it, scratching your head a bit, and then writing it down. There are several things to figure out, and you're probably going to have to do some calculations. Here's a suggested approach, which chops the problem into several more manageable pieces:

Step 1: Decide what to measure. Pick something that's fundamental to the system. For a retail system, it could be new orders (how many we must be able to receive in a given time, that is). One system could have several throughput requirements for different measures, but don't worry about secondary activities whose volumes depend largely on something you've already chosen; they're taken care of in Step 2.
Step 2: Work out other relative volumes (if necessary). Devise formulae for working out the relative volume of secondary activities of interest based on the thing whose throughput we're measuring (for example, how many order inquiries per order). In effect, this is a little model of relative volumes, which can form part of an overall sizing model.
Step 3: Choose indicative hardware set-up (if necessary). If hardware is outside the scope of the system, define a rough hardware set-up for which to specify a throughput target.
Step 4: Determine average throughput. Organizations think of projected business in terms of relatively long timeframes: per month, per week, or perhaps per day (relatively long, that is, from the point of view of a computer rated in billions of cycles per second). Begin throughput calculations by thinking in the same way as the business, which gives us an average throughput over a relatively long period of time.
Step 5: Determine peak throughput. The load on the system won't stay constant: a conveniently average throughput won't be delivered every minute or every second. How much will it vary? What's the busiest it will get? It's the answer to the last question that gives us our target throughput-because the system must cope with the peak load.

Each of these steps is described in more detail in its own section that follows. Steps 4 and 5 need to be performed for each distinct measure identified in Step 1.

Step 1: Decide What To Measure

For the main throughput target, pick the thing most important to the organization. For a business this means the one that makes the money, which isn't necessarily the one with the highest volume. In most systems it's the business transactions. (That's why for a retail system, we'd pick orders rather than inquiries.) If you have several common things, pick the one that happens most frequently. It's best to pick only one thing for which to set an overall throughput target. Step 2 takes the system's secondary throughputs into account.

There could be several different types of the thing on which you've decided to base throughput-several different types of business transactions, for example. In this case, either pick one (the most important or the most numerous one) and treat the others as secondary (and deal with them in Step 2) or estimate what percentage of the total each type represents. The final results are the same.

In addition to the system's main throughput target you can set a separate throughput target for each inter-system interface for which this factor is important. This makes sense only if there's no direct relationship between the system throughput and that of the interface in question; if every transaction is sent down the interface, it doesn't need its own target.

Distinguish between incoming and outgoing throughput. Usually, it's the incoming throughput that constitutes the load on the system; the system can send things out with much less effort (invoices, emails-no matter what they are). The exception is systems whose main purpose is producing something. One incoming transaction could generate one or more outgoing transaction. The net effect, in terms of communications bandwidth, could be more than the consideration of just the incoming transactions would indicate. Communication pipes, however, aren't the same as physical pipes: a heavy flow one way doesn't necessarily mean there's no room for anything to go the other way, and the capacity one way might differ from the capacity the other way.

Step 2: Work Out Other Relative Volumes

In Step 1 we identified what requests to base our throughput measuring on. But handling them isn't the only work the system has to do. Step 2 aims to get an idea of the load imposed by everything else. However, the results of this step don't feed into the throughput requirement itself. It serves two purposes: first, to gain a better understanding of the overall load on the system, and second, to supply useful information to whoever will decide what size hardware is needed. (It's not possible to size the hardware at requirements time.)

Draw up a list of the other everyday activities of the system (or, for an inter-system interface, the other things the interface handles): important inquiries, registration of customers, and so on. Then estimate how many of each of these there will be on average for each one of the things the throughput measures. For a Web retail site, we might estimate that product inquiries outnumber orders fifty to one, the number of new customers registering is a third of the number of orders, and there are two order inquiries for each order.

A spreadsheet is the most convenient tool to use; it lets us easily change the primary volume and recalculate all the others. If you've already created an overall sizing model, add these factors to it.

One extra factor that's often useful to add is the origin of the things we're measuring. Where do they come from? What owns or produces them? For example, the origin of business transactions might be customers. Estimating the rate at which a single origin entity creates such transactions can then form the basis for our throughput calculations, in a way that people find more natural. Asking how many orders an average customer will place per month is easier to picture than an absolute total number of orders in isolation (though wherever you start, you ought to reach the same results).

There is a slight danger that the developers will take trouble to make sure the primary transaction type is handled lightning fast, to satisfy the throughput requirement. This might leave everything else disproportionately-and perhaps unacceptably-slow. It's hard for the requirements to protect against this: you can hardly ban or complain about the efficient execution of anything.

Step 3: Choose Indicative Hardware Set-Up

If we're building a system for a particular organization, we have only its projected business volume to worry about, so we can specify target throughput independently of hardware. The hardware can be chosen later, when we've built the software and know how well it performs. In this case, bypass Step 3.

On the other hand, if we're specifying only the software for a system without knowing the power of the machines it will run on, we can't just throw up our hands and announce that it's impossible to specify throughput requirements. That would render even the most inefficient software acceptable (as far as the requirements are concerned). This dilemma is particularly important when building a product because different customers might have enormous variations in their business volumes. One answer is to devise an indicative hardware set-up (such-and-such machine with a so-and-so processor running this operating system and that database, and so on) and to state the throughput it must achieve.

A slightly different approach is to focus on one aspect of hardware performance-the machine's CPU cycle rate is the obvious one-and specify target throughput against it. For example, we could demand one business transaction for every 10 million CPU cycles (so a 2 GHz machine would handle 200 business transactions per second). This is a rather simplistic alternative. It doesn't take into account any of the other factors that affect throughput, and it forces you to deal in unfamiliar quantities. (Can you feel the CPU cycles go by?)

It's distasteful for the requirements process to address hardware at all, but we have no alternative if we must address performance in the absence of a concrete underlying environment. A car maker couldn't tell you the top speed of a planned new car if its engine size isn't known yet.

Step 4: Determine Average Throughput

Now it's time to approach the gurus who can foretell the future of the business. This is the domain of sales and marketing and senior management; no one else possesses such powers. Arrange a session with them to discuss and set down business volume projections. The goal of Step 4 is to determine the volume of business in terms of the time period the business feels most comfortable with (per year, quarter, month, week, or day)-and thus average throughput.

Give your business gurus free reign to express their estimates of business volumes however they wish, but intervene if they start talking in terms that aren't measurable. Doing Steps 1 and 2 beforehand-or at least preparing a first version of your sizing model-lets you demonstrate and tinker with it during the session. It's usually most natural to start by discussing volumes in terms of whatever comes most naturally-often numbers of customers rather than transactions, and then how many transactions each customer will make in a given time period. That is, take a step back from the thing you'll actually base the throughput target on.

For an established business (if we're replacing an existing system, say), target throughput can usually be set with a reasonable degree of reliability. For a new venture it's largely guesswork. Be alert to the eternal optimism of sales predictions. ("In five years' time, 50 percent of the world's population will be buying their whatever-they-are online, and we intend to have 90 percent of that market.") If that happens, bring the discussion down to earth by asking what volumes will be in the shorter term. It's far better to cater for smaller initial volumes and require the system to be scalable than gear up for starry-eyed exaggerations. This demonstrates that it's important to always associate a timeframe with every throughput target-indeed, every performance target of any kind. If possible, do so relative to when the system goes live, rather than an absolute date. It's perfectly acceptable to specify two targets for the same thing, covering different timeframes-either putting both in the same requirement or writing two separate requirements. The latter allows the targets to be assigned different priorities.

Other factors you might want to take into account include budget (how much high-power hardware can the organization afford?) and the potential damage to the business if it cannot cope with demand. Also, if the business is subject to seasonal variation, base the target throughput on the busiest season (or time of year) or special busy dates. For example, a system for a florist can expect to be most busy on Valentine's Day.

Step 5: Determine Peak Throughput

Assuming we have an average throughput (from Step 4), how do we turn that into a real, immediate, here-and-now throughput? What's the greatest load we must be ready for? In a sense, our system must be a marathoner, a middle-distance runner, and a sprinter all in one-and the peak throughput says how fast it must be able to sprint. The aim of Step 5 is to determine a short-term peak throughput based on the long-term average.

The rest of this section applies to incoming throughput. Outgoing throughput is easier to determine because we typically have a lot more control over when it happens (for example, producing invoices or sending emails). Outgoing throughput also tends to be less important, because it usually imposes less of a processing load.

What's the ideal unit time period for which to set peak throughput? A day and an hour are too long because they provide plenty of time to satisfy the target while still having long periods with little (or even no) throughput. A second is too short because it implies that the target throughput must be achieved every second, which leaves little room for even fleeting hiccups. What's the point of such a tight requirement if it's not possible for any user to notice if it wasn't achieved? Indeed, no one would probably notice if the system did nothing at all for a second. Let's not split hairs and debate funny time periods like five minutes or thirty seconds. Keeping to nice round numbers, the most convenient time period is therefore a minute. The rest of this section assumes we're calculating throughput for the peak minute. If you have sound business reasons for a different time period, then use it.

The extent to which peak throughput varies from the average depends on numerous factors according to the nature of the system. Common factors are

Factor 1: The system's availability window This means its normal hours of operation. For a company's internal system running from 9 to 5, a day's average throughput is crammed into eight hours. For an always-open Web system, it's spread over 24 hours.
Factor 2: Naturally popular times At what times of day is a typical user most likely to use the system (according to their local time zone)? If you're offering a service to businesses, it's likely to be busiest during working hours. If it's recreational, it'll probably be in the evening and at weekends.
Factor 3: Geographic distribution How widely spread are your users? Across different time zones? If your system is available worldwide, do you have a dominant region from which most of your business comes (such as North America)? This factor can lead to complex patterns of load through the day.
Factor 4: High activity triggers Do you have any situations that are unusually busy? Is there anything that could cause peak throughput to be much higher than the average? For example, if you're selling concert tickets online, you can expect to be deluged the moment tickets for a popular artist become available.

Build a model as sophisticated as you like or as simple as you can get away with to calculate the peak throughput. In addition to these factors there will also always be natural variations from minute to minute. A statistician would be able to work this out properly, but in the absence of one, resort to guessing. If you have no meaningful data at all, you must assume the peak throughput will be appreciably higher than the average, but not massively so. A factor of double might be a reasonable assumption of last resort.

Content

Once we've figured out a throughput target, a requirement for it needs to contain the following:

Throughput object type. State the sort of thing whose throughput is to be measured (such as new orders).
Target throughput quantity and unit time period (for example, 10 per second).
A statement about contingency (if you wish). In some circumstances, it's worth adding a contingency factor on top of the estimated throughput. (That factor is usually a semi-arbitrary percentage-say, 10 percent or 20 percent.) If you decide to do so, state the amount of contingency that's included in the target. Ordinarily you'd increase the contingency in line with your uncertainty, but that could prove expensive here (in extra hardware cost). If you include a contingency without saying so, the development team might add their own contingency as well, and no one will know what's going on: you could end up with an over-engineered system without realizing. If you don't include a contingency, say so, if there's a risk of anyone wondering.
Part of system (if relevant). A throughput requirement applies either to the system as a whole or just to a part (usually an inter-system interface). If this requirement is for a part, say which.
Justification. Where did the target figure come from? How was it calculated? What figures were used as the basis for the calculation? In only the simplest cases is a self-contained justification concise enough to fit within the requirement; otherwise, refer to a justification that resides elsewhere. Either include it as informal material in the specification or keep it externally. Referring to a sizing model is fine.

The justification might contain sensitive information that you don't want all readers of the requirements specification to see. If so, omit it from the specification. Consider omitting reference to it altogether if you don't want some readers feeling like second-class citizens.
Target achievement timeframe. How far into the system's life does the target need to be achieved? It might be immediately after it's installed, after a year, or at some distant time in the future ("eventually").
Indicative hardware description (if relevant), from Step 3 of the preceding approach.

Template(s)

Open table as spreadsheet

Summary	Definition
«Throughput type» rate	«Part of system» shall be able to handle «Throughput object type» transactions at a rate of at least «Throughput quantity» per «Unit time period» [when using «Indicative hardware set-up»]. [«Target achievement timeframe statement».] [«Contingency statement».] [«Justification statement».]

Summary

Definition

«Throughput type» rate

«Part of system» shall be able to handle «Throughput object type» transactions at a rate of at least «Throughput quantity» per «Unit time period» [when using «Indicative hardware set-up»]. [«Target achievement timeframe statement».]

[«Contingency statement».]

[«Justification statement».]

Example(s)

Open table as spreadsheet

Summary	Definition
Order entry rate	The initial system shall be able to handle the entry of orders by customers at a rate of at least 10 per second. No contingency has been added; this rate represents the actual demand expected. See the system sizing model for details of how this figure has been arrived at. It is located at «Sizing model location».

Summary

Definition

Order entry rate

The initial system shall be able to handle the entry of orders by customers at a rate of at least 10 per second.

No contingency has been added; this rate represents the actual demand expected. See the system sizing model for details of how this figure has been arrived at. It is located at «Sizing model location».

Extra Requirements

Verifying whether the system achieves a throughput requirement can be difficult and tedious if the system itself doesn't help, so features for measuring throughput are the first candidates for extra requirements. Then we can think about steps to maximize throughput and how we want the system to react when it reaches its throughput limits. Here are some topics to consider writing extra requirements for:

Monitoring throughput Monitoring can be divided into immediate and reflective: immediate tells us the throughput level right now; reflective provides statistics on throughput levels over an extended period, to highlight busy periods and throughput trends.
Limiting throughput We can't stop incoming traffic directly (or, at least, it's usually too drastic to), but we can consider restricting the causes of traffic-such as limiting the number of active users, perhaps by preventing users logging in if the number already logged in has reached the limit. This could be refined to let in some users but not others-registered customers but not casual visitors, for example. Another step could be to disable resource-intensive secondary functions at times of high load.
Maximizing throughput What steps can we take to squeeze the most through the system? One way is to "clear the decks" during times of peak throughput: arrange for some other processing to be done at other times. That depends on how much load is imposed by other processing. If it's not much, it's not worth bothering. Also consider insisting upon separate machines for background processing.
High throughput characteristics Computer systems, like all complex and temperamental creatures, can behave differently when pushed to their limits. The response time requirement pattern recommends putting caveats on that aspect of performance when the system is experiencing high load, but there might be others that you want to apply only when the throughput is within its stated limit.
Implementation sizing model It's sometimes useful to have a good sizing model to help determine the hardware needed to achieve a given throughput level, particularly if you're building a product. You can make this a requirement. State who will use this model: your customers or only representatives of your organization. A requirement of this kind effectively asks the development or testing team to extend any sizing model produced during the requirements process to take into account the software's actual performance.

Considerations for Development

Design to maximize the efficiency of high-volume transactions. For example, don't send information more than once. And keep interactions as simple as possible-don't use two request-response pairs when one would suffice.

Even if there is no requirement for throughput monitoring, it's useful to incorporate at least a rudimentary way of showing current throughput. Find out whether an automated throughput tester is going to be purchased or built for the testing team. If so, make sure it's available to the development team for their use, too.

Considerations for Testing

Attempting to manually make suitably large numbers of requests to a system is, in most cases, logistically impossible. To test throughput, you need an automated way to generate a high volume of requests. You might find a product to do this job, or you might have to build your own software for the purpose (in which case treat it as a serious development effort). Whichever way you go, a good automated throughput testing tool should let you do these things:

Define the requests to submit to the system (and the expected response to each one). The two basic ways are either to pregenerate large quantities of test data or to define rules by which test data can be generated on the fly.
Start submitting requests (and stop, when you've done enough).
Dynamically change the rate at which requests are submitted. This allows you simulate low, average, and heavy demand levels.
Monitor the response time of each request. This provides an external picture of how the system behaves.
Validate each response. This doesn't tell you about throughput per se, but being able to automatically check that large numbers of responses are as you expect is a valuable bonus.
Simulate the load on the system likely to be imposed by other activities, because it's not realistic to assume the system will be able to devote its full attention to one kind of request.
Generate reports on the system's performance. The accumulated response time data can be used to calculate throughput figures. It can also provide response time statistics: the shortest, average, and longest response times, and how response times vary with throughput.

The throughput that a system can handle doesn't vary proportionately with the power of its hardware, so it's hard to figure out just what testing using a hardware set-up different from that of the production environment tells you: extrapolations are likely to be difficult and unreliable. There are also many hardware factors that determine its overall power: the number and speed of CPUs, memory, disk drives, network bandwidth, and more. A sizing model helps, but it's still only a model and will have limited accuracy. Modify the sizing model based on observations from the real system.