15.2 Relative capacity and capacity planning

Capacity planning is the process of estimating the computer resources necessary to meet IT needs, not just for now, but also in the future when the business expands.

With a Linux-on-the-mainframe environment, you will want to do capacity planning to ensure that new business projects do not get ahead of implementation capabilities. If you are looking to use Linux on the mainframe to introduce IT changes faster for your business than you have in the past, you may need to enhance the frequency with which you do capacity planning. A benefit from the Linux-on-the-mainframe environment is that the raw data you need to do capacity planning can be captured by z/VM and used as input to your (or your consultant's) capacity planning model.

For example, StoreCompany has a new workload that it is adding to its system. It asks, "What will happen to the production system if OaK goes well and we get 500 hits a day and 75 purchases? Is it possible to get by with just adding more disks? Or is more memory or more CPU needed?" Acceptable response times are a key measurement that StoreCompany considers for its Web application. What if the regular catalog response time went from 0.5 seconds to 0.75 seconds? This is the kind of information that will help StoreCompany justify buying additional capacity sooner than planned.

Capacity planning is often done through modeling. The models help in identifying the key system interactions that impact throughput and capacity. As with any model, there are a number of variables that you can manipulate. With z/VM it is quite easy to obtain the input data, even for the most detailed models.

15.2.1 What makes mainframe performance different?

Real throughput in a system depends on many factors; the primary ones are processor power (processor design and clock speed), scheduling capability, and internal bandwidth for memory and I/O. Therefore, the capacity of two systems can be effectively compared only by taking all these factors into account.

If you ask a performance expert, "What is performance?" you might get the answer, "It depends." The mainframe approach to performance is, in a sense, based on efficient use of its resources. The mainframe aims to make sure there is the right balance of resources (CPU, memory, and I/O) needed for the type of work typically done by businesses.

In a balanced system, all computer resources (CPU, storage, I/O, operating system) work together without workloads causing a bottleneck or conflict for any one resource. Such a system accomplishes more than just being simply busy. It provides service to all its users within the required response time and meets throughput objectives, even when the business grows or when unexpected load conditions arise. A balanced system optimizes system capacity and achieves the best performance and throughput in accordance with the goals that were set (for example, throughput, response times, number of users).

Figure 15-4 and Figure 15-5 show a graphical representation of the balance concept for a given workload. In the figures, the CPU's axis is related to CPU speed times the number of CPUs per system. Bandwidth is a measure of data rate from all sources to memory, both CPU and I/O. Scheduler capability is the ability of the operating system to schedule work in a way that utilizes the system resources in the most efficient manner (with the objective of avoiding conflicts and bottlenecks).

Figure 15-4. Servers optimized for different workloads

graphics/15fig04.gif

Figure 15-5. The mainframe is balanced for commercial workloads

graphics/15fig05.gif

Figure 15-4 shows an example of a server optimized for many users who do not process much data. This might be a support center with many people who accept calls and must be able to work on the problems when they come in. Only a small number are active at a time. The system in the figure is balanced for this workload; however, this workload is not typical for a commercial system.

What happens to a small Web server run by StoreCompany if the number of users increases from tens to hundreds? For example, let us say you need to support static Web serving at 60,000 hits during an eight-hour working day. That is 125 hits per second involving numerous I/O operations if all the users want to look up books or CDs on a catalog. An increase in CPU does not automatically mean an increase in bandwidth, as is shown Figure 15-4. A mainframe traditionally has been extended in all dimensions, as is shown in Figure 15-5.

The mainframe CPU, memory hierarchy, and I/O are primarily designed for resource sharing. That does not mean that one design is right and the other is wrong. What is undeniable is the fact that they are different, and this is what creates difficulty in assessing relative capacity. If, for example, you used a benchmark of one single user running a C program with very little I/O to compare the mainframe with other machines, the mainframe would not fare well. This is because such a benchmark does not make use of the memory hierarchy, the I/O structure, the context switching capability, or the work scheduling capabilities of the mainframe. Relative system capacity is heavily dependent upon the workload. IBM measures performance on the mainframe with the use of special workloads designed to represent real, commercial workloads of different types.

Conceptually, the way performance is measured on the mainframe is similar to how it is done on other platforms: Data are collected, logged, and analyzed. However, the workload used for measuring performance is different from other platforms because of the long mainframe history. It may be useful to you to know something about how the IBM laboratories measure and publish the results of mainframe performance measurements, not least when you need to compare the capabilities of two processors. You need to have numbers that represent the aspects of the new processors that your work will use not the typical MHz-type rating that is always available.

How you define performance depends on what kind of workload you run (for example, whether you have many interactive users or more batch-like workloads). The IBM mainframe performance group has defined a family of workloads that emulates varied, real-world workloads. Throughput measurements done with these workloads are known as Large Systems Performance Ratio (LSPR) measurements.

One common mainframe processor performance measurement is internal throughput rate (ITR). Why the ITR and not the external throughput rate (ETR)? ETR depends on factors external to the processors, such as the network configuration. These factors vary depending on company setup. However, when deciding on the better machine to purchase, the focus is on the performance contribution of the machine, that is, its ITR.

Reliable mainframe performance for varied workloads is due to the balanced design of the internal bus bandwidth, I/O, processor speed, context switching, and scheduling capability. You can track how well your Linux-on-the-mainframe environment is working for you by gathering statistics for ITR and ETR calculations. By creating your own ITR and ETR numbers for your workload and comparing them with IBM's suite of workloads, you might find some combination of IBM workloads that have characteristics similar to yours. With that information, you can better predict how your workloads might behave on a new machine.

An LSPR ITR ratio for comparing two IBM mainframe processors is obtained by dividing the ITR for one processor running a certain workload with the ITR of another processor running the same workload. IBM labs started measuring these ratios with the IBM Model 158 in 1972. For more details including a list of the workloads, see Large Systems Performance Reference for IBM zSeries and S/390 (http://www.ibm.com/servers/eserver/zseries/lspr/).