11.6 Infrastructure issues

< Day Day Up >

One reason to do capacity planning is to purchase only the IT infrastructure that is required to run your business. The cost of infrastructure is less important if the number of virtual servers is small. From a performance data collection perspective, there are agents used to collect data. These agents are unrelated to Domino, and there are many of these agents available to support different aspects of systems management.

If the cost of running performance (or other systems management) agents in each Linux is 2% of a processor, and this is only multiplied one or two times, it is a trivial expense. However, if there will be one hundred servers, then the cost at 2% each requires two processors just for the performance agents. Any method for performance analysis or capacity planning should be evaluated for this shared environment prior to the solidifying of the configuration.

In an environment where resources will be shared by two or more virtual servers, any resources used by one process reduces the amount of resources available to other servers. The cost of running non-required or inefficient processes thus reduces the overall workload capability of the system. All additional processes should be evaluated for suitability. For example, if you find cron jobs that start regularly but are not always necessary, this is work that is easily eliminated.

When measuring inside the Linux server, an agent is required to provide the data. Though "top" is the most common performance monitor in a single server dedicated environment, the resource requirements of top are not suitable to a shared resource environment.

There are two types of agents, passive and active:

An active agent is constantly waking up, collecting data, writing it to a log file, and then going to sleep. This may occur as often as every two seconds. The data collection is done on the server, and some additional work is required to move that data from the server to where the data will be analyzed.
A passive agent—netsnmp is the example used in this book—sits idle until there is an external request for data. The result is that control over data collection is an external function rather than an internal one. Also, the data now resides in a central repository for all servers.

11.6.1 Linux measurement inaccuracy

When an agent inside Linux under z/VM performs CPU measurements, Linux assumes and reports the data based on 100% of the machine. This leads to sometimes very exaggerated values.

In Figure 11-1 on page 274, each Linux guest was measured twice. This data is provided by Linux. The reported value by Linux was LinuxA2, LinuxB2, and LinuxC2. The true (corrected) values are reported as LinuxA, LinuxB, and LinuxC. Thus LinuxA2 and LinuxA are reporting on the same Linux server. The external data collector (as part of ESALPS) corrected the LinuxA value based on the z/VM correct data. The data provided by z/VM for each virtual machine is correct to nearly a microsecond.

click to expand
Figure 11-1: ESAUCD4 Linux server processor reporting - showing accuracy problem

Note

This same reporting inaccuracy also happens when Linux is running natively in an LPAR. However, as noted above, z/VM has the benefit of providing correct reporting.

Performing capacity planning using invalid numbers can be a very serious mistake. On heavily loaded systems, Linux has been shown to report an order of magnitude increase in CPU requirements over what is actually utilized. When measuring dedicated servers, this is not an issue. Thus, on servers where processors are dedicated, or even shared but with very low utilization, this is not an issue. It is an issue whether running under z/VM or in an LPAR on a shared system, but is easier to correct under z/VM.

< Day Day Up >