11.3 Proactive Performance Management in Production | The OReilly Java Authors

I l @ ve RuBoard

Here are some examples of typical performance management problems that crop up:

Users call with a response time problem.
The JVM reaches an out-of-memory state and crashes.
Logging space on the filesystem is exhausted.
A database table maxes out on the extents.
The nightly batch runs too long or freezes .

Performance management solutions in production tend to be reactive. For example, here is a standard pattern of activity:

A problem occurs and is reported .
The system is analyzed to identify what caused the problem.
Some corrective action is taken to eliminate the problem.
Everything goes back to normal.

Reactive performance management will always be required because it is impossible to anticipate all conditions. However, proactive performance management can minimize situations in which reactive activity is required. Proactive performance management requires monitoring normal activity to identify unusual performance spikes, dips, and trends.

For example, one site with a performance monitoring policy identified a trend that showed long page response time. Analysis of the server indicated that a cache was configured incorrectly. This was fixed and the problem was eliminated. Users noticed nothing. Without the site's proactive policy, administration would not have known about the problem until users complained about increasingly slow performance.

Similarly, another site with a proactive performance management policy monitored JVM memory size . After an upgrade, the JVM memory started to increase slowly. The administrators were able to make controlled restarts of the JVM at times when site traffic was lowest until the object retention problem was fixed. Without heap size monitoring, the JVM would have crashed at some point without warning, and with no obvious cause. The fix would have been delayed until after analysis and, most likely, reoccurence of the problem.

Systems and applications change over time. Monitoring the system and having logs available for analysis helps to minimize reactive problem management.

11.3.1 Plan for Performance Factors

There are five performance factors that you should always plan for:

Workload: The amount of work that will be performed by the system. This is defined by the number of users, their activity levels, and types of activity, together with any non- user -initiated automatic activity such as background and batch processes. The performance plan needs to take into account how these factors will scale and change over time, and should consider average and peak workloads.
Throughput: The total amount of work the system can handle. Technically, throughput depends on a composite of I/O speed, CPU speed, and the efficiency of the operating system. Practically, throughput can be considered in terms of factors such as the number of transactions per minute that can be handled by the application, the amount of data or number of objects that can flow through the various subsystems of the application, and the number and sizes of requests that can be handled per minute. The performance plan should highlight expected transaction rates and data flow capacities as targets that need to be met.
Resources: The system's hardware and software. Within the performance plan you need to anticipate increasing the amount or speed of hardware resources to scale the system, and ensure that the software ”e.g., the operating system and middleware ”is capable of handling the expected performance.
Scaling: The ability of the system to handle increasing numbers of users or objects, and increasing amounts of data. A system that scales well is one that can handle almost twice the throughput without performance degradation by doubling the resources available to the system. The performance plan should include performance-testing capabilities for the various scales the system is targeted to reach at various milestones. This includes ensuring that the requisite licensing is available for simulated users and data.
Contention: When more than one component is attempting to simultaneously use a resource in a conflicting way. Contention is inevitable ”for example, there are always times when multiple threads try to use the CPU. Contention limits the scalability of the application. Minimizing contention is usually a major goal of performance management. The performance plan should target the identification of contended resources at all stages of development and production. Trying to predict which resources will cause contention helps to alleviate contention.

I l @ ve RuBoard