HP-UX Workload Manager


HP introduced the Workload Manager (WLM) product in early 2000. It focuses on simplifying the application of shared resources to the workloads that need them. It does this by monitoring the workloads that are running on a consolidated server and reallocating unused resources to workloads that are experiencing an increase in load and need more resources.

The original goal was to provide for automatic reconfiguration of Resource Partitions. At the time, resource partitioning through the Process Resource Manager product was the only partitioning technology available on HP-UX and HP had not yet introduced its Utility Pricing products.

A lot has changed in the five years since then. Now four partitioning products and three Utility Pricing products are integrated with the Workload Manager. (HP has also introduced the new Global Workload Manager, but we have more on that later in this chapter.) WLM is described as the "intelligent policy engine" behind the Virtual Server Environment. Users specify service-level objectives for each workload and WLM allocates resources using any of the flexing technologies that happen to be available on the system.

note

This section only provides an overview of the HP-UX Workload Manager. More details and some usage examples will be provided in Chapter 15, "Workload Manager."


WLM and the Virtual Server Environment

The key thing to understand about Workload Manager is that it is an automation toolnothing more, nothing less. All of the flexibility that WLM takes advantage of is inherent in the underlying VSE technology. There is nothing you can do with WLM that you couldn't do manually by using the standard interfaces of the other VSE products. However, doing these tasks manually would be exceptionally complex and time consuming.

You would have to set up performance or utilization monitors for all of your partitions and applications. You would then need to have someone watching those monitors 24 hours a day looking for which applications are experiencing increases in load. When they saw one, they would have to find another application or partition that was underutilized and then determine how to reallocate resources. They would have to know the commands required to reallocate the resources from the idle partition to the busy one, which would be different depending on what partitioning technology was being used on the various systems. This also assumes that this could all be done before the spike in load for the application subsides.

The remainder of this section will discuss how WLM integrates with each of the other VSE technologies and some more-advanced WLM capabilities.

Partitions

The WLM product integrates with all of the partitioning solutions available on HP's high-end and midrange servers.

note

Although nPars and Integrity VMs either already or will support Windows, OpenVMS, and Linux, WLM is an HP-UX only product. If you want to do workload management of partitions running other operating systems on Integrity servers, you should skip to later in this chapter and read the section on Global Workload Manager.


nPars with Instant Capacity

As you may recall from Part 1 of this book, nPartitions are fully electrically isolated from each other. Since WLM is a tool for automating the migration of resources between partitions, it needs to do so between electrically isolated partitions. This clearly can't be done if you are sharing the same physical CPUs, so we use Instant Capacity CPUs.

In Part 2 we described how it is possible to migrate an Instant Capacity right-to-use (RTU) license from one nPar to another. This is the capability that WLM relies on to apply the available active CPU capacity to the partition that has the heaviest load. Figure 13-11 shows a summary of the diagrams shown back in Chapter 8.

Figure 13-11. WLM Migrating Instant Capacity RTU Licenses between nPars


As you can see in Figure 13-11, a daemon (wlmd) is running in each partition and one global daemon (wlmpardlabeled pard in the figure) is running in each complex. The wlmd daemon is responsible for allocating resources to the workloads inside the partition when there is more than one workload. It is also responsible for passing data to the wlmpard daemon that describes the resources required to satisfy the workloads running in the partition. The wlmpard daemon uses this information to determine if there is a need to adjust the active capacity in any of the nPars. If so, it deactivates a processor in an idle partition and activates a processor in a busy one. The result is that the CPUs are always active in the partition that can use them. They will never run idle in one partition if they could be used by another.

In Figure 13-11, the workload in nPar1 gets busy, so WLM deactivates CPUs in nPar2 and activates CPUs in nPar1. This provides the maximum capacity of eight CPUs to the workloads in nPar1 while they are busy. Then the workloads in nPar1 become idle and the workloads in nPar2 get busy. WLM can then deactivate CPUs in nPar1 and activate them in nPar2 to ensure that the CPUs are available for the workloads that need them.

This provides the ability to flex each partition from four CPUs to eight CPUs, one CPU at a time, while both partitions are running and while maintaining full electrical isolation between the partitions. WLM allows you to do this automatically.

A key consideration in making this work is that you want to ensure that each nPar has sufficient physical CPUs in the partition to meet the peak needs of all the workloads running there. Chapters 8 and 9 discuss how to optimize the number of active and inactive CPUs available in each partition. The nice thing is that the inactive CPUs only cost a fraction of the cost of active CPUs. Effectively, HP is paying for most of the cost of your spare capacity.

Virtual Partitions

HP's virtual partitions (vPars) product has supported the migration of CPUs from one vPar to another since it was first released. WLM released a version that supported the automatic movement of these CPUs about a month later.

Migrating CPUs between vPars provides similar benefits to those we saw with nPars but with one major difference. Because there is no physical isolation between the CPUs in the partitions, it is possible to deallocate a CPU from one partition and allocate the exact same physical CPU to another partition. It is not necessary to have idle Instant Capacity to do this. Figure 13-12 shows how this works for vPars.

Figure 13-12. WLM Migrating CPUs between vPars


The WLM configuration for vPars is similar to that for nPars. In Figure 13-12, you see where the eight physical CPUs on the system are allocated to the different partitions as the loads on the workloads vary over time. In the first box, both vPars have four CPUs. When the load increases on the workload running in vPar1, WLM moves two CPUs from vPar2 and apply them to vPar1. Later, when the load on vPar1 decreases and vPar2 becomes busy, WLM migrates the CPUs back to vPar2; the box on the right shows six CPUs in the partition. This can all be done in real time while both of these partitions are under load.

Secure Resource Partitions

Secure Resource Partitions (SRPs) was the first flexible partitioning technology that WLM supported. In fact, it was the only partitioning technology available on HP-UX platforms when WLM was first introduced.

SRPs allow you to run multiple workloads in a single copy of HP-UX and isolate them from each other from both a resource perspective and a security perspective. Figure 13-13 shows how SRPs work.

Figure 13-13. WLM Reallocation of CPU Shares among Secure Resource Partitions


WLM works with Secure Resource Partitions to move the boundaries of how much CPU each partition gets as the loads on each of the applications vary over time. If Application A gets a spike in demand, WLM will pull CPU resources from idle workloads in the other partitions and allocate them to partition 0. If you are using FSS CPU controls for these partitions, then WLM can allocate them in sub-CPU granularity increments. If you are using PSETs, then it will do this with whole-CPU granularity. You can also combine FSS and PSET partitions in the same configuration. For example, consider if Partition 0 and Partition 1 are FSS partitions and Partition 2 is a PSET. If application C requires additional resources, it will need a whole CPU. If there are sufficient idle shares in Partitions 0 and 1 to add up to a whole CPU, WLM will shrink those partitions and add a CPU to Partition 2.

WLM Memory Controls for Secure Resource Partitions

Additionally, because Secure Resource Partitions is currently the only partitioning technology that supports online migration of memory between partitions, WLM also supports this feature. However, because the method of reallocating memory between partitions is paging, this is not something that you would want to do at every WLM interval. Therefore, WLM has implemented a way to reallocate memory when workloads activate or deactivate. This is most useful for Serviceguard packages. Effectively, when a package fails over and activates on a node where WLM is running, it will reallocate the memory to ensure that the new workload gets an appropriate share of the memory. It will do this for CPU allocations as well, of course.

Utility Pricing

There are three Utility Pricing technologies. Instant Capacity provides the ability to permanently activate capacity that is available on the system. You can also activate this spare capacity with a Temporary Instant Capacity license, which allows you to activate Instant Capacity for short periods of time. Finally, there is Pay per use, which allows you to lease a system from HP and pay only for the capacity that you use each month.

WLM integrates with all of these. However, because WLM is primarily intended to reallocate resources to meet real-time workload demands, is integration with standard permanent activation Instant Capacity is only to reassign Instant Capacity RTU licenses between nPars, which was described in the previous section. In this section we will describe how WLM integrates with Temporary Instant Capacity and Pay per use. Both of these technologies allow users to instantly activate and deactivate spare capacity when the workloads on a system require it.

Temporary Instant Capacity

As was described in Chapter 8, "HP's Utility Pricing Solutions," it is possible to take advantage of available Instant Capacity processors by activating them for short periods of time using Temporary Capacity. WLM's integration with this is fairly simpleif there are not enough CPUs active on the system to meet the needs of the workloads running there, and Instant Capacity CPUs are available, and there is Temporary Capacity licensed on the system, WLM will activate additional Temporary Instant Capacity CPUs to meet the demand. You also have the ability to specify that only high priority workloads can activate these processors using the icod_thresh_priority keyword in the configuration file.

One of the nicest features of this is that WLM will only activate processors after it has made all available attempts to reallocate CPUs that are already active. Only then will it consider activating additional capacity. In other words, if there are idle resources in any of the other Secure Resource Partitions, vPars or nPars on the system, it will reallocate those resources before it will activate any additional Temporary Instant Capacity CPUs.

Pay Per Use

Since PPU systems are leased and the lease cost varies based on the utilization of the server, customers are typically less concerned about increasing the utilization of the server and more concerned with controlling utility costs. As a result, there is often a smaller number of workloads and partitions on PPU systems than on other servers that WLM manages. The key feature that WLM provides on a PPU system is that it will ensure that the capacity is used efficiently, which, in turn, ensures that your monthly payments are as low as they can be while still meeting the service-level objectives of the workloads running there.

There are two types of PPU, Active CPU and Percent CPU. WLM is most useful in ensuring that idle CPUs are deactivated, which is very similar to how it manages Temporary Instant Capacity. Both Percent CPU and Active CPU PPU now allow you to activate only the CPUs you need for your workloads and keep the rest inactive. The payment you make at the end of the month is based in part on how many CPUs were active throughout the course of the month. WLM's normal operation is to allocate the minimum amount of CPU required. It monitors the workloads running on the system and activates an additional CPU if additional resources are needed. When idle CPUs are active, it turns them off. It effectively minimizes the amount of time the CPUs are on, and thereby your cost, because it turns CPUs on as soon as they are needed and turns them off as soon as they become idle.

Other VSE Management Tools

WLM is also integrated with the other VSE management tools. The level of integration varies depending on the product and tends to focus on what would be the most useful to real customers.

Serviceguard

Since Serviceguard is a high-availability tool, the integration of WLM with Serviceguard is focused on ensuring that a workload will get the appropriate amount of resources when it fails over onto a WLM-managed system. This is depicted in Figure 13-14.

Figure 13-14. WLM Reallocation of CPU Resources after a Serviceguard Failover


As you can see, when the system is running its normal workloads, the Oracle CRM instance gets 80% of the resources of node 2. However, the SAP instance running on the other server is higher priority than the CRM database. Therefore, when the SAP instance fails over onto node 2, WLM reallocates the resources of node 2 to ensure that the SAP instance gets a larger share of the system.

One key feature of the WLM integration with Serviceguard is that WLM does active monitoring of the workload after the failover. The reason this is important is that if the workload is not busy when the failover occurs, it will migrate the resources back to the other workloads, even if they are lower priority. It will ensure that the SAP instance gets the resources it needs, but if SAP is idle when the failover occurs (e.g. in the middle of the night), then WLM will let the other workloads use the resources rather than letting them go to waste.

Integrity Essentials Virtualization Manager

Earlier in this chapter we discussed the new Virtualization Manager. This product enables customers to visualize where their workloads are running, what resources these workloads are consuming, and what other workloads they are sharing resources with.

The integration of the Virtualization Manager with Workload Manager is focused on configuration. In this initial release, the Virtualization Manager will launch the WLM graphical user interface when you select a workload or partition where WLM is being used to control resource allocation.

Advanced WLM Features

The WLM product has some very sophisticated features. However, you should be aware of what you are getting into before implementing them. A basic WLM configuration can be set up and deployed very quickly. When you start getting fancy, there may be some unintended side effect that you hadn't considered. This is why we typically recommend that you start off simply and expand your configuration into some of these more-sophisticated features after you are more familiar with how the product works.

Hierarchical Arbitration

A new feature of the 3.0 release of Workload Manager is the ability to stack partitions in any combination and take advantage of all of the flexing characteristics available for all of them. This also includes the ability to activate and deactivate Temporary Instant Capacity or PPU processors if they are available. HP has pulled in all resource allocation functionality that was not related to SRPs into the global arbiter daemon. This daemon is now the central coordination point for all nPar, vPar, Temporary Instant Capacity, and PPU reallocation functions. Figure 13-15 shows what is possible with this new global arbiter.

Figure 13-15. WLM Managing nPars, vPars, Secure Resource Partitions, and Temporary Instant Capacity Resources to meet the Demands of a Serviceguard Failover


Figure 13-15 illustrates a Serviceguard failover and WLM's reallocation of resources. The first thing to happen is a failure of the application in secure resource partition 2.1.2. Serviceguard sees this and restarts the package on secure resource partition 1.1.2. WLM sees this happen and immediately reallocates resources between the secure resource partitions inside the vPar. If there are not enough resources in the vPar to satisfy the requirements of the workload when the failover occurs, it will assess the resource requirements of all of the workloads across the entire system. It will reallocate Instant Capacity resources across the nPars, assign those resources to vPar 1.1, and apply them to secure resource partition 1.1.2. If there are not enough resources to satisfy all the workloads even after reallocating them across the entire system, it will activate the Instant Capacity CPUs on the nPar using a Temporary Instant Capacity license if one is available.

The important thing to consider here is that WLM will minimize the cost of Temporary Instant Capacity by first applying idle resources in any of the other partitions before resorting to activating the Instant Capacity processors. This is a very nice combination of features, but it requires a sophisticated system configuration. We recommend that you start with one or two of these features and build up the configuration of the others over time.

Metric-Based Goals

WLM was the first goal-based workload manager available in the Unix market. It allows you to allocate the resource entitlement of a workload based on an arbitrary application metric, such as the application's response time.

The concept is quite simple, actually. WLM monitors the response time of the application at every interval. When the response time exceeds the goal you have set, it adds some amount of additional CPU in proportion to how far off from the goal the actual value of the metric is. It then waits for another interval to see what impact this had. If the response time is still above the goal, more CPU is added again. This process is repeated until the actual response time drops below the goal that is defined. When the load on the application drops and the response time drops significantly below the goal, some of the CPU is returned to the free pool so it can be allocated to other workloads that need it.

Configuring WLM to use metrics is quite simple. The use of metric-based goals is considered to be an advanced feature because it requires a user to have quite a bit more knowledge about the application being controlled. Some of the issues include:

  • You need to have access to some useful metric about the application. This can be response time, queue length, or the number of users or processes.

  • You need to have a metric that fluctuates in relation to the application's need for resources. If adding CPU to the application doesn't improve the value of the metric, it isn't a good candidate for a goal.

  • Sometimes these metrics vary much more quickly than WLM can react. For example, when a request queue length is used and the requests are typically handled in a very short time, the instantaneous queue length might not be a good indication of the ongoing load on the application. It would do little good for WLM to add a CPU to a workload because of a high queue length if the application can flush the queue in less than one second. In this case, an exponentially degrading average of the queue length can be used (which WLM will do for you if you use the cntl_smooth option). Another option would be to use the value of the change in the queue length rather than the value of the actual queue length. If the queue length is increasing, it is likely that the application needs more resources.

Although there are ways to work around all of these issues, it requires a much more detailed understanding of the application and how it reacts to changes in the amount of CPU available.

ISV Toolkits

HP provides a number of toolkits that simplify the data-collection process for some major independent software vendor (ISV) applications. Table 13-1 lists the toolkits that come with WLM. Many of these toolkits will be supported in future versions of gWLM as well.

Table 13-1. WLM ISV Toolkits

Toolkit

How It Works

Oracle

Allows you to specify an application-specific query. Allows you to use this query as a measure of the response time of the database. Alternatively you can use the query to pull data out of the database and use that as a metric. This is useful for other applications, such as SAP, that store their own response-time data in the Oracle database.

WebLogic

Uses the standard JMX management interface to pull the queue length and size of the free thread pool out of a WebLogic instance.

Apache

Will run a URL fetch through the Apache instance and time how long it takes for the server to respond. This is useful for getting response time for any CGI program or even a J2EE server that uses Apache as its front end.

Job Duration/SAS

Managing batch jobs is very different than managing a application with a long-running server. The thing you want to control is how long it takes to finish the job. This toolkit monitors jobs and tracks how much CPU is required to complete them. You can then specify how long you want the job to take, and WLM will set its entitlement to ensure that it gets enough CPU to complete in the specified time.

There is an extension specific to SAS that allows you to instrument your SAS job with a macro that reports how far along the job is at each step. The job-duration toolkit uses this to determine if this specific run of the job is taking more or less time than normal and adjusts the CPU to meet the job duration goal specified.

SNMP

Allows you to define any SNMP variable by its object identifier. It will query this variable at every WLM interval and feed the value into WLM.


These are shipped with the WLM product and are also available for download from software.hp.com. Although these toolkits simplify the collection of detailed performance data from these applications, you will still need to develop an understanding of how the application and these metrics react to changes in resources.



The HP Virtual Server Environment. Making the Adaptive Enterprise Vision a Reality in Your Datacenter
The HP Virtual Server Environment: Making the Adaptive Enterprise Vision a Reality in Your Datacenter
ISBN: 0131855220
EAN: 2147483647
Year: 2003
Pages: 197

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net