HP introduced the Workload Manager (WLM) product in early 2000. It focuses on simplifying the application of shared resources to the workloads that need them. It does this by monitoring the workloads that are running on a consolidated server and reallocating unused resources to workloads that are experiencing an increase in load and need more resources. The original goal was to provide for automatic reconfiguration of Resource Partitions. At the time, resource partitioning through the Process Resource Manager product was the only partitioning technology available on HP-UX and HP had not yet introduced its Utility Pricing products. A lot has changed in the five years since then. Now four partitioning products and three Utility Pricing products are integrated with the Workload Manager. (HP has also introduced the new Global Workload Manager, but we have more on that later in this chapter.) WLM is described as the "intelligent policy engine" behind the Virtual Server Environment. Users specify service-level objectives for each workload and WLM allocates resources using any of the flexing technologies that happen to be available on the system. note This section only provides an overview of the HP-UX Workload Manager. More details and some usage examples will be provided in Chapter 15, "Workload Manager." WLM and the Virtual Server EnvironmentThe key thing to understand about Workload Manager is that it is an automation toolnothing more, nothing less. All of the flexibility that WLM takes advantage of is inherent in the underlying VSE technology. There is nothing you can do with WLM that you couldn't do manually by using the standard interfaces of the other VSE products. However, doing these tasks manually would be exceptionally complex and time consuming. You would have to set up performance or utilization monitors for all of your partitions and applications. You would then need to have someone watching those monitors 24 hours a day looking for which applications are experiencing increases in load. When they saw one, they would have to find another application or partition that was underutilized and then determine how to reallocate resources. They would have to know the commands required to reallocate the resources from the idle partition to the busy one, which would be different depending on what partitioning technology was being used on the various systems. This also assumes that this could all be done before the spike in load for the application subsides. The remainder of this section will discuss how WLM integrates with each of the other VSE technologies and some more-advanced WLM capabilities. PartitionsThe WLM product integrates with all of the partitioning solutions available on HP's high-end and midrange servers. note Although nPars and Integrity VMs either already or will support Windows, OpenVMS, and Linux, WLM is an HP-UX only product. If you want to do workload management of partitions running other operating systems on Integrity servers, you should skip to later in this chapter and read the section on Global Workload Manager. nPars with Instant CapacityAs you may recall from Part 1 of this book, nPartitions are fully electrically isolated from each other. Since WLM is a tool for automating the migration of resources between partitions, it needs to do so between electrically isolated partitions. This clearly can't be done if you are sharing the same physical CPUs, so we use Instant Capacity CPUs. In Part 2 we described how it is possible to migrate an Instant Capacity right-to-use (RTU) license from one nPar to another. This is the capability that WLM relies on to apply the available active CPU capacity to the partition that has the heaviest load. Figure 13-11 shows a summary of the diagrams shown back in Chapter 8. Figure 13-11. WLM Migrating Instant Capacity RTU Licenses between nParsAs you can see in Figure 13-11, a daemon (wlmd) is running in each partition and one global daemon (wlmpardlabeled pard in the figure) is running in each complex. The wlmd daemon is responsible for allocating resources to the workloads inside the partition when there is more than one workload. It is also responsible for passing data to the wlmpard daemon that describes the resources required to satisfy the workloads running in the partition. The wlmpard daemon uses this information to determine if there is a need to adjust the active capacity in any of the nPars. If so, it deactivates a processor in an idle partition and activates a processor in a busy one. The result is that the CPUs are always active in the partition that can use them. They will never run idle in one partition if they could be used by another. In Figure 13-11, the workload in nPar1 gets busy, so WLM deactivates CPUs in nPar2 and activates CPUs in nPar1. This provides the maximum capacity of eight CPUs to the workloads in nPar1 while they are busy. Then the workloads in nPar1 become idle and the workloads in nPar2 get busy. WLM can then deactivate CPUs in nPar1 and activate them in nPar2 to ensure that the CPUs are available for the workloads that need them. This provides the ability to flex each partition from four CPUs to eight CPUs, one CPU at a time, while both partitions are running and while maintaining full electrical isolation between the partitions. WLM allows you to do this automatically. A key consideration in making this work is that you want to ensure that each nPar has sufficient physical CPUs in the partition to meet the peak needs of all the workloads running there. Chapters 8 and 9 discuss how to optimize the number of active and inactive CPUs available in each partition. The nice thing is that the inactive CPUs only cost a fraction of the cost of active CPUs. Effectively, HP is paying for most of the cost of your spare capacity. Virtual PartitionsHP's virtual partitions (vPars) product has supported the migration of CPUs from one vPar to another since it was first released. WLM released a version that supported the automatic movement of these CPUs about a month later. Migrating CPUs between vPars provides similar benefits to those we saw with nPars but with one major difference. Because there is no physical isolation between the CPUs in the partitions, it is possible to deallocate a CPU from one partition and allocate the exact same physical CPU to another partition. It is not necessary to have idle Instant Capacity to do this. Figure 13-12 shows how this works for vPars. Figure 13-12. WLM Migrating CPUs between vParsThe WLM configuration for vPars is similar to that for nPars. In Figure 13-12, you see where the eight physical CPUs on the system are allocated to the different partitions as the loads on the workloads vary over time. In the first box, both vPars have four CPUs. When the load increases on the workload running in vPar1, WLM moves two CPUs from vPar2 and apply them to vPar1. Later, when the load on vPar1 decreases and vPar2 becomes busy, WLM migrates the CPUs back to vPar2; the box on the right shows six CPUs in the partition. This can all be done in real time while both of these partitions are under load. Secure Resource PartitionsSecure Resource Partitions (SRPs) was the first flexible partitioning technology that WLM supported. In fact, it was the only partitioning technology available on HP-UX platforms when WLM was first introduced. SRPs allow you to run multiple workloads in a single copy of HP-UX and isolate them from each other from both a resource perspective and a security perspective. Figure 13-13 shows how SRPs work. Figure 13-13. WLM Reallocation of CPU Shares among Secure Resource PartitionsWLM works with Secure Resource Partitions to move the boundaries of how much CPU each partition gets as the loads on each of the applications vary over time. If Application A gets a spike in demand, WLM will pull CPU resources from idle workloads in the other partitions and allocate them to partition 0. If you are using FSS CPU controls for these partitions, then WLM can allocate them in sub-CPU granularity increments. If you are using PSETs, then it will do this with whole-CPU granularity. You can also combine FSS and PSET partitions in the same configuration. For example, consider if Partition 0 and Partition 1 are FSS partitions and Partition 2 is a PSET. If application C requires additional resources, it will need a whole CPU. If there are sufficient idle shares in Partitions 0 and 1 to add up to a whole CPU, WLM will shrink those partitions and add a CPU to Partition 2. WLM Memory Controls for Secure Resource PartitionsAdditionally, because Secure Resource Partitions is currently the only partitioning technology that supports online migration of memory between partitions, WLM also supports this feature. However, because the method of reallocating memory between partitions is paging, this is not something that you would want to do at every WLM interval. Therefore, WLM has implemented a way to reallocate memory when workloads activate or deactivate. This is most useful for Serviceguard packages. Effectively, when a package fails over and activates on a node where WLM is running, it will reallocate the memory to ensure that the new workload gets an appropriate share of the memory. It will do this for CPU allocations as well, of course. Utility PricingThere are three Utility Pricing technologies. Instant Capacity provides the ability to permanently activate capacity that is available on the system. You can also activate this spare capacity with a Temporary Instant Capacity license, which allows you to activate Instant Capacity for short periods of time. Finally, there is Pay per use, which allows you to lease a system from HP and pay only for the capacity that you use each month. WLM integrates with all of these. However, because WLM is primarily intended to reallocate resources to meet real-time workload demands, is integration with standard permanent activation Instant Capacity is only to reassign Instant Capacity RTU licenses between nPars, which was described in the previous section. In this section we will describe how WLM integrates with Temporary Instant Capacity and Pay per use. Both of these technologies allow users to instantly activate and deactivate spare capacity when the workloads on a system require it. Temporary Instant CapacityAs was described in Chapter 8, "HP's Utility Pricing Solutions," it is possible to take advantage of available Instant Capacity processors by activating them for short periods of time using Temporary Capacity. WLM's integration with this is fairly simpleif there are not enough CPUs active on the system to meet the needs of the workloads running there, and Instant Capacity CPUs are available, and there is Temporary Capacity licensed on the system, WLM will activate additional Temporary Instant Capacity CPUs to meet the demand. You also have the ability to specify that only high priority workloads can activate these processors using the icod_thresh_priority keyword in the configuration file. One of the nicest features of this is that WLM will only activate processors after it has made all available attempts to reallocate CPUs that are already active. Only then will it consider activating additional capacity. In other words, if there are idle resources in any of the other Secure Resource Partitions, vPars or nPars on the system, it will reallocate those resources before it will activate any additional Temporary Instant Capacity CPUs. Pay Per UseSince PPU systems are leased and the lease cost varies based on the utilization of the server, customers are typically less concerned about increasing the utilization of the server and more concerned with controlling utility costs. As a result, there is often a smaller number of workloads and partitions on PPU systems than on other servers that WLM manages. The key feature that WLM provides on a PPU system is that it will ensure that the capacity is used efficiently, which, in turn, ensures that your monthly payments are as low as they can be while still meeting the service-level objectives of the workloads running there. There are two types of PPU, Active CPU and Percent CPU. WLM is most useful in ensuring that idle CPUs are deactivated, which is very similar to how it manages Temporary Instant Capacity. Both Percent CPU and Active CPU PPU now allow you to activate only the CPUs you need for your workloads and keep the rest inactive. The payment you make at the end of the month is based in part on how many CPUs were active throughout the course of the month. WLM's normal operation is to allocate the minimum amount of CPU required. It monitors the workloads running on the system and activates an additional CPU if additional resources are needed. When idle CPUs are active, it turns them off. It effectively minimizes the amount of time the CPUs are on, and thereby your cost, because it turns CPUs on as soon as they are needed and turns them off as soon as they become idle. Other VSE Management ToolsWLM is also integrated with the other VSE management tools. The level of integration varies depending on the product and tends to focus on what would be the most useful to real customers. ServiceguardSince Serviceguard is a high-availability tool, the integration of WLM with Serviceguard is focused on ensuring that a workload will get the appropriate amount of resources when it fails over onto a WLM-managed system. This is depicted in Figure 13-14. Figure 13-14. WLM Reallocation of CPU Resources after a Serviceguard FailoverAs you can see, when the system is running its normal workloads, the Oracle CRM instance gets 80% of the resources of node 2. However, the SAP instance running on the other server is higher priority than the CRM database. Therefore, when the SAP instance fails over onto node 2, WLM reallocates the resources of node 2 to ensure that the SAP instance gets a larger share of the system. One key feature of the WLM integration with Serviceguard is that WLM does active monitoring of the workload after the failover. The reason this is important is that if the workload is not busy when the failover occurs, it will migrate the resources back to the other workloads, even if they are lower priority. It will ensure that the SAP instance gets the resources it needs, but if SAP is idle when the failover occurs (e.g. in the middle of the night), then WLM will let the other workloads use the resources rather than letting them go to waste. Integrity Essentials Virtualization ManagerEarlier in this chapter we discussed the new Virtualization Manager. This product enables customers to visualize where their workloads are running, what resources these workloads are consuming, and what other workloads they are sharing resources with. The integration of the Virtualization Manager with Workload Manager is focused on configuration. In this initial release, the Virtualization Manager will launch the WLM graphical user interface when you select a workload or partition where WLM is being used to control resource allocation. Advanced WLM FeaturesThe WLM product has some very sophisticated features. However, you should be aware of what you are getting into before implementing them. A basic WLM configuration can be set up and deployed very quickly. When you start getting fancy, there may be some unintended side effect that you hadn't considered. This is why we typically recommend that you start off simply and expand your configuration into some of these more-sophisticated features after you are more familiar with how the product works. Hierarchical ArbitrationA new feature of the 3.0 release of Workload Manager is the ability to stack partitions in any combination and take advantage of all of the flexing characteristics available for all of them. This also includes the ability to activate and deactivate Temporary Instant Capacity or PPU processors if they are available. HP has pulled in all resource allocation functionality that was not related to SRPs into the global arbiter daemon. This daemon is now the central coordination point for all nPar, vPar, Temporary Instant Capacity, and PPU reallocation functions. Figure 13-15 shows what is possible with this new global arbiter. Figure 13-15. WLM Managing nPars, vPars, Secure Resource Partitions, and Temporary Instant Capacity Resources to meet the Demands of a Serviceguard FailoverFigure 13-15 illustrates a Serviceguard failover and WLM's reallocation of resources. The first thing to happen is a failure of the application in secure resource partition 2.1.2. Serviceguard sees this and restarts the package on secure resource partition 1.1.2. WLM sees this happen and immediately reallocates resources between the secure resource partitions inside the vPar. If there are not enough resources in the vPar to satisfy the requirements of the workload when the failover occurs, it will assess the resource requirements of all of the workloads across the entire system. It will reallocate Instant Capacity resources across the nPars, assign those resources to vPar 1.1, and apply them to secure resource partition 1.1.2. If there are not enough resources to satisfy all the workloads even after reallocating them across the entire system, it will activate the Instant Capacity CPUs on the nPar using a Temporary Instant Capacity license if one is available. The important thing to consider here is that WLM will minimize the cost of Temporary Instant Capacity by first applying idle resources in any of the other partitions before resorting to activating the Instant Capacity processors. This is a very nice combination of features, but it requires a sophisticated system configuration. We recommend that you start with one or two of these features and build up the configuration of the others over time. Metric-Based GoalsWLM was the first goal-based workload manager available in the Unix market. It allows you to allocate the resource entitlement of a workload based on an arbitrary application metric, such as the application's response time. The concept is quite simple, actually. WLM monitors the response time of the application at every interval. When the response time exceeds the goal you have set, it adds some amount of additional CPU in proportion to how far off from the goal the actual value of the metric is. It then waits for another interval to see what impact this had. If the response time is still above the goal, more CPU is added again. This process is repeated until the actual response time drops below the goal that is defined. When the load on the application drops and the response time drops significantly below the goal, some of the CPU is returned to the free pool so it can be allocated to other workloads that need it. Configuring WLM to use metrics is quite simple. The use of metric-based goals is considered to be an advanced feature because it requires a user to have quite a bit more knowledge about the application being controlled. Some of the issues include:
Although there are ways to work around all of these issues, it requires a much more detailed understanding of the application and how it reacts to changes in the amount of CPU available. ISV ToolkitsHP provides a number of toolkits that simplify the data-collection process for some major independent software vendor (ISV) applications. Table 13-1 lists the toolkits that come with WLM. Many of these toolkits will be supported in future versions of gWLM as well.
These are shipped with the WLM product and are also available for download from software.hp.com. Although these toolkits simplify the collection of detailed performance data from these applications, you will still need to develop an understanding of how the application and these metrics react to changes in resources. |