Global Workload Manager Overview | The HP Virtual Server Environment: Making the Adaptive Enterprise Vision a Reality in Your Datacenter

Global Workload Manager is the second-generation workload management application in HP's Virtual Server Environment. gWLM's architecture provides a centralized management model for defining workloads and their associated policies. This makes possible resource sharing by utilizing several of the dynamic system capabilities in HP's Virtual Server Environment such as virtual partitions, processor sets, and fair-share schedule groups. The association of policies with workloads increases system utilization by increasing the level of resource-sharing between workloads. Policies can be used instead to isolate resources to specific workloads.

Figure 19-1 shows an example configuration for gWLM. This example illustrates gWLM's flexibility and power in its management of workloads. The Central Management Server (CMS) hosts the HP Systems Insight Manager application, the gWLM CMS daemon, and the gWLM web-based graphical user interface (GUI). The gWLM GUI is tightly integrated with HP Systems Insight Manager and communicates with the gWLM CMS daemon. The gWLM CMS daemon communicates with the gWLM agents running on each operating system under the control of gWLM.

Figure 19-1. Global Workload Manager Example Configurations

In the CMS depicted in the diagram, four Shared Resource Domains are being managed. The first Shared Resource Domain is SRD 1. This SRD is in managed mode, which means that gWLM is monitoring workload resource utilization and is actively making changes to resource allocation for each compartment. SRD 1 contains three vPars that are all running HP-UX. Notice that the gWLM CMS daemon communicates with a gWLM managed node agent running on each of the vPars.

Important

Shared Resource Domains containing vPars must reside within the same nPar or stand-alone server. This is because gWLM must have the ability to migrate resources between the compartments within the Shared Resource Domain.

The second SRD shown is SRD 2, which consists of four FSS groups running within a single operating system. Since all of the compartments are contained within a single operating system, only one gWLM agent is required on the system. The gWLM CMS daemon communicates with the gWLM agent to send configuration changes and collect historical utilization data. The gWLM agent is responsible for adjusting the size of each compartment based on the policies and resource utilization metrics of its workloads. This SRD is also in managed mode.

The third SRD, SRD 3, is in advisory mode, which means that gWLM will not make changes to the PSET compartments on this system. Instead, gWLM will provide graphs and reports that show the resource utilization and the associated resource adjustments it would make if it were in managed mode. SRD 3 contains two processor sets that can be adjusted in size based on resource utilization if the SRD was changed from advisory to managed mode.

Finally, SRD 4 shows gWLM's ability to manage workloads running on the Linux operating system. When using Linux, processor sets are the only type of compartment supported. SRD 4 is in managed mode. The same CMS can manage a heterogeneous environment containing HP-UX, Linux, and OpenVMS operating systems (OpenVMS is not shown in the diagram but functions similarly to Linux and HP-UX). SRDs must contain homogenous types of compartments and operating systems, but at the CMS level, gWLM can manage a variety of compartment types and operating systems.

Architecture of the Global Workload Manager Central Management Server

The diagram shown in Figure 19-2 provides a more detailed view of the gWLM CMS architecture, which is quite simple. The two main functions are the graphical user interface and a daemon that handles background tasks that need to be running even when no users are actively using the user interface.

Figure 19-2. Architecture of the gWLM Central Management Server

The gWLM screens in the GUI rely on data in the gWLM database on the CMS. The gWLM GUI also interacts with the agents on the managed systems for status display, real-time graphing, and deploying configuration data.

The gWLM daemon running on the CMS ensures that the agents always have a mechanism to upload historical data into the CMS database. It is also responsible for forwarding events to the HP SIM event management system.

Architecture of Global Workload Manager's Managed Node

The diagram show in Figure 19-3 illustrates the architecture of gWLM's managed node. Each node being managed by gWLM must have a gWLM agent running. The major functions performed by the gWLM agent include:

Discovery: The agent discovers the partition configuration of the system. This information is passed to the CMS and is displayed to the administrator when he or she is configuring workloads and policies for a system.
Application Manager: When the local system has multiple workloads that are being controlled by PSET or FSS compartments, it is necessary to specify which processes on the system should run in each of the compartments. The application manager is responsible for ensuring that these processes are assigned to the correct compartment.
Data Collection and Aggregation: The data that gWLM collects in order to manage workloads is stored locally. It is then aggregated and passed to the CMS for storage in the database.
Shared Resource Domain Manager: The SRD manager is responsible for communicating with other gWLM Agents when the shared resource domain includes multiple partitions running separate operating systems. One agent will be automatically elected the master and will be responsible for resource arbitration for the entire SRD. Once the master is elected, all the other nodes pass their resource requirements to the master for arbitration.
Policy Arbiter: The policy arbiter is responsible for taking inputs from all the workloads in the shared resource domain and deciding how resources will be allocated. In an SRD with multiple OS images, this is performed by the master agent.
Workload Controllers: The workload controllers are responsible for collecting information specific to each of the local workloads and deciding if the workload needs more or less resources to satisfy the policy associated with the workload. This information is then passed to the arbiter, which makes resource allocation adjustments as necessary.

Figure 19-3. Architecture of the gWLM Agent

In an SRD with multiple partitions running separate OS images, one of the agents is elected the master and this agent is the only one that does arbitration. To prevent the master from becoming a single point of failure for the entire SRD, the agents are designed to reconnect to one another if they lose communication with the master. If the master node experiences a failure or the agent is not reachable by the other nodes, the other agents in the SRD will renegotiate and elect a new master. This occurs only if the master is the only agent that fails. If multiple partitions fail at the same time, gWLM assumes there has been catastrophic failure and stops attempting to reallocate resources until the SRD has at least n-1 agents running.

Global Workload Manager Polices

Global Workload Manager supports several different types of policies for controlling resource allocation to workload compartments. gWLM is shipped with a set of default policies that are commonly used. If the default policies are not appropriate for a given workload, new policies can be created or the default policies can be modified. The following four types of policies are available in gWLM:

Fixed policies guarantee that a workload compartment has a fixed amount of resource allocation. These policies are satisfied before any other type of policy is considered. Using fixed policies allows workloads to receive a constant share of the system's resources; however, fixed policies may result in lower system utilization because resources that are not in use are not allowed to be shared.
Utilization policies are based on specified target utilization values. When the CPU utilization goes above the target utilization value, such as 85%, gWLM will allocate more resources to bring the utilization below the 85% target. Similarly, when the workload's resource utilization drops below the lower utilization target value, such as 60%, gWLM will remove CPU resources from the workload compartment. Using these two example target values, gWLM will attempt to keep the resource utilization of the compartment between 60% and 85%.
Own-Borrow-Lend policies allow a specific amount of CPU resources to be owned by each workload compartment. In addition, these policies specify a minimum number of CPU resources. The difference between the owned CPU resources and the minimum is the amount that can be lent to other workloads when they are not being utilized by the owning workload. Finally, the maximum value specified defines the maximum amount of CPU resources that should be allocated to the compartment. The difference between the maximum and the owned CPU resources is the amount of CPU resources the workload is allowed to borrow. Own-borrow-lend policies are also referred to as OwnBorrow policies in the gWLM software.
Custom policies allow workload specific metrics to be configured. gWLM uses the defined metric to adjust resource allocation based on how the current metric compares to the target value.