The Ganglia Web Package | Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software

The Ganglia Web package allows you to view a current snapshot of the performance metrics stored in the RRDtool round-robin database discussed earlier. We'll divide the Ganglia Web package into two sections: title and node snapshot. Let's look at each in turn.

Note

Our example Ganglia web page comes from the cluster named IBM Cluster at the Berkeley Millennium project (http://monitor.millennium.berkeley.edu). You can call up this public web page from your web browser to see a live demonstration of the figures we'll use in this chapter.

The Title Section

The Ganglia Web package title section is shown in Figure 18-1. By default, the package uses the Full View to display the cluster (click the Physical View link to see the same data in a different format). The data displayed is a snapshot of the performance metrics collected by gmond. To refresh the data, click the Get Fresh Data button.

image from book
Figure 18-1: The Ganglia Web package title section

Note

The webfrontend pages refresh every 300 seconds (5 minutes) by default. This can be modified in ./config.php as with all Ganglia Web parameters.

The pull-down lists in the title section control what is displayed in the overview and snapshot sections. The Last menu lets you select up to a year's worth of data that will be graphed in the overview section. When you click the Metric pull-down list, you'll see the list of collected performance metrics. The default metric is load_one, which is the one-minute load average on the cluster nodes. (We'll discuss this shortly.)

Also notice that you can specify a sort order for the output of the metric you select, which is used in the display of the nodes in the node snapshot section.

The last line of text in the title section allows you to select a different cluster to monitor if you have configured your gmetad daemon to monitor the gmond daemons on more than one cluster. In this example, the group of clusters monitored can be seen when you click the Grid link. As you can see in the figure, the IBM Cluster is selected and will be used to display the information contained in the next two sections of the web page.

The Node Snapshot Section

The node snapshot section of the Ganglia Web page (Figure 18-2) contains one graph for each cluster node, representing the metric and time period you selected in the title section. The graphs are sorted in the order you select in the title section: descending, ascending, or by host name.

image from book
Figure 18-2: The Ganglia Web package node snapshot section

The node snapshot section allows you to examine the history of one performance metric on all of the cluster nodes. The graphs in Figure 18-2 show the load_one performance metric for the previous hour for each cluster node. Let's examine the load_one (one-minute load average) performance metric in more detail.

Load Average

Inside the kernel,^[6] the 1-, 5-, and 15-minute load average is recalculated automatically every five seconds based on the current system load. The current system load is the number of processes that are running on the system plus the number of processes that are waiting to run^[7] (in other words, processes in the "TASK_RUNNING" and "TASK_UNINTERRUPTIBLE" state). On a lightly loaded system (a system that is not processor bound), the current system load fluctuates between 1 and 0.

Note

You can also use the utilities uptime, w, and procinfo from the shell prompt to see the 1-, 5-, and 15-minute load averages.

The kernel uses three numbers to store the 1-, 5-, and 15-minute load averages in memory; it does not keep any historical information about the system load except these three numbers in memory. Thus, to make the three load averages represent an average of the system load for the three time intervals (1, 5, and 15 minutes), a sophisticated mathematical formula called an exponentially smoothed moving average function^[8] is used to apply a portion of the current system load toward increasing or decreasing the load averages, depending on whether the current system load is higher or lower than the load averages that were calculated 5 seconds ago. The longer the time interval represented by the load average value, the lower the impact the current system load has on the new load averages. Thus, a sudden increase or drop in the current load affects the 1-minute load average more than it affects the 5-and 15-minute load averages.

The load average is not the same as the CPU utilization, even though the CPU is usually 100 percent busy when the load average is 1. For example, if a cluster node is idle except for three concurrently running processes that each consume about 33 percent of the total CPU time, the CPU utilization is 100 percent and the load average is 3.^[9] Or, to take another example, if an idle system with a load average of 0 starts running one CPU-bound process, the CPU utilization is immediately 100 percent, but the 1-minute load average will only be approximately .5 after 30 seconds and won't reach 1 until about one minute.

Red Cluster Nodes

When you see a node turn red in the Ganglia Web overview section, it means the per-CPU load average on a cluster node is greater than 1. When this happens, one or more processes on a cluster node are processor bound; that is, they would run faster on a cluster node that was less heavily loaded or on a cluster node with a faster CPU.

In the Linux Enterprise Cluster, where users running reports compete with users doing data transactions, a red cluster node in the overview section may simply indicate that a report is running on one of the cluster nodes. If all of the cluster nodes are red, however, your users are waiting for CPU resources (even the users who do data transactions), and they would therefore benefit from additional or faster cluster nodes.

Examining the Cluster Node From the Ganglia Web Package

If you click a node in the Ganglia Web package overview section, your browser will display the Ganglia Web Host Report web page for the node. The Host Report web page contains all of the information that has been collected by gmond for one cluster node. Like the main Ganglia Web web page, the Host Report web page contains a title section that allows you to select the period of time you would like to use for the graphs presented on the page (you can select hour, day, week, month, or year), as shown in Figure 18-3.

image from book
Figure 18-3: The Host Report Title Section

The next section of this web page is the Host Report overview section. As shown in Figure 18-3, this section summarizes the string and constant metrics, and it displays three graphs of the CPU load and CPU and memory utilization on the cluster node. When the string metric gexec is set to ON, the gexec batch jobs can be run on the cluster node. This screen also lists the version of the Linux kernel running on the cluster node and displays several basic pieces of information about the cluster node.

The constant metrics values, such as the total amount of memory, the number of CPUs in the system, and processor speed, normally do not change without a system reboot.

The next section of the Host Report web page shown in Figure 18-4 (again, we are now looking at just one cluster node) contains one graph for each volatile system metric you collect using the gmond daemon. (I'll describe how to add your own metric to this list later, in the "Creating Custom Metrics with gmetric" section in this chapter.) When you select a time period in the Host Report title section that is greater than one hour—such as a day or week—these graphs can help you to spot trends in system hardware resource utilization during periods of peak system load so that you can determine whether additional hardware is required to meet the processing needs of your users.

image from book
Figure 18-4: The Host Report overview section

While we won't discuss in detail how to interpret all of the data you see on these graphs, we will describe how to create and interpret a custom metric in the "Creating Custom Metrics with gmetric" section in this chapter.

Examining the Cluster Node From the Shell Prompt

When a node turns red, meaning its load average per CPU is greater than 1, you can examine the collected performance metrics with the Ganglia Web page, or you can log on to the cluster node and run top to see a list of processes that are consuming the most CPU time. The top command ranks the running processes based on the amount of CPU time each is consuming and automatically refreshes the list every five seconds. To investigate a particular process, run the strace command against the PID of the process:

 #strace -p 3456

where 3456 is the PID of the process that is hogging the CPU. strace will display the calls made by this process on your screen. To learn more about these system calls, look at the man pages for each. Looking at these manual pages may help you to see if the process is stuck in a loop and not doing anything other than wasting CPU cycles.

Note

If the process is functioning normally, see if it can be moved to a dedicated cluster node or scheduled to run after hours, so that it will not affect other users contending for CPU time.

gstat

Installation of the gmond RPM also installed gstat on your cluster nodes. You can use gstat to list information about the cluster if you need to manually decide which node has the least load. To try it out enter:

 #gstat

This command outputs a list of cluster nodes based on gmond's knowledge of the cluster. The node with the least load is listed first. (Enter gstat -h for a list of the command line switches gstat supports.)

Note

You won't be using gstat for maintenance or monitoring. I mention it for the sake of completeness.

Running a Command on the Least-Loaded Cluster Node using gstat

You can use the secure shell (ssh) and the output of gstat to run a command on the least-loaded cluster node, as shown in the following example:

 #ssh `gstat -m | head -1 | awk -F: '{print $1}'` uptime

This command can be executed from any cluster node, and ssh will run the command you enter on the least-loaded node (the first node returned by the gstat command). In this example, the command uptime will run on the cluster node, and the output of the command will be displayed on your screen. (See Chapters 4 and 19 for a more detailed discussion of ssh.)

^[6]See sched.h in the kernel source tree.

^[7]Process in the run queue or a short-term sleep state.

^[8]See http://www.teamquest.com/html/gunther/ldavg1.shtml for complete details.

^[9]That is, the one-minute load average is 3 if the processes have been running for more than a minute.