Deploying a Network-Monitoring System


The open source community offers many network-monitoring tools. Some started off as simple ping scripts and evolved into stable and commercial-quality network-monitoring products. Some of the popular tools are as follows:

  • Big Brother

  • BigSister

  • NMIS (Network Management Information System)

  • OpenNMS

  • Nagios/Netsaint

  • Spong

Whereas the overall goal of each tool is to monitor a network, each tool is unique in terms of installation, configuration, and architecture. Many of these tools have a widely satisfied user base. In the following sections, you learn about deploying Nagios and Big Brother.

These tools are selected because of their communitywide popularity and support. Additionally, each of these tools offers plug-in features for customizing or enhancing monitoring capabilities.

Deploying a Linux-Based Big Brother Network-Monitoring System

Big Brother offers the following advantages:

  • Works on both Linux and Windows machines

  • Has been in development since 1997

  • Is relatively simple to install

  • Provides a web-based graphical user interface (GUI) for monitoring and reporting

  • Is robust and scalable and can monitor up to 1000 nodes

  • Uses ICMP to monitor network nodes

  • Provides built-in plug-ins to monitor services such as Hypertext Transfer Protocol (HTTP), Domain Name System (DNS), File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), Post Office Protocol 3 (POP3), disk space, and CPU utilization on servers

  • Boasts a wide user base and support community on the web

To deploy the Linux version of Big Brother, the first step is to install the Big Brother Server on the Linux computer. Configuration is next, followed by running the Big Brother Server in the network.

Installing Big Brother in Linux

A Big Brother Server on Linux requires the following features to work correctly:

  • A C compiler (for example, GCC)

  • A web server (for example, Apache)

Installation of these programs is beyond the scope of this discussion. For installation details, refer to the documentation for your Linux distribution.

Additionally, before installing Big Brother on the target Linux computer, you must create a new user and a group (for example, username bb and group name bb) using the adduser system command. This username and group name are used for running the Big Brother daemon, because for security reasons, Big Brother does not run as a root user. For the benefit of new Linux users, Example 5-1 shows the commands used in Debian-Linux to create a user bb and a group bb with the home directory /home/bb.

Example 5-1. Creating a User and Group in Debian-Linux
 linuxbox:~# adduser bb Adding user bb... Adding new group bb (1003). Adding new user bb (1003) with group bb. Creating home directory /home/bb. Copying files from /etc/skel Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for nn Enter the new value, or press return for the default         Full Name []: Big Brother User         Room Number []:         Work Phone []:         Home Phone []:         Other []: Is the information correct? [y/n] y linuxbox:~# 

After defining the new user and group, you can install Big Brother using the source code files. The source code is available as a zipped tar file at http://www.bb4.org. Download the latest source code and unpack the zipped tar file using the tar xzvf bigbrother-source-file-name command. At the time of this writing, the Big Brother source code (in zipped tar format) contained both the Big Brother Server and the Big Brother Client. The Big Brother Client monitors local resources, such as CPU and disk activity, on a remote Windows or Linux computer and is beyond the scope of this discussion. After uncompressing the original source file, you might need to further extract the server files using the tar xvf BBSVRxxx.tar command. The unpacked server source files contain the bbconfig script in the install directory. The purpose of the bbconfig script is to automate the installation process. Run the bbconfig script and follow the prompts to complete the installation of Big Brother Server.

A sample installation is provided in the command-line interface (CLI) session shown in Example 5-2. Refer to the highlighted comments for explanations of each step.

Example 5-2. Installing Big Brother in Debian-Linux
 # The Big Brother zipped tar file is in the home directory    linuxbox:/home/bb# ls bb-1.9e.tar.gz # uncompress the zipped tar file                              linuxbox:/home/bb# tar xzvf bb1 -1.9e.tar.gz BB.README.FIRST BBSVR-bb1.9e-btf.tar BBCLT-bbc1.9e-btf.tar # extract Big Brother Server files from the tar file archive  linuxbox:/home/bb# tar xvf BBSVR-bb1.9e-btf.tar # verify the contents of the directory                        linuxbox:/home/bb# ls -l total 1984 -rw-r--r--   1  200 daemon     305 Jan  2 2004 BB.README.FIRST -rw-r--r--   1  200 daemon  406528 Jan  2 2004 BBCLT-bbc1.9e-btf.tar -rw-r--r--   1  200 daemon 1147392 Jan  2 2004 BBSVR-bb1.9e-btf.tar -rw-r--r--   1 root staff   447216 Apr 30 2004 bb-1.9e.tar.gz drwxr-sr-x  10 root staff     4096 Apr 30 2004 bb1.9e-btf # Create a link /home/bb/bb to the new directory "bb1.9e-btf" # per instruction in the accompanied README.INSTALL file      linuxbox:/home/bb# ln -s /home/bb/bb1.9e-btf /home/bb/bb # Run the bbconfig scipt to begin installation routine        linuxbox:/home/bb/# /home/bb/bb/install/bbconfig 

Configuring Big Brother Using the Text Files

Big Brother is mainly configured through the bb-hosts file, a text file that is located in the $BBHOME/etc directory. ($BBHOME stands for the home directory of Big Brother; in this example, it is /home/bb/bb/etc/.) The bb-hosts file contains the list of hosts to be monitored. The format for adding a host entry in the bb-hosts file is as follows:

  IP-ADDR      HOSTNAME       # DIRECTIVES 

For example, to test the availability of the host Dallas-router by pinging its IP address 192.168.0.10, the entry in the bb-hosts file is as follows:

 192.168.0.10      Dallas-router   # testip 

Table 5-1 provides a partial list of directives and their details.

Table 5-1. Bb-host Service Directives
 

Directive

Explanation

1

testip

This is the most useful directive; it instructs Big Brother to ping the IP address to test the node. It also instructs Big Brother not to use the host name.

2

BBDISPLAY[*]

This host displays the HTML results; can be more than one for redundancy.

3

BBPAGER[*]

This host acts as the notification server to inform Netadmins and processes; can be more than one for redundancy.

4

BBNET[*]

This host monitors the network services; can be more than one for redundancy.

5

ftp

Tests FTP service.

6

Smtp

Tests SMTP service.

7

telnet

Tests Telnet service.

8

ssh

Tests SSH service.

9

noping

Specifies no ping test for this host and displays a clear dot.

10

noconn

Specifies no ping test for this host and not to generate a colored dot.

11

!

Tests to see whether this service is not running. For example, the !telnet directive tests to determine whether Telnet is not running; very useful for monitoring unauthorized activities on a host.


[*] A single server is typically configured to perform the three roles of BBDISPALY, BBPAGER, and BBNET.

To control the HTML tables in the Big Brother output pages, use the directives defined in Table 5-2.

Table 5-2. Bb-host HTML Table Directives
 

Directive

Explanation

1

group

Defines a block of hosts to be grouped in the same HTML table.

2

group-compress

Is identical to the group directive, except it only displays services (columns) that contain data for that group.

3

group-only

Creates a table with only the columns defined in the directive. The columns are delimited with the pipe symbol (|).


Based on the discussion of the various directives for the bb-host file, Example 5-3 shows a sample file. Note the first line, where multiple directives are defined for the Linux server with the host name linuxbox. The server is acting as BBPAGER, BBNET, and BBDISPLAY. This file instructs Big Brother to monitor the four nodes by pinging their IP addresses. The availability status is displayed in a single table on the HTML page.

Example 5-3. Sample bb-host File
 192.168.0.30 linuxbox # BBPAGER BBNET BBDISPLAY http://linuxbox/ group-compress <H3><I>Network Devices</I></H3> 192.168.0.10   Dallas-router # testip 192.168.0.20   Dallas-Firewall # testip 192.168.0.50   Dallas-Switch # testip 192.168.0.100  FileServer # testip 

To verify the configuration of the bb-hosts, use the bbchkhost.sh script, which is located in the $BBHOME/etc directory, as follows:

  bb@linuxbox:~/bb/etc$ ./bbchkhosts.sh  If any comments are displayed, please fix the entries in your configuration  Note that some error messages may be for tags of external scripts 

Running the Big Brother Server

After verifying the bb-host file for errors, change your user ID to user bb through the su username command. Remember, you cannot run bb as a root user. Next, start the Big Brother Server by using the bbrun.sh script located in the $BBHOME directory. The bbrun.sh script is also used to stop or restart the Big Brother Server. The bbrun.sh script is as follows:

  linuxbox:/home/bb/bb# su bb  bb@linuxbox:~/bb$ ./runbb.sh start  Starting Big Brother          Starting Big Brother Daemon (bbd)...          Starting Network tests (bb-network)...          Starting Display process (bb-display)...  Big Brother 1.9e started  bb@linuxbox:~/bb/etc$ 

After starting the Big Brother Server, you can view the network status by pointing your web browser to the URL http://bigbrother-server-ip-address/bb/bb.html, as shown in Figure 5-1.

Figure 5-1. Big Brother Output Web Page


Note

The Big Brother screen captures shown in this chapter are in grayscale and do not indicate the true color of the web page output.


The dots next to each host name indicate the status of that host. Green indicates normal operation, red indicates trouble, and yellow indicates critical but not yet down or unreachable. Additionally, the color of the background also indicates the overall health of the network. A green background indicates that all the monitored services are working. A red background indicates a network issue. If Big Brother is monitoring a large number of nodes and services, the main page (bb.html) becomes large and difficult to navigate. In that case, Netadmins can use the bb2.html page, which provides a summarized view. The summarized view only shows the hosts whose status is currently other than green. If all the hosts are up, the summary page shows the message All Monitored Systems OK. The summary page also provides a list of all events in the last 240 minutes. Figure 5-2 shows the summary page and the list of events in the last 240 minutes. This view allows Netadmins to quickly assess the network status and the recent history.

Figure 5-2. Big Brother Summary Page


To view a historic availability report for the network, use the reporting function through the following URL:

http://bigbrother-server-ip-address/bb/help/bb-rep.html

Choose the starting and ending dates and click the Generate Report button. The availability of each host is indicated in terms of percentage. A solid green dot next to the host name indicates 100 percent availability.

Tips for Advanced Users

The following sections provide some tips that are helpful in fine-tuning a Big Brother Server.

Change Notification Interval

By default, Big Brother runs the tests and sends notifications every 300 seconds. To change this, edit all the occurrences of the BBSLEEP timers in the $BBHOME/runbb.sh script to the desired value. Note that BBSLEEP=300 is found in four places in the runbb.sh script. Changing the BBSLEEP timers also changes the rate at which the web pages are generated.

Note

In some case, the default BBSLEEP value of 300 seconds (5 minutes) is slightly higher, and outages lasting for less than 300 seconds can go unnoticed. On the other hand, reducing this parameter to a low value (such as 5 seconds) causes high CPU utilization and traffic on the Big Brother Server, often resulting in incomplete tests and false results. The exact value depends on various factors and should be left to the discretion of the Netadmin.


Sending E-Mail Notifications

Big Brother has the built-in capability to send alerts to the Netadmin through e-mails. The e-mail address of the recipient is specified in the $BBHOME/etc/bbwarnrules.cfg file. The format is as follows:

   hosts;exhosts;services;exservices;day;time;recipients-email-address 

The bbwarnrules.cfg file also provides information regarding the details of the format. As highlighted in Example 5-4, spope@abcinvestment.com is the recipient of e-mail alerts. Additionally, the statement *;;*;;*;*;spope@abcinvestment.com causes Big Brother to send e-mails regarding all hosts and all services during all 7 days of the week and any time of the day. The keyword unmatched- ensures that at least one recipient is sent the alerts for hosts that do not match any of the previous rules.

Example 5-4. Sample bbwarnrules.cfg File
 # Rules are written in the following format: # hosts;exhosts;services;exservices;day;time;recipients # hosts: match on these hosts (* is a wildcard for all hosts) # exhosts: exclude these hosts # services: match on these services (* is wildcard for all hosts) # exservices: exclude these services # day: 0-6 (sunday-saturday) # time: 0000-2359 # recipients: email address, numeric pager, sms number # *;;*;;*;*;spope@abcinvestment.com           unmatched-*;;*;;*;*;spope@abcinvestment.com 

The bbwarnrules.cfg file is a good location for customizing your notifications. For example, the following configuration instructs Big Brother to send notifications for all hosts to jkeith@abcinvestment.com from 8:30 a.m. to 5:30 p.m. on weekdays:

   *;;*;;1-5;00830-1730;jkeith@abcinvestment.com 

Increasing Performance

By default, Big Brother only runs a single thread to run all the tests. If you have a large number of hosts and services, you can increase the number of concurrent threads for the tests executed by the Big Brother Server. By running concurrent tests, Big Brother reduces the time required to check all the hosts and services. The number of concurrent threads is increased by modifying the value of BBNETTHREADS in the $BBHOME/etc/bbdefserver.sh file. You can set the value to 5 for a reasonable boost in performance without an adverse effect on CPU utilization. Use the following code:

  #  BBNETTHREADS=5  export BBNETTHREADS 

However, if you set BBNETTHREADS higher than 5, the boost in Big Brother performance comes at the expense of CPU utilization. The underlying hardware (CPU and RAM) should be robust enough to support Big Brother for running multiple threads.

Monitoring Additional Services

Netadmins often need to monitor additional services on network devices or servers. This monitoring functionality can be incorporated into Big Brother to provide a centralized monitoring system. Any text-based TCP/UDP service can be checked by taking the following actions:

  1. Using the service name as the directive in the bb-host file. The name should match what appears in the /etc/services file on the Linux machine that is hosting the Big Brother Server.

  2. Adding the service name in the list of services in the BBNETSVCS variable of the $BBHOME/etc/bbdef-server.sh file.

A common case of network services is the network administrator running a Terminal Access Controller Access Control System Plus (TACACS+) or Remote Authentication Dial-In User Service (RADIUS) Server to support the AAA feature on Cisco routers and switches. Sample lines, shown in Example 5-5 and 5-6, depict the configuration needed to monitor the TACACS+ service running on the TACACS+ server.

Example 5-5. Sample Line in bb-host File
 192.168.0.55 AAASERVER # testip tacacs 

Example 5-6. Sample Line in bbdef-server.sh File
 # BBNETSVCS="smtp telnet ftp pop pop3 pop-3 ssh imap ssh1 ssh2 imap2 imap3 imap4 pop2 pop-2 nntp tacacs" export BBNETSVCS 

Improving Scalability

The original code for Big Brother (Linux version) faces performance issues because it does not scale well when monitoring more that 50 nodes. The BBGen patch, created by Henrik Storner, provides high-performance replacements and enhancements to several Big Brother components. Big Brother, in conjunction with BBGen, has been reported to successfully monitor 1000 nodes simultaneously. The installation of BBGen is straightforward, and the process is documented in the INSTALL file that is included with the source code. To install BBGen, download and unpack the tar file from http://www.deadcat.net.

Creating Hyperlinks for Node Information

Using hyperlinks to view information about specific nodes is an excellent feature that can help the Netadmin to centrally locate needed information at critical times. You can set up hyperlinks for a host in the bb.html or bb2.html pages to point to an information page. This is achieved by creating files in the $BBHOME/bb/www/notes directory. The filenames should match the system names that are specified in the $BBHOME/bb/etc/bb-hosts file.

For each monitored node, create a text or HTML file with information such as the serial number, location, warranty information, circuit ID, and vendor or service-provider contact information for the specific device. When the node goes down, the Netadmin can click the hyperlink to get the necessary information to solve the problem.

Deploying a Windows-Based Big Brother Network-Monitoring System

Netadmins searching for a Windows-based network monitoring system have limited choices. The following three are the good options:

  • Big Brother

  • BigSister

  • JFFNMS

Again, because of ease of installation and configuration compared to other tools, Big Brother is the preferred tool and is discussed in this section. The Windows version of Big Brother runs on Windows NT 4.0, 2000, and 2003. Although the overall process of deploying Big Brother in Windows is similar to that in Linux, a few differences exist. One of the most distinct difference is that unlike its Linux counterpart, you do not need to create a separate user in Windows to run the Big Brother Server. The following sections cover the installation and configuration of Big Brother in Windows. However, details regarding the usage of Big Brother are covered in the section "Deploying a Linux-Based Big Brother Network-Monitoring System," earlier in this chapter.

Installing Big Brother in Windows

The target Windows machine for hosting the Big Brother Server must have a preconfigured and functional Internet Information Services (IIS) web server. The IIS web server is part of the Windows NT/2000/2003/XP operating system. IIS is installed through the Add/ Remove Programs icon in the Windows Control Panel. Refer to the Windows documentation for more details on installing and configuring IIS.

After installing the IIS web server, download and save the Big Brother executable file for Windows from http://www.bb4.org. Double-click the downloaded .exe file to begin the installation process. During the installation process, Big Brother prompts you for OSspecific information; the default values should work in most cases. Follow the prompts to finish the installation process.

Configuring Big Brother

Big Brother on Windows is configured through the . cfg files located in the default directory \Program Files\Quest Software\Big Brother BTF\ xx \etc (where xx is the version number). Before proceeding with the configuration, Netadmins should understand the function of each of these cfg files. The seven cfg files are as follows:

  • bb-hosts.cfg

  • bbdef.cfg

  • bbskin-eng.cfg

  • bbskin-fra.cfg

  • bbwarnrules.cfg

  • bbwarnsetup.cfg

  • security.cfg

bb-hosts.cfg File

The bb-hosts.cfg file is used to list all the nodes that are to be monitored by Big Brother. The format is similar to the Linux version, as shown in Example 5-7. Note the similarity between the files in Example 5-3 and 5-7

Example 5-7. Sample bb-host.cfg file
 192.168.0.30 localhost # BBPAGER BBNET BBDISPLAY http://localhost/ group-compress <H3><I>Network Devices</I></H3> 192.168.0.10   Dallas-router # testip 192.168.0.20   Dallas-Firewall # testip 192.168.0.50   Dallas-Switch # testip 192.168.0.100  FileServer # testip 

bb-def.cfg File

The bb-def.cfg file controls the behavior of Big Brother. The parameters are similar to those in the Linux version. The default configuration should work for most situations. BBNETSLEEP controls the frequency at which Big Brother performs the monitoring tests. The interval between successive generations of the bb.html (or bb2.html) page is controlled by BBSLEEP. The default value is 300 seconds for both. Netadmins can increase the monitoring frequency by decreasing these parameters, as shown in the sample bb-def.cfg file in Example 5-8. However, excessively decreasing these values (for example, to 5 seconds) leads to higher traffic and CPU utilization. Moreover, Big Brother might not complete the testing of all hosts or services within the specified interval, thus causing false results.

Example 5-8. Sample bb-def.cfg File
 # -- output suppressed -- # INTERVAL TO WAIT IN SECONDS TO REGENERATE THE bb.html/bb2.html files BBSLEEP="30"                                                           # -- output suppressed -- # INTERVAL BETWEEN NETWORK TESTS (IN SECONDS) BBNETSLEEP="30"                                                        # -- output suppressed -- 

bbskin-eng.cfg File

The bbskin-eng.cfg file controls the display properties, such as the size or color of fonts, for the bb.html and bb2.html pages. This file is for the English version of the HTML pages. While the default content works well in most cases, you can edit the font properties or the text labels to seamlessly integrate Big Brother pages with your web portals.

bbskin-fra.cfg File

This file is same as the bbskin-eng.cfg file, except it controls the French version of the bb.html and bb2.html pages.

bbwarnrules.cfg File

The bbwarnrules.cfg file contains rules for sending alerts and notifications to suitable e-mail recipients. The format is as follows:

  hosts;exhosts;services;exservices;day;time;recipients 

In this code, hosts and services are the list of host names and services, respectively, as specified in the bb-hosts.cfg file. Additionally, exhosts is the list of hosts to be excluded, exservices is the list of services to be excluded, day specifies the day of the week (Sunday is expressed as 0, Monday as 1, Saturday as 6, and so on), and time is the range of time specified in HHMM (0000-2359) format. You can use the asterisk (*) as a wildcard to match all values. The first entry in Example 5-9 causes Big Brother to send e-mail alerts to spope@abcinvestment.com and pager alerts at the phone number 333-4444 for all hosts and all services, for all 7 days of the week, and finally during all 24 hours of the day. You can specify multiple recipients using separate line entries. The second entry in Example 5-9 sends e-mail alerts to ksmith@abcinvestment.com only on weekdays (Monday through Friday) between 8:00 a.m. and 5:00 p.m. The last entry in Example 5-9 is a catchall entry. The unmatched- keyword ensures that at least one recipient is sent the alerts for hosts that do not match any of the previous rules. The original bbwarnrules.cfg file contains templates for further customizing the Big Brother alerts and notifications.

Example 5-9. Sample bbwarnrules.cfg File
 *;;*;;*;*;spope@abcinvestment.com 333-4444 *;;*;;1-5;0800-1700*;ksmith@abcinvestment.com unmatched-*;;*;;*;*;spope@abcinvestment.com 

Tip

Many cell-phone service providers now allow e-mail messages to be sent directly to the phones. The e-mail address is of the format xxx-xxx-xxxx@providerdomain.com, where xxx-xxx-xxxx is the 10-digit cell-phone number. Netadmins can use this feature to their advantage and avoid carrying pagers. On the other hand, if an outage interrupts the Internet or mail server connection, the e-mail alerts are not received. You should implement fallback options, such as connecting a modem with a public switched telephone network (PSTN) line to the network-monitoring system to send pager alerts. Or you can use a separate digital subscriber line (DSL) as an alternate path to send e-mail or cell-phone alerts.


bb warnsetup.cfg File

The bbwarnsetup.cfg file is used to modify the overall settings for the notification feature of Big Brother. The default content of the file should work fine for most environments and should not be modified by beginner-level users. Advanced users can refer to the embedded comments within the original bbwarnsetup.cfg file that comes with the installation.

security.cfg File

You should not edit this file, because it is irrelevant to the regular operation of the Big Brother Server. This file is only relevant in scenarios that feature Big Brother monitoring clients talking to the Big Brother Server. By default, the Big Brother Server listens for messages from any Big Brother Client. You can restrict the Big Brother Server to listen for messages only from Big Brother Clients with IP addresses listed in the security.cfg file. Note that this file has no impact on accessing Big Brother web pages (bb.html and bb2.html).

Running the Big Brother Server

After configuring the .cfg files, you can start the Big Brother Server to monitor the network. Big Brother runs as a Windows service and can be controlled through the Windows Services Microsoft Management Console (MMC) snap-in.

Follow these steps to start the Big Brother service:

Step 1.

Choose Start > Run, enter Services. msc, and click the OK button to launch the Services MMC snap-in.

Step 2.

In the Services window, right-click Big Brother SNM Server to access the Properties menu. On the Properties menu, choose Start to start the Big Brother service.

You can use the Services MMC snap-in to stop or restart the Big Brother service. You can find the messages generated by the Big Brother service in the Application section of the Windows Event Viewer. To access the Windows Event Viewer, choose Start > Run, enter eventvwr.exe, and click the OK button.

Step 3.

To view the current status of the network, access the Big Brother web page at the following URL:

http://ipaddress_of_bigbrotherserver/bb/bb.html

Alternately, to view the summary results of the network status, visit the following URL:

http://ipaddress_of_bigbrotherserver/bb/bb2.html

Big Brother displays the results of network monitoring through the bb.html and bb2.html web pages. These pages are identical to those generated by the Linux version of Big Brother. For more information on using Big Brother web pages in Linux, see the section "Running the Big Brother Server," earlier in this chapter.

Deploying Nagios for Linux-Based Network Monitoring

Nagios is another Linux-based system and network-monitoring application that is popular in the open source community. Nagios provides the following:

  • A web-based GUI for viewing current network status, notification and problem history, log files, and host configurations

  • The ability to monitor services (such as HTTP, SSH, FTP) that are running on a host

  • The ability to send e-mail or pager-alert notifications

  • The ability to display hierarchical network configuration for defining parent and child hosts

  • Active development and support from the open source community

  • A customizable network map

Of the listed features, the ability to create network hierarchy is useful in minimizing the number of alerts during network outages. For example, consider a WAN router connected to four remote routers. When the WAN router is down, the connectivity to the remote routers is also affected. Without network hierarchy, the monitoring system generates alerts for all five devices. However, with the network hierarchy defined, the monitoring system generates alerts only for the parent device (the WAN router, in this example).

Nagios also allows you to use MySQL or PostgreSQL databases to store data instead of using flat files. Despite its versatility and features, Nagios is comparatively intensive in terms of deployment time and effort. This discussion is aimed at helping Netadmins to quickly understand and deploy a functional Nagios system using text files to store data. Such a system should suffice when monitoring a medium-size network with 50 to 500 nodes. The following sections provide details regarding Nagios installation, configuration, and usage.

Nagios Installation

Before installing Nagios, the TCP/IP stack on the target Linux machine should be preconfigured and the Linux machine should be connected to the network. The Nagios source code is available from the Nagios home page at http://www.nagios.org. To compile Nagios using the source code, the target Linux machine requires the following features:

  • C compiler (for example, GCC)

  • Web server (for example, Apache)

Additionally, for advanced use of Nagios for database support, you must also install and configure the MySQL or PostgreSQL database on the Linux machine. Installation of these programs is beyond the scope of this book. Refer to the documentation for your Linux distribution.

However, Debian-Linux users can benefit from the simplified routine for installing Nagios by using the apt-get install nagios-text command. The entire discussion about Nagios in this chapter is based on deploying Nagios on a Debian-Linux system. The choice of Nagios on Debian is credited to the extreme convenience and stability provided by Debian-Linux.

Note

Your Debian system might give you the following error:

      E: Couldn't find package nagios-txt 

In this case, you should edit the /etc/apt/sources.list file to add the following statement:

      deb http://ftp.debian.org/debian unstable main non-free contrib 


The apt-get install nagios-text command automatically installs Nagios and takes care of other system dependencies. By default, the Debian installation process defines nagiosadmin as the default user for administering Nagios. You must configure the password for the nagiosadmin user when prompted during the installation process.

You can verify the installation by pointing your web browser to the Nagios machine using the URL http://Nagios-server-IP-address/nagios/. The default home page consists of the navigation pane at the left, as shown in Figure 5-3.

Figure 5-3. Nagios Home Page


The navigation pane provides links to documentation, monitoring, and reporting pages. The default navigation pane is organized into four subsections for easy usage. The username for accessing the Nagios web GUI is nagiosadmin; the password is the one that you set during the Nagios installation process. The default installation also includes detailed product documentation. Beginners are strongly encouraged to read this online manual.

Nagios Configuration

Nagios is configured through the .cfg files located in the default directory /etc/nagios. The main configuration file (/etc/nagios/nagios.cfg) controls the Nagios daemon and contains the location of other .cfg files. Nagios monitors the services (listed in the service.cfg file) running on the hosts listed in the hosts.cfg file. For ease of administration and notification, hosts can be grouped in the hostgroups.cfg file. This file defines the contact groups for each host group. The contactgroups.cfg file groups various contacts. The e-mail or pager information for each contact (individual recipient) is listed in the contacts.cfg file.

To deploy a basic Nagios monitoring system, you must configure each of the following five files:

  • /etc/nagios/hosts.cfg

  • /etc/nagios/hostgroups.cfg

  • /etc/nagios/services.cfg

  • /etc/nagios/contactgroups.cfg

  • /etc/nagios/contacts.cfg

These configuration files are created automatically during the installation process. Each file contains sample templates for ease of configuration. Lines beginning with a hash (#) are treated as comments. Similarly, any text following a semicolon (;) is ignored. A detailed explanation of each of these files, as well as all the parameters included within these files, is provided in the online product documentation.

For a better understanding of each of the .cfg files, consider the network scenario shown in Figure 5-4. The configuration of each of the subsequent .cfg files discussed in this chapter is based on this network scenario.

Figure 5-4. Sample Network Scenario


Editing the /etc/nagios/hosts.cfg File

The /etc/nagios/hosts.cfg file contains a list of every host to be monitored by Nagios. Example 5-10 provides the default contents of the /etc/nagios/hosts.cfg file.

Example 5-10. Default Contents of the /etc/nagios/hosts.cfg File
 define host{ name generic-host         notifications_enabled           1         event_handler_enabled           0         flap_detection_enabled          0         process_perf_data               1         retain_status_information       1         retain_nonstatus_information    1         register                        0         } define host{         use                     generic-host; Name of host template to use         host_name               gw         alias                   Default Gateway         address                 192.168.0.1         check_command           check-host-alive         max_check_attempts      20         notification_interval   60         notification_period     24x7         notification_options    d,u,r         } 

Each host definition is contained within braces ({}). The first entry defines a generic host template. The second entry defines the default gateway of the Nagios server itself. Note that the host name gw is internally assigned by the Nagios installation process and should not be changed. The notification_options parameter specifies when to send notifications for the host. Possible values are d for down, u for unreachable, r for recovery to up state, and n for disabling notifications. The u and d options, along with the parents parameter, are helpful in defining network hierarchy and controlling the alert notifications (as shown in the next example). A host is considered unreachable if its parent host is down or unreachable. For example, if Nagios is monitoring a remote router and a web server located behind the router, the router is defined as the parent host for the web server. When the router goes down, the web server is considered unreachable. For a detailed explanation of each parameter, refer to the online product documentation.

Note

The configuration for gw is automatically created only during the installation and is static. If you move the Nagios server to a different subnet, you must manually change the IP address for gw to match the new network.


To add new hosts, you can use the configuration of the host gw as a template. Simply edit and append the configuration for each host to the existing /etc/nagios/hosts.cfg file. Example 5-11 shows the configuration snippet for the hosts depicted in the network scenario of Figure 5-4. Note that host dallas-router is connected to a remote host newyork-router through a WAN link. So the host dallas-router is defined as the parent for newyork-router, as highlighted in Example 5-11. This helps when creating network hierarchy. Additionally, only the down (d) and recovery (r) notification options are enabled for host newyork-router. Consequently, when newyork-router or the WAN link is not functional, Nagios generates alerts for newyork-router. On the other hand, when the parent device dallas-router is down, newyork-router is unreachable. Nevertheless, Nagios only generates alerts for dallas-router and not for newyork-router. Similar logic applies to host dallas-pix, dallas-vpn and their parent host gw (LAN-router) in Figure 5-4.

Note

While network hierarchy is a useful feature, it requires the appropriate personnel to have a thorough understanding of the network topology. If you need to generate unreachable alerts for a device irrespective of the status of its parent device, simply add u to the notification_options parameter in the host definition.


Example 5-11. Contents of the hosts.cfg File
 # -- default entries for generic-host and gw should be included here -- # # 'dallas-router' host definition define host{        use                   generic-host            ; Name of host template to use         host_name               dallas-router         alias                   Router-Dallas Cisco 1600         address                 192.168.0.10         check_command           check-host-alive         max_check_attempts      20         notification_interval   60         notification_period     24x7         notification_options    d,u,r         } # 'newyork-router' host definition define host{         use                     generic-host         host_name               newyork-router         alias                   Cisco 2600 router         address                 192.168.254.2         parents              dallas-router                check_command           check-host-alive         max_check_attempts      5         notification_interval   60         notification_period     24x7         notification_options    d,r     } # 'dallas-pix' host definition define host{         use                     generic-host         host_name               dallas-pix         alias                   Firewall pix535         address                 192.168.1.2         parents              gw                          check_command           check-host-alive         max_check_attempts      5         notification_interval   60         notification_period     24x7         notification_options    d,r     } # 'dallas-vpn' host definition define host{         use                     generic-host         host_name               dallas-vpn         alias                   VPN3030 Concentrator         address                 192.168.1.3         parents             gw                               check_command           check-host-alive         max_check_attempts      5         notification_interval   60         notification_period     24x7         notification_options    d,r     } # 'web-server' host definition define host{         use                     generic-host         host_name               web-server         alias Intranet          web server         address                 192.168.0.100         check_command           check-host-alive         max_check_attempts      5         notification_interval   60         notification_period     24x7         notification_options    d,u,r     } # 

Editing the /etc/nagios/services.cfg File

After defining each host monitored by Nagios, you must specify the services to be monitored on each of these hosts. The services are defined in the /etc/nagios/services.cfg file. Example 5-12 provides the contents of a sample /etc/nagios/services.cfg file. The first two definitions are default entries created by the Nagios installation process. The first definition creates a generic service and is a placeholder for a system-specific configuration that can be applied to all the services. The second entry is added by the Nagios installation process to monitor the availability of the default gateway through the ICMP ping command. Do not delete any of these entries. However, when monitoring other hosts through ICMP ping, you can simply append host names within the default entry for the host gw. The exact configuration is provided in Example 5-12. The third definition monitors web services running over the host web-server. The notification_options parameter determines when to send notifications for the service; possible values are w for warning, u for unknown, c for critical, r for recovery, and n for disabling notifications. The service_description parameter refers to the predefined name of the service within Nagios. The contact_groups parameter defines the group of administrators responsible for the maintenance of the monitored host or service. The /etc/nagios/services.cfg file also contains built-in templates for monitoring other common services that are predefined by Nagios, such as SMTP, SSH, FTP, and POP3.

Example 5-12. Contents of the /etc/nagios/services.cfg File

 define service{      ; The 'name' of this service template, referenced in other service definitions    name            generic-service    active_checks_enabled     1  ; Active service checks are enabled    passive_checks_enabled    0  ; Passive service checks are enabled/disabled    parallelize_check         1  ; Active service checks should be parallelized                        ; (disabling this can lead to major performance problems)     obsess_over_service       1 ; We should obsess over this service (if necessary)    check_freshness            0 ; Default is to NOT check service 'freshness'    notifications_enabled      0 ; Service notifications are disabled    event_handler_enabled      0 ; Service event handler is disabled    flap_detection_enabled     0 ; Flap detection is disabled    process_perf_data          1 ; Process performance data    retain_status_information  1 ; Retain status information across program restarts    retain_nonstatus_information 1 ; Retain non-status information across program restarts     register          0        ;DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!   } # # the PING service for 'gw' is created by the Nagios installation process # additional hosts can be added in the same line as "gw" define service{         use                            generic-service        ; Name of service template  to use         host_name                       gw, dallas-router, newyork-router, dallas-pix, dallas- vpn, web-server                                                                                       service_description              PING        is_volatile                      0        check_period                     24x7        max_check_attempts               3        normal_check_interval            1        retry_check_interval             1        contact_groups                   router-admins, web-admins        notification_interval            240        notification_period              24x7                                                          notification_options             c,r        check_command                    check_ping!100.0,20%!500.0,60%        } # # 'web-server' service for monitoring http # define service{         use                             generic-service         host_name                       web-server         service_description             HTTP         is_volatile                     0         check_period                    24x7         max_check_attempts              3         normal_check_interval           5         retry_check_interval            1         contact_groups                  web-admins         notification_interval           120         notification_period             24x7                                                          notification_options            w,u,c,r         check_command                   check_http         } 

Tip

Here is a timesaving tip for configuring the service.cfg file: Instead of individually adding each host to a service definition, you can specify the hostgroup definition. So instead of using the following:

      host_name       HOST1,HOST2,HOST3,...,HOSTN 

you can use the following:

      hostgroup_name       HOSTGROUP1,HOSTGROUP2,...,HOSTGROUPN 

Alternatively, you can include all the hosts by using the * character, as shown here:

      host_name       * 


Editing the /etc/nagios/hostgroups.cfg File

The /etc/nagios/hostgroups.cfg file groups similar hosts on the basis of features or functionality. The idea behind the hostgroups.cfg file is to identify a similar set of hosts and tie them to a group of administrators who should receive alerts related to those hosts. The host group definitions are also used in the status map pages that are created by Nagios. Each host group definition contains a list of hosts separated by a comma (,). A host can belong to multiple groups, but it must belong to at least one group. Each host group definition also contains the contact_groups parameter, which specifies the group of Netadmins responsible for maintaining the hosts. When the status of a host changes, Nagios sends notifications to the contact groups of each host group to which the host belongs. Example 5-13 shows two host group definitions, each with a different group of contacts.

Example 5-13. Contents of the /etc/nagios/hostgroups.cfg File
 define hostgroup{         hostgroup_name  gateways         alias           Network devices         contact_groups  router-admins         members         gw, dallas-router, newyork-router, dallas-pix, dallas-vpn         } # define hostgroup{         hostgroup_name  webserver         alias           Web servers         contact_groups  web-admins         members         web-server } 

Editing the /etc/nagios/contactgroups.cfg File

The /etc/nagios/contactgroups.cfg file defines the contact groups that are used within the host group and service definitions. The contact groups definition also contains the individual members for each of the contact groups. Each group definition must contain the list of members assigned to the group. Multiple member names must be separated by a comma (,). Example 5-14 shows the contents of the /etc/nagios/contactgroups.cfg file for two contact groups , router-admins and web-admins.

Example 5-14. Contents of the contactgroups.cfg File
 # 'router-admins' contact group definition define contactgroup{         contactgroup_name       router-admins         alias                   Router and Network admins         members                 spope         } # 'web-admins' contact group definition define contactgroup{         contactgroup_name       web-admins         alias                   Web admins         members                 jkeith, spope         } 

Editing the /etc/nagios/contacts.cfg File

After defining the contact group, the next step is to define the information for each contact (or member). The contact information, such as e-mail address or pager number, is specified through the /etc/nagios/contacts.cfg file. The value for the contact_name parameter in the /etc/nagios/contacts.cfg file must match that specified for the value of the members parameter in the /etc/nagios/contactgroups.cfg file. Example 5-15 provides the configu-ration of the /etc/nagios/contacts.cfg file for two contacts, spope and jkeith.

Example 5-15. Contents of the /etc/nagios/contacts.cfg File
 # 'spope' contact definition define contact{         contact_name                    spope                                             alias                           Network Admin         service_notification_period     24x7         host_notification_period        24x7         service_notification_options    w,u,c,r         host_notification_option        d,u,r         service_notification_commands   notify-by-email,notify-by-epager         host_notification_commands      host-notify-by-email,host-notify-by-epager         email                           spope@abcinvestment.com                            pager                           111-222-3333@pagingcompany.net                     } # 'jkieth' contact definition define contact{         contact_name                    jkeith                                             alias                           web Admin         service_notification_period     24x7         host_notification_period        24x7         service_notification_options    w,u,c,r         host_notification_options       d,u,r         service_notification_commands   notify-by-email,notify-by-epager         host_notification_commands      host-notify-by-email,host-notify-by-epager         email                           jkeith@abcinvestment.com                           pager                           111-222-4444@pagingcompany.net                     } 

For details regarding other parameters included in the /etc/nagios/contact.cfg file, refer to the Nagios product documentation.

Running Nagios

After editing the .cfg files, Nagios is ready to be started. But before running Nagios, you should verify the configuration using the nagios -v main-config-file command. The -v option forces Nagios to parse each of the .cfg files and look for errors without loading the files for production use. Example 5-16 shows the output of the nagios -v command.

Example 5-16. Verifying the Nagios Configuration
 root@linuxbox:# nagios -v /etc/nagios/nagios.cfg Nagios 1.3 Copyright 1999-2004 Ethan Galstad (nagios@nagios.org) Last Modified: 10-24-2004 License: GPL Reading configuration data... Running pre-flight check on configuration data... Checking services...         Checked 7 services. Checking hosts...         Checked 6 hosts. Checking host groups...         Checked 2 host groups. Checking contacts...         Checked 2 contacts. Checking contact groups...         Checked 2 contact groups. Checking service escalations...         Checked 1 service escalations. Checking host group escalations...         Checked 0 host group escalations. Checking service dependencies...         Checked 0 service dependencies. Checking host escalations...         Checked 0 host escalations. Checking host dependencies...         Checked 0 host dependencies. Checking commands...         Checked 90 commands. Checking time periods...         Checked 4 time periods. Checking for circular paths between hosts... Checking for circular service execution dependencies... Checking global event handlers... Checking obsessive compulsive service processor command... Checking misc settings... Total Warnings: 0 Total Errors:   0 Things look okay - No serious problems were detected during the pre-flight check root@linuxbox:/etc/nagios# 

Issues encountered during the parsing process are reported by Nagios and must be rectified. After verifying the configuration, you can start the Nagios daemon using the nagios main-config-file command. Debian users can also use the init script to start, stop, or restart Nagios, as shown here:

  root@linuxbox:~# /etc/init.d/nagios restart  Stopping nagios: nagios.  Starting nagios: nagios. 

After the Nagios daemon is started, you can access the Nagios page for network monitoring, using the following URL:

http://ip-address-of-Nagios-machine/nagios/

The Nagios web GUI provides an HTML version of the product documentation. To view the Nagios product documentation, click the Documentation link in the navigation pane. Figure 5-5 shows the table of contents of the Nagios documentation web page.

Figure 5-5. Nagios Documentation Page


To view the current network status, click the Status Map link in the navigation pane. By default, the Status Map layout is circular, as shown in Figure 5-6. You can customize the layout by choosing the Layout Method drop-down menu in the upper-right corner of the Status Map page.

Figure 5-6. Nagios Status Map Page


You can click each of the nodes displayed on the Status Map page for more details on each of the hosts and associated services. Alternatively, you can click the Service Detail link in the navigation pane to view the service details for all the listed hosts, as shown in Figure 5-7.

Figure 5-7. Nagios Service Detail Page


The navigation pane in the Nagios web GUI also provides links for viewing host details, status summary, and status overview. When monitoring larger network with Nagios, these status pages can get crowded. For such cases, the Nagios navigation pane provides two links Service Problems and Host Problems. The Service Problems page only displays a list of hosts that are experiencing service issues. Figure 5-8 illustrates the Service Problems page, showing two service issues. The status of the PING service (on host dallas-vpn) and the HTTP service (on host web-server) is critical.

Figure 5-8. Nagios Service Problems Page


Similarly, the Host Problems page, shown in Figure 5-9, provides a list of all hosts that are currently down.

Figure 5-9. Nagios Hosts Problems Page


The Downtime page is useful for scheduling downtime for hosts and services that you are monitoring. When a host or service is in the scheduled downtime period, the notification feature for that host or service is disabled. Figure 5-10 shows the Downtime page. You can schedule a downtime for a host or service by clicking the Schedule host downtime and Schedule service downtime links, respectively. You can cancel a scheduled downtime by clicking the Recycling Bin icon at the right side of the particular entry.

Figure 5-10. Nagios Downtime Page


The Comment links in the navigation pane enables you to add arbitrary text as a comment for a host or service. Note that the Comments page automatically includes the comments that were added for a host or service in the Downtime page. Figure 5-11 shows the Comments page. As expected, the downtime comments are also included. You can delete a comment by clicking the Recycling Bin icon at the right of the particular comment.

Figure 5-11. Nagios Comments Page


The Process Info page provides the GUI for controlling the Nagios process, thus eliminating the need to learn the CLI options and commands. Some of the options provided by the Process Info page are as follows:

  • Start, stop, or restart the Nagios process

  • Disable or enable notifications

  • Start or stop service checks

Figure 5-12 shows the Process Info page. The Process Information section provides a summary of the Nagios processes, including the process identification (PID), start time, and total running time.

Figure 5-12. Nagios Process Info Page


The Reporting section of the Nagios navigation pane provides options for creating customizable reports. The Trends page creates a trending report for a host or a monitored service. This report consists of a graph that shows the state of the monitored host or service over an arbitrary period of time. Figure 5-13 illustrates the trending report graph for the state of host dallas-pix over a period of 1 day.

Figure 5-13. Nagios Trends Page


Instead of a graphic illustration, you can also view the same report in numeric format by choosing the Availability page. Figure 5-14 shows the availability report for the state of the host dallas-pix. This is the same report as the previous trending report graph, but in numeric format.

Figure 5-14. Nagios Availability Page


The Event Log link in the Reporting section is another useful feature for Netadmins. This link shows the syslog messages generated by Nagios. These messages are useful for monitoring or troubleshooting the Nagios daemon. Figure 5-15 shows a sample event log screen.

Figure 5-15. Nagios Event Log Page


In the Configuration section of the navigation pane, the View Config link is handy for referring to the Nagios configuration. Instead of reading through the contents of each .cfg file, Netadmins can use this link to view the configuration in a tabular format. Figure 5-16 depicts the configuration details of the /etc/nagios/hosts.cfg file.

Figure 5-16. Nagios View Config Page: Hosts


Notes for Advanced Nagios Use

Nagios is highly customizable and provides a variety of features for advanced users. You should read the online product documentation that is included with the installation files. To deploy Nagios in larger (enterprise-grade) networks, the online documentation includes tips such as the following:

  • Improving Nagios security

  • Enhancing Nagios performance

  • Integrating Nagios with other programs

  • Customizing Nagios web pages

  • Deploying Nagios with database support

Additionally, two useful tips for Cisco Netadmins are as follows:

  • Changing default icons The default set of icons used by Nagios, to create web pages, lack network-specific symbols, such as router and firewall. You can download network-specific icons for Nagios from http://www.nagiosexchange.org. To use the new icons, follow the instructions provided for the icon_image and statusmap_image parameters in the "Extended information configuration" section of the product documentation. Figure 5-17 provides a screen shot of the Nagios status map using network-specific icons that were downloaded from http://www.nagiosexchange.com.

    Figure 5-17. Nagios Status Map with New Icons


  • Inserting additional information While troubleshooting network issues, additional logistic information, such as circuit ID, serial number, and support contract details, is often required. You can insert such details for each host on a separate web page. To include the link to these web pages, use the notes_url parameter. Details regarding the use of the notes_url parameter are also provided in the product documentation in the "Extended information configuration" section.



Network Administrators Survival Guide
Network Administrators Survival Guide
ISBN: 1587052113
EAN: 2147483647
Year: 2006
Pages: 106

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net