The open source community offers many network-monitoring tools. Some started off as simple ping scripts and evolved into stable and commercial-quality network-monitoring products. Some of the popular tools are as follows:
Whereas the overall goal of each tool is to monitor a network, each tool is unique in terms of installation, configuration, and architecture. Many of these tools have a widely satisfied user base. In the following sections, you learn about deploying Nagios and Big Brother. These tools are selected because of their communitywide popularity and support. Additionally, each of these tools offers plug-in features for customizing or enhancing monitoring capabilities. Deploying a Linux-Based Big Brother Network-Monitoring SystemBig Brother offers the following advantages:
To deploy the Linux version of Big Brother, the first step is to install the Big Brother Server on the Linux computer. Configuration is next, followed by running the Big Brother Server in the network. Installing Big Brother in LinuxA Big Brother Server on Linux requires the following features to work correctly:
Installation of these programs is beyond the scope of this discussion. For installation details, refer to the documentation for your Linux distribution. Additionally, before installing Big Brother on the target Linux computer, you must create a new user and a group (for example, username bb and group name bb) using the adduser system command. This username and group name are used for running the Big Brother daemon, because for security reasons, Big Brother does not run as a root user. For the benefit of new Linux users, Example 5-1 shows the commands used in Debian-Linux to create a user bb and a group bb with the home directory /home/bb. Example 5-1. Creating a User and Group in Debian-Linuxlinuxbox:~# adduser bb Adding user bb... Adding new group bb (1003). Adding new user bb (1003) with group bb. Creating home directory /home/bb. Copying files from /etc/skel Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for nn Enter the new value, or press return for the default Full Name []: Big Brother User Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [y/n] y linuxbox:~# After defining the new user and group, you can install Big Brother using the source code files. The source code is available as a zipped tar file at http://www.bb4.org. Download the latest source code and unpack the zipped tar file using the tar xzvf bigbrother-source-file-name command. At the time of this writing, the Big Brother source code (in zipped tar format) contained both the Big Brother Server and the Big Brother Client. The Big Brother Client monitors local resources, such as CPU and disk activity, on a remote Windows or Linux computer and is beyond the scope of this discussion. After uncompressing the original source file, you might need to further extract the server files using the tar xvf BBSVRxxx.tar command. The unpacked server source files contain the bbconfig script in the install directory. The purpose of the bbconfig script is to automate the installation process. Run the bbconfig script and follow the prompts to complete the installation of Big Brother Server. A sample installation is provided in the command-line interface (CLI) session shown in Example 5-2. Refer to the highlighted comments for explanations of each step. Example 5-2. Installing Big Brother in Debian-Linux# The Big Brother zipped tar file is in the home directory linuxbox:/home/bb# ls bb-1.9e.tar.gz # uncompress the zipped tar file linuxbox:/home/bb# tar xzvf bb1 -1.9e.tar.gz BB.README.FIRST BBSVR-bb1.9e-btf.tar BBCLT-bbc1.9e-btf.tar # extract Big Brother Server files from the tar file archive linuxbox:/home/bb# tar xvf BBSVR-bb1.9e-btf.tar # verify the contents of the directory linuxbox:/home/bb# ls -l total 1984 -rw-r--r-- 1 200 daemon 305 Jan 2 2004 BB.README.FIRST -rw-r--r-- 1 200 daemon 406528 Jan 2 2004 BBCLT-bbc1.9e-btf.tar -rw-r--r-- 1 200 daemon 1147392 Jan 2 2004 BBSVR-bb1.9e-btf.tar -rw-r--r-- 1 root staff 447216 Apr 30 2004 bb-1.9e.tar.gz drwxr-sr-x 10 root staff 4096 Apr 30 2004 bb1.9e-btf # Create a link /home/bb/bb to the new directory "bb1.9e-btf" # per instruction in the accompanied README.INSTALL file linuxbox:/home/bb# ln -s /home/bb/bb1.9e-btf /home/bb/bb # Run the bbconfig scipt to begin installation routine linuxbox:/home/bb/# /home/bb/bb/install/bbconfig Configuring Big Brother Using the Text FilesBig Brother is mainly configured through the bb-hosts file, a text file that is located in the $BBHOME/etc directory. ($BBHOME stands for the home directory of Big Brother; in this example, it is /home/bb/bb/etc/.) The bb-hosts file contains the list of hosts to be monitored. The format for adding a host entry in the bb-hosts file is as follows: IP-ADDR HOSTNAME # DIRECTIVES For example, to test the availability of the host Dallas-router by pinging its IP address 192.168.0.10, the entry in the bb-hosts file is as follows: 192.168.0.10 Dallas-router # testip Table 5-1 provides a partial list of directives and their details.
To control the HTML tables in the Big Brother output pages, use the directives defined in Table 5-2.
Based on the discussion of the various directives for the bb-host file, Example 5-3 shows a sample file. Note the first line, where multiple directives are defined for the Linux server with the host name linuxbox. The server is acting as BBPAGER, BBNET, and BBDISPLAY. This file instructs Big Brother to monitor the four nodes by pinging their IP addresses. The availability status is displayed in a single table on the HTML page. Example 5-3. Sample bb-host File192.168.0.30 linuxbox # BBPAGER BBNET BBDISPLAY http://linuxbox/ group-compress <H3><I>Network Devices</I></H3> 192.168.0.10 Dallas-router # testip 192.168.0.20 Dallas-Firewall # testip 192.168.0.50 Dallas-Switch # testip 192.168.0.100 FileServer # testip To verify the configuration of the bb-hosts, use the bbchkhost.sh script, which is located in the $BBHOME/etc directory, as follows: bb@linuxbox:~/bb/etc$ ./bbchkhosts.sh If any comments are displayed, please fix the entries in your configuration Note that some error messages may be for tags of external scripts Running the Big Brother ServerAfter verifying the bb-host file for errors, change your user ID to user bb through the su username command. Remember, you cannot run bb as a root user. Next, start the Big Brother Server by using the bbrun.sh script located in the $BBHOME directory. The bbrun.sh script is also used to stop or restart the Big Brother Server. The bbrun.sh script is as follows: linuxbox:/home/bb/bb# su bb bb@linuxbox:~/bb$ ./runbb.sh start Starting Big Brother Starting Big Brother Daemon (bbd)... Starting Network tests (bb-network)... Starting Display process (bb-display)... Big Brother 1.9e started bb@linuxbox:~/bb/etc$ After starting the Big Brother Server, you can view the network status by pointing your web browser to the URL http://bigbrother-server-ip-address/bb/bb.html, as shown in Figure 5-1. Figure 5-1. Big Brother Output Web PageNote The Big Brother screen captures shown in this chapter are in grayscale and do not indicate the true color of the web page output. The dots next to each host name indicate the status of that host. Green indicates normal operation, red indicates trouble, and yellow indicates critical but not yet down or unreachable. Additionally, the color of the background also indicates the overall health of the network. A green background indicates that all the monitored services are working. A red background indicates a network issue. If Big Brother is monitoring a large number of nodes and services, the main page (bb.html) becomes large and difficult to navigate. In that case, Netadmins can use the bb2.html page, which provides a summarized view. The summarized view only shows the hosts whose status is currently other than green. If all the hosts are up, the summary page shows the message All Monitored Systems OK. The summary page also provides a list of all events in the last 240 minutes. Figure 5-2 shows the summary page and the list of events in the last 240 minutes. This view allows Netadmins to quickly assess the network status and the recent history. Figure 5-2. Big Brother Summary PageTo view a historic availability report for the network, use the reporting function through the following URL:
Choose the starting and ending dates and click the Generate Report button. The availability of each host is indicated in terms of percentage. A solid green dot next to the host name indicates 100 percent availability. Tips for Advanced UsersThe following sections provide some tips that are helpful in fine-tuning a Big Brother Server. Change Notification IntervalBy default, Big Brother runs the tests and sends notifications every 300 seconds. To change this, edit all the occurrences of the BBSLEEP timers in the $BBHOME/runbb.sh script to the desired value. Note that BBSLEEP=300 is found in four places in the runbb.sh script. Changing the BBSLEEP timers also changes the rate at which the web pages are generated. Note In some case, the default BBSLEEP value of 300 seconds (5 minutes) is slightly higher, and outages lasting for less than 300 seconds can go unnoticed. On the other hand, reducing this parameter to a low value (such as 5 seconds) causes high CPU utilization and traffic on the Big Brother Server, often resulting in incomplete tests and false results. The exact value depends on various factors and should be left to the discretion of the Netadmin. Sending E-Mail NotificationsBig Brother has the built-in capability to send alerts to the Netadmin through e-mails. The e-mail address of the recipient is specified in the $BBHOME/etc/bbwarnrules.cfg file. The format is as follows: hosts;exhosts;services;exservices;day;time;recipients-email-address The bbwarnrules.cfg file also provides information regarding the details of the format. As highlighted in Example 5-4, spope@abcinvestment.com is the recipient of e-mail alerts. Additionally, the statement *;;*;;*;*;spope@abcinvestment.com causes Big Brother to send e-mails regarding all hosts and all services during all 7 days of the week and any time of the day. The keyword unmatched- ensures that at least one recipient is sent the alerts for hosts that do not match any of the previous rules. Example 5-4. Sample bbwarnrules.cfg File# Rules are written in the following format: # hosts;exhosts;services;exservices;day;time;recipients # hosts: match on these hosts (* is a wildcard for all hosts) # exhosts: exclude these hosts # services: match on these services (* is wildcard for all hosts) # exservices: exclude these services # day: 0-6 (sunday-saturday) # time: 0000-2359 # recipients: email address, numeric pager, sms number # *;;*;;*;*;spope@abcinvestment.com unmatched-*;;*;;*;*;spope@abcinvestment.com The bbwarnrules.cfg file is a good location for customizing your notifications. For example, the following configuration instructs Big Brother to send notifications for all hosts to jkeith@abcinvestment.com from 8:30 a.m. to 5:30 p.m. on weekdays: *;;*;;1-5;00830-1730;jkeith@abcinvestment.com Increasing PerformanceBy default, Big Brother only runs a single thread to run all the tests. If you have a large number of hosts and services, you can increase the number of concurrent threads for the tests executed by the Big Brother Server. By running concurrent tests, Big Brother reduces the time required to check all the hosts and services. The number of concurrent threads is increased by modifying the value of BBNETTHREADS in the $BBHOME/etc/bbdefserver.sh file. You can set the value to 5 for a reasonable boost in performance without an adverse effect on CPU utilization. Use the following code: # BBNETTHREADS=5 export BBNETTHREADS However, if you set BBNETTHREADS higher than 5, the boost in Big Brother performance comes at the expense of CPU utilization. The underlying hardware (CPU and RAM) should be robust enough to support Big Brother for running multiple threads. Monitoring Additional ServicesNetadmins often need to monitor additional services on network devices or servers. This monitoring functionality can be incorporated into Big Brother to provide a centralized monitoring system. Any text-based TCP/UDP service can be checked by taking the following actions:
A common case of network services is the network administrator running a Terminal Access Controller Access Control System Plus (TACACS+) or Remote Authentication Dial-In User Service (RADIUS) Server to support the AAA feature on Cisco routers and switches. Sample lines, shown in Example 5-5 and 5-6, depict the configuration needed to monitor the TACACS+ service running on the TACACS+ server. Example 5-5. Sample Line in bb-host File192.168.0.55 AAASERVER # testip tacacs Example 5-6. Sample Line in bbdef-server.sh File # BBNETSVCS="smtp telnet ftp pop pop3 pop-3 ssh imap ssh1 ssh2 imap2 imap3 imap4 pop2 pop-2 nntp tacacs" export BBNETSVCS Improving ScalabilityThe original code for Big Brother (Linux version) faces performance issues because it does not scale well when monitoring more that 50 nodes. The BBGen patch, created by Henrik Storner, provides high-performance replacements and enhancements to several Big Brother components. Big Brother, in conjunction with BBGen, has been reported to successfully monitor 1000 nodes simultaneously. The installation of BBGen is straightforward, and the process is documented in the INSTALL file that is included with the source code. To install BBGen, download and unpack the tar file from http://www.deadcat.net. Creating Hyperlinks for Node InformationUsing hyperlinks to view information about specific nodes is an excellent feature that can help the Netadmin to centrally locate needed information at critical times. You can set up hyperlinks for a host in the bb.html or bb2.html pages to point to an information page. This is achieved by creating files in the $BBHOME/bb/www/notes directory. The filenames should match the system names that are specified in the $BBHOME/bb/etc/bb-hosts file. For each monitored node, create a text or HTML file with information such as the serial number, location, warranty information, circuit ID, and vendor or service-provider contact information for the specific device. When the node goes down, the Netadmin can click the hyperlink to get the necessary information to solve the problem. Deploying a Windows-Based Big Brother Network-Monitoring SystemNetadmins searching for a Windows-based network monitoring system have limited choices. The following three are the good options:
Again, because of ease of installation and configuration compared to other tools, Big Brother is the preferred tool and is discussed in this section. The Windows version of Big Brother runs on Windows NT 4.0, 2000, and 2003. Although the overall process of deploying Big Brother in Windows is similar to that in Linux, a few differences exist. One of the most distinct difference is that unlike its Linux counterpart, you do not need to create a separate user in Windows to run the Big Brother Server. The following sections cover the installation and configuration of Big Brother in Windows. However, details regarding the usage of Big Brother are covered in the section "Deploying a Linux-Based Big Brother Network-Monitoring System," earlier in this chapter. Installing Big Brother in WindowsThe target Windows machine for hosting the Big Brother Server must have a preconfigured and functional Internet Information Services (IIS) web server. The IIS web server is part of the Windows NT/2000/2003/XP operating system. IIS is installed through the Add/ Remove Programs icon in the Windows Control Panel. Refer to the Windows documentation for more details on installing and configuring IIS. After installing the IIS web server, download and save the Big Brother executable file for Windows from http://www.bb4.org. Double-click the downloaded .exe file to begin the installation process. During the installation process, Big Brother prompts you for OSspecific information; the default values should work in most cases. Follow the prompts to finish the installation process. Configuring Big BrotherBig Brother on Windows is configured through the . cfg files located in the default directory \Program Files\Quest Software\Big Brother BTF\ xx \etc (where xx is the version number). Before proceeding with the configuration, Netadmins should understand the function of each of these cfg files. The seven cfg files are as follows:
bb-hosts.cfg FileThe bb-hosts.cfg file is used to list all the nodes that are to be monitored by Big Brother. The format is similar to the Linux version, as shown in Example 5-7. Note the similarity between the files in Example 5-3 and 5-7 Example 5-7. Sample bb-host.cfg file192.168.0.30 localhost # BBPAGER BBNET BBDISPLAY http://localhost/ group-compress <H3><I>Network Devices</I></H3> 192.168.0.10 Dallas-router # testip 192.168.0.20 Dallas-Firewall # testip 192.168.0.50 Dallas-Switch # testip 192.168.0.100 FileServer # testip bb-def.cfg FileThe bb-def.cfg file controls the behavior of Big Brother. The parameters are similar to those in the Linux version. The default configuration should work for most situations. BBNETSLEEP controls the frequency at which Big Brother performs the monitoring tests. The interval between successive generations of the bb.html (or bb2.html) page is controlled by BBSLEEP. The default value is 300 seconds for both. Netadmins can increase the monitoring frequency by decreasing these parameters, as shown in the sample bb-def.cfg file in Example 5-8. However, excessively decreasing these values (for example, to 5 seconds) leads to higher traffic and CPU utilization. Moreover, Big Brother might not complete the testing of all hosts or services within the specified interval, thus causing false results. Example 5-8. Sample bb-def.cfg File# -- output suppressed -- # INTERVAL TO WAIT IN SECONDS TO REGENERATE THE bb.html/bb2.html files BBSLEEP="30" # -- output suppressed -- # INTERVAL BETWEEN NETWORK TESTS (IN SECONDS) BBNETSLEEP="30" # -- output suppressed -- bbskin-eng.cfg FileThe bbskin-eng.cfg file controls the display properties, such as the size or color of fonts, for the bb.html and bb2.html pages. This file is for the English version of the HTML pages. While the default content works well in most cases, you can edit the font properties or the text labels to seamlessly integrate Big Brother pages with your web portals. bbskin-fra.cfg FileThis file is same as the bbskin-eng.cfg file, except it controls the French version of the bb.html and bb2.html pages. bbwarnrules.cfg FileThe bbwarnrules.cfg file contains rules for sending alerts and notifications to suitable e-mail recipients. The format is as follows: hosts;exhosts;services;exservices;day;time;recipients In this code, hosts and services are the list of host names and services, respectively, as specified in the bb-hosts.cfg file. Additionally, exhosts is the list of hosts to be excluded, exservices is the list of services to be excluded, day specifies the day of the week (Sunday is expressed as 0, Monday as 1, Saturday as 6, and so on), and time is the range of time specified in HHMM (0000-2359) format. You can use the asterisk (*) as a wildcard to match all values. The first entry in Example 5-9 causes Big Brother to send e-mail alerts to spope@abcinvestment.com and pager alerts at the phone number 333-4444 for all hosts and all services, for all 7 days of the week, and finally during all 24 hours of the day. You can specify multiple recipients using separate line entries. The second entry in Example 5-9 sends e-mail alerts to ksmith@abcinvestment.com only on weekdays (Monday through Friday) between 8:00 a.m. and 5:00 p.m. The last entry in Example 5-9 is a catchall entry. The unmatched- keyword ensures that at least one recipient is sent the alerts for hosts that do not match any of the previous rules. The original bbwarnrules.cfg file contains templates for further customizing the Big Brother alerts and notifications. Example 5-9. Sample bbwarnrules.cfg File*;;*;;*;*;spope@abcinvestment.com 333-4444 *;;*;;1-5;0800-1700*;ksmith@abcinvestment.com unmatched-*;;*;;*;*;spope@abcinvestment.com Tip Many cell-phone service providers now allow e-mail messages to be sent directly to the phones. The e-mail address is of the format xxx-xxx-xxxx@providerdomain.com, where xxx-xxx-xxxx is the 10-digit cell-phone number. Netadmins can use this feature to their advantage and avoid carrying pagers. On the other hand, if an outage interrupts the Internet or mail server connection, the e-mail alerts are not received. You should implement fallback options, such as connecting a modem with a public switched telephone network (PSTN) line to the network-monitoring system to send pager alerts. Or you can use a separate digital subscriber line (DSL) as an alternate path to send e-mail or cell-phone alerts. bb warnsetup.cfg FileThe bbwarnsetup.cfg file is used to modify the overall settings for the notification feature of Big Brother. The default content of the file should work fine for most environments and should not be modified by beginner-level users. Advanced users can refer to the embedded comments within the original bbwarnsetup.cfg file that comes with the installation. security.cfg FileYou should not edit this file, because it is irrelevant to the regular operation of the Big Brother Server. This file is only relevant in scenarios that feature Big Brother monitoring clients talking to the Big Brother Server. By default, the Big Brother Server listens for messages from any Big Brother Client. You can restrict the Big Brother Server to listen for messages only from Big Brother Clients with IP addresses listed in the security.cfg file. Note that this file has no impact on accessing Big Brother web pages (bb.html and bb2.html). Running the Big Brother ServerAfter configuring the .cfg files, you can start the Big Brother Server to monitor the network. Big Brother runs as a Windows service and can be controlled through the Windows Services Microsoft Management Console (MMC) snap-in. Follow these steps to start the Big Brother service:
Deploying Nagios for Linux-Based Network MonitoringNagios is another Linux-based system and network-monitoring application that is popular in the open source community. Nagios provides the following:
Of the listed features, the ability to create network hierarchy is useful in minimizing the number of alerts during network outages. For example, consider a WAN router connected to four remote routers. When the WAN router is down, the connectivity to the remote routers is also affected. Without network hierarchy, the monitoring system generates alerts for all five devices. However, with the network hierarchy defined, the monitoring system generates alerts only for the parent device (the WAN router, in this example). Nagios also allows you to use MySQL or PostgreSQL databases to store data instead of using flat files. Despite its versatility and features, Nagios is comparatively intensive in terms of deployment time and effort. This discussion is aimed at helping Netadmins to quickly understand and deploy a functional Nagios system using text files to store data. Such a system should suffice when monitoring a medium-size network with 50 to 500 nodes. The following sections provide details regarding Nagios installation, configuration, and usage. Nagios InstallationBefore installing Nagios, the TCP/IP stack on the target Linux machine should be preconfigured and the Linux machine should be connected to the network. The Nagios source code is available from the Nagios home page at http://www.nagios.org. To compile Nagios using the source code, the target Linux machine requires the following features:
Additionally, for advanced use of Nagios for database support, you must also install and configure the MySQL or PostgreSQL database on the Linux machine. Installation of these programs is beyond the scope of this book. Refer to the documentation for your Linux distribution. However, Debian-Linux users can benefit from the simplified routine for installing Nagios by using the apt-get install nagios-text command. The entire discussion about Nagios in this chapter is based on deploying Nagios on a Debian-Linux system. The choice of Nagios on Debian is credited to the extreme convenience and stability provided by Debian-Linux. Note Your Debian system might give you the following error: E: Couldn't find package nagios-txt In this case, you should edit the /etc/apt/sources.list file to add the following statement: deb http://ftp.debian.org/debian unstable main non-free contrib The apt-get install nagios-text command automatically installs Nagios and takes care of other system dependencies. By default, the Debian installation process defines nagiosadmin as the default user for administering Nagios. You must configure the password for the nagiosadmin user when prompted during the installation process. You can verify the installation by pointing your web browser to the Nagios machine using the URL http://Nagios-server-IP-address/nagios/. The default home page consists of the navigation pane at the left, as shown in Figure 5-3. Figure 5-3. Nagios Home PageThe navigation pane provides links to documentation, monitoring, and reporting pages. The default navigation pane is organized into four subsections for easy usage. The username for accessing the Nagios web GUI is nagiosadmin; the password is the one that you set during the Nagios installation process. The default installation also includes detailed product documentation. Beginners are strongly encouraged to read this online manual. Nagios ConfigurationNagios is configured through the .cfg files located in the default directory /etc/nagios. The main configuration file (/etc/nagios/nagios.cfg) controls the Nagios daemon and contains the location of other .cfg files. Nagios monitors the services (listed in the service.cfg file) running on the hosts listed in the hosts.cfg file. For ease of administration and notification, hosts can be grouped in the hostgroups.cfg file. This file defines the contact groups for each host group. The contactgroups.cfg file groups various contacts. The e-mail or pager information for each contact (individual recipient) is listed in the contacts.cfg file. To deploy a basic Nagios monitoring system, you must configure each of the following five files:
These configuration files are created automatically during the installation process. Each file contains sample templates for ease of configuration. Lines beginning with a hash (#) are treated as comments. Similarly, any text following a semicolon (;) is ignored. A detailed explanation of each of these files, as well as all the parameters included within these files, is provided in the online product documentation. For a better understanding of each of the .cfg files, consider the network scenario shown in Figure 5-4. The configuration of each of the subsequent .cfg files discussed in this chapter is based on this network scenario. Figure 5-4. Sample Network ScenarioEditing the /etc/nagios/hosts.cfg FileThe /etc/nagios/hosts.cfg file contains a list of every host to be monitored by Nagios. Example 5-10 provides the default contents of the /etc/nagios/hosts.cfg file. Example 5-10. Default Contents of the /etc/nagios/hosts.cfg Filedefine host{ name generic-host notifications_enabled 1 event_handler_enabled 0 flap_detection_enabled 0 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 register 0 } define host{ use generic-host; Name of host template to use host_name gw alias Default Gateway address 192.168.0.1 check_command check-host-alive max_check_attempts 20 notification_interval 60 notification_period 24x7 notification_options d,u,r } Each host definition is contained within braces ({}). The first entry defines a generic host template. The second entry defines the default gateway of the Nagios server itself. Note that the host name gw is internally assigned by the Nagios installation process and should not be changed. The notification_options parameter specifies when to send notifications for the host. Possible values are d for down, u for unreachable, r for recovery to up state, and n for disabling notifications. The u and d options, along with the parents parameter, are helpful in defining network hierarchy and controlling the alert notifications (as shown in the next example). A host is considered unreachable if its parent host is down or unreachable. For example, if Nagios is monitoring a remote router and a web server located behind the router, the router is defined as the parent host for the web server. When the router goes down, the web server is considered unreachable. For a detailed explanation of each parameter, refer to the online product documentation. Note The configuration for gw is automatically created only during the installation and is static. If you move the Nagios server to a different subnet, you must manually change the IP address for gw to match the new network. To add new hosts, you can use the configuration of the host gw as a template. Simply edit and append the configuration for each host to the existing /etc/nagios/hosts.cfg file. Example 5-11 shows the configuration snippet for the hosts depicted in the network scenario of Figure 5-4. Note that host dallas-router is connected to a remote host newyork-router through a WAN link. So the host dallas-router is defined as the parent for newyork-router, as highlighted in Example 5-11. This helps when creating network hierarchy. Additionally, only the down (d) and recovery (r) notification options are enabled for host newyork-router. Consequently, when newyork-router or the WAN link is not functional, Nagios generates alerts for newyork-router. On the other hand, when the parent device dallas-router is down, newyork-router is unreachable. Nevertheless, Nagios only generates alerts for dallas-router and not for newyork-router. Similar logic applies to host dallas-pix, dallas-vpn and their parent host gw (LAN-router) in Figure 5-4. Note While network hierarchy is a useful feature, it requires the appropriate personnel to have a thorough understanding of the network topology. If you need to generate unreachable alerts for a device irrespective of the status of its parent device, simply add u to the notification_options parameter in the host definition. Example 5-11. Contents of the hosts.cfg File# -- default entries for generic-host and gw should be included here -- # # 'dallas-router' host definition define host{ use generic-host ; Name of host template to use host_name dallas-router alias Router-Dallas Cisco 1600 address 192.168.0.10 check_command check-host-alive max_check_attempts 20 notification_interval 60 notification_period 24x7 notification_options d,u,r } # 'newyork-router' host definition define host{ use generic-host host_name newyork-router alias Cisco 2600 router address 192.168.254.2 parents dallas-router check_command check-host-alive max_check_attempts 5 notification_interval 60 notification_period 24x7 notification_options d,r } # 'dallas-pix' host definition define host{ use generic-host host_name dallas-pix alias Firewall pix535 address 192.168.1.2 parents gw check_command check-host-alive max_check_attempts 5 notification_interval 60 notification_period 24x7 notification_options d,r } # 'dallas-vpn' host definition define host{ use generic-host host_name dallas-vpn alias VPN3030 Concentrator address 192.168.1.3 parents gw check_command check-host-alive max_check_attempts 5 notification_interval 60 notification_period 24x7 notification_options d,r } # 'web-server' host definition define host{ use generic-host host_name web-server alias Intranet web server address 192.168.0.100 check_command check-host-alive max_check_attempts 5 notification_interval 60 notification_period 24x7 notification_options d,u,r } # Editing the /etc/nagios/services.cfg FileAfter defining each host monitored by Nagios, you must specify the services to be monitored on each of these hosts. The services are defined in the /etc/nagios/services.cfg file. Example 5-12 provides the contents of a sample /etc/nagios/services.cfg file. The first two definitions are default entries created by the Nagios installation process. The first definition creates a generic service and is a placeholder for a system-specific configuration that can be applied to all the services. The second entry is added by the Nagios installation process to monitor the availability of the default gateway through the ICMP ping command. Do not delete any of these entries. However, when monitoring other hosts through ICMP ping, you can simply append host names within the default entry for the host gw. The exact configuration is provided in Example 5-12. The third definition monitors web services running over the host web-server. The notification_options parameter determines when to send notifications for the service; possible values are w for warning, u for unknown, c for critical, r for recovery, and n for disabling notifications. The service_description parameter refers to the predefined name of the service within Nagios. The contact_groups parameter defines the group of administrators responsible for the maintenance of the monitored host or service. The /etc/nagios/services.cfg file also contains built-in templates for monitoring other common services that are predefined by Nagios, such as SMTP, SSH, FTP, and POP3. Example 5-12. Contents of the /etc/nagios/services.cfg Filedefine service{ ; The 'name' of this service template, referenced in other service definitions name generic-service active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 0 ; Passive service checks are enabled/disabled parallelize_check 1 ; Active service checks should be parallelized ; (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 0 ; Service notifications are disabled event_handler_enabled 0 ; Service event handler is disabled flap_detection_enabled 0 ; Flap detection is disabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts register 0 ;DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } # # the PING service for 'gw' is created by the Nagios installation process # additional hosts can be added in the same line as "gw" define service{ use generic-service ; Name of service template to use host_name gw, dallas-router, newyork-router, dallas-pix, dallas- vpn, web-server service_description PING is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 1 retry_check_interval 1 contact_groups router-admins, web-admins notification_interval 240 notification_period 24x7 notification_options c,r check_command check_ping!100.0,20%!500.0,60% } # # 'web-server' service for monitoring http # define service{ use generic-service host_name web-server service_description HTTP is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups web-admins notification_interval 120 notification_period 24x7 notification_options w,u,c,r check_command check_http } Tip Here is a timesaving tip for configuring the service.cfg file: Instead of individually adding each host to a service definition, you can specify the hostgroup definition. So instead of using the following: host_name HOST1,HOST2,HOST3,...,HOSTN you can use the following: hostgroup_name HOSTGROUP1,HOSTGROUP2,...,HOSTGROUPN Alternatively, you can include all the hosts by using the * character, as shown here: host_name * Editing the /etc/nagios/hostgroups.cfg FileThe /etc/nagios/hostgroups.cfg file groups similar hosts on the basis of features or functionality. The idea behind the hostgroups.cfg file is to identify a similar set of hosts and tie them to a group of administrators who should receive alerts related to those hosts. The host group definitions are also used in the status map pages that are created by Nagios. Each host group definition contains a list of hosts separated by a comma (,). A host can belong to multiple groups, but it must belong to at least one group. Each host group definition also contains the contact_groups parameter, which specifies the group of Netadmins responsible for maintaining the hosts. When the status of a host changes, Nagios sends notifications to the contact groups of each host group to which the host belongs. Example 5-13 shows two host group definitions, each with a different group of contacts. Example 5-13. Contents of the /etc/nagios/hostgroups.cfg Filedefine hostgroup{ hostgroup_name gateways alias Network devices contact_groups router-admins members gw, dallas-router, newyork-router, dallas-pix, dallas-vpn } # define hostgroup{ hostgroup_name webserver alias Web servers contact_groups web-admins members web-server } Editing the /etc/nagios/contactgroups.cfg FileThe /etc/nagios/contactgroups.cfg file defines the contact groups that are used within the host group and service definitions. The contact groups definition also contains the individual members for each of the contact groups. Each group definition must contain the list of members assigned to the group. Multiple member names must be separated by a comma (,). Example 5-14 shows the contents of the /etc/nagios/contactgroups.cfg file for two contact groups , router-admins and web-admins. Example 5-14. Contents of the contactgroups.cfg File# 'router-admins' contact group definition define contactgroup{ contactgroup_name router-admins alias Router and Network admins members spope } # 'web-admins' contact group definition define contactgroup{ contactgroup_name web-admins alias Web admins members jkeith, spope } Editing the /etc/nagios/contacts.cfg FileAfter defining the contact group, the next step is to define the information for each contact (or member). The contact information, such as e-mail address or pager number, is specified through the /etc/nagios/contacts.cfg file. The value for the contact_name parameter in the /etc/nagios/contacts.cfg file must match that specified for the value of the members parameter in the /etc/nagios/contactgroups.cfg file. Example 5-15 provides the configu-ration of the /etc/nagios/contacts.cfg file for two contacts, spope and jkeith. Example 5-15. Contents of the /etc/nagios/contacts.cfg File# 'spope' contact definition define contact{ contact_name spope alias Network Admin service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_option d,u,r service_notification_commands notify-by-email,notify-by-epager host_notification_commands host-notify-by-email,host-notify-by-epager email spope@abcinvestment.com pager 111-222-3333@pagingcompany.net } # 'jkieth' contact definition define contact{ contact_name jkeith alias web Admin service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email,notify-by-epager host_notification_commands host-notify-by-email,host-notify-by-epager email jkeith@abcinvestment.com pager 111-222-4444@pagingcompany.net } For details regarding other parameters included in the /etc/nagios/contact.cfg file, refer to the Nagios product documentation. Running NagiosAfter editing the .cfg files, Nagios is ready to be started. But before running Nagios, you should verify the configuration using the nagios -v main-config-file command. The -v option forces Nagios to parse each of the .cfg files and look for errors without loading the files for production use. Example 5-16 shows the output of the nagios -v command. Example 5-16. Verifying the Nagios Configuration root@linuxbox:# nagios -v /etc/nagios/nagios.cfg Nagios 1.3 Copyright 1999-2004 Ethan Galstad (nagios@nagios.org) Last Modified: 10-24-2004 License: GPL Reading configuration data... Running pre-flight check on configuration data... Checking services... Checked 7 services. Checking hosts... Checked 6 hosts. Checking host groups... Checked 2 host groups. Checking contacts... Checked 2 contacts. Checking contact groups... Checked 2 contact groups. Checking service escalations... Checked 1 service escalations. Checking host group escalations... Checked 0 host group escalations. Checking service dependencies... Checked 0 service dependencies. Checking host escalations... Checked 0 host escalations. Checking host dependencies... Checked 0 host dependencies. Checking commands... Checked 90 commands. Checking time periods... Checked 4 time periods. Checking for circular paths between hosts... Checking for circular service execution dependencies... Checking global event handlers... Checking obsessive compulsive service processor command... Checking misc settings... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check root@linuxbox:/etc/nagios# Issues encountered during the parsing process are reported by Nagios and must be rectified. After verifying the configuration, you can start the Nagios daemon using the nagios main-config-file command. Debian users can also use the init script to start, stop, or restart Nagios, as shown here: root@linuxbox:~# /etc/init.d/nagios restart Stopping nagios: nagios. Starting nagios: nagios. After the Nagios daemon is started, you can access the Nagios page for network monitoring, using the following URL:
The Nagios web GUI provides an HTML version of the product documentation. To view the Nagios product documentation, click the Documentation link in the navigation pane. Figure 5-5 shows the table of contents of the Nagios documentation web page. Figure 5-5. Nagios Documentation PageTo view the current network status, click the Status Map link in the navigation pane. By default, the Status Map layout is circular, as shown in Figure 5-6. You can customize the layout by choosing the Layout Method drop-down menu in the upper-right corner of the Status Map page. Figure 5-6. Nagios Status Map PageYou can click each of the nodes displayed on the Status Map page for more details on each of the hosts and associated services. Alternatively, you can click the Service Detail link in the navigation pane to view the service details for all the listed hosts, as shown in Figure 5-7. Figure 5-7. Nagios Service Detail PageThe navigation pane in the Nagios web GUI also provides links for viewing host details, status summary, and status overview. When monitoring larger network with Nagios, these status pages can get crowded. For such cases, the Nagios navigation pane provides two links Service Problems and Host Problems. The Service Problems page only displays a list of hosts that are experiencing service issues. Figure 5-8 illustrates the Service Problems page, showing two service issues. The status of the PING service (on host dallas-vpn) and the HTTP service (on host web-server) is critical. Figure 5-8. Nagios Service Problems PageSimilarly, the Host Problems page, shown in Figure 5-9, provides a list of all hosts that are currently down. Figure 5-9. Nagios Hosts Problems PageThe Downtime page is useful for scheduling downtime for hosts and services that you are monitoring. When a host or service is in the scheduled downtime period, the notification feature for that host or service is disabled. Figure 5-10 shows the Downtime page. You can schedule a downtime for a host or service by clicking the Schedule host downtime and Schedule service downtime links, respectively. You can cancel a scheduled downtime by clicking the Recycling Bin icon at the right side of the particular entry. Figure 5-10. Nagios Downtime PageThe Comment links in the navigation pane enables you to add arbitrary text as a comment for a host or service. Note that the Comments page automatically includes the comments that were added for a host or service in the Downtime page. Figure 5-11 shows the Comments page. As expected, the downtime comments are also included. You can delete a comment by clicking the Recycling Bin icon at the right of the particular comment. Figure 5-11. Nagios Comments PageThe Process Info page provides the GUI for controlling the Nagios process, thus eliminating the need to learn the CLI options and commands. Some of the options provided by the Process Info page are as follows:
Figure 5-12 shows the Process Info page. The Process Information section provides a summary of the Nagios processes, including the process identification (PID), start time, and total running time. Figure 5-12. Nagios Process Info PageThe Reporting section of the Nagios navigation pane provides options for creating customizable reports. The Trends page creates a trending report for a host or a monitored service. This report consists of a graph that shows the state of the monitored host or service over an arbitrary period of time. Figure 5-13 illustrates the trending report graph for the state of host dallas-pix over a period of 1 day. Figure 5-13. Nagios Trends PageInstead of a graphic illustration, you can also view the same report in numeric format by choosing the Availability page. Figure 5-14 shows the availability report for the state of the host dallas-pix. This is the same report as the previous trending report graph, but in numeric format. Figure 5-14. Nagios Availability PageThe Event Log link in the Reporting section is another useful feature for Netadmins. This link shows the syslog messages generated by Nagios. These messages are useful for monitoring or troubleshooting the Nagios daemon. Figure 5-15 shows a sample event log screen. Figure 5-15. Nagios Event Log PageIn the Configuration section of the navigation pane, the View Config link is handy for referring to the Nagios configuration. Instead of reading through the contents of each .cfg file, Netadmins can use this link to view the configuration in a tabular format. Figure 5-16 depicts the configuration details of the /etc/nagios/hosts.cfg file. Figure 5-16. Nagios View Config Page: HostsNotes for Advanced Nagios UseNagios is highly customizable and provides a variety of features for advanced users. You should read the online product documentation that is included with the installation files. To deploy Nagios in larger (enterprise-grade) networks, the online documentation includes tips such as the following:
Additionally, two useful tips for Cisco Netadmins are as follows:
|