5.1. HP's OpenView Network Node ManagerNetwork Node Manager (NNM) is a licensed software product. The package includes a feature called Instant-On that allows you to use the product for a limited time (60 days) while you are waiting for your real license to arrive. During this period, you are restricted to a 250-managed-node license, but the product's capabilities aren't limited in any other way. When you install the product, the Instant-On license is enabled by default.
5.1.1. Running NNMTo start the OpenView GUI on a Unix machine, define your DISPLAY environment variable and run the command $OV_BIN/ovw. This starts OpenView's NNM . If your NNM has performed any discovery, the nodes it has found should appear under your Internet (top-level) icon. If you have problems starting NNM, run the command $OV_BIN/ovstatus -c and then $OV_BIN/ovstart or $OV_BIN/ovstop, respectively, to start or stop it. By default, NNM installs the necessary scripts to start its daemons when the machine boots. OpenView will perform all of its functions in the background, even when you aren't running any maps . This means that you do not have to keep a copy of NNM running on your console at all times and you don't have to start it explicitly when your machine reboots. When the GUI starts, it presents you with a clickable high-level map. This map, called the Root map , provides a top-level view of your network. The map gives you the ability to see your network without having to see every detail at once. If you want more information about any item in the display, whether it's a subnet or an individual node, click on it. You can drill down to see any level of detail you wantfor example, you can look at an interface card on a particular node. The more detail you want, the more you click. Figure 5-1 shows a typical NNM map. Figure 5-1. A typical NNM mapThe menu bar (see Figure 5-2) allows you to traverse the map with a bit more ease. You have options such as closing NNM (the leftmost button, which resembles a closing door), going straight to the Home map (the second button from the left, which is, not surprisingly, a house),[*] the Root map (the third-left, a hierarchical diagram), the parent or previous map (the fourth-left button, an up arrow), or the quick navigator (the fifth-left button, a right arrow with two diverging arrows).[] There is also a magnifying glass button that lets you pan through the map or zoom in on a portion of it.
Figure 5-2. OpenView NNM menu bar
5.1.2. The netmon ProcessNNM's daemon process (netmon) starts automatically when the system boots and is responsible for discovering nodes on your network, in addition to a few other tasks. In NNM's menu, go to Options Network Polling Configurations: IP. A window should appear that looks similar to Figure 5-3. Figure 5-3 shows the General area of the configuration wizard. The other areas are IP Discovery , Status Polling, and Secondary Failures. The General area allows us to specify a filter (in this example, NOUSERS) that controls the discovery process we might not want to see every device on the network. We discuss the creation of filters in "Using OpenView Filters," later in this chapter. We elected to discover beyond the license limit, which means that NNM will discover more objects on our network than our license allows us to manage. "Excess" objects (objects past the license's limit) are placed in an unmanaged state so that you can see them on your maps but can't control them through NNM. This option is useful when your license limits you to a specific number of managed nodes. The IP Discovery area (Figure 5-4) lets us enable or disable the discovery of IP nodes. Using the "auto adjust" discovery feature allows NNM to figure out how often to probe the network for new devices. The more new devices it finds, the more often it Figure 5-3. OpenView's General network polling configuration optionspolls; if it doesn't find any new devices, it slows down, eventually waiting one day (1d) before checking for any new devices. If you don't like the idea that the discovery interval varies (or perhaps more realistically, if you think that probing the network to find new devices will consume more resources than you like, either on your network management station or on the network itself), you can specify a fixed discovery interval. Finally, the Discover Level-2 Objects button tells NNM to discover and report devices that are at the second layer of the OSI network model. This category includes things such as unmanaged hubs and switches, many AppleTalk devices, and so on. Figure 5-5 shows the Status Polling configuration area. Here you can turn status polling on or off and delete nodes that have been down or unreachable for a specified length of time. The example in Figure 5-5 is configured to delete nodes after they've been down for one week (1w). The DHCP polling options are, obviously, especially useful in environments that use DHCP. They allow you to establish a relationship between polling behavior and IP addresses. You can specify a filter that selects addresses that are assigned by DHCP. Then you can specify a time after which netmon will delete nonresponding DHCP addresses from its map of your network. If a device is down for the given amount of time, netmon disassociates the node and IP address. The rationale for this behavior is Figure 5-4. OpenView's IP Discovery network polling configuration optionsFigure 5-5. OpenView's Status Polling network polling configuration optionssimple: in a DHCP environment, the disappearance of an IP address often means that the node has received a new IP address from a DHCP server. In that case, continuing to poll the old address is a waste of effort and is possibly even misleading, since the address may be reassigned to a different host. Finally, the Secondary Failures configuration area shown in Figure 5-6 allows you to tell the poller how to react when it sees a secondary failure. This occurs when a node beyond a failed device is unreachablefor example, when a router goes down, making the file server that is connected via one of the router's interfaces unreachable. In this configuration area, you can state whether to show alarms for secondary failures or suppress them. If you choose to suppress them, you can set up a filter that identifies important nodes in your network that should not be suppressed even if they are deemed secondary failures. Figure 5-6. OpenView's Secondary Failures network polling configuration optionsOnce your map is up, you may notice that nothing is getting discovered. Initially, netmon won't discover anything beyond the network segment to which your NMS is attached. If your NMS has an IP address of 24.92.32.12, you will not discover your devices on 123.67.34.0. NNM finds adjacent routers and their segments, as long as they are SNMP compatible, and places them in an unmanaged (tan colored) state on the map.[*] This means that anything in and under that icon will not be polled or discovered. Selecting the icon and going to Edit Manage Objects tells NNM to begin managing this network and allows [*] In NNM, go to Help Display Legend for a list of icons and their colors. If your routers do not show any adjacent networks, you should try testing them with Fault Test IP/TCP/SNMP. Add the name of your router, click Restart, and see what kind of results you get back. If you get a message that says "OK except for SNMP," read Chapter 6 as well as the next section in this chapter, which discusses setting up the default community names within OpenView . netmon also allows you to specify a seed file that helps it to discover objects faster. The seed file contains individual IP addresses , IP address ranges, or domain names that narrow the scope of hosts that are discovered. You can create the seed file with any text editorjust put one address or hostname on each line. Placing the addresses of your gateways in the seed file sometimes makes the most sense since gateways maintain ARP tables for your network. netmon will subsequently discover all the other nodes on your network, thus freeing you from having to add all your hosts to the seed file. For more useful information, see the documentation for the -s switch to netmon and the Local Registration Files (LRFs). NNM has another utility, called loadhosts , that lets you add nodes to the map one at a time. Here is an example of how you can add hosts, in a sort of freeform mode, to the OpenView map. Note the use of the -m option, which sets the subnet to 255.255.255.0: $ loadhosts -m 255.255.255.0 10.1.1.12 gwrouter1 Once you have finished adding as many nodes as you'd like, press Ctrl-D to exit the command. 5.1.3. Configuring Polling IntervalsThe SNMP Configuration page is located off the main screen under Options SNMP Configuration. A window similar to the one in Figure 5-7 should appear. This window has four sections: Specific Nodes, IP Address Wildcards, Default, and the entry area (cropped in this example). Each section contains the same general areas: Node or IP Address, Get Community, Set Community, Proxy (if any), Timeout, Retry, Port, and Polling. The Default area, which is unfortunately at the bottom of the screen, sets up the default behavior for SNMP on your networkthat is, the behavior (community strings, etc.) for all hosts that arent listed as "specific nodes" or that match one of the wildcards. The Specific Nodes section allows you to specify exceptions on a per-node basis. IP Wildcard allows you to configure properties for a range of addresses. This is especially useful if you have networks that have different get and set community names.[*] All areas allow you to specify a Timeout in seconds and a Retry value. The Port field gives you the option of inserting a different port number (the default port is 161). Polling is the frequency at which you would like to poll your nodes.
Figure 5-7. OpenView's SNMP Configuration pageIt's important to understand how timeouts and retries work. If we look at Specific Nodes, we see a Timeout of .9 seconds and a Retry of 2 for 208.166.230.1. If OpenView doesn't get a response within .9 seconds, it tries again (the first retry) and waits 1.8 seconds. If it still doesn't get anything back, it doubles the timeout period again to 3.6 seconds (the second retry); if it still doesn't get anything back, it declares the node unreachable and paints it red on NNM's map. With these Timeout and Retry values, it takes about 6 seconds to identify an unreachable node. Imagine what would happen if we had a Timeout of 4 seconds and a Retry of 5. By the fifth try, we would be waiting 128 seconds, and the total process would take 252 seconds. That's more than four minutes! For a mission-critical device, four minutes can be a long time for a failure to go unnoticed. This example shows that you must be very careful about your Timeout and Retry settingsparticularly in the Default area, because these settings apply to most of your network. Setting your Timeout and Retry too high and your Polling periods too low will make netmon fall behind; it will be time to start over before the poller has worked through all your devices.[*] This is a frequent problem when you have many nodes, slow networks, small polling times, and high numbers for Timeout and Retry.[] Once a system falls behind, it will take a long time to discover problems with the devices it is currently monitoring, as well as to discover new devices. In some cases, NNM may not discover problems with downed devices at all! If your Timeout and Retry values are set inappropriately, you wont be able to find problems and you will be unable to respond to outages.
Falling behind can be very frustrating. We recommend starting your Polling period very high and working your way down until you feel comfortable. Ten to twenty minutes is a good starting point for the Polling period. During your initial testing phase, you can always set a wildcard range for your test servers, etc. 5.1.4. A Few Words About NNM Map ColorsBy now, discovery should be taking place, and you should be starting to see some new objects appear on your map. You should see a correlation between the colors of these objects and the colors in NNM's Event Categories (see Chapter 9 for more about Event Categories). If a device is reachable via ping, its color will be green. If the device cannot be reached, it will turn red. If something "underneath" the device fails, the device will become off-green, indicating that the device itself is OK, but something underneath it has a nonnormal status. For example, a router may be working, but a web server on the LAN behind it may have failed. The status source for an object like this is Compound or Propagated. (The other types of status source are Symbol and Object.) The Compound status source is a great way to see if there is a problem at a lower level while still keeping an eye on the big picture. It alerts you to the problem and allows you to start drilling down until you reach the object that is under duress. It's always fun to shut off or unplug a machine and watch its icon turn red on the map. This can be a great way to demonstrate the value of the new management system to your boss. You can also learn how to cheat and make OpenView miss a device, even though it was unplugged. With a relatively long polling interval, it's easy to unplug a device and plug it back in before OpenView has a chance to notice that the device isn't there. By the time OpenView gets around to it, the node is back up and looks fine. Long polling intervals make it easy to miss such temporary failures. Lower polling intervals make it less likely that OpenView will miss something, but more likely that netmon will fall behind, and in turn miss other failures. Take small steps so as not to crash or overload netmon or your network. 5.1.5. Using OpenView FiltersYour map may include some devices you don't need, want, or care about. For example, you may not want to poll or manage users' PCs, particularly if you have many users and a limited license. It may be worthwhile for you to ignore these user devices to open more slots for managing servers, routers, switches, and other more important devices. netmon has a filtering mechanism that allows you to control precisely which devices you manage. It lets you filter out unwanted devices, cleans up your maps , and can reduce the amount of management traffic on your network. In this book, we warn you repeatedly that polling your network the wrong way can generate huge amounts of management traffic. This happens when people or programs use default polling intervals that are too fast for the network or the devices on the network to handle. For example, a management system might poll every node in your 10.1.0.0 networkconceivably thousands of themevery two minutes. The poll may consist of SNMP get or set requests, simple pings, or both. OpenView's NNM uses a combination of these to determine if a node is up and running. Filtering saves you (and your management) the trouble of having to pick through a lot of useless nodes and reduces the load on your network. Using a filter allows you to keep the critical nodes on your network in view. It allows you to poll the devices you care about and ignore the devices you don't care about. The last thing you want is to receive notification each time a user turns off his PC when he leaves for the night. Filters also streamline network management by letting you exclude DHCP users from network discovery and polling. DHCP and BOOTP are used in many environments to manage large IP address pools. While these protocols are useful, they can make network management a nightmare, since it's often hard to figure out what's going on when addresses are being assigned, deallocated, and recycled. In our environment, we use DHCP only for our users. All servers and printers have hardcoded IP addresses. With our setup, we can specify all the DHCP clients and then state that we want everything but these clients in our discovery, maps, etc. The following example should get most users up and running with some pretty good filtering. Take some time to review OpenView's "A Guide to Scalability and Distribution for HP OpenView Network Node Manager" manual for more in-depth information on filtering.[*]
The default filter file, which is located in $OV_CONF/C, is broken up into three sections:
In addition, lines that begin with // are comments. // comments can appear anywhere; some of the other statements have their own comment fields built in. Sets allow you to place individual nodes into a group. This can be useful if you want to separate users based on their geographic locations, for example. You can then use these groups or any combination of IP addresses to specify your Filters, which are also grouped by name. You then can take all of these groupings and combine them into FilterExpressions. If this seems a bit confusing, it is! Filters can be very confusing, especially when you add complex syntax and not-so-logical logic (&&, ||, etc.). The basic syntax for defining Sets, Filters, and FilterExpressions looks like this: name "comments or description" { contents } Every definition contains a name, followed by comments that appear in double quotes and then the command surrounded by brackets. Our default filter,[*] named filters, is located in $OV_CONF/C and looks like this:
// lines that begin with // are considered COMMENTS and are ignored! // Beginning of MyCompanyName Filters Sets { dialupusers "DialUp Users" { "dialup100", " dialup101", \ " dialup102" } } Filters { ALLIPRouters "All IP Routers" { isRouter } SinatraUsers "All Users in the Sinatra Plant" { \ ("IP Address" ~ 199.127.4.50-254) || \ ("IP Address" ~ 199.127.5.50-254) || \ ("IP Address" ~ 199.127.6.50-254) } MarkelUsers "All Users in the Markel Plant" { \ ("IP Address" ~ 172.247.63.17-42) } DialAccess "All DialAccess Users" { "IP Hostname" in dialupusers } } FilterExpressions { ALLUSERS "All Users" { SinatraUsers || MarkelUsers || DialAccess } NOUSERS "No Users " { !ALLUSERS } } Now let's break down this file into pieces to see what it does. 5.1.5.1. SetsFirst, we defined a Set[] called dialupusers containing the hostnames (from DNS) that our dial-up users will receive when they dial into our facility. These are perfect examples of things we dont want to manage or monitor in our OpenView environment.
|