Simple Network Management Protocol (SNMP) | Upgrading and Repairing Networks (5th Edition)

Building a network today involves integrating products from various vendors. This chapter has discussed tools that can be used to locate faults in the physical elements that make up the network and tools that can be used to monitor the functioning of network protocols.

Yet, so far the tools that have been mentioned are all limited to performing a few specific tasks, and each tool must be used as a separate entity. SNMP was developed to provide a "simple" method of centralizing the management of TCP/IP-based networks. The goals of the original SNMP protocols include the following:

Keep development costs low to ease the burden of implementing the protocol for developers.
Provide for managing devices remotely.
Make the protocol extensible so that it can adapt to new technologies.
Make the protocol independent of the underlying architecture of the devices that are managed.
Keep it simple.

The last goal is an important one. Because SNMP is meant to be incorporated into many types of network devices, it was designed so that it would not require a lot of overhead. This makes it easy to create simple devicessuch as a bridge or a hubthat can be managed by SNMP, as well as a more complex device such as a router or a switch. Other key factors of the protocol that stick to this goal include the use of the User Datagram Protocol (UDP) for messaging and a manager-agent architecture. UDP is easier to implement and use than a more complex protocol such as TCP. Yet it provides enough functionality to allow a central manager to communicate with a remote agent that resides on a managed device.

The two main players in SNMP are the manager and the agent. The manager is usually a software program running on a workstation or larger computer that communicates with agent processes that run on each device being monitored. Agents can be found on bridges, routers, hubs, and even users' workstations. The manager polls the agents making requests for information, and the agents respond when asked.

Applications designed to be the manager end of the SNMP software vary in both expense and functionality. Some are simple applications that perform queries and allow an administrator to view information from devices and produce reports. Some of the other functions that a management console application might perform include the following:

Mapping the topology of the network
Monitoring network traffic
Trapping selected events and producing alarms
Reporting variables

Some management consoles, also referred to as network management stations (NMS), can produce trend-analysis reports to help capacity planning set long-range goals. With more advanced reporting capabilities, the administrator can produce meaningful reports that can be used to tackle a specific problem.

SNMP Primitives

Management software and device agents communicate using a limited set of operations referred to as primitives. These primitives are used to make requests and send information between the two. The primitives are initiated by the management software and include the following:

get The manager uses this primitive to get a single piece of information from an agent.
get-next When the data the manager needs to get from the agent consists of more than one item, this primitive is used to sequentially retrieve data; for example, a table of values.
set The manager can use this primitive to request that the agent running on the remote device set a particular variable to a certain value.

The following primitives are used by the agent on a managed device:

get-response This primitive is used to respond to a get or a get-next request from the manager.
trap Although SNMP exchanges are usually initiated by the manager software, this primitive is used when the agent needs to inform the manager of some important event.

Network Objects: The Management Information Base (MIB)

The primitives just described are the operations that can be performed by the manager or agent processes when they exchange data. The types of data they can exchange are defined by a database called the management information base (MIB). The first compilation of the objects stored in this database was defined by RFC 1066, "Management Information Base for Network Management of TCP/IP-based Internets." A year later, this was amended by RFC 1213, "Management Information Base for Network Management of TCP/IP-based Internets: MIB-II." MIB-II clarified some of the objects that were defined in the original document and added a few new ones. Two other RFCs, 2011 and 2012, added further information for the MIB-II database.

The MIB is a tree of information (a virtual information store). This hierarchical database resides on the agent, and information collected by the agent is stored in the MIB. The MIB is precisely defined; the current Internet standard MIB contains more than a thousand objects. Each object in the MIB represents some specific entity on the managed device. For example, on a hub, useful objects might collect information showing the number of packets entering the hub for a specific port while another object might track network addresses.

When deciding which types of objects to include in the standard, the following things were taken into consideration:

The object had to be useful for either fault or configuration management.
The object had to be "weak," which means that it had to be capable of performing only a small amount of damage should it be tampered with. Remember, in addition to reading values stored in the MIB, the management software can request that an object be set to a value.
No object was allowed if it could be easily derived from objects that already exist.

The first definition of the standard MIB hoped to keep the number of objects to 100 or fewer so that it would be easier to implement. This, of course, is not a factor now.

Because the SNMP management scheme is intended to be extensible, vendors often create their own objects that can be added to the management console software so that you can use them.

An object has a specific syntax, name, and method of encoding associated with it. The name consists of an object identifier, which specifies the type of object to which a specific instance of that kind of object is added. The object identifier is a numeric string of decimal digits separated by periodsfor example, ".3.6.1.2.1.1.1". The "instance" of an object is the same, with an additional decimal number following the original object identifier. To make things easier for humans, an object descriptor is used in a text-readable format.

An object can be read-only, read-write, or write-only. In addition, an object can be non-accessible. Syntax types for objects include the following:

Integer
Octet String or Display String
Object Identifier
Null
Network Address
Counter
Gauge
TimeTicks
Opaque

In the first MIB RFC, objects are divided into only a few high-level groups:

System This group includes objects that identify a type of system (hardware or software).
Interfaces An object in this group might represent an interface number or an interface type. Other information about network interfaces, such as the largest IP datagram that can be sent or received, is included as objects in this group.
Address Translation Objects in this group are used for address translation information, such as the ARP (Address Resolution Protocol) cache.
IP Objects in this group supply information about the IP protocol, including time-to-live values, number of datagrams received from interfaces, errors, and so on.
ICMP This group includes Internet Control Message Protocol (ICMP) input and output statistics.
TCP Objects in this group are used to hold information about TCP connections. Instances of these objects exist only while the connection exists. Data contained in these objects includes the number of segments sent or received, for example, or the state of a particular TCP connection (closed, listen, and so on).
UDP Objects in this group represent statistics about UDP, such as the number of UDP datagrams delivered, or the number of UDP datagrams received for which there is no corresponding application at the destination port.
EGP These objects are used for the Exterior Gateway Protocol (EGP), and they contain information such as components of each EGP neighbor, and the state of the local system with respect to a neighbor.

In MIB-II, the address translation group was declared to be "deprecated." That is, it should still be supported but might not be in the next version, which is a means for gradually preparing for changes in the protocol. MIB-II, however, adds new objects and functionality that can be used to perform the same functions as those performed by this group, just in a different way.

MIB-II also added new objects to the existing groups. For example, what seems obvious now as necessary information for the system groupa contact person, a system location, and system servicescan now be stored in objects in this group.

New groups added by MIB-II include these:

Transmission Related to the Interface group, this group is used for objects that relate to specific transmission media.
SNMP A group added for objects needed by the application-oriented working group to collect useful statistical information.

Proxy Agents

Not all devices are equipped with SNMP capabilities. For these devices, another device might be able to handle those functions and acts as a proxy agent so that it can still be managed from the SNMP management console. For example, a network card might not be SNMP-enabled, but the host computer can run a process that can monitor the network card and act as a proxy agent, relaying information to the management station. Proxy agents also can be developed to translate between proprietary management software and SNMP. In this case the proxy agent understands the proprietary management capabilities of the device, and communicates with the SNMP management station when necessary.

The Complex Road to SNMPv2 and SNMPv3

The original implementation of SNMP was kept simple and has been widely used throughout the industry. However, it suffers from several limitations. The get/response messaging mechanism allows for the transfer of only one piece of information at a time. The UDP packet is sufficiently large enough to accommodate more data, but the protocol was not built to allow for this. Security is also an issue with SNMP (version 1) because it has no provisions for encryption or authentication.

A committee of the IETF began work on what was to become SNMPv2 in 1994. Work on this second version of the SNMP standard was delayed for years because many could not agree on some of the security and other issues involved. Because of this, several versions of SNMPv2 were created, specifically SNMPv2u and SNMPv2c, each taking a different approach to security issues. In spite of the haziness of the actual SNMPv2 specifications, however, you'll now find that many vendors support some of the functionality that has been described in the many RFCs that relate to SNMPv2. RFC 1901, "Introduction to Community-Based SNMPv2," is the current Experimental Standard that is implemented by some vendors as if it were an approved standard. See also RFCs 1905 and 1906.

One of the good things to come out of the SNMPv2 debate were two new operations:

Get-bulk This operation allows for the retrieval of a larger amount of information from a single request. This new operation can be used in place of repetitive calls to get-next when transferring large amounts of related information.
Inform This operation allows for one network management station (NMS) to send traps to another NMS.

For the most part, however, a newer version called SNMPv3 is a more likely candidate for adoption by a wider range of vendors. Although SNMPv1 and SNMPv2 implementations are not compatible with each other, SNMPv3 incorporates the best from both, adding security and other features to the protocol. Actually, SNMPv3 is still being developed, and only some RFCs are considered standards.

RFC 2571, "An Architecture for Describing SNMP Management Frameworks," uses the previous SNMP RFCs heavily, with the following items being the main goals of the RFC:

Provide an architecture that allows for the standards process for SNMP developments to proceed even when consensus has not been reached for all the specifics of proposed additions.
Provide for additional security measures.
Modularize each SNMP entity so that each "SNMP engine" can implement the necessary functions to send and receive messages, perform authentication, and perform encryption of messages. Thus, any number of entities can be combined to create an agent, or a management station.

By allowing for a modular approach to SNMP construction, this RFC makes it possible to create new SNMP functionality without having to redefine the entire SNMP standard each time a new feature is added. After all, the "S" in SNMP stands for "simple."

You might want to read these other relevant RFCs when you are evaluating a product and determining how it measures up to the latest in SNMPv3 standards. The following are now IETF standard RFCs, and they apply to different aspects of the SNMP protocol, from version 2 to 3. These are recommended reading for anyone who is involved with purchasing, using, or operating a management console or devices that incorporate SNMP functionality:

RFC 1157, "Simple Network Management Protocol (SNMP)," which is considered a "historic" standard, in that it was involved in the process of defining the SNMP protocol.
RFC 1643, "Definition of Managed Objects for the Ethernet-Like Interface Types."
RFC 3418, "Management Information Base (MIB) for the Simple Network Management Protocol (SNMP)."
RFC 3417, "Transport Mappings for the Simple Network Management Protocol (SNMP)."
RFC 3416, "Version 2 of the Protocol Operations for the Simple Network Management Protocol (SNMP)."
RFC 3415, "View-Based Access Control Model (vacm) for the Simple Network Management Protocol (SNMP)."
RFC 3414, "User-Based Security Model (USM) for Version 3 of the Simple Network Management Protocol (SNMPv3)."
RFC 3413, "Simple Management Protocol (SNMP) Applications."
RFC 3411, "An Architecture for Describing Simple Network Management Protocol (SNMP) Management Frameworks."

Note that these RFC standards are rather recent, although those just listed are now considered standards. However, as complex as the "Simple" Network Management Protocol has become, it is hard to give a general definition of exactly what SNMPv3 is, or will be at this time. The preceding list of RFCs can point you in the right direction when it comes to evaluating manufacturers' hardware/software definitions.

RMON

RMON (which stands for Remote Monitoring) is a data-gathering and analysis tool that was developed to help alleviate some of the shortcomings of SNMP. RMON works in a similar manner, and its objects are defined in an MIB. RMON can also be thought of as a specialized SNMP MIB for use with remote monitoring devices. It was designed to work much like the LAN analyzer discussed earlier in this chapter. RFCs 1757, "Remote Network Monitoring Management Information Base," and 1513, "Token-Ring Extensions to the Remote Network Monitoring MIB," provide the standard MIB definitions for RMON for Ethernet and Token-Ring networks, respectively.

In SNMP, the roles of the manager and agent are those of a client and server, with the agents being the client of the management console software. In RMON, the agents (often called probes) are the active parties and become the server while one or more management consoles can be their clients.

Instead of the management console performing a periodic polling process to gather data and perform analysis from agents out in the field, the agents in RMON perform intelligent analysis and send SNMP traps to management consoles when significant events occur.

Using RMON, the administrator can get an end-to-end view of the network. The types of data collected and the alerts and actions that are associated with RMON are different than those of the standard SNMP type. The objects for RMON fall into the following MIB groups:

Statistics This group records data collected about network interfaces. A table called EtherStatsTable contains one entry for each interface to hold this data and also contains control parameters for this group. Statistics include traffic volume, packet sizes, and errors.
History The control function of this group manages the statistical sampling of data. This function controls the frequency at which data is sampled on the network. The historyControlTable is associated with this group. The history function of this group of objects records the statistical data and places the data in a table called the etherHistoryTable.
Hosts This group tracks hosts on the network by MAC addresses. Information in the hostControlTable specifies parameters for the monitoring operations, and a table called the hostTimeTable records the time a host was discovered on the network.
HostTopN This group is used to rank hosts by a statistical value, such as the number of errors generated or "top talkers." The TopNControlTable contains the control parameters for this group, and the hostTopNTable keeps track of the data.
Matrix Data recorded by this group involves the exchange of frames between hosts on the network. Statistics are kept here for data traveling in both directions between hosts.
Filter This group specifies the types of packets that the RMON probe will capture, such as frame size.
Capture Although the Filter group specifies the parameters that are evaluated for capturing packets, this group is responsible for capturing packets based on those parameters.
Alarm This group is used to set up alarms for events that are described in the next group, the Event group. Here you can set the sampling intervals and thresholds that will trigger an alarm. This group reads statistics that have been gathered, and when they exceed the threshold, an event is generated.
Event When a variable exceeds a threshold defined by an alarm, an event is generated. This group can generate an SNMP trap to notify a network management station or record the information in a log. The Event Table is used to define the notification action that will be taken for an event, and the Log Table is used to record information.

As this list shows, RMON provides a greater deal of functionality compared to SNMP. It allows for the collection of statistical data from all levels of the OSI reference model, including applications at the top in RMON2.

Because Ethernet and Token-Ring networks operate in a fundamentally different way, additional groups are defined in RFC 1513 that are specific to Token-Ring networks:

Token-Ring Statistics A group to store information about the behavior of the ring, from traffic volume to the number of beacons occurring, ring purges, and other information specific to Token-Ring.
Token-Ring History Similar to the History group used for Ethernet, this group keeps track of events on a historical basis.
Token-Ring Station Detailed information about each station on the ring can be found here.
Station Order The physical order of stations in the ring can be determined by information stored in this group.
Station Config Configuration information for stations is stored here.
Source Routing Monitors information about Token-Ring source routing for inter-ring traffic.

Alarms and Events

RMON agents can be programmed to take actions when specific things happen on the network. The Alarms and Events groups provide an important intelligence function.

Configuring an alarm consists of specifying a variable to be watched, the sampling interval, and the event that will be performed when a threshold is crossed. The threshold can be a rising or a falling threshold, or both. For example, an alarm can be set to notify you when something begins to go awry, and to tell you when the situation gets better.

An event that is generated by an alarm can be configured to send an SNMP trap message to one or more management consoles, and store the event in the Log Table. The management station can then take the actions it deems necessary, including retrieving information from the Log Table.

Establishing a Baseline

When making decisions on how to set up alarms and the events they generate, you should consider how the network functions normally. First monitor the network using RMON agents over a long period, noting when variations in traffic or errors occur. Make note of any fluctuations that regularly occur for specific dates or for a particular time of day.

Different network segments might require different sampling intervals and thresholds. For example, a local LAN segment might be subject to wide variations depending on only a small number of users, whereas a major backbone might fluctuate much less as traffic from many segments is blended together. When deciding on a sampling period, it's best to use a shorter interval for a segment that experiences frequent fluctuations and a longer interval for a segment that behaves in a more stable manner.

Response to alarms can be in the form of immediate corrective action, as in the case of a defective device, or a long-term solution such as additional capacity or equipment. Regularly review the baseline values you set, and change them as network usage or topology changes. If alarms and events are not configured to reflect activity that is of a genuine concern, network operators might begin to ignore them, much like what happened to the boy who "cried wolf."