9.1 Network management technologies

9.1.1 The Network Management System (NMS)

In a well-run network, network management allows managers to take control of the network rather than the other way around, and the network management system (sometimes called the network operations center) is likely to be the focal point for network status monitoring and reporting. Many large commercial networks have dedicated rooms for network management facilities, often with multiple in-band management stations along with out-of-band control facilities. At the heart of most network management systems is a live network map. This is typically a set of hierarchical object maps presented on a Graphical User Interface (GUI). Various icons are used to represent the class and status of the different devices being managed. Figure 9.1 illustrates a typical map showing a number of color-coded devices indicating normal operation and unreachable or fault condition

click to expand
Figure 9.1: A typical Graphical User Interface (GUI) showing a network map.

Management interfaces can be either passive or active. A passive interface allows the network administrator to access important information such as system status, protocol events, and traffic statistics. An active interface enables the administrator to modify device status and control activities on a device (such as enabling a port, disabling a routing protocol, etc.). Clearly, one of the key issues with active interfaces (and to some extent passive interfaces) is security, in that the remote party must be a trusted user. Security has limited the scope of SNMP deployment for many years.

There is more to network management than the network map however. The ISO Management Framework breaks down network management into the following categories:

Configuration management—During the initial implementation of any large internetwork configuration management is the most important facet of network management. It covers areas such as installation, booting, inventory management, and reconfiguration. In a multivendor environment it may be difficult to maintain consistent procedures here, and this is just one area of network management that needs significant future development.
Performance management—Performance management is key to capacity planning, trend analysis, and baselining [1]. It is also important for performance tuning and optimization. The results of performance monitoring should ideally be fed back into any tools used to design the network so that the model can be validated. Some of the more sophisticated design tools are now able to do this, albeit in a limited fashion.
Fault management—Covers areas such as troubleshooting, diagnostics, and locating and fixing failed devices. This is probably one of the better served areas of network management, although there is still a long way to go towards the goal of automating most of these procedures.
Security management—As networks have increased in size and businesses move more of their operations onto the network, security has become a major concern. Security management includes topics such as alerting unauthorized access via alarms and traps and coordinating security policy over large internetworks.
Accounting management—Accounting management is concerned with how the running costs of the network are captured and ultimately billed to users. This is possibly the least well-developed area of network management today.

To assist this functionality, a network management station may offer features such as the following:

Graphical User Interface (GUI)
Autotopology discovery (difficult in mixed-media multivendor networks)
Database support (either proprietary or standard, increasingly accessible via the Lightweight Directory Access Protocol—LDAP)
Both in-band and out-of-band management interfaces
Embedded ping and Telnet support (either for diagnostics, inventory management, configuration management, reachability testing where no SNMP stack exists, or as part of autodiscovery)
Reporting capabilities and detailed network statistics (standard objects plus many vendor-specific additions)
Trouble ticketing
Conversion of traps/events to alerts for centralized management

There needs to be a data collection and transport architecture underpinning all of these features to glue all of the relevant pieces together. The main choices to date are Simple Network Management Protocol (SNMP), Common Management Information Protocol/Common Management Information Service (CMIP/CMIS), and IBM's NetView. Some of the main vendors offering network management platforms, complete architectures, and management tools are discussed in section 9.1.6. First we discuss some of the underlying technology used to build network management frameworks.

9.1.2 Simple Network Management Protocol (SNMP)

Background

Nowadays it seems almost impossible to mention the words network management without using SNMP in the same sentence. SNMP has become synonymous with network management due to its rapid deployment and the fact that, for the most part, it works well and is compact enough to be implemented on most low-level devices. SNMP was designed to operate over large internetworks and is primarily concerned with monitoring and isolating faults. SNMP is not really designed for end system management or high-level configuration management.

SNMP is based on the Simple Gateway Management Protocol (SGMP) router protocol; the original architects are Jeff Case, James Davin, Mark Fedor, and Martin Schoffstall. Cisco, ACC, and Proteon introduced the first SNMP-products in 1988. At the time there was still much debate about OSI and IEEE management schemes, but these were soon drowned in a tidal wave of SNMP-enabled products. SNMP is now supported by practically every manufacturer of network devices. SNMP is the de facto standard for multivendor network management. It was primarily developed for use with TCP/IP networks and was endorsed by the U.S. Internet Activities Board (IAB) in April 1988. SNMP is now extremely well supported as agent software running in devices such as repeaters, bridges, routers, and switches. From the management station perspective, implementations range from basic management tools built with UNIX command-line utilities or simple Windows dialog boxes (which are often freely available) up to sophisticated network management systems with built-in expert systems costing thousands, or even millions, of dollars.

The key advantages of implementing SNMP are that it is vendor independent, simple to implement, and relatively small (in terms of protocol stack requirements). Early studies suggested that SNMP could be implemented in as little as 64 KB of RAM, whereas OSI's CMIS/CMIP required as much as half a megabyte (plus additional connection-oriented processing overheads). This is important for keeping costs down in devices where there is precious little memory and possibly no requirement for a transport protocol (such as in a modem or multiplexer). Critics of the protocol, however, argue that it is insecure (especially version 1) and makes device configuration very cumbersome (since SNMP has no concept of data grouping). Today, SNMP is supported by numerous commercial platforms and is widely available as public domain source code. SNMPvl is documented in [2]; SNMPv2 is documented in [3–7].

Architecture

SNMP is based on a manager-agent interaction, running a client/server model over connectionless UDP (although transport over TCP and other protocols is possible). Typically the agent collects real-time data (called managed objects) from the device on which it is resident. The manager polls the agents for information on a regular basis to keep track of the status of resources (see Figure 9.2). The NMS then processes the data locally. Agents cannot request information from an NMS (hence they cannot perform tasks such as user validation on SetRequests). An SNMP manager (the client) has the ability to issue the SNMP commands and be the end point for traps being sent by the agent (the server).

click to expand
Figure 9.2: Basic SNMP architecture.

The manager uses commands to either retrieve data (using a Get command) or change data (using a Set command) from the Management Information Base (MIB) located in the agent. An agent can notify a manager of an event using the Trap command.

SNMP also supports the concept of a proxy agent. This is a type of gateway and provides the ability to manage remote non-SNMP devices over proprietary protocols. The management station must interface with the non-SNMP devices using other protocols (possibly proprietary) and then publishes any relevant data into the SNMP domain by placing these data in the local MIB.

Aside from the client/server architecture, the key components of the SNMP model comprise the Structure of Management Information (SMI), the Management Information Base (MIB), and the Simple Network Management Protocol (SNMP).

SMI

The Structure of Management Information (SMI) is defined in [5, 8]. The SMI organizes, names, and describes object information in order to standardize how information can be accessed consistently. Objects are defined using SMI encoding rules. The SMI states that each managed object must have a name, syntax, and an encoding, as follows:

The name (referred to as an Object Identifier, or OID) uniquely identifies the object.
The syntax defines the data type, such as an integer or a string of octets.
The encoding describes how the information associated with an object is serialized for transmission between machines.

SMI defines the Basic Encoding Rules (BER) for representing managed objects and uses a subset of Abstract Syntax Notation (ASN.1) encoding for message formats (see section 9.1.4 and [2]). SNMP uses four simple types: INTEGER (32-bit unsigned), OCTET STRING, OBJECT IDENTIFIER, and NULL. Managed objects are referenced using a unique OID, and values associated with these objects are stored as strings or integers. Note that the NULL type is used by programmers to customize data types. SNMP also permits two constructor types (sometimes referred to as aggregate types): SEQUENCE and SEQUENCE OF. These are used to generate lists and tables. Finally, SNMP supports a small numbers of application-wide types, as follows:

NetworkAddress—This represents an address from one of possibly several protocol families. Currently, only the Internet protocol family is present. The type is defined in ASN.1 as CHOICE.
IpAddress—A 32-bit IP address. It is defined as an OCTET STRING of length 4, in network byte order.
Counter—A nonnegative integer, which increments until it reaches a maximum value, at which point it wraps around and starts increasing again from zero. The maximum value is 2^32-1 (4294967295 decimal).
Gauge—A nonnegative integer, which may increase or decrease but which latches at a maximum value. The maximum value is 2^32-1 (4294967295 decimal).
TimeTicks—A nonnegative integer, which counts the time in hundredths of a second since some epoch. The description of the object type identifies the reference epoch.
Opaque—Offers the capability to pass any form of encoding transparently as an OCTET STRING. A conforming implementation need only be able to accept and recognize opaquely encoded data (it does not need to be able to unwrap these data).

Note that the restricted 32-bit integer size of counters has caused some concern for long-term statistical analysis on large or extremely busy networks. If these counters wrap, there is no obvious way of knowing how many times they have cycled, and so data are effectively lost or are meaningless unless collated regularly.

MIB

An MIB defines groups of managed objects, some of which are mandatory according to the protocol modules implemented in the managed device. The objects are defined independently from SNMP and arranged into a hierarchical tree, divided into four broad object classifications (directory, mgt, experimental, private), as illustrated in Figure 9.3. The major object branches are as follows:

    1                    iso    1.3                  org    1.3.6                dod    1.3.6.1              internet    1.3.6.1.1            directory    1.3.6.1.2            mgmt    1.3.6.1.2.1          mib-2    1.3.6.1.2.1.2.2.1.3  ifType    1.3.6.1.2.1.10       transmission    1.3.6.1.2.1.10.23    transmission.ppp    1.3.6.1.2.1.27       application    1.3.6.1.2.1.28       mta    1.3.6.1.3            experimental    1.3.6.1.4            private    1.3.6.1.4.1          enterprise    1.3.6.1.5            security    1.3.6.1.6            SNMPv2    1.3.6.1.7               mail

click to expand
Figure 9.3: MIB tree hierarchy.

The mgt (management) subtree is mandatory for objects under SNMP and is the subtree used to identify objects that are defined in IAB-approved documents. The experimental subtree is used to identify objects used in Internet experiments, and objects here are potentially migrated into the mgt branch [9]. The directory branch is reserved for future use. For a complete list of assigned numbers relating to the subtrees, consult [10]. The private subtree is used to identify objects defined unilaterally (i.e., for vendor-specific extensions). Note that it is also strongly recommended for vendors to register themselves under the private.enterprise subtree 1.3.6.1.4.1.

Examples of networking-related enterprise OIDs are as follows (there are over 1,000 registered entries referenced in [10]).

   1  Proteon                        164      Rad Data Communications Ltd.   2  IBM                            166      Shiva Corporation   5  ACC                            193      Ericsson Business Communications   9  Cisco                          211      Fujitsu Limited  16  Timeplex                       232      Compaq  23  Novell                         233      NetManage, Inc.  33  Xyplex                         238      Netrix Systems Corporation  34  Cray                           303      Hughes  35  Bell Northern Research         311      Microsoft  36  DEC                            351      Stratacom  42  Sun Microsystems               353      ATM Forum  43  3Com                           434      EICON  52  Cabletron Systems              437      Grand Junction Networks  56  Castle Rock Computing          475      Wandel and Goltermann Technologies  72  Retix                          484      kn-X Ltd.  75  Ungermann-Bass                 559      Sonix Communications, Ltd.  81  Lannet Company                 562      Northern Telecom, Ltd.  94  Nokia Data Communications      594      British Telecom  99  SNMP Research                  838      Xedia Corporation 111  Oracle                         897      Sybase, Inc. 119  NEC Corporation                946      Bay Technologies Pty Ltd.

One of the most important generic MIBs is called MIB-II [11], which supersedes MIB-I [12]. Objects within an MIB are referenced by traversing the tree using a dot-delimited string of short names or integer OIDs. For example, an object in MIB-II could be prefixed by 1.1.3.6.1.2.1 or iso.org.dod.internet.mgmt.mib2 (note that root uses OID = 1 but has no name). Figure 9.4 illustrates an example GetRequest for the object 1.3.6.1.2.1.1.1.1.1.1.1.1.0 in its decoded form.

 File:SNMP01.ENC      Type:SNIFFER_ENC  Mode:ooooooo  Records:16 =============================================================================== Frame    : 5                Len      : 87               Error    : None T Elapsed: 12:09:09:636     T Delta  : 00:00:00:002 -------------------------------[mac]------------------------------------------- Dest Mac : Xyplex027c33     Sourc Mac: Sun   02be59     Type     : IP -------------------------------[ip]-------------------------------------------- IP Ver   : 4                IP HLen  : 20 Bytes TOS      : 0x00             Pkt Len  : 73               Seg ID   : 0x450d Flags    : FRAG:v.LAST      Frag Ptr : 0     (8 Octet)  TTL      : 60 PID      : UDP  ( 17)       Checksum : 0x5ae  (Good) Dest IP  : 193.128.88.59    Source IP: 193.128.88.173 -------------------------------[udp]------------------------------------------- Dest Port: SNMP [  161]                Src Port : 4401 Length   : 53                          Checksum : 0x0000 -------------------------------[snmp]------------------------------------------ Version  : 1 Community: "public" Command  : GetRequest RequestID: 17 ErrStatus: 00 ErrIndex : 00 Object ID: 1.3.6.1.2.1.1.1.1.1.1.1.1.0. String   : "" ===============================[data:   0]=====================================

Figure 9.4: Example GetRequest for 1.3.6.1.2.1.1.1.1.1.1.1.1.0.

MIB-I defines 126 generic objects and is fairly limited in scope. The latest MIB standard (MIB-II, see [11]) defines several extensions for routing and media support. These include DS3, PPP, and Frame Relay support; better EGP monitoring; improved access to application information; and IEEE 802.1d STA, Token Ring, and DECnet Phase IV support. MIB-II defines the ten function groups as follows:

System—textual description of the entity being managed.
Interface (IF Table)—tabular description of the network interfaces.
Address Translation (AT Table)—translation tables for physical addresses.
IP—addresses, indicators, and counters for IP decisions on datagrams and routing information.
ICMP—the error input and output statistics.
TCP—information of TCP connections, and transmission.
UDP—information on datagrams.
EGP—information on the exterior gateway neighbors.
Transmission—information on different types of transmission media.
SNMP—information on SNMP for use by applications using SNMP statistical information.

An SNMP agent does not have to implement all groups; however, if an agent implements a group then it must implement all the managed objects within that group. Note that MIB-1 and MIB-II have the same OID. MIB-II is a superset of MIB-1.

Vendors can create their own private MIBs using the private.enterprise tree; these are normally described in an ASCII text file using ASN.1 BER. In the past, some vendors did not publish their MIB extensions and produced their own proprietary management applications, mainly for competitive reasons. Thankfully, those days are long gone, and nowadays these private MIBs are often freely published on vendor Web sites or may be obtained by contacting the vendor directly. For a management application to access these private objects it must be aware of them and understand their context. Once downloaded you can incorporate these private objects onto an NMS platform via an ASN.1 compiler. The compiler is often supplied with the management application, but you could use a public domain MIB compiler (see [13, 14] or one of the many commercial compilers).

Private extensions typically comprise performance, routing, hardware status, and configuration management objects. Over time, some of these extensions make it into the standard MIBs, and as the standard MIBs expand, the management of devices in a multivendor network will become simpler and less proprietary.

SNMP

SNMP [2] is a simple request-response protocol and is predominantly asymmetric in operation; most communications are initiated from an intelligent management application and directed toward one or more relatively dumb agents (SNMPv2 redresses the balance a little by introducing peer communication via the inform command). The SNMP agent resides on the device being managed (such as a router) and acts upon Get or Set requests from the NMS by interacting with local MIB objects. The only autonomous operation an SNMPv1 agent can perform is the transmission of Traps to the NMS. The management application is normally accessible through a local or remote graphical user interface; the agent usually has no user interface.

SNMP is transport independent, the preferred transport platform being UDP (using ports 161 and 162) over IP (see Figure 9.5). UDP port 161 is used to send and receive all data-related SNMP messages. UDP port 162 is used exclusively for Traps. There may also be circumstances in which SNMP is required to run directly over some other protocol stack (generally this is not the case), as follows:

If the SNMP UDP/IP stack cannot be tunneled.
The managed node is accessible only via a proprietary protocol.
The UDP/IP encapsulation is seen as too great an overhead.
The host network is pure OSI or some other non-IP architecture.
The application requires low-bandwidth out-of-band management and, therefore, minimal overhead.
There is no current or future requirement for internetworking on a large flat network.

click to expand
Figure 9.5: Basic SNMP protocol stack and object interaction.

SNMP has been ported over a number of different protocol stacks, including TCP, OSI Connectionless-mode Transport Service (CLTS), OSI Connection-Oriented Transport Service (COTS) [15], AppleTalk DDP [16], Novell IPX [17], SLIP/PPP, ATM, X.25 tunneling, LLC/LLC2, and Ethernet MAC layer framing (EtherType 0x814C). As a general rule there is not much benefit in running SNMP over anything other then UDP, since it is so widely implemented, although a number of users do perceive the need for a guaranteed transport service (such as TCP or OSI Transport Class 4). Direct mappings over Data Link or Physical Layer protocols have questionable value, since they cannot be routed, and this severely limits the scope of network management. Another issue to be resolved with non-UDP implementations is interoperability; there would clearly need to be some form of transport gateway if two different implementations were to communicate. The interested reader is directed to [4, 18, 19] for further information on possible transport mappings and their implications.

SNMP message structure

SNMP entities normally communicate via UDP messages within IP datagrams. A message must be entirely contained in a single datagram (no fragmentation is allowed). SNMP packet formats are described using ASN.1. ASN.1 basically fills the role of external data representation (XDR). The message format uses the basic encoding rules of ASN.1, structured as follows:

    INT   Version ID    BYTE  *Community Name    BYTE  *Data

The data field comprises one or more SNMP Protocol Data Units (PDUs). The maximum SNMP message size is limited by maximum UDP message size (65,507 bytes). All SNMP implementations must be capable of receiving packets up to at least 484 bytes in length. Message lengths greater than 484 bytes may be rejected, although support for larger messages is recommended. Some SNMP implementations may not handle packets exceeding 484 octets or may misbehave on receipt of such messages.

Each SNMP PDU contains the following fields:

PDU type—the command type
Request ID—request sequence number
Error status—zero if no error—otherwise one of a small set
Error index—if nonzero, indicates which of the OIDs in the PDU caused the error
List of OIDs and values—values are null for get and get next requests (values are supplied by agents in response PDUs)

In SNMPvl, Trap PDUs differ from the standard PDU and contain the following fields:

Enterprise—identifies the type of object causing the Trap
Agent address—IP address of agent that sent the Trap
Generic Trap ID—the common standard Traps
Specific Trap ID—indicates a proprietary or enterprise Trap
Timestamp—indicates when the Trap occurred in time ticks
List of OIDs and values—OIDs that may be relevant to send to the NMS

SNMPv2 Trap PDUs were modified to use the standard PDU format.

Service primitives

SNMP operations are invoked by a small number of primitives, as defined in [2]. SNMP supports the following functions:

Get Request/Response
Set Request/Response
GetNext Request/Response
GetBulk Request/Response (introduced in SNMPv2)
Trap
Inform Request/Response (introduced in SNMPv2)

Get

GetRequests retrieve object information from a client MIB. They can retrieve only one object at a time and are unsuitable for bulk retrieval. A request is not explicitly acknowledged, but if successful the client will respond with the appropriate data and an error indication using a GetResponse.

Set

SetRequests are used to modify an MIB object or variable, and a successful operation will be responded to by a SetResponse. The client will respond with appropriate data and an error indication if not successful.

GetNext

GetNextRequests are used to simplify table transversal and simplify MIB browsing. GetNextRequest references a pointer to the last retrieved object and so does not require an exact name to be specified; when invoked, it simply returns the next object in the MIB (hence, MIBs must be strictly ordered sets, and, depending upon the implementation, GetNextRequests can exhibit some anomalies).

GetBulk

GetBulkRequests are used to alleviate the burden of programming multiple GetNextRequests; in effect that is precisely what they do automatically to retrieve multiple rows of table data. The implementation of GetBulk is, unfortunately, still problematic. Refer to [20] for full details of GetBulk.

Inform

Inform is used for unsolicited messages for manager-to-manager (M2M) communications. This could also be used for distributed management or by a networking device wishing to retrieve information from the management application. Several vendors have retrofitted this command to their SNMPvl stacks.

Trap

Traps are used for unsolicited messages, triggers, and alarms. A Trap might be generated on a significant alarm condition, such as link failure or authentication failure. In SNMPvl the Trap primitive used a form of PDU different from the other commands. In SNMPv2 the Trap PDU was made consistent, and traps are named in MIB space (enabling manager-to-manager communications). The original mandatory Trap list specifies the following conditions:

0: coldStart
1: warmStart
2: linkDown
3: linkUp
4: authenticationFailure
5: egpNeighborLoss

Many products do not implement all of these Traps, since they are not all relevant to every device type supporting SNMP (egpNeighborLoss being a good example). Vendors can assign additional Traps (starting at 6) for custom events via a special Trap called the enterprise Trap (see [21, 22]. Trap is the only PDU sent by an agent to one or more nominated Trap servers (usually one or more NMSs) on its own initiative.

One of the problems with Traps is that when a number of failures occur, many Traps could be generated, blocking up event logs and potentially causing congestion. To work around this, a managed node may be configured to send Traps only when specific thresholds have been reached (e.g., a transparent bridge could send a Trap when 90 percent of its filter table is in use). Newer MIBs specify management objects that control how Traps are sent. An even better solution is the concept of Trap-directed polling. With this model the managed node sends a single trap to the NMS when an extraordinary event occurs. It is then up to the NMS to poll in the vicinity of the problem by initiating further communications with the managed node or neighboring managed devices (the NMS is clearly in a better position to determine what the cause of the problem might be, since it has the big picture of the network).

Table traversal with GetNextRequest

In SNMPv1 the most important use of GetNextRequest is the traversal and retrieval of data from tables within the MIB. GetNext supports multiple arguments, enabling efficient data retrieval; it is interesting to note that other management protocols such as OSI employ extremely complex mechanisms for traversing management information. SNMP data retrieval is, however, not especially efficient from a traffic perspective. Use of the GetNextRequest for bulk data retrieval can have a noticeable impact on network performance; hence, it is worth understanding. An example SNMP exchange is outlined in the following chart. Here an SNMP management application is attempting to retrieve the destination address and next-hop gateway address for all entries in a routing table. The table is accessed via an SNMP agent residing on a remote routing node. Assume in this example that the routing table has only three entries, as follows:

    Destination        NextHop          Metric    195.160.12.23      195.160.12.1     100    20.0.0.50          89.1.1.42        40    10.0.0.99          89.1.1.42        20

The management station sends a GetNextRequest PDU to the SNMP agent, containing several operands representing three columns of data in the first row of the routing table, as follows:

 GetNextRequest ( ipRouteDest, ipRouteNextHop, ipRouteMetric1 )

The interaction between the NMS and agent proceeds as follows:

 MANAGEMENT APPLICATION                                                    SNMP AGENT                     <- GetResponse (( ipRouteDest.195.160.12.23 = "195.160.12.23" ),                                  ( ipRouteNextHop.195.160.12.23  = "195.160.12.1" ),                                              ( ipRouteMetric1.195.160.12.23 = 100 )) GetNextRequest ( ipRouteDest.195.160.12.23, ->                  ipRouteNextHop.195.160.12.23,                  ipRouteMetric1.195.160.12.23 )                            <- GetResponse (( ipRouteDest.20.0.0.50 = "20.0.0.50" ),                                         ( ipRouteNextHop.20.0.0.50 = "89.1.1.42" ),                                                  ( ipRouteMetric1.20.0.0.51 = 40 )) GetNextRequest ( ipRouteDest.20.0.0.50, ->                  ipRouteNextHop.20.0.0.50,                  ipRouteMetric1.20.0.0.50 )                            <- GetResponse (( ipRouteDest.10.0.0.99 = "10.0.0.99" ),                                         ( ipRouteNextHop.10.0.0.99 = "89.1.1.42" ),                                                  ( ipRouteMetric1.10.0.0.99 = 20 )) GetNextRequest ( ipRouteDest.10.0.0.99, ->                  ipRouteNextHop.10.0.0.99,                  ipRouteMetric1.10.0.0.99 )

When there are no further entries in the table, the SNMP agent returns those objects that are lexicographically next in the order within the MIB (i.e., when the returned prefix differs from the requested prefix, this indicates the end of the table). In this example we would expect to see a final response from the agent with the prefix ipRouteIfIndex instead of ipRouteDest (since this is the next object in MIB-II). From a traffic perspective, each row retrieved produces a pair of SNMP poll-response packets. If large data tables are periodically retrieved from multiple agents, this can lead to a significant amount of background management traffic (imagine a backbone node routing table). For this reason SNMPv2 introduced a new primitive called GetBulk, but unfortunately its use is still problematic.

SNMP security

SNMP by default allows anybody to configure or access data on a remote device. Although it is in the interests of simple management, clearly this represents a considerable security compromise. In SNMPv1 there are several basic mechanisms to control access, as follows:

Community names
Limit management requests to specific devices
Disable the Set capability

The community name (or community string) is a case-sensitive ASCII string (an OctetString, of 0–255 octets). Every managed device belongs to a community; the default community string is Public. Community names are a simple way of restricting configuration and monitoring access; the community name, in effect, operates like a simple password for a group of devices. A group of routers, for example, could belong to the community TERABIT; this would mean that only management applications configured with the same community string would be able to manage these routers. Note that the main NMS on a large internetwork would generally be configured to manage multiple communities. Community names are useful but not very secure, since they are statically configured, sent over the network in cleartext, and hence vulnerable to attack. Even if these strings could be encrypted, some countries might not permit this option if local encryption legislation were particularly restrictive (so remote management over international boundaries could be compromised).

SNMP-enabled internetwork devices usually allow you to configure one of more Trap clients by IP address. If this facility is configured on the managed device (i.e., the agent), any attempts to configure or retrieve data from that device by an untrusted management application could raise an alarm via a Trap message.

Because of the relative weakness of community strings and the potential for damaging mission-critical equipment, many vendors disable the Set capability on their agents by default; several do not implement it at all. In secure environments these devices would typically be configured via the local console, via remote password-protected Telnet sessions into the CLI or, even better, via a secure remote HTTP access in combination with SSH or SSL (i.e., a secured Web-based browser interface). Once configured these devices would then be purely monitored via SNMP; no configuration changes would be allowed. SNMPv2 and SNMPv3 have enhanced SNMP's security architecture as described in the following text.

SNMPv2

SNMPv2 is an interim standard documented in [6] (with the MIB specified in [7]). SNMPv2 does, however, provide several useful enhancements, including the following:

Performance improvements, as indicated in Table 9.1.

Table 9.1: Comparison of Secure SNMP and SNMPv2 Performance, Measured in SNMP Primitives Per Second (Tests Conducted at Carnegie Mellon University, Pittsburgh, PA)
	Secure SNMP	SNMP2
No Security	210	3300
Authentication Only	195	2910
DES Encryption	110	1600

A new primitive called GetBulk, for optimized bulk data retrieval.
A richer set of messages for use between the NMS and the agent. This includes an acknowledgment for Set primitives as well as the ability for an agent to indicate it has a problem servicing a request (so the NMS doesn't persistently retry, as in SNMPv1).
Multiprotocol support. SNMPv2 offers better abstraction at the Transport Layer interface to enable support for non-IP-based transports.
A new set of manager-to-manager (M2M) communication features that standardize the role of the Mid-Level Manager (MLM) for hierarchical management models.

SNMPv2 also integrates several security enhancements [23]: Secure SNMP, the Simple Management Protocol (SMP), and the Party-based SNMPv2. Each of these efforts incorporates industrial-strength security, and these efforts were integrated into the SNMPv2 Management Framework [5–7]. However, this framework had no standards-based security and administrative framework of its own and relied on multiple frameworks, including the Community-based SNMPv2 (SNMPv2c), SNMPv2u, and SNMPv2*. Unfortunately, SNMPv2c was endorsed by the IETF but had no security and administration framework, and both SNMPv2u and SNMPv2* had security but lacked IETF endorsement. Aside from this standards muddle, SNMPv2 offered DES for authentication and encryption, which was problematic for international users since U.S. manufacturers were restricted by legislation from exporting strong encryption outside the United States. Consequently, SNMPv2 received a mixed reception and the whole security issue has been revisited and addressed in SNMPv3.

SNMPv3

SNMPv3 is a new version of SNMP under development but builds upon much of the early work done by the authors of SNMPv1 and SNMPv2. One of the key focal points for SNMPv3 is a definition of security and administration that enables secure management transactions, including manager-to-agent, agent-to-manager, and manager-to-manager transactions. This work comprises authentication and privacy, authorization and view-based access control, and standards-based remote configuration (of particular importance for managed VPNs). Refer to [23] for a full discussion of these features.

Implementations of the SNMPv3 are already being developed by several vendors and research organizations, including ACE*COMM, AdventNet, AGENT++, BMC Software, Cisco (IOS version 12.0[3]T), IBM Research, and SNMP Research. For the interested reader, an active Web page for SNMPv3, including many useful links, is maintained by the Simple Times [24]. See also [RFC2570], [23, 25–28].

SNMP V1, V2, and V3 coexistence

There are two generally accepted approaches for migrating between SNMPv1, SNMPv2, and SNMPv3 environments: bilingual managers and proxy agents. For further details on these strategies refer to [29]. IBM Research [30], for example, already has a multilingual SNMPv1, v2c, and v3 stack.

Resilience

SNMP's simplicity and connectionless operation provide a degree of robustness. The connectionless nature of SNMP leaves the recovery and error detection up to the NMS and even up to the agent. Neither the NMS nor the agents rely on one another for continuous operation (a manager can continue to function even if a remote agent fails and vice versa). Even if an agent fails, it can send a Trap to the NMS if it subsequently restarts, notifying the NMS of its change in operational status. The NMS can periodically poll agents to test their availability.

Although SNMP is typically used over UDP, it is transport independent. UDP, being connectionless, is thought by many to be better suited for network management than a connection-oriented transport, particularly when the network is failing and routing can oscillate. UDP minimizes the stress placed on the network (i.e., no resources are tied up as with maintaining connections), leaving the agent and NMS implementations with the responsibility for error recovery. This contrasts directly with OSI's preference for a connection-oriented approach to network management (using OSI transport class 4). Some users may, however, require a connection-oriented transport, if the network is especially unreliable or if specific guarantees are required to ensure reliable message delivery (in the IP world this could be provisioned using TCP). The argument against this is that it is precisely when the network is most unreliable that connection-oriented protocols place the most stress on the network, potentially making the situation worse.

Performance issues

The connectionless mode of SNMP means that the management application is not required to maintain long-term state information for management sessions (just short-term state information for pending requests or replies). This greatly reduces the CPU and memory overheads on the host NMS platform, and, in theory, the management application needs only to run one client process to manage a large internetwork. However, for performance reasons it is usual for the management application to spawn off multiple client processes to increase throughput (avoiding blocking and queuing problems during management traffic burst or multiple pending events).

SNMPv1 is by definition simple, but the request-response polling model (together with the very limited primitive support) means that it is by no means an efficient way to move bulk management data around the network. We saw earlier that to get the contents of a routing table from a large backbone router via SNMP requires that the table is retrieved a row at a time with successive SNMP GetNextRequests. This means that there will be a UDP request and response for every routing entry (literally thousands in a backbone node), and each request includes a variable OID string and community name. This potentially induces significant traffic overheads on the network and significant latency in data retrieval.

All currently active SNMP frameworks (SNMPv1, SNMPv2c, and SNMPv3) are inefficient in terms of the number of bytes needed to transfer MIB data over the network. There are three main reasons for this inefficiency: the Basic Encoding Rules (BER), the OID naming scheme (lots of repetition), and what is referred to as the GetBulk overshoot problem. Nevertheless, SNMPv1 is considerably slower than its successors. Relative testing performed at Carnegie Mellon University demonstrates how Secure SNMP and SNMPv2 compare (in units of management transactions per second).

Reference [20] describes an algorithm that improves the retrieval of an entire table by using multiple threads in parallel, where each thread retrieves only a portion of the table. This requires a manager that supports multiple threads and that has knowledge about the distribution of instance identifiers in the table. This algorithm does not reduce the total number of request-response PDU exchanges but does improve latency, because several threads gather data simultaneously. The downside for achieving reduced latency is bursty SNMP traffic, which can overload the agent or impact user traffic (since SNMP traffic is most likely in-band). Things get worse if the network starts to drop packets, since retransmission timers must expire and subsequent retransmissions must get through for the retrieval process to continue. SNMPv2 offers a new primitive called GetBulk, which is designed to resolve this issue. Unfortunately, the manager may not know the size of the table to be retrieved and therefore must guess a value for the max-repetitions parameter. Using small values for max-repetitions may result in too many PDU exchanges. Using large values can result in the agent returning data that do not belong to the table being probed. These data will be sent back to the manager to be discarded. In the article "Bulk Transfers of MIB Data" [31] a new operator called getsubtree is proposed to counter this problem.

Another issue for performance is the management hierarchy. In a large internetwork you must seriously consider systems that can be implemented from the top down. If every management system receives all of the management data, then they will quickly become overloaded. It is relatively easy to define which objects an NMS will manage, but some systems may not support an NMS hierarchy, whereby a central backbone NMS manages multiple local Mid-Level Managers (MLMs). There is nothing in the SNMP specifications that describes such a model. With this type of model you should also investigate systems with resilient databases and possibly hierarchical access control schemes to support different classes of users.

Summary of SNMP advantages and disadvantages

The key advantages of SNMP are as follows:

Its design is simple, easy to understand, and easy to implement on a large internetwork.
It puts relatively low stress on network bandwidth and resources (compared with connection-oriented management models).
Public domain or freeware code is widely available. The SNMP API is relatively simple to use to design applications. A particularly nice API for C++ programmers called SNMP++ is documented in [32].
SNMP has a huge installed base. All major vendors of internetwork hardware (bridges, switches, routers, etc.) design their products to support SNMP.

The disadvantages to SNMP are as follows:
SNMP is inefficient. The traffic overhead created by using SNMP to monitor multiple devices does not scale well (solutions such as RMON, described shortly, are preferable).
SNMP lacks a standard model for hierarchical management implementations and interaction. This limits scalability and requires proprietary models to be deployed.
SNMP security is poor. There are large security holes that allow network intruders to access SNMP data carried over the network. The key weaknesses include data privacy, authentication, and access control. SNMPv2 and Secure SNMP have added security mechanisms that combat these failings. SNMPv3 further develops this work into a consolidated set of standards.
SNMP is considered by many to be too simple. The information it manipulates is neither sufficiently detailed nor sufficiently well organized to cope with large, demanding internetworks. Some of these problems are being addressed via SNMPv2 and SNMPv3. For example, SNMPv2 allows for more detailed specification of variables and optimizes table retrieval. There are also two new PDUs for manipulating objects in tables.

9.1.3 Remote Network Monitoring (RMON)

Remote Network Monitoring (RMON) is an MIB implementation designed to optimize remote performance monitoring and to offload much of the bandwidth and resource overheads incurred by the SNMP polling model. It relies on local devices called smart agents (implemented in devices such as routers and switches) to gather much of the data for subsequent collection. These agents are often referred to as Data Collection Modules (DCMs) or probes. Probes can be remotely configured by a network manager to set custom thresholds on specific events and report SNMP traps.

RMON is defined by a set of MIBs (see [33, 34]). Reference [34] defines nine major groups of information, most of which are generic object groups relevant to all LANs. It also includes MIB objects that were specifically developed for monitoring remote Ethernet segments. It is worth pointing out that an RMON-compliant agent or manager only has to support a subgroup of information within one major group. There have been two major releases of RMON, called RMONv1 and RMONv2.

RMONv1 provides network managers with a host of new functions they could not get from SNMP-based tools and also allows for storing performance histories on designated segments and triggering alarms for specific network conditions.

RMONv2 provides data about traffic at the Network Layer in addition to the Physical Layer. This enables administrators to analyze traffic by protocol type. RMONv2 also allows managers to check performance all the way to the individual port level on RMON-enabled routers or switches. It also allows a single RMON probe to monitor multiple protocol types on a single segment and provides much more flexibility in the way probes are configured for later reporting and for measuring network response times.

RMONv1 was dogged by high implementation costs and a number of incompatibility issues between vendor implementations. Another problem is that system resource requirements can be significant, especially if RMON features are enabled on all interfaces. Features such as per interface packet capture are superb for remote diagnostics, but unfortunately most existing enterprise and access router architectures would simply stop forwarding (or at least struggle) without an additional hardware assist. These issues have not disappeared with RMONv2. Another issue affecting deployment is that RMONv1 vendors have added proprietary extensions to their products in an effort to make their products more attractive to network managers. Again, there is the potential here for incompatibility. Currently RMON is being deployed with some caution, partly due to the issues described here and partly due to cost, and many customers are sticking with RMONv1 until implementations stabilize. Over time improvements in CPU performance, memory, prices, and system architecture will reduce costs and improve performance, enabling users to make the most out of what RMON has to offer.

Operation

An RMON agent works offline; it is a noninteractive monitor sitting on a LAN segment. An RMON agent is remotely configured and activated via a management application, enabling it to collect traffic and performance data on local interfaces. The agent reads and copies each frame on its local segment, updating counters based on the contents of the frame. Typically this information is cached at the agent until a remote management application requests it to be uploaded. The management application is not actively involved in the data collection. RMON uses preemptive monitoring. By continuously capturing and caching performance data in real time, an RMON application can look for anomalies in traffic patterns over time or perform trend analysis. Traffic surges or rising error rates can be detected and alarmed before the problem becomes critical. The agent can be configured to recognize when specific thresholds have been exceeded and to generate SNMP traps when these events occur. As part of the diagnostic process additional information within the RMON MIBs can be extremely useful in further qualifying the nature of the fault.

The RMON MIBs can hold detailed information about LAN operations, such as Nearest Active Upstream Neighbor (NAUN) order of the Token Ring adapters on the segment [1] or the source and destination MAC addresses of the stations with the most network traffic. This level of detail can be used to help isolate performance problems or perform trend analysis for capacity planning. A single RMON agent can support several remote RMON managers. A table is maintained within the agent of what information (and at what intervals) is to be sent to specific manager IP addresses. This is done through the alarm and event groups within RMON MIBs. Other information is routinely collected by the agent, captured in memory, and reported within GetResponse replies to a remote SNMP manager.

RMON MIB groups

Reference [34] defines the function and organization of the RMON MIB groups, as follows:

1—Statistics. Records, packets, octets, broadcasts, collisions, discards, fragments, and errors.
2—History. Caches multiple samples from the statistics group to be used for operations such as trend analysis.
3—Alarm. Enables thresholds and sampling intervals to be set in order to specify alarm conditions.
4—Host. Provides a table of active nodes and basic per node statistics.
5—HostTopN. Extends the host table by offering user-defined sorting capabilities (processed at the agent).
6—Matrix. Summarizes the traffic and error counts between pairs of nodes.
7—Filter. Offers user-defined packet filters for use as trigger or termination events for capturing activities.
8—Capture. Packets that pass the defined packet filters are copied and stored locally.
9—Event. Enables the user to create event logs or send SNMP Traps from the agent.

Several of the RMON groups in the MIB contain control and data tables. Control tables contain control parameters that specify which statistics you want to access and collect. You can view and change many entries in a control table. Data tables contain statistics the agent collects; usually you can only view entries in these tables. The following sections describe the function of each group and the tables that each group defines. Refer to [34] for more detail.

Statistics group

The statistics group records data that the agent measures on network interfaces. On an Ethernet interface the agent creates one entry for each Ethernet interface it monitors and places the entry in the EtherStatsTable. The EtherStatsTable also contains control parameters for this group. As an example of the kind of statistics maintained, reference [34] defines the following objects:

 EtherStatsEntry ::= SEQUENCE {    etherStatsIndex                    INTEGER (1..65535),    etherStatsDataSource               OBJECT IDENTIFIER,    etherStatsDropEvents               Counter,    etherStatsOctets                   Counter,    etherStatsPkts                     Counter,    etherStatsBroadcastPkts            Counter,    etherStatsMulticastPkts            Counter,    etherStatsCRCAlignErrors           Counter,    etherStatsUndersizePkts            Counter,    etherStatsOversizePkts             Counter,    etherStatsFragments                Counter,    etherStatsJabbers                  Counter,    etherStatsCollisions               Counter,    etherStatsPkts64Octets             Counter,    etherStatsPkts65to127Octets        Counter,    etherStatsPkts128to255Octets       Counter,    etherStatsPkts256to511Octets       Counter,    etherStatsPkts512to1023Octets      Counter,    etherStatsPkts1024to1518Octets     Counter,    etherStatsOwner                    OwnerString,    etherStatsStatus                   EntryStatus }

History group

The history group contains a control and data collection function. The control function manages the periodic statistical sampling of data from networks and specifies control parameters, such as the frequency of data sampling, in the historyControlTable. The history function records periodic statistical samples from Ethernet networks—for example, interval, start time, and number of packets. This function places the statistical samples in the etherHistoryTable.

Host group

The host group identifies hosts on the network by recording the source and destination MAC addresses in good packets and places the information in the hostTable. This group also records the time it discovered a host on the network in the hostTimeTable. The hostControlTable specifies control parameters, such as which monitoring operations the agent performs, and contains some information about the monitoring process.

HostTopN group

The HostTopN group ranks hosts according to a statistic type. For example, you might want to rank the hosts by the number of errors they generate. Control parameters for this group appear in the hostTopNControlTable, and data this group generates appear in the hostTopNTable.

Matrix group

The matrix group stores statistics for an interchange between hosts at different addresses. This group's control parameters, such as number of hosts, appear in the matrixControlTable. When the matrix group receives information from a good packet, it places data in both the matrixSDTable and the matrixDSTable.

Filter group

The filter group specifies what type of packets the agent should capture. Filter control parameters, such as the minimum length of the packets to capture, appear in the filterTable. Associated with each filter is a channel (a specific path along which data flow). Control parameters in the channelTable define how and where the filtered packets flow.

Capture group

The capture group enables the capture of packets that satisfy the filter group control parameters for a particular interface. Control parameters in the bufferControlTable specify how to transfer data from the channelTable to the captureBufferTable. For example, you can specify the maximum number of octets from each packet that the group should store in the captureBufferTable.

Alarm group

The alarm group allows you to set an alarm threshold and a sampling interval to enable the RMON agent to generate alarms on any network segment it monitors. Alarm thresholds can be based on absolute or delta values so that you can be notified of rapid spikes or drops in a monitored value. Each alarm is linked to an event in the event group. An event defines an action that will be triggered when the alarm threshold is exceeded. The alarm group periodically takes statistical samples from variables in the agent and compares them with previously configured thresholds. The alarm table stores configuration entries that define a variable, a polling period, and threshold parameters. If the RMON agent determines that a sample crosses the threshold values, it generates an event.

The RMON agent monitors any variables that resolve to an ASN.1 primitive type of integer (integer, counter, gauge, or TimeTick) in this way. You can specify rising or falling thresholds, indicating network faults such as slow throughput or other network-related performance problems. You specify rising thresholds when you want to be notified that an alarm has risen above the threshold you specified. You specify falling thresholds when you want to be notified that the network is behaving normally again. For example, you might specify a falling threshold of 30 collisions per second to indicate a return to acceptable behavior.

Event group

The event group allows for the generation of an SNMP trap, the generation of a log entry, or both, for any event you choose. An event can occur when the sample variable exceeds the alarm threshold or a channel match event generated on an agent. The RMON agent can deliver traps to multiple NMSs. You can typically set up events to either record the monitoring information or to notify the NMS.

The event group includes an event table and a log table. The event table defines the notification that takes place when an event is triggered. One form of notification is to write an entry in the log table. Each entry in the event table identifies an event that can be triggered and indicates an action, such as writing a log entry to the log table or sending an SNMP trap to the NMS. The event can trigger any of the following actions:

The system sends an SNMP Trap to the network management station.
The management station is notified immediately. The management station determines how to react to the SNMP Trap.
The system logs the event in the log table in the agent system.
The management station can retrieve the information stored in the log table for further analysis. For example, the information collected can be used to select proper threshold values.
The system sends an SNMP trap and logs the event in the log table.

The log table is a read-only data table for the network management station. It records each event that needs to be logged. It provides the event number, an index that distinguishes occurrences of the same event, the time at which the event occurred, and the event description. You are not required to configure the log table.

RMON summary

RMON is widely recognized as a very useful multivendor tool for efficient remote monitoring, diagnostics, and data collection. RMON management applications are passive, relying on other device management applications to provide change and control operations. It is up to the manager to decide what data should be monitored and what thresholds and alarm conditions to set. RMON can be invaluable in monitoring LAN health and performance and as a guide to longer-term capacity planning. In medium to large internetworks RMON is preferable for traffic monitoring to SNMP polling, since RMON records data passively and does not skew the overall traffic data with additional background management traffic.

9.1.4 Common Management Information Service/ Protocol (CMIS/CMIP)

Background

The OSI defined a complete set of specifications for network management, with two major components: Common Management Information Protocol (CMIP) and Common Management Information Service (CMIS). CMIS defines the services used, and CMIP defines the protocol that carries those services. The OSI management model uses an object-oriented approach. Managed objects all exhibit the same exterior appearance and accept the same set of commands. Part of the OSI standards is the Guidelines for the Definition of Managed Objects (GDMO). This standard provides a common way to define the objects being managed by a manager.

CMIP's basic design is similar to SNMP, whereby PDUs are employed as variables to monitor a network. CMIP, however, contains 11 types of PDUs and it employs much richer and more complex data structures with many attributes. Specifically these are as follows:

Variable attributes represent the variable's characteristics.
Variable behaviors are actions of that variable that can be triggered.
Notifications generate an event report whenever a specified event occurs.

CMIS/CMIP was designed to make up for the shortcomings of SNMP and offers a more powerful and scalable network management platform. Research and development were heavily funded by governments and several large corporations. CMIS/CMIP has had several false starts; initially it was considered unattractive by the user community, because it imposed major overheads on memory and processing resources when compared with SNMP (particularly back in 1988). Its reliance on the OSI Connection-Oriented Transport Service (COTS) protocol also caused great resistance. Unfortunately, further problems with its implementation, the usual standards overkill and delays and high deployment costs, have effectively stalled any widespread deployment. Implementations of CMIS/CMIP are currently used to manage carrier networks (i.e., PTTs, RBOCs, Inter-LATA carriers, etc.); however, it is very expensive to deploy and consequently is rarely seen in enterprise networks. The enterprise market is likely to remain dominated by SNMP for the foreseeable future.

Architecture

CMIP is the basic framework for OSI network management and is largely based on Digital's proprietary protocol architecture. CMIP managers manage CMIP agents and can communicate with many different types of objects. The CMIP applications fall into five categories, as follows:

Fault
Configuration
Accounting
Performance
Security

The Management Information Service (MIS) within OSI is provided by an Application Layer entity called the Systems Management Application Entity (SMAE). Since part of SAME's function is to monitor the status of the OSI protocol layers, it has a direct interface to all layers (otherwise it would have to rely on lower layers operating correctly to ensure management reachability—see Figure 9.6). SMAE gets its information about each level of protocol from an MIB. The MIB interfaces directly to each OSI protocol layer.

click to expand
Figure 9.6: OSI management structure.

Clearly, it would be overkill for devices such as routers to implement the full seven-layer OSI protocol stack. For such devices a minimum communication capability is provided by the use of an OSI subprofile (a thin slice of the OSI seven-layer stack, with just enough features in layers 3 through 7 to manage the device). Even so, many vendors in the IP world consider this to be an unacceptable overhead, since the protocol stack may be required solely for self-management (running a seven-layer OSI stack on a modem does seem like overkill).

Protocol

CMIP uses the full OSI Application Layer protocol over a Connection-Oriented Transport Service (COTS), although CMIP was initially defined for use over X.25. A management station must establish a virtual circuit between itself and a managed node before any useful management functions can be performed. This has some implications on network performance, as follows:

When the network is already congested, the connection-oriented CMIP protocol will only add more burden. Unfortunately, this is when management is needed most.
In COTS mode CMIP requires a minimum of three packets (handshaking) for any meaningful management interaction.
Active management sessions mean that more resource (CPU, RAM) is required on CMIP-enabled devices to maintain these stateful connections.

Subsequently, there have been efforts to standardize the use of CMIS/ CMIP using other transports. Common Management Services and Protocol over TCP/IP (CMOT) and Common Management Services and Protocol over LAN (CMOL) are two examples. CMOT is a protocol configuration proposed by the IETF, where CMIP is available over a connection-oriented TCP stack or a connectionless UDP stack, defined in [35]. To support this functionality in the IP environment a streamlined OSI session, presentation, and application service is offered over TCP/IP, including a Lightweight Presentation Protocol (LPP), as defined in [36]. LPP uses two well-known UDP/TCP port numbers; the CMOT manager uses port 163 and the CMOT agent uses port 164. It is fair to say that none of these transport variants has gained any significant interest from the wider user community; SNMP continues to dominate the installed base.

Primitives

CMIS provides the following primitives:

Get
Set
Event
Create
Delete
Action

SNMP puts restrictions on the use of complex items (such as lists); CMIP does not. Unlike SNMP, CMIP distinguishes between objects and their attributes (e.g., an object may be a port on a hub, and an associated attribute might be its state).

ASN.1

Abstract Syntax Notation One (ASN.1) is a formal language used to define an abstract data format, enabling information to be exchanged between systems at the binary level regardless of machine architecture. In other words, ASN.1 is the Esperanto of data representation. ASN.1 provides two basic representations for information: a human-readable format and an encoded format for use by communications protocols. ASN.1 is an ISO OSI protocol standard [37], broadly equivalent to ITU/T X.409. ASN.1 normally encodes data using the Basic Encoding Rules (BER) [38], although this is not mandatory.

ASN.1 is used by both CMIS/CMIP and SNMP to encode management information; it is used to define both managed objects and PDU formats. Managed objects are described as ASN.1 OBJECT-TYPEs, which comprise five fields (OBJECT DESCRIPTOR, SYNTAX, ACCESS, STATUS, and DESCRIPTION). Typical object definitions are illustrated in the standard MIB-II interfaces group [11].

The five fields are described as follows:

OBJECT DESCRIPTOR (Name) is a textual name for the object. In our example, ifAdminStatus.
SYNTAX defines the data type associated with the object. ASN.1 constructs are used to define this structure (note that in SNMP the syntax of ASN.1 is not used). The ASN.1 type ObjectSyntax defines three categories of object syntax: Simple, Application-Wide, and Simply Constructed. Examples include INTEGER, OCTET STRING, NetworkAddress, Counter, Gauge, TimeTicks, SEQUENCE, and SEQUENCE OF.
ACCESS defines the level of access permitted for the object. It can be either read-only, read-write, write-only, or not accessible.
STATUS defines the managed objects implementation requirement (Mandatory, Optional, or Obsolete).
DESCRIPTION is a textual description of the object.

For further information, the interested reader is referred to [37–41].

Performance

CMIP is more powerful than SNMP and generally enables more efficient operations. As we have already discussed, CMIS/CMIP in COTS mode can represent a significant burden on internetworking equipment such as bridges, repeaters, and routers and is unlikely to gain much support from users (since this adds both cost and complexity). Only the carriers can currently afford the luxury of a full OSI management implementation.

CMIP advantages and disadvantages

CMIP has several advantages over when compared with SNMP, as follows:

CMIP variables not only relay information to and from the terminal (as in SNMP), but they can also be used to perform tasks that would be impossible under SNMP. For instance, if a terminal on a network cannot reach its file server a predetermined amount of times, CMIP can notify the appropriate personnel of the event. With SNMP, however, the management application would have to explicitly keep track of how many unsuccessful attempts to reach a file server a terminal has incurred. CMIP results in a more efficient network management system, since less work is required by a user to keep updated on the status of the network.
CMIP addresses many of the shortcomings of SNMP. For instance, it has built-in security management devices that support authorization, access control, and security logs. The result of this is a safer system than the installation of SNMP; no security upgrades are necessary.
CMIP was funded by governments and several large corporations. One can deduce that CMIP not only has a very large development budget, but also that when (and if) it becomes a widely available protocol it will have numerous immediate users—namely, the governments and corporations that funded it.

The disadvantages of CMIP are summarized as follows:

CMIP has a relatively small installed base and little user acceptance.
CMIP takes more system resources than SNMP (some say by a factor of ten). In other words, very few systems would be able to handle a full implementation of CMIP without massive network modifications (such as the installation of extra memory and new protocol agents).
CMIP is difficult to program, which greatly slows down deployment and requires more skilled staff to implement interfaces and applications.

9.1.5 Configuration management

In many texts about network management, configuration management receives little attention. In real networks, however, configuration management is arguably the single most critical system management discipline. Configuration management determines how your network and systems will operate and provides input to many different important processes (including problem, change, and asset management). Configuration management can also impact scalability; in multivendor networks the general lack of integration of configuration management systems can be a major factor in delaying the rollout of new services. We can broadly divide configuration management into two functions, as follows:

Planning and administration—Managing a large enterprise requires careful planning and control to meet business objectives and avoid network outages. Planning includes functions such as inventory control, configuration design, change control, connectivity information, financial modeling, recovery procedures, and optimization. The main objectives of this phase are to provide cost-effective use of resources and to meet business objectives in a timely manner.
Operations—In this phase the live configuration of systems is implemented and monitored. This function is responsible for the correct day-to-day operation of the network, in particular the performance and availability of systems on that network.

In an ideal world these functions will be tightly integrated, with well-defined information flows between them. Networks are live entities and so the two processes are often in a state of flux.

In practice, configuration management is one of the most difficult and time-consuming management disciplines and is one of the hardest to quantify in terms of the value to the business. For example, deploying hundreds of routers in remote locations may require skilled engineers at each site if remote configuration management features are poor. For any large service provider this is simply unacceptable, since skill shortages are likely to limit rate of service rollout according to the number of free engineers. A better solution is to configure all nodes centrally, either prior to shipping or over a dial-up line (meaning that a less skilled engineer can be used to connect the right cables and power up the device). For maintenance purposes all configuration data need to be collated and held centrally, so that any remote failures can be recovered quickly.

The need for automation and integration

In anything other than very small networks, configuration management demands some level of automation, both to reduce the number of skilled engineers and to reduce the possibility of errors. Many organizations invest huge amounts of money developing their own configuration management functions, either through script files or custom software. SNMP, the de facto management standard, is particularly weak in the area; SNMP Set and Get simply do not scale, and due to security weaknesses many devices disable Set altogether.

Traditionally, the configuration of networking equipment has for the most part been carried out using a combination of Telnet and a vendor-specific Command-Line Interface (CLI). One big advantage of these techniques is that they are fast and relatively low bandwidth, meaning that they are also suitable for out-of-band management in emergencies (Telnet can be combined with dial-back modem or ISDN support to facilitate remote access if the network is down). The network administrator would typically configure a number of scripts at the central site, and then download them to each device over a Telnet or console connection. For a large, managed service these script files may be identical for all, or defined, groups of nodes. Often the only difference is that a unique IP address is required for each device (for a remote Telnet connection this requires either that the device preconfigured with an IP address or that it can be reached via a modem connection initially). This approach has served the industry reasonably well for several years but has several drawbacks, including the following:

The user is required to learn a number of different CLIs.
Text-based CLIs provide variable degrees of assistance; some are extremely terse.
Security is often poor (usually just a plain ASCII password, transferred unencrypted over the wire). Secure protocols such as SSL and SSH may be available, but this really depends on the vendor.

Many experienced engineers still prefer to use CLIs, since, once learned, they can be very efficient interfaces for rapid configuration and diagnostics. As indicated, CLIs often support scripting for bulk network configuration (such as injecting hundreds of large static routes into a routing table). In a large network this could be the only way of ensuring the timely rollout of equipment.

Web-based configuration management

In recent years HTTP (Web)-based browser management interfaces have become increasingly common as the standard way to manage distributed systems (such as routers and firewalls—see Figure 9.7). This interface is especially easy to implement, requiring a simple HTTP server running on the remote device, with hooks into the system configuration interface and system databases. The major benefit for users is that they simply run a standard Web browser on the client (such as Netscape Navigator or Microsoft Internet Explorer) and connect to the remote device's IP address or domain name. The management interface, therefore, has the appearance of a set of customized Web pages (often with links for additional help to the vendor's home Web site).

click to expand
Figure 9.7: Nokia's Voyager HTTP interface for firewall configuration management.

This type of interface has been used primarily for configuration management (configuring interfaces, enabling protocols and basic statistics) and so complements the NMS rather than replacing it. The advantages of this approach are obvious: Web browsers are installed on almost all PCs today as part of the operating system, and most users are now comfortable using this point-and-click interface (thus alleviating the need to learn a new command set for different vendor equipment). Another major advantage is that standard security mechanisms are readily available for this type of interface to ensure that the remote party is trusted—for example, the SSH and SSL protocols. The main disadvantage of this approach is that it tends to be slow; it does not scale when deploying hundreds of devices. Some Web-based systems do, however, include the facility to download configuration scripts from within the interface by simply pressing a button.

Large-scale configuration management

Sophisticated configuration management systems are already emerging to automate and control large-scale parameter and OS image deployment. These systems typically maintain a complete network database of location data, version numbers, addressing information, time last updated, and the methods used (e.g., SSL/TLS, SSH, Telnet, etc.). Implementations can enable the network planner to customize standard templates for different equipment types (routers, switches, by vendor, etc.), together with the ability to organize templates into logical groups (either by device class or some other useful hierarchy, such as network topology or site geography).

9.1.6 Network management platforms and implementations

In this section, we briefly discuss the different types of network management products available commercially or via public domain or shareware repositories. The list of products is representative but not exhaustive.

MIB browsers and development kits

At the noncommercial end of the scale there are numerous MIB browsers, mostly public domain or offered as part of a development kit. These tools offer basic MIB interrogation and data retrieval features (by exercising primitives such as Get, Set, etc.) and the ability to send Traps manually. Their primary use is for testing and demonstrating SNMP implementations, but they can offer limited diagnostic capabilities. Figure 9.8 shows a simple SNMP browser available as part of the HP SNMP++ development kit. SNMP++ offers an excellent object-oriented C++ class library API, which is available on both Windows and a number of UNIX environments. The source code is available free, subject to the inclusion of a copyright notice. The interested reader is referred to [32].

click to expand
Figure 9.8: HP++ SNMP browser tool. An example showing the Get primitive being configured for MIB-II.

We mentioned earlier the standard ISODE SNMP implementation, which is freely available and widely implemented. Public domain source code is also freely available from both the Massachusetts Institute of Technology (MIT) and Carnegie Mellon University (CMU). A respected commercial network management development environment is SNMPc from Castle Rock [42]. SNMPc runs under Windows 98 and NT and provides an integrated MIB compiler plus excellent programming APIs. The APIs include a simple DLL-based API, the WinSNMP de facto standard API, and a Windows DDE-based API (programmable from Visual BASIC and from many Windows applications, such as Microsoft Excel).

Vendor management products

Many of the larger networking vendors have in the past developed their own management systems. Often this was used as a way to lock in customers to a single-vendor solution. With the proliferation of networking systems there are now very few vendors that can offer complete end-to-end solutions with best-of-breed products in all categories; therefore, customers tend to buy routers, firewalls, and servers from different vendors. Having spent years of man-effort developing these management systems and not recovering their costs, hardware vendors have either given up on the mission to create a complete framework or sold off their network management businesses (e.g., Retix split off Network Managers, and Cabletron Systems recently relinquished control of its Spectrum products). Examples of vendor- or domain-specific management products include the following:

Cisco: CiscoWorks and StrataSphere
NorTel/Bay Networks: Optivity
Novell: NetWare Manager
Apple Macintosh: MacSNMP

While some of these products may claim to be generic management solutions, it is rare to find a management system in this class as the sole NMS in a multivendor network (with the possible exception of Spectrum). Framework vendors tend to be independent specialists.

Network management frameworks

Historically, some of the key vendors offering complete network management architectures included Hewlett-Packard, Digital Equipment, IBM, Tivoli, CA, and Sun Microsystems. Today there are several high-end commercial network management implementations, including Computer Associates, Unicenter TNG [43]; Tivoli, Tivoli Management Environment (TME), NetView [44]; Hewlett-Packard (HP), HP OpenView ITSM, Network Node Manager [45].

Most of these platforms are SNMP based, but several also offer an OSI/ CMISE stack. Some of the more sophisticated management platforms (so-called management frameworks) are expensive—prices start around $50,000 and typically reach millions of dollars. Frameworks typically offer a whole range of facilities, including the following:

Hierarchical Manager-of-Manager (MoM) model, with a centralized event management console and distributed Mid-Level Managers (MLMs)
Distributed GUIs (e.g., X Windows, Java GUIs, HTTP browser access)
Security management tools (AAA services)
SLA management
Database management tools, resilient or mirrored object databases, often LDAP enabled
Software distribution tools (version control, remote boot and configuration)
Topology management (automated discovery via SNMP, Telnet, ping, routing tables, ARP caches, etc.)
Inventory management (semiautomated database population of software and hardware revisions, device types, location data, addressing information, etc.)
Element management (via product-specific modules (PSMs)
Alarm and event consolidation (via overall topology knowledge, Trap-directed polling, or a rulebase)
Help-desk management

There is an increasing trend for hooking these frameworks into business processes, so that purchasing, deployment, inventory, and maintenance processes can be integrated into a (supposedly) seamless architecture. From a data management perspective this makes good sense, since it can significantly speed up processes and greatly reduce the potential for error, especially if all these systems interface via common APIs. However, this market is still relatively immature; many customers still do not appear to be getting value for money, since often these systems are cumbersome to use and require significant after-sales consultancy and customization [46]. In large multivendor networks the scope of these products means that they cannot always fulfill what is promised (or expected) out of the box. Some frameworks appear to be a thinly disguised mishmash of acquired products, hammered together with varying degrees of success. Nevertheless, these frameworks ultimately represent the best hope for large-scale integrated management. The alternative leaves network managers struggling with piecemeal solutions and hard-to-maintain in-house tools.

We briefly review two of the most important network management frameworks.

Hewlett-Packard (HP) has been very proactive in the area of network management, and HP OpenView is probably the best-known generic NMS platform on the market today. HP OpenView is well known for its SNMP support, but it also offers an OSI/CMISE stack. Its protocol stack is based on the International Standards Organization Decode/Encode (ISODE) releases. This can be run over the Lightweight Presentation Protocol (LPP) directly over UDP (i.e., the Common Management over TCP/IP, or CMOT, implementation). HP offers a base OpenView product as well as a distributed product. It also offers a very useful MIB browser. HP Open-View is based on object-oriented principles, with objects registering via an Object Registration Service (which provides services such as mapping names to network addresses). The OpenView Application Programmers Interface (API) offers a high degree of functionality and has been used successfully by many developers to produce their own enhanced variations of OpenView—either rebagged or relabeled products (e.g., DEC, IBM, and NCR/AT&T). The interested reader is referred to [45].

IBM's network management architecture is called NetView. NetView is designed to manage IBM's mainframe and associated devices and is one of the few proprietary management architectures to survive in the wake of SNMP. NetView runs on an IBM mainframe with terminal access via an application called NetView/PC. After the initial release of NetView, IBM began integrating other network management modules under the control of NetView, and the NetView protocols were opened up so that other network management systems could be hooked in (via the Application Programming Interface/Communication Services, API/CS. This provides an interface to NetView, access to DDM file transfer, and protocols for exchanging alarms and data). IBM has developed problem-solving strategies in this software and combined them with a comprehensive database of network status information; this creates a powerful tool, which may be extended and customized to meet particular customer needs. IBM LAN equipment (hubs, bridges, etc.) is managed via its LAN Manager Agent (not to be confused with Microsoft LAN Manager NOS). IBM's 6611 router is managed via SNMP. IBM's NetView 6000 application runs on an IBM RS/6000 host (running AIX) and was developed from HP's Open-View network management platform. NetView 6000 offers full SNMP management, together with a gateway into proprietary NetView (SNA) devices and the LAN Manager platform. It offers a highly integrated platform for mixed IBM/multivendor environments. For example, SNMP Traps can be converted into SNA alerts and vice versa. The system also has a strong device discovery capability and provides network monitoring for fault and performance measurements.

Element management

As we have already seen, standard MIBs contain only a subset of the management objects needed to run a real-life network, although more objects are being included over time. The major router, switch, and bridge vendors have spent years enhancing and differentiating the functionality of their devices, to the point where there are now many private objects (MIB enterprise extensions) used for advanced status or statistical data, together with objects associated with proprietary protocols. Without access to these objects the scope of a management console would be limited to generic MIB functionality.

Most vendors now publish their private MIBs on the Internet, and you can easily compile these private MIBs onto your management platform; however, depending on the capabilities of the SNMP browser you are using, you may have to do a considerable amount of work to make the best use of these raw objects. Most, if not all, high-end enterprise management frameworks include element management modules (sometimes called Product-Specific Modules, or PSMs). These modules not only include private.enterprise MIB data but may also include associated graphical device representations (detailed chassis views, card views, etc.), interactive status data, and remote configuration capabilities. One of the main headaches for a network management vendor is keeping PSMs current with device hardware and software changes, especially where a large number of PSMs are supported.

9.1.7 Basic implementation models

So far we have talked about the tools and techniques available. There are several ways to implement a management model in an internetwork, and we will discuss the various benefits and trade-offs of the approaches available.

Implementing hierarchical management

On a very small network there may be no need to consider management hierarchy—a single NMS will work fine. As the network grows, the amount of traffic and system overhead (CPU, RAM) imposed by management starts to become significant. As a general rule management traffic should not be running at more than 5 percent of the maximum bandwidth on conventional LANs (Ethernet, Token Ring, and FDDI). The problem is that most network management architectures rely on polling, in-band, to gain real-time data from key devices.

There is a direct trade-off between the amount of traffic created and the usefulness of the data retrieved, against factors such as the following:

The number of devices polled
The number of objects required for each poll
The polling frequency
The available link bandwidths and bottlenecks

On a large network you will need to consider distributing some of load and constraining polling locally. For example, assume we have a 100-node network spread over three LAN subnets and linked to a central site via 64-Kbps links, as illustrated in Figure 9.9. Assume we are interested in monitoring ten objects from each agent, each of which returns an integer (32-bit) variable, and we want to monitor these every 30 seconds. With a single NMS located in the central site, using SNMP, we would need to issue a single Get request for each object, resulting in ten Gets and ten Responses. Assuming an average packet size of 90 bytes (including the community string and object IS, etc.), we are looking at the following amount of data:

Total Traffic	=	90 (bytes) × 8 (bits) × 100 (nodes)
	=	72,000 bps

click to expand
Figure 9.9: Simple one-tier centralized management, with three remote subnets connected via 64-Kbps links.

This represents a worst case of 24,000 bps, or 37.5 percent of each 64-Kbps link, in each direction (assuming full-duplex operation), every 30 seconds. This does not take into account any latency in issuing the requests or receiving the responses. It would be tempting to divide this traffic by 30 (i.e., time), but unless you have configured the management application to schedule these requests evenly, the NMS is likely to process these operations as fast as it can, leading to a significant spike of management traffic. These figures also do not take into account queuing delays on intermediate routers [1], so there will be additional latency there. If the load incurred is seriously affecting user traffic, we could try to reduce the number of objects we are monitoring, but this might be a serious compromise in what we are trying to achieve. Even so, be ruthless about the number of objects polled; make sure that they are all really necessary and that no duplication is taking place.

One solution to this would be to place at least one polling agent in each subnet so that all device polling is constrained within the relatively fast LANs; this concept is sometimes referred to as the Mid-Level Manager (MLM) model (see Figure 9.10). The central NMS (a Manager-of-Managers, or MoM) could poll these MLMs periodically, using a more efficient means of uploading these bulk data at relatively quiet periods of the day. The MLM could either be another NMS or some management-enabled networking device (such as a concentrator or hub). Another solution would be to use more intelligent agents (such as RMON agents) to do most of the real-time collection locally in each of the end systems and network devices. Note that there are no SNMP standards for MoM or MLM communications, and this has essentially been left for the vendors to define.

click to expand
Figure 9.10: Manager-of-Manager (MoM) concepts offering a hierarchical and distributed management design.

In-band or out-of-band management

Most traditional management systems use in-band data collection schemes. As we have discussed, if you can keep the overhead to less than 5 percent, then, under normal conditions, this is a perfectly reasonable approach. The problem is, however, that when the network starts to fail, the NMS is using the network for data transport; this is likely to lead to loss of valuable data and events, and the management system can end up imposing more load on the network (e.g., if it retransmits) just when the network needs all the bandwidth it can get (especially one that is COTS based).

There is no simple solution to this. An out-of-band management system costs more money and adds complexity to the network. We can try to minimize this by using relatively low-cost media for management traffic (such as dial-up ISDN or modem links, or spare switch ports and LAN segments), but in practice this is rarely done, unless the network is mission critical or especially large. On some networks this may not be optional, and cost may be less of an issue. If a service provider guarantees availability as part of the SLA, the provider may have no choice but to install a highly rugged management architecture, with out-of-band access where appropriate. A network with many remote offices in inaccessible locations may also deem cost to be secondary.

Out-of-band management is, however, an attractive technique for remote configuration management, especially for internetworks where remote devices are very inaccessible or are located in areas of less skilled or no skilled staff. For example, a router housed in a remote military observation post could be equipped with a spare dial-up modem port for remote access should the primary link fail. One of the issues commonly associated with out-of-band management is security.