The network management process consists of choosing which characteristics of each type of network device to monitor/manage; instrumenting the network devices (or adding collection devices) to collect all necessary data; processing these data for viewing, storage, and/or reporting; displaying a subset of the results; and storing or archiving some subset of the data.
Network management touches all other aspects of the network. This is captured in the FCAPS model:
Fault management
Configuration management
Accounting management
Performance management
Security management
In this model, fault management consists of processing events and alarms (where an alarm is an event that triggers a real-time notification to network personnel); identifying, isolating, troubleshooting, and resolving problems; and returning the network to an operational state.
Configuration management consists of setting system parameters for turnup; provisioning the network; configuring the system and generating backups and restores; and developing and operating system databases.
Accounting management consists of monitoring and managing subscriber service usage and service billing.
Performance management consists of implementing performance controls, based on the IP services architecture; collecting network and system performance data; analyzing this performance data; generating short- and long-term reports from these data; and controlling network and system performance parameters.
Finally, security management consists of implementing security controls, based on the security architecture; collecting and analyzing security data; and generating security reports and logs from these data.
The network management process and management model both provide input to the network management architecture. With the knowledge of what network management means for our network, we can consider the following in the architecture:
In-band and out-of-band management
Centralized, distributed, and hierarchical management
Scaling of network management traffic
Checks and balances
Management of network management data
MIB selection
Integration into OSS
In-band management occurs when the traffic flows for network management follow the same network paths as the traffic flows for users and their applications. This simplifies the network management architecture, for the same network paths can be used for both types of data and a separate path (and possibly network) is not required (Figure 7.8).
Figure 7.8: Traffic flows for in-band management.
A trade-off with in-band management is that management data flows can be affected by the same problems that have an impact on user traffic flows. Since part of network management is troubleshooting problems in the network, it will be negatively affected if the management data flows are delayed or blocked. At the times when network management is most needed, it may not be available. Also, a primary objective of the network management architecture is to be able to do event monitoring when the network is under stress, for example, when congested with traffic, suffering from network hardware/software configuration problems, or under a security attack.
Out-of-band management occurs when different paths are provided for network management data flows and user traffic flows. This type of management has the distinct advantage of allowing the management system to continue to monitor the network during most network events, even when such events disable the network. This allows you to effectively see into portions of the network that are unreachable through normal paths (e.g., user data flow paths).
Out-of-band management is usually provided via a separate network, such as frame relay, ATM, or plain old telephone service (POTS) connections. Figure 7.9 illustrates this point. An advantage of having a separate network is that additional security features can be integrated into this (network management) network. Given that this network provides access to most or all network devices, having additional security here is important. Another advantage is that the out-of-band connection can be used to troubleshoot and configure network devices that are in remote locations. This saves time and resources when the user data network is down and remote network devices need to be accessed.
Figure 7.9: Traffic flows for out-of-band management.
Whenever out-of-band management is planned, a method to check and verify its availability is needed. This can be as simple as planning to use out-of-band management on a regular basis, regardless of need. This will help ensure that problems with out-of-band management are detected and solved while the network is still healthy.
A trade-off with out-of-band management is the added expense and complexity of a separate network for network management. One way to reduce the expense is to provide out-of-band monitoring at a low level of performance, relative to the user data network. For example, out-of-band monitoring may be achieved by using phone lines. Although this may be cheaper than providing dedicated network connections, it will require time to set up (e.g., call) the out-of-band connections and the capacity of each connection may be limited.
For some networks a combination of in-band and out-of-band management is optimal (Figure 7.10). Usually this is done when the performance of the user data network is needed to support network management data flows (for monitoring the operational network), but the separate, out-of-band network is needed when the user data network is down.
Figure 7.10: Combination of in-band and out-of-band management traffic flows.
Some trade-offs of having a combination of in-band and out-of-band management are that the expense of a separate network is still incurred and security issues on the user data network still need to be addressed.
Centralized management occurs when all management data (e.g., ping packets, SNMP polls/responses, traceroute) radiate from a single (typically large) management system. The flows of management data then behave like the client-server flows discussed in Chapter 4 and shown in Figure 7.8.
The obvious advantage to centralized management is that only one management system is needed, simplifying the architecture and reducing costs (depending on the choice of management system). A centralized management system often has a variety of management tools associated with it.
Some trade-offs to centralized management are that the management system is a single point of failure and that all management flows converge at the network interface of the management system, potentially causing congestion or failure.
Distributed management occurs when there are multiple separate components to the management system, and these components are strategically placed across the network, localizing network management traffic and distributing management domains. In Figure 7.11, multiple local element management systems (EMSs) are used to distribute management functions across several domains.
Figure 7.11: Distributed management where each local EMS has its own management domain.
In distributed management, either the components provide all management functions (monitoring, display, storage, and processing) or the distributed components are the monitoring devices. For example, distributed management may take the form of having multiple management systems on the network (e.g., one management system per campus or per management domain, as shown in Figure 7.11) or a single management system with several monitoring nodes, as in Figure 7.12.
Figure 7.12: Distributed management where monitoring is distributed.
Advantages to distributed management are that the monitoring devices act to localize the data collection, reducing the amounts of management data that transit the network. They may also provide redundant monitoring so that other monitoring devices on that network can cover the loss of any single monitoring device.
A trade-off with distributed management is that costs will increase with the number of monitoring devices or management systems needed.
Hierarchical management occurs when the management functions (monitoring, display, storage, and processing) are separated and placed on separate devices. Management is hierarchical because when the functions are separated, they can be considered layers that communicate in a hierarchical client-server fashion (see Chapter 4 for a discussion about hierarchical client-server flows). Figure 7.13 shows the structure of a hierarchical management system.
Figure 7.13: Hierarchical management separates management into distinct functions that are distributed across multiple platforms.
In hierarchical management, localized monitoring devices collect management data and either pass these data directly to display and storage devices or to monitoring devices to be processed. When the management data are passed on to display and storage devices without processing, the monitoring devices act as they did in distributed management, localizing the data collection and reducing the amounts of management data that transit the network.
When the management data are processed before being sent to display and storage devices, the monitoring devices act as local filters, sending only the relevant data (e.g., deltas on counter values or updates on events). This may substantially reduce the amount of management data in the network, which is especially important if the monitoring is in-band.
Thus, we can have monitoring devices at strategic locations throughout the network, polling local devices and network devices, collecting and processing the management data, and forwarding some or all of these data to display and storage devices. The numbers and locations of each type of device will depend on the size of the network, the amount of management data expected to be collected (discussed later in this chapter), and where the displays and storage devices are to be located in the network management architecture.
An advantage to hierarchical management is that every component can be made redundant, independent of the other components. Thus, it can be tailored to the specific needs of your network. In some networks, it may be preferable to have several display devices, whereas in other networks, having several processing devices or storage devices may be preferable. These components are separate, so the amounts of each can be individually determined.
A trade-off in hierarchical management is the cost, complexity, and overhead of having several management components on the network.
Some recommendations are presented here to help determine and optimize the capacity requirements of network management traffic.
For a LAN environment, start with one monitoring device per IP subnet. For each subnet, estimate values for the following traffic variables:
Number of devices and network devices to be polled
An average number of interfaces per device
Number of parameters to be collected
Frequency of polling (polling interval)
Combining these variables will give you an estimate of the average data rate for management traffic per subnet. If the combined variable rate is greater than approximately 10% of the capacity (line rate) of the LAN, you may want to consider reducing the amount of management traffic generated by reducing one or more of the aforementioned variables. When the estimated average rate is less than 1% of LAN capacity, this indicates that it may be possible to increase one or more of the variables.
For most of the standard LAN technologies (Ethernet, Fast Ethernet, Gigabit Ethernet, Token Ring, FDDI), the management traffic rate should be targeted at 2% to 5% of the LAN capacity. As LAN capacity increases, you will have more available capacity for network management traffic and may choose to increase one or more traffic variables (Figure 7.14).
Figure 7.14: Scaling network management traffic.
For a WAN environment, start with one monitoring device per each WAN-LAN interface. This is in addition to any monitoring devices indicated in Recommendation 1. However, if a monitoring device is on a LAN subnet that is also the WAN-LAN interface, that device may be used to collect data for both the LAN and the WAN. Placing a monitoring device at each WAN-LAN interface allows us to monitor the network at each location, as well as measure, verify, and possibly guarantee services and performance requirements across the WAN.
Checks and balances are methods to duplicate measurements in order to verify and validate network management data. While doing checks and balances adds effort to the network management process, it is advisable to have more than one method of collecting network management data, particularly for data that are considered vital to the proper operation of the network. SNMP agent and MIB implementations are vendor implementation specific and are not guaranteed to provide data that are consistent across all vendors.
Objectives of performing checks and balances are to locate and identify the following:
Errors in recording or presenting network management data
Rollovers of counters (e.g., returning a counter value to zero without proper notification)
Changes in MIB variables from one software version to another
In addition, checks and balances help normalize network management data across multiple vendors by verifying data through measurements from multiple sources.
Collected management data should be verified for accuracy. For example, when polling for SNMP variables for an interface, consider RMON polling as well to verify these data. Consider using a traffic analyzer to verify data for various random periods. You may also run independent tests with traffic generators, the vendors' network devices, and data-collection devices to verify the accuracy of collected data.
Flows of network management data typically consist of SNMP parameter names and values, and the results of queries from utilities such as ping or traceroute. These data are generated by network devices and other devices on the network, transported via SNMP to monitoring devices, and possibly forwarded to display and storage devices. It is important to the network management architecture that we understand where and how the data are generated, transported, and processed; this will help us to determine where network management components may be placed in the network.
Management data may be generated either in a query/response (stateless) method, as with SNMP or ping queries, or in response to a prearranged set of conditions (stateful), as with SNMP traps. Large numbers of SNMP queries should be spread out over a time interval (e.g., polling interval), not only to avoid network congestion but also to avoid overburdening network devices and monitoring devices with the processing required to generate management data.
Management data will consist of frequently generated parameters for real-time event notification and less frequently generated (or needed) parameters for trend analysis and planning. The same parameters may be used for both purposes. Since frequent polling can generate large amounts of data, storage of these data can become a problem. Some recommendations for managing these data are presented in the following sections.
Determine which management data are necessary to keep stored locally and which data may be archived. Management data are usually kept local, cached where they can be easily and quickly retrieved, for event analysis and short-term (on the order of hours or days) trend analysis. Management data that are not being used for these purposes should be archived to secondary or tertiary storage, such as tape archives or offsite storage (Figure 7.15).
Figure 7.15: Local and archival storage for management data.
When a management parameter is being used for both event notification and trend analysis, consider copying every Nth iteration of that parameter to a separate database location, where the iteration size N is large enough to keep the size of these data relatively small yet is small enough so the data are useful in trend analysis. In Figure 7.16, SLA variables are polled regularly (each variable polled per second), and every Nth poll is saved in long-term storage (archival). Depending on the bandwidth and storage available for network management traffic, N can range from 102 to 105.
Figure 7.16: Selective copying to separate database.
A trade-off in selective copying of data is that whenever data are copied, some data may be lost. To help protect against this risk, you can use TCP for data transmission or send copies of data to multiple archival systems (e.g., 1 primary 1 redundant).
If there are indications that more immediate analysis needs to be done, then either a short-term trend analysis can be done on the locally stored data (from Recommendation 1) or the iteration size N can be temporarily shortened.
When management data are collected for trend analysis, data can be stored locally to the management device then downloaded to storage/archival when traffic is expected to be low (e.g., at night). In Figure 7.17, polls of network management data are made in 5-minute intervals and stored locally. These data are then downloaded to archival storage once or twice daily, usually when there is little user traffic on the network (e.g., at 2 AM).
Figure 7.17: Data migration.
Metadata include additional information about the collected data, such as references to the data types, time stamps of when the data were generated, and any indications that these data reference any other data. A management data archival system should provide such additional information regarding the data that have been collected.
MIB selection is determining which SNMP MIBs to use and apply to, as well as which variables in each MIB are appropriate for, your network. This may, for example, be a full MIB (e.g., MIB-II is commonly used in its entirety), a subset of each MIB that is required for conformance to that MIB's specification (also known as a conformance subset of the MIB),[2] enterprise-specific MIBs (the parameters that are available from each vendor/network-element type), or you could define a subset of MIB parameters that apply to your network.
For example, a subset of performance monitoring parameters can be used from the interfaces MIB (RFC 2863): ifInOctets, ifInErrors, ifInUcastPkts, ifOutOctets, ifOutErrors, and ifOutUcastPkts. This set of six parameters is a common starting point for MIB parameters. These parameters can usually be measured on all interfaces for most network devices.
One can consider MIB variables falling into the following sets: a common set that pertains to network health and a set that is necessary to monitor and manage those things that the network needs to support, including:
Server, user device, and network parameters
Network parameters that are part of SLAs, policies, and network reconfiguration
When the network includes an interface to an OSS, the network management architecture should consider how management would be integrated with the OSS. The interface from network management to OSS is often termed the northbound interface because it is in the direction of service and business management (see Section 7.4). This northbound interface is typically CORBA or SNMP (Figure 7.18).
Figure 7.18: Integration of network management with OSS.
Internal relationships for the network management architecture are the interactions, dependencies, trade-offs, and constraints between network management mechanisms. It is important to understand these relationships—they are part of a complex, nonlinear system and these relationships define and describe the behavior of this architecture.
Interactions within network management may include interactions between components of the management system; between the network management system and network devices; and between the network management system and the OSS.
If there are multiple network management systems or if the network management system is distributed or hierarchical, then there will be multiple components to the management system. The network architecture should include the potential locations for each component and/or management system, as well as the management data flows between components and/or management systems. The interactions here may be in the form of SNMP or CMIP/CMOT queries/responses, CORBA, HTTP, file transfers, or a proprietary protocol.
Part of network management is in each managed network device, in the form of management data (e.g., MIB variables) and software that allows access and transport of management data to and from the management system (e.g., SNMP agent software). Therefore, interactions between network management components (particularly monitoring devices) and managed network devices can also be considered here. We may choose to consider all of the managed network devices, depending on how many of them are expected in the network; however, we usually do not consider all managed devices in the network, for there can be quite a large number of them. As we discussed in Chapter 4, the devices that are most likely to be considered are those that interact with several users, such as servers and specialized equipment. Interactions here are likely to be in the form of SNMP or CMIP/CMOT queries/responses.
If your environment includes an OSS, there will likely be some degree of interactions between network management and the OSS for flow-through provisioning, service management, and inventory control. The network management architecture should note where the OSS would be located, which components of the network management system will interact with the OSS, and their locations in the network. Interactions here are likely to use CORBA but may use SNMP or HTTP (see the following subsection).
Dependencies within network management may include dependencies on capacity and reliability of the network for management data flows, dependencies on the amount of data storage available for management data, and dependencies on the OSS for the northbound interface requirement.
Network management may be dependent on the performance of the underlying network for support of management data flows. In its most basic sense, the network will need to provide sufficient capacity for the estimated amount of management data. This estimate can be derived using the information in Section 7.5. This is particularly important when network management is centralized, and all management data will be aggregated at the network management system interface. This may also be a dependency on IP services, discussed later in this section.
The amount of management data that can be stored is in part a function of how much storage will be available, so network management can be dependent on data storage availability.
Although network management may interface with an OSS, it may also be dependent on that OSS to determine the northbound interface. For example, some OSSs require CORBA for their interface, which will have to be supported by network management. This may also be considered a constraint on network management.
Trade-offs within network management may include trade-offs between in-band and out-of-band management and trade-offs between centralized, distributed, and hierarchical management. Trade-offs between in-band and out-of-band management include the following:
In-band management is cheaper and simpler to implement than out-of-band management, although when management data flows are in-band, they can be affected by the same problems that affect user traffic flows.
Out-of-band management is more expensive and complex to implement than in-band management (because it requires separate network connections); however, it can allow the management system to continue to monitor the network during most network events, even when such events disable the network. In addition, out-of-band management allows access to remote network devices for troubleshooting and configuration, saving the time and effort of having to be physically present at the remote location.
Out-of-band management, by definition, requires separate network connections. This may be a benefit in that security for the separate network can be focused on the requirements of management data flows, or it may be a liability in that additional security (with its associate expense and overhead) is required for this network.
When in-band and out-of-band management are combined, it will still have the expense and complexity of out-of-band management, as well as the additional security requirements; however, the combination allows (typically) higher-performance in-band management to be used for monitoring (which is the high-capacity portion of network management) yet still allows out-of-band management to be used at critical times (e.g., when the user data network [and in-band paths] is down).
Trade-offs between centralized, distributed, and hierarchical management include the following:
In centralized management, only one management system is needed (all management components, as well as other tools, are on one hardware platform), simplifying the architecture and reducing costs (depending on the choice of management system) over distributed or hierarchical management systems, which may have several separate components. However, centralized management can be a single point of failure, and all management data flows are aggregated at the management system's network interface, potentially causing congestion or failure. Distributed or hierarchical management can avoid central points of failure and reduce congestion points.
The degrees of distribution or hierarchy in management are a function of how complex and costly you are willing to allow the management system to become and how important it is to isolate management domains and provide redundant monitoring. Costs for distributed or hierarchical management will increase with the number of monitoring devices or management systems needed. However, if you are willing to accept high management costs, you can provide a fully redundant, highly flexible hierarchical management system for your network.
Constraints include the northbound interface from network management to the OSS. This interface may be constrained by the interface requirement of the OSS. Since the OSS potentially ties together several service and business components, its interface requirements may be forced onto network management. CORBA is often required for this northbound interface.
External relationships are trade-offs, dependencies, and constraints between the network management architecture and each of the other component architectures (addressing/routing, performance, security, and any other component architectures you may develop).
Network management should be a part of all the other architectural components, for they will need some or all of the monitoring, control, and configuration capabilities that network management provides. As such, each of the other components will interact at some level with network management.
There are common external relationships between network management and each of the other component architectures, some of which are presented here.
Network management depends on the addressing/routing architecture for the proper routing of management data flows through the network. If the management is in-band, the routing of management data flows should be handled in the same fashion as the user traffic flows and does not require any special treatment in the architecture. However, if the management is out-of-band, the routing of management data flows may need to be considered in the architecture.
Network management is often bounded by the network or networks that are under common management. A management domain is used to describe a set of networks under common management; an autonomous system is another term often used. Thus, the routing and addressing architecture may define the management domain for the network, setting the boundaries for network management.
If the management is out-of-band, the separate management network may require routing and addressing as part of the architecture.
Network management will interact with the performance architecture through the collection of network performance data for use to verify the proper operation of performance mechanisms. This may occur through a northbound interface (described earlier) to OSS or to a policy database for performance. Performance will also depend on network management to provide data on performance and function of the network.
A trade-off between network management and performance is in how much network resources (e.g., capacity) network management requires, as it may affect the ability to support various performance levels. This is particularly true when management is centralized, as management data flows in centralized management are aggregated at the management system's network interface.
Network management can depend on performance in two ways: First, if performance mechanisms support best-effort traffic (as determined in the flow specification), part of this best-effort traffic can be allocated to network management data flows; second, if a higher-priority service is desired for network management data flows, then network management will be dependent on performance mechanisms to provide the necessary support for such services. When network management data flows require high-priority service, network management may be dependent on performance mechanisms to function properly.
Network management is dependent on some level of security in order to be used in most operational environments. This may be security at the protocol level (e.g., SNMP security) and/or securing access to network devices. If the management is out-of-band, the separate network that supports this management will need to be secured.
Network management may be constrained by security, if the security mechanisms used do not permit network management data or access across the security perimeter. This may also be considered a trade-off, when it is possible to reduce the level of security to support access or management data transport across the security perimeter.
For example, consider the use of POTS for out-of-band access. Such dial-in access is unacceptable to many organizations, unless extra security measures are taken on each access line (e.g., dial back, security keys).
[2]Conformance subsets of MIBs are usually listed at the end of each MIB's specification (RFC).