Why Develop a Knowledge Base? | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

Network administrators will typically start managing their network by purchasing a management package that supports auto-discovery. After they turn the package on, it discovers every device in their network. It then starts generating events as PCs come and go on the network, equipment is tested in labs, and a myriad other reasons. Some events may point to legitimate faults in your network. However, the signal-to-noise ratio is excessively high to be useful. A knowledge base will help your network management system to reduce that signal-to-noise ratio to an acceptable ratio via a variety of methods, including indicating which devices and ports should be managed.

For example, suppose an SNMP trap comes in that indicates that port 2/3 on device Switch1 has gone down. With no knowledge of this network, it is impossible to determine whether this event is reporting a fault in the network, much less the severity of the possible fault. For example, this port may be the connection from your company to the Internet. Loss of this link may affect access to external critical resources for all your company if it is the only connection, and fixing it quickly is critical. On the other hand, if there is a redundant link, fixing the connection may be a lower priority. Alternatively, this port may be connected to one end user's PC and they may have turned their PC off for a plethora of reasons. In this case, you probably don't even want to know it happened, or you want it logged to a file that will only be looked at if some other issue in the network requires re-evaluating your events. You can't tell which of these scenarios applies without knowledge of the topology of the network.

Without knowledge about the performance of this network, it is unlikely that you can set thresholds for performance analyses that don't result in too many or too few events. Thus, the information gained from baselining the network needs to be placed in your knowledge base so it can be used for generating useful thresholds for your network.

You also need knowledge about how important this network and specific pieces of this network are to the operation of the company. This is often stored as a set of rules that are derived from your established network and communications policies. These policies collectively implement the service level agreements (SLAs) that you have with your users. Chapter 2, "Policy-Based Network Management," covers policies, SLAs, and rules. You can choose to store only the rules in your network knowledge base and/or store the information the rules are derived from. As network management becomes more advanced, you will be able to specify this type of information at an increasingly higher level.

An example of using this information is a trading house that may have several networks in-house and places the highest priority on the trading network during trading hours because the core business of the company is impacted. A network tying remote offices may be assigned a medium priority because an outage may affect only a few customers. In addition, a network primarily used to connect PCs to printers may be assigned the lowest priority the company may be willing to accept longer outages on such a network in trade for lower equipment or support costs.

Another level of knowledge that may be reflected in SLAs or may be more an oral tradition in your company is the financial, political, or religious level. The network connection that deserves more attention than others is an important piece of information. Obviously, the CEO's connection is important, but what may be less obvious is the connection of a particular bright and rising star who has the ear of the CEO. To properly process faults, this information is important.

There may be tactical ways of managing this information that fits in the existing structure of your knowledge base, such as treating these devices as in the same class as critical servers.

Another example would be verifying connectivity to an important group of top executives, all of whom use laptops that are often connected to and disconnected from the network. Your knowledge base may have information about other devices on the same portion of the network, such as a printer, that could be monitored to verify connectivity to this portion of the network, thus acting as a proxy for the availability of these laptops.