Defining the Knowledge Base Structure
In the process of baselining your network, you have started to accumulate information for your knowledge base. As you expand the types and extent of network management you implement, you can develop a more
After you select the tool(s) you plan on using to implement your knowledge base (see Chapter 9, "Selecting the Tools"), you will want to
|
Understanding Data Sources
This section breaks out each network management task you'll want to implement and discusses the knowledge you'll want to have in your knowledge base to support these tasks. The order in which these
Network InventoryThe first task you need to do to manage your network is to perform a network audit. Out of this audit, you will obtain the information required to create an inventory of your network. Performing this audit is covered in Chapter 1, "Conducting a Network Audit." The information collected should include the following:
If the device is connected to a switch, you may want to include information about this connection, as
You may want to include the following information about these devices in your knowledge base. This information may be useful in automating your performance and fault management functions:
To keep track of your network's layer 2 connectivity, you'll need to add the following information to your knowledge base:
You may also want to track your layer 3 connectivity. You'll want to add the following information to your knowledge base:
Policy-based Network ManagementThe knowledge required to support policy-based network management consists of the following:
Policy-based network management is covered in Chapter 2.
You may want to implement a two-
You would then add to your knowledge base the commands required to change the ToS on each device that this rule might be applied to.
A key part of policy-based network management is verifying that the policies are being met. The information collected during the baselining of your network will provide the base values to compare to the current state of your network and allow you to determine your network's rate of compliance with those policies. This information will be stored in the performance monitoring section of your knowledge base, which is covered
Performance Measurement and ReportingPerformance measurement and reporting entails several types of tasks and, therefore, several types of information:
Performance measurement and reporting is covered in detail in Chapter 4, "Performance Measurement and Reporting." We suggest that you implement availability monitoring first. Your knowledge base should contain information to support this, and if kept flexible, it will support additional performance measurements as well. Keep this in mind when designing your knowledge base.
You'll want to add which devices are
Let's look at what knowledge is required in your knowledge base to support each component of performance measurement and reporting. Availability MonitoringThe information your knowledge base will require to support availability monitoring includes the following:
Portions of your network may be "interesting" only during certain times of the day. For example, an office environment that is staffed from 8 a.m. to 5 p.m. may be interesting during these hours and not
Response Time MonitoringTo implement response time measurements, you need knowledge of your network at the applications level. You need to know where the users of the applications you want to instrument are located in your network and where the servers are located for those same applications. Just as you did for availability monitoring, you need to define what your expectations are for response time. You should have information about the response time performance of your network from the baseline. This information should be in your knowledge base and can be used to determine significant variances from this baseline. If you have SLAs or policies, these can be translated into rules that detail the times and expectations for responses expected in your network. To support response time monitoring, your knowledge base will need to include the following information:
You may also want to include the following:
Accuracy Monitoring
Evaluating the accuracy with which data is transmitted in your network requires much the same information you've
You probably want to monitor the same interfaces for accuracy that you monitor for availability. Only a little more information will be required to support accuracy monitoring, including the following:
Utilization MonitoringFor utilization, you may be able to use the same list of interfaces you use for availability and accuracy monitoring because most network managers don't monitor utilization on links going to individual PCs. The information you'll need to add to your knowledge base for utilization monitoring includes the following:
Performance Reporting
Many network managers and network management
You'll want to add the following to you knowledge base:
Configuring EventsYour next step in implementing network management is configuring events (this is covered in detail in Chapter 5). You should have collected the data you need to add during baselining. This information will build and expand upon the data you put in your knowledge base to support performance measurements. The specific information required include what objects are interesting to monitor for specific interfaces or groups of interfaces and what thresholds seem appropriate for these interfaces. Note that you will want to include both rising and falling thresholds for each object to allow you to implement hysteresis to reduce the volume of events received. You may consider implementing thresholds based on network technology. For example, acceptable error thresholds for LANs may be significantly lower than for WAN links. The information to add to your knowledge base include the following:
Prioritizing Faults
The information needed to process events and faults is an extension of the information you have already collected, as outlined thus far. The difference is the focus. Although you need to know what ports are interesting in order to start availability management, you need to supplement this information with prioritization information. That is, for each port that you might receive an event about, you need some information about whether the port is interesting (already in the knowledge base) and what priority of fault an
There are several ways to determine the priority of a fault. You could keep it simple and assign a priority to each port or
Another use for topology information is event correlation. One issue with availability monitoring is that it is usually done from one place in the network. This can give a
If topology information about the network is available, the fault management system can use event correlation techniques to determine that the directly connected link is down and, therefore, is the place to start determining the actual source of the fault. Placing a high priority on a fault of this type is probably desirable because there may be other faults that you cannot detect during the period in which your network management application cannot contact the network. The priority of a fault can vary, depending on influences outside of the network itself. These outside influences include the following:
If you defined an SLA, you may have specified criteria that you need to take into consideration, such as an agreement that a certain category of fault will be fixed within a certain period of time. As that time approaches, the priority of the fault may need to be increased to ensure that proper attention is given to it, considering your SLA.
Other considerations that may influence the priority of a given fault include who
Another consideration that could modify the priority of a fault is the time of day it occurs. The previous example covered in the availability monitoring section, in which an office network is very important during business hours but is much less important during other hours, is one example of time modifying the priority of the fault. Another example is a trading floor, in which it is desirable to have the network up at all times, but it is critical to have the network up during trading hours. You could also determine the priority of a fault by periodically examining the traffic statistics on the link affected by a fault, determining how much traffic and what type of traffic will be affected, and storing this information in your knowledge base. Obviously, this requires knowledge about traffic flow on your network. Another way of evaluating fault priority is by looking at the financial consequences of a fault. In this case, you need information about what the financial contribution portions of your network make to your company's bottom line. We recommend that you start out with a simple priority scheme and enhance it where and when required. If you start with a simple yet flexible scheme, you'll be able to add capabilities as you expand your network management. So, the items you'll want to add to your knowledge base to support prioritizing faults will include the following:
|