Network Baselining


To determine whether a network can deliver a particular policy, you should measure the network's current performance. This process is called baselining, whereas the process of interpreting the data is called baseline analysis. Baselining allows you to discover the true performance and operation of the network in terms of the policies that you've defined. It is an attempt to determine what is "normal" so that you know when an event is abnormal.

After identifying data of interest for a policy, baselining allows you to take a snapshot of the current state of those variables throughout the network. Baseline analysis leads to an understanding of what service levels can actually be achieved.

How does the network perform day to day? Where are the under- and over-utilized areas? Where are you seeing the most errors? What thresholds should you set for the devices you plan to monitor? Can the network deliver the identified policies? Conducting a baseline analysis of your network also allows you to measure and answer these types of questions.

The purpose of conducting a network baseline is to measure the performance and availability of critical network devices and links, and compare them over time. The baseline allows a network manager to determine the difference between abnormal behavior and "business as usual." It also provides insight into whether the current network design can deliver required policies and SLAs.

NOTE

There is a plethora of tools to choose for your baseline analysis. Chapter 9, "Selecting the Tools," identifies the criteria to use for selecting tools and discusses whether to build a tool yourself or buy it.

Generally, the tool that you plan to use for your day-to-day performance monitoring will suffice as a baseline tool. Be sure that you can segregate the baseline data from the day-to-day monitored data and that you have the ability to store the baseline data over long periods of time.


The collected data will reveal the true nature of congestion or potential congestion in the network. It may also reveal areas in the network that are underutilized. Analysis after an initial baseline tends to reveal hidden problems and quite often leads to network redesign efforts based on quality and capacity observations.

Another reason for conducting a baseline is that no two networks operate or behave the same way, so books such as this one cannot provide concrete threshold or "worry" values. Part II of this book provides suggested thresholds based on real-life experience, but the values are meant to be used as starting points rather than definitive settings. Your baseline will reveal appropriate thresholds for your network.

Analysis of the collected data will also serve to populate the knowledge base, as described in Chapter 3, "Developing the Network Knowledge Base."

Without a baseline, you can only guess the nature of network traffic and congestion. Network engineers glean this information, either when they put a network analyzer on the network for troubleshooting or when they happen to be logged into a router or switch and run a show command. This technique can be problematic because a single view at a single time does not capture the true performance of the network over the hours and days of its operation.

Following sections describe the general methodology for defining, collecting, and reporting a network performance baseline. The study entails the collection of key performance data from the ports and devices considered to be mission-critical. The baseline is a vital preliminary step in determining the network's personality. It also simplifies the creation of effective service level agreements and thresholds.

The following are steps to building a baseline:

  • Planning for the first baseline

  • Identifying devices and ports of interest

  • Determining the duration of the baseline

Planning the First Baseline

When conducting your first baseline, start out simply select a few variables that represent your defined policies well. If you begin by collecting many data-points, the amount of data can be overwhelming, and determining how to make sense of the collection can be difficult. Hence, start out simply and fine tune along the way.

Generally, some good starting measures are interface and CPU utilization. Please see Part II for recommended variables and usage.

Collect the data for a day or two before starting the actual baseline study to determine whether you are getting the right data from the right devices. After you collect a couple of days' worth of data, play with it. Graph the findings in different ways until you find something that makes sense to you. Slicing through the data in different ways can reveal interesting and sometimes surprising observations.

Pick the top couple of reports that have meaning and study them to determine whether there is more information you need in order to understand a particular pattern or trend. Then, fine-tune the data to be collected and begin the actual baseline study.

TIP

You should plan on conducting a baseline analysis of your network on a regular basis. Whether you conduct an annual analysis of your entire worldwide network or baseline different sections of the network on a rotating quarterly basis, you must conduct a baseline regularly in order to understand how the network grows and changes.

Gathering the information in a consistent manner and analyzing the data will keep you on top of your network and allow you to make informed design decisions as well as hasten your fault isolation. You'll also get better at it with practice.


Identifying Devices and Ports of Interest

As part of planning the baseline, you must identify the ports of interest. Ports of interest include those network device ports that connect to other network devices, servers, key users, and anything else considered critical to the operation. By narrowing the ports you plan to poll, your reports will be clearer and you will minimize network and device management load. The sections "Where Are Your Network Devices?", "Where Are Your Servers?", and "Where Are Your Key Users?" in Chapter 1 provide more detail on ways to select critical ports.

After the ports have been identified, you must ensure that processes are in place to either keep that connection from being changed or to inform you if the connection gets changed. Without this assurance, your reports will become inaccurate.

For example, a report may indicate that a backbone port on a particular port is performing fine, when in fact the device connected to that port is no longer a router but a user's PC. You've been monitoring the wrong port!

TIP

One method to track the ports of interest is to use the port description fields on devices to indicate what is connected. For instance, if backbone router A is connected to switch port 1/1 on the main campus, you should configure the port descriptions for each of the devices to describe the device connected at the other end.

The port description can then be used for adding clarity to the reports you will be creating from the baseline and performance monitoring.


Determining the Duration of the Baseline

How long should the baseline collection last? The duration should last as long as it takes to gather a "typical" picture of the network.

The collection needs to last at least seven days in order to capture any daily or weekly trends. Unless you are looking for specific long-term trends, the baseline needs to last no more than six weeks. Generally, a two-to-four-week baseline is adequate.

Do not perform the collection during times of extraordinary traffic patterns. For example, do not conduct the baseline over a holiday or during December if most of the company is on vacation.

When conducting the baseline, you should set the duration in seven-day increments. Weekly trends are just as important as daily or hourly trends.

For example, suppose that on Sundays at 2 a.m., the engineers in Building 7 run a massive backup and software refresh (which you didn't know about) on all 200 of their workstations. Because the backup server happens to be in Building 9, the push and backup saturate the corporate backbone. If you chose to run your baseline from Monday through Friday, you would have missed the saturation.

However long you decide to conduct your initial baseline, plan to try different durations on subsequent baselines. This will allow you to discover the optimal analysis collection for your network.

Using the Baseline Data

This is where your efforts pay off. You have defined the policies that drive your efforts, you have identified those variables that measure the policies, and you have collected the data from critical devices and connections for a period of time. By baselining your network, you should be able to gain a clearer understanding of the true nature of the capacity and quality of service your network delivers. You must now analyze the snapshot.

You can use the data in the following ways to learn more about your network:

  • Identify undesired network behavior

  • Identify thresholds for fault and performance monitoring

  • Analyze long-term performance and capacity trends

  • Verify policies

Identifying Undesired Network Behavior

An immediate benefit of baselining your network is the objective identification of undesired network behavior. By generating reports that identify the most utilized lines, for example, you can objectively pinpoint those areas of the network that are either experiencing problems or are prime for failure.

At the same time, you can identify under-utilized areas of the network. Where redundancy is involved, you may discover that traffic is routing almost entirely over one link of redundant connections and not the other. This may be undesirable because you can almost double the bandwidth available in normal operating network with load sharing.

The identification of undesired network behavior may lead to a network redesign or a change in network policy. Or you may simply find a device that is misconfigured.

Identifying Thresholds

Efficient fault management requires the setting of thresholds that reflect different levels of warning. Arriving at the appropriate thresholds for each of your network policies requires a baseline analysis.

Initially, most network managers consult their network device documentation or vendor technical support for a set of recommended or default thresholds. Generally, the answer they receive is "It depends the thresholds vary according to your network."

Unfortunately, the nature of distributed networking forces the process of identifying or predicting a fault to be an art. Vendors who develop performance and fault tools can provide you with a set of defaults. More often than not, however, you must fine-tune the settings to meet your particular network activity.

The baseline analysis provides you with the data to study performance and fault patterns over a period of time. From the data, you can derive the appropriate thresholding as it applies to your network policies.

Predicting Long-Term Performance and Capacity Trends

As part of the planning cycle for your network, you can benefit from studying network growth over time. You can begin to understand how the network may continue to grow in the future, and more reliably provide concrete data when trying to obtain funding for network expansion.

As stated earlier, you should plan to conduct baseline studies on a regular basis usually anywhere from annually to quarterly. By comparing the data from each baseline, you can isolate long-term growth tends.

These types of reports tend to be reported in terms of total bandwidth or total capacity for each link, device, and perhaps for the network as a whole.

Network growth tends to occur in large bursts. Hopefully, you can anticipate new application growth by working with application owners and then grow the network accordingly.

Verifying Policies

Determining whether a policy can be achieved in your network is a good use of a network baseline. By reporting the data in terms of the defined policies, you can determine whether the network is adhering to or violating a policy, and to what degree adherence or violation is occurring.

If the data reflects a policy violation, you must consider how to resolve the issue. First, how serious is the violation? How often does it occur and how long does it last? Is the source of the violation a misconfiguration? Either you must redefine the policy based on what the network currently delivers or you will need to redesign the network.

If the study reveals the need for a network redesign, data from a good baseline analysis can be used objectively to justify the need for new equipment or service purchases.



Performance and Fault Management
Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)
ISBN: 1578701805
EAN: 2147483647
Year: 2005
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net