11.6 Route Flap Damping


11.6 Route Flap Damping

Route damping may be a relatively new term in routing, but the problems caused by route flapping have been around for many years . Damping is a method of suppression of BGP updates for routes that are deemed unstable. The need for damping came about owing to the increase in the number and volatility of routes in the Internet. Unstable routes may have a profound effect on the interdomain routing table because changes must propagate throughout the entire Internet.

Route flapping results from instability in an upstream system that causes routes to be advertised and withdrawn in excess. Essentially, an UPDATE message is sent to provide NLRI for a prefix; then, another subsequent UPDATE message is sent withdrawing the same prefix. The repetition of this process over and over for the same prefix is called route flapping.

In many cases, if the oscillation of a flapping route is repetitive enough, it is considered good practice to withdraw the advertisement until the route has stabilized. Such a large volume of route volatility could cause an otherwise stable BGP session to go down owing to overloading, but this is a very extreme case.

When route flapping occurs, two significant problems can arise: The first is protocol convergence; the second is the potential to perpetuate the flapping by sending similar UPDATE messages to other external peers, advertising and then withdrawing the same prefix. This can go on and on, causing significant instability in multiple routing domains.

Consider the following analogy. You are sitting at home and, all of a sudden, your electricity goes out. What should you do? You should call the electricity supplier of course. Two minutes after your call the electricity returns, but there is a technician on the way to your house. So you call the supply company again to tell them everything is fine. Five minutes after you get off the phone, the electricity goes out again, and again you call the supply company.

It is likely that you are not the only caller for this issue. The supply center has been overrun with calls because all of your neighbors have been reporting problems as well. Apparently, an industrial company had been overloading the circuit that feeds your home, causing the power outages. The unfortunate result was that a few hundred simultaneous calls were made to the supply center's help line. However, not all calls were able to make it through. Some were dropped, owing to the lack of answering capability. Route flapping is a lot like this, where a route fluctuation causes a flood of updates to be sent out if no damping measures are in effect. Updates can sometimes increase route instability if they are received more quickly than a router can handle.

Juniper Networks routers are designed to withstand great amounts of route instability due to the separation of the routing engine from the PFE. In addition, they have built-in route-damping capabilities to minimize routing instability. In general, we can classify Internet routes as falling into two categories:

  1. Stable ” routes that do not flap very often, if at all

  2. Unstable ” routes that are volatile and flap regularly

Damping is a process to subdue the problem of route flaps by providing a calculated method for the suppression of UPDATE messages. Damping is typically implemented in exterior border routers to keep the flapping routes from being injected into and withdrawn from the internal routing domain (IGP routing process). This type of implementation causes minimal impact to the underlying AS routing domain.

Route flap damping devises a mechanism whereby the unstable routes are penalized for their volatility. The goals of BGP route flap damping are defined and outlined in RFC 2439 [4] as follows :

  • To provide a mechanism capable of reducing router processing load caused by instability

  • To prevent sustained routing oscillations

  • To avoid sacrificing route convergence time for generally well-behaved routes

RFC 2439 outlines a number of parameters that should be used to implement route flap damping.

JUNOS implements damping in accordance to RFC 2439. The configurable parameters (default values listed in parentheses) are as follows:

  • half-life decay (15 minutes) ”the time used to compute the rate of decay of the figure of merit

  • max-suppress threshold parameters (60 minutes) ”the maximum number of minutes that the route can be suppressed without reinstalling the prefix in the route table and starting the process over again

  • reuse threshold (750) ”the figure-of-merit value that must be reached through the decay process to reuse the prefix information (install in route table and propagate)

  • cutoff or suppress threshold (3,000) ”the figure-of-merit value used to define the threshold to meet and exceed before suppression of the prefix will occur

The example below shows the sample code for implementation. If the default values are to be used, there is no reason to use the policy-options configuration.

 protocols {      bgp {         group ebgp {             type external;             neighbor 10.0.23.1 {                 damping;                 peer-as 100;             }         }     } } policy-options {     damping test {         half-life 15;         reuse 60;         suppress 750;         max-suppress 3000;     } } 

Each of these parameters will be discussed, along with the penalties that a route can incur for flapping, but to understand them, you must first be familiar with the concepts of figure of merit and exponential decay. A figure of merit is a graph that plots representative information. The figure of merit that is used for route flap damping is an exponential half-life decay. Fortunately, this is easier to understand that it would seem!

11.6.1 Half-life Decay

Half-life decay is the amount of time that it takes for half of the particles in a sample to decay. A well-known half-life used today is that of the Carbon-14 isotope. Scientists are able to date an object based on measurements of the amount of Carbon-14 present in an object. This carbon dating method was developed by a group of scientists led by the late Willard F. Libby of the University of Chicago who received the Nobel Prize in Chemistry in 1960 for this discovery. The half-life of Carbon-14 is 5,568 years. This means that if you take a sample today of an object that is 5,568 years old, half the original amount of Carbon-14 will remain. In 11,136 years (2 x 5,568), one quarter of the original amount will remain because half of the remainder after 5,568 years will have decayed. In 16,704 years there will be one- eighth of the original amount of Carbon-14 left, and so forth.

So, how does all this apply to route flap damping? Exponential decay is used as a method of reducing the figure of merit for a route because the rate of decay slows over the course of time. This means that a damped route will remain in the figure of merit for a period of observation or until its damping value has reached a predefined reuse threshold.

All routes begin with a figure-of-merit value of 0. On a two-dimensional graph, the figure of merit is plotted on the vertical axis and time is plotted on the horizontal axis. If a route fluctuates, then it is penalized as follows:

  • Route withdrawn ”penalty of 1,000

  • Route readvertised ”penalty of 1,000

  • BGP attribute change ”penalty of 500

The default values for route flap damping are as follows:

  • A decay half-life of 15 minutes ”the amount of time that it takes the figure-of-merit value to decrease by half; configurable for a range of 1 to 45 minutes

  • max-suppress time for a route of 60 minutes ”the maximum amount of time for which a route can be suppressed; configurable for a range of 1 to 720 minutes

  • cut-off threshold for a route of 3,000 ”figure-of-merit value at which a route is withdrawn from service and deemed unusable until its value decays to the reuse threshold; configurable for a range of 1 to 20,000

  • reuse threshold for a route of 750 ”the figure-of-merit value at which a route is put back into service; configurable for a range of 1 to 20,000 and should be lower than the cut-off threshold

Damping works on a figure of merit. Table 11-8 lists the figures of merit JUNOS applies based upon the associated event.

Table 11-8. Figures of Merit
Prefix Action Figure-of-Merit Points
withdrawn +1,000
readvertised +1,000
path attributes change +500

Figure 11-9 illustrates an example of a figure of merit using default values. The example has a given route that begins with a figure-of-merit value of 0. When the route flaps for the first time, it is given a penalty of 1,000. The route immediately begins to decay based on a half-life of 15 minutes. After a few minutes, the route flaps once more, so it is penalized again by another 1,000 points. Since the figure of merit has decayed, this leaves the value at around the 1,800 mark. The route flaps a third time, but again after a period of decay. This leaves the value at around the 1,900 mark. After another period of decay, the route flaps a fourth time, taking us up to 2,500 on the figure of merit. The route still remains in the table because it has not met the cut-off threshold. The figure of merit then decays slightly, but the fifth flap results in the route being withdrawn from service after the figure of merit reaches 3,100. The route is kept out of service until the figure of merit decays enough to fall under the reuse threshold value of 750; however, the hold time will not exceed a maximum of 60 minutes. In this example, the decay took two half-lives, or 30 minutes.

Figure 11-9. Figure of Merit

graphics/11fig09.gif

Note

Note that this example was fairly lenient because the route flapped five times before it was suppressed. A lower decay threshold would have caused the route to be suppressed much sooner.


Next , we have a simple example of a route being advertised, withdrawn, and readvertised. We will follow the JUNOS method of damping. Damping is enabled, as was shown in the previous configuration example. Prefix 172.16.0.0/20 is advertised to our local system. Our local system receives the prefix and installs it into the routing and forwarding tables.

The remote system withdraws the route, causing our figure of merit to increase by 1,000, per the event values listed in Table 11-8. Because we are using default values, the decay period is set to 15 minutes. Because the figure of merit has not met or exceeded our default cut-off threshold of 3,000, the prefix will not be suppressed. However, the decay process will start because there is a figure of merit. Based upon the half-life calculation, the figure of merit will decrease to 500 after 15 minutes, then to 250 after another 15 minutes, and so on.

For this example, let's assume that immediately after the prefix is withdrawn, it is immediately readvertised and withdrawn again. This will add 1,000 to the figure of merit for the readvertisement, and another 1,000 for the withdrawal. This brings our total figure of merit to 3,000. Looking at Table 11-9, we can see how the decay process decrease the figure of merit every 15 minutes. According to our reuse threshold, we see that after 30 minutes of suppression, we can reuse the prefix information.

Table 11-9. Figure-of-Merit Decay
Figure of Merit Decay Value Minutes
3,000 1,500 15
1,500 750 30
750 375 45
375 187.5 60
187.5 93.75 75
93.75 46.875 90
46.875 23.4375 105
23.4375 11.71875 120
11.71875 5.859375 135
5.859375 2.9296875 150
2.9296875 1.46484375 165
1.46484375 0.732421875 180

If damping is configured, we can view the parameters by issuing the show policy damping command. We can also use the show route detail command to view damping information on a given prefix.

11.6.2 Damping Policies

Damping policies are normally implemented using route filters. Different damping parameters can be applied based on prefix length. A well-known publication entitled "RIPE Routing-WG Recommendations for Coordinated Route-Flap Damping Parameters," better known as RIPE-229 [17], provides excellent watermarks on which to base BGP route-damping parameters. The following example has been subdivided into two sections to improve readability and reduce complexity:

 protocols {      /* Dont enable damping globally - we want it to just apply to our eBGP peers.*/     bgp {         group  your-flappy-peername  {             /* turn on damping for this external peergroup */  damping;  type external;             /* process the damping policy "graded-flap-damping" on the input side */  import graded-flap-damping;  peer-as 65500;             neighbor 192.168.1.1;         } } 

In general, damping should only be applied to external BGP peers. Routes that are advertised by internal peers should have already been damped when they entered the network from an external peer. The graded-flap-damping import policy is applied to the external BGP peer group, which sets the appropriate damping parameters. Among those parameters is a list of special networks, such as the DNS netblocks, which should never be damped, owing to their high-level importance. These networks are defined in the golden-networks prefix list, used to group all special networks together.

 policy-options {      /* Define Golden Networks here. Keep these current */     prefix-list golden-networks { 198.41.0.0/24; 128.9.0.0/24; 192.33.4.0/24; 128.8.0.0/16; 192.203.230.0/24; 192.5.4.0/23; 192.112.36.0/24; 128.63.0.0/16; 192.36.148.0/24; 193.0.14.0/24; 198.32.64.0/24; 202.12.27.0/24;     }     /* Define a damping policy. */     policy-statement graded-flap-damping {         term exclude {             from {                 prefix-list golden-networks;             }             then {                 damping set-none;                 next policy;             }         }         term damping { from {     /* Lower penalty for prefixes of size /21 and smaller */     route-filter 0.0.0.0/0 upto /21 damping set-normal;              /* Medium penalty for prefixes of size /22 to /23 */     route-filter 0.0.0.0/0 upto /23 damping set-medium;               /* Higher penalty for prefixes greater than /24  */     route-filter 0.0.0.0/0 orlonger damping set-high; }             then {                    next policy;                }         }     damping set-high {         half-life 30;         reuse 1640;         suppress 6000;         max-suppress 60;     }     damping set-medium {         half-life 15;         reuse 1500;         suppress 6000;         max-suppress 45;     }     damping set-normal {         half-life 10;         reuse 3000;         suppress 6000;         max-suppress 30;     }     damping set-none {         disable;     } } 

Four levels of damping exist based on the recommendations laid out by RIPE: set-high , set-medium , set-normal , and set-none . Each of these specifies a different group of damping parameters applied within the policy. Those networks that fall in the golden-networks category should never be damped and are ignored, using the set-none parameter. Prefixes with lengths of size /21 and smaller are given lower restrictions according to the set-medium parameters. Aside from the golden-networks , each of the parameters is based on the size of the prefixes that it is applied to. The most restrictive of damping policies are applied to those prefixes that are /24 and larger because fewer hosts are presumably affected. Larger networks are given higher priority and credibility over smaller route announcements.

The half-life for the set-normal parameters is set to 10 minutes, so if the route is suppressed, the figure of merit will decay quickly with suppression not to exceed 30 minutes. In all three of the damping parameters, it would take three flaps to reach the suppress value of 6,000. Recall that one route withdrawal and route readvertisement add up to 2,000. The longest period of time that a route will be damped is one hour if it is of size /24 or larger.

All of the configured damping parameters, along with computed threshold values, can be viewed by issuing the show policy damping command at the CLI prompt while in user mode:

 user@Chicago> show policy damping  Default damping information:   Halflife: 15 minutes   Reuse merit: 750 Suppress/cutoff merit: 3000   Maximum suppress time: 60 minutes   Computed values:     Merit ceiling: 12110     Maximum decay: 6193 Damping information for "set-none":   Damping disabled Damping information for "set-normal":   Halflife: 10 minutes   Reuse merit: 3000 Suppress/cutoff merit: 6000   Maximum suppress time: 30 minutes   Computed values:     Merit ceiling: 24090     Maximum decay: 12453 Damping information for "set-medium":   Halflife: 15 minutes   Reuse merit: 1500 Suppress/cutoff merit: 6000   Maximum suppress time: 45 minutes   Computed values:     Merit ceiling: 12049     Maximum decay: 12449 Damping information for "set-high":   Halflife: 30 minutes   Reuse merit: 1640 Suppress/cutoff merit: 6000   Maximum suppress time: 60 minutes   Computed values:     Merit ceiling: 6577 Maximum decay: 24933 

The next section shows how different levels of damping can be applied and configured through the use of AS path regular expressions and BGP communities.



Juniper Networks Reference Guide. JUNOS Routing, Configuration, and Architecture
Juniper Networks Reference Guide: JUNOS Routing, Configuration, and Architecture: JUNOS Routing, Configuration, and Architecture
ISBN: 0201775921
EAN: 2147483647
Year: 2002
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net