Layer 3 VPNs | Network Virtualization

Chapter 5 discussed the subtleties of RFC 2547bis. We now delve into some recommendations about how to deploy this architecture in the campus network and MAN. The main motivation for an enterprise to adopt this architecture is scalability. So far, all the techniques we have presented have scalability restrictions at different levels.

RFC 2547bis the MPLS Way

Although RFC 2547bis VPNs are scalable, they require sophisticated software and hardware that may not be readily available throughout the enterprise. Therefore, MPLS VPNs are likely to be adopted in large campus network deployments that actually resemble a MAN.

Campus Network / MAN Deployment

When you are deploying RFC 2547bis, the recommendation is to position the PEs at the edge of the routed domain. Thus, the first Layer 3 hop that traffic from a host encounters (usually the distribution switches) would ideally be a PE router. When using MPLS to deploy RFC 2547bis, pushing the PE role to the distribution switches implies that all devices in the routed domain must support label switching (a.k.a. MPLS), and all distribution switches must support PE functionality, which is basically the capability of supporting multiple VRFs, multiprotocol iBGP (MP-iBGP), and MPLS.

In this scenario, there are no CE routers because the PE routers (distribution switches in this case) are directly connected to the user subnets or VLANs. User VLANs are terminated at the distribution switches (PE), where the VLANs belonging to a group are mapped onto the VRF for the corresponding group, as described previously.

The deployment of RFC 2547bis requires a core IGP for the interconnection of the PEs. This IGP is the same as the IGP used in a nonvirtualized campus network or MAN and should be designed according to standard hierarchical best practices for campus network and MAN design. The core IGP serves a dual purpose: It allows the establishment of MP-iBGP sessions between the PEs, and it provides the global IP information for the establishment of label switched paths (LSPs) between PEs. When all traffic is assigned to VPNs, the core IGP should not carry information on user prefixes. However, the fact that the core IGP can carry user prefixes if necessary is a powerful migration tool that we exploit in a later section.

Label switching and a Label Distribution Protocol (LDP) must be enabled on all platforms. On the P routers, this must be enabled on all interfaces. On the PE routers, label switching is only to be enabled on the core-facing interfaces (those facing the P routers). Example 6-22 shows the commands to enable MPLS at the global and interface level.

Example 6-22. Enabling MPLS

 mpls label protocol ldp tag-switching tdp router-id Loopback0 force ... interface GigabitEthernet1/1  description To P1 - intf G3/0/0  ip address 125.1.100.50 255.255.255.252  tag-switching ip !

RFC 2547bis also requires a full mesh of MP-iBGP neighbor relationships between all PEs. This full mesh is used to populate the VRFs with VPN routes. The reason for a full mesh is that iBGP is nontransitive, and therefore any prefix that is to be announced by a PE over iBGP must be a directly connected prefix. The scalable alternative to full mesh of BGP sessions is the use of route reflectors (RRs), which we discuss in more detail in the section "BGP Best Practices: Route Reflectors."

The general configuration for MP-iBGP on a PE follows. In this particular example, the PE is peering with a pair of RRs (125.1.125.15 and 125.1.125.16), and there are two VRFs, which include directly connected subnets into their BGP advertisements through the command redistribute connected. Let's look at it step by step. As illustrated in Example 6-23, you must create the BGP process and the iBGP neighbor sessions with the RRs. This is no different from any regular BGP deployment, and there is no multiprotocol element to this step.

Example 6-23. Enabling BGP Neighbor Sessions

 router bgp 1  no synchronization  bgp log-neighbor-changes  neighbor 125.1.125.15 remote-as 1  neighbor 125.1.125.15 update-source Loopback0  neighbor 125.1.125.16 remote-as 1  neighbor 125.1.125.16 update-source Loopback0  no auto-summary  !

After the neighbor sessions have been created, these sessions must be activated within the VPNv4 address family and the extended community attribute must be enabled for each neighbor, as illustrated in Example 6-24. At this point, we are enabling the multiprotocol portion of BGP. As discussed in previous chapters, the extended community attribute will carry important VPN information such as the route targets (RTs). Simultaneously, the VPNv4 Network Layer Reachability Information (NLRI) updates will carry other important VPN information such as the route distinguishers (RDs).

Example 6-24. Enabling VPNv4 Updates and the Extended Community

  address-family vpnv4  neighbor 125.1.125.15 activate  neighbor 125.1.125.15 send-community extended  neighbor 125.1.125.16 activate  neighbor 125.1.125.16 send-community extended  exit-address-family !

Finally, the BGP characteristics for each VRF must be configured. This is done under the VRF address family and for each VRF. Any BGP configuration and tailoring is to be done under the VRF address families. This can include route redistribution directives, addition and alteration of attributes, route-map policy enforcement, and so on. This is illustrated in Example 6-25, where we use the redistribute connected command to include the directly connected VLANs in the BGP updates. This, of course, assumes that VLAN interfaces have been previously assigned to the VRFs.

Example 6-25. Configuring BGP per Address Family

  address-family ipv4 vrf red-voice  redistribute connected  no auto-summary  no synchronization  exit-address-family  !  address-family ipv4 vrf red-data  redistribute connected   no auto-summary  no synchronization  exit-address-family !

Multi-VRF CE Deployments

Many enterprise platforms do not support MPLS PE functionality. However, it is necessary to extend the VPN separation achieved in the MPLS cloud to non-MPLS routed platforms located in the periphery of the MPLS domain.

Most Cisco enterprise platforms support VRF-lite functionality, which is also known as multi-VRF CE functionality. Multi-VRF CE refers to the capability of CE platforms to host multiple VRFs. By connecting the VRFs at a CE with those at the PEs of a network, it is possible to extend the separation achieved by an MPLS VPN to non-MPLS hops in the periphery of the MPLS cloud. An MPLS VPN network surrounded by multi-VRF CE devices is illustrated in Figure 6-12.

Figure 6-12. MPLS VPNs with Multi-VRF CE

This model can be used to extend VPNs onto non-MPLS platforms in the campus network and MAN, especially in the WAN. We explore WANs in more detail in the next few chapters. To deploy the multi-VRF CEs, it is necessary to complete the following steps. We outline only the necessary steps here because the configuration detail has been discussed previously in the section "Layer 3 Hop to Hop."

Step 1.	Create VRFs on the non-MPLS platforms.
Step 2.	Create 802.1q trunks on the PE-CE links.
Step 3.	Associate each pair of VRFs to be interconnected with a VLAN on the 802.1q trunk. By assigning SVIs to VRFs.

Step 4.	Configure dynamic routing between the PE and CE for each pair of VRFs
Step 5.	Redistribute the PE-CE IGP routes in each VRF into the corresponding MP-iBGP address family at the PE.
Step 6.	Redistribute the MP-iBGP routes in each VRF into the corresponding PE-CE IGP instance.

RFC 2547bis over L2TPv3

We highlighted the fact that for a network to support RFC 2547bis VPNs over MPLS, every Layer 3 hop in the network must be MPLS capable. More precisely, every PE and P router must be a label switched router (LSR; that is, MPLS enabled).

As discussed in Chapters 4 and 5, the reason for requiring MPLS is that the VPN traffic is tunneled between PEs to preserve its separation. These tunnels are dynamic and are known as label switched paths (LSPs).

It is not uncommon for an enterprise to have a core that contains non-MPLS devices. In this case, running RFC 2547bis over Layer 2 Tunnel Protocol Version 3 (L2TPv3) provides an alternative in which only IP forwarding capabilities are required in the core. L2TPv3 tunnels can be established dynamically between PEs and substitute for the role of the LSPs in an MPLS VPN network.

Another scenario in which a non-MPLS solution is of value is that in which an enterprise wants to deploy RFC 2547bis VPNs over a core composed of multiple autonomous systems. An MPLS-based VPN solution requires the deployment of a separate VPN domain for each autonomous system. This fragments the VPN network and requires sophisticated inter-autonomous system routing mechanisms to be able to join the VPNs in the different autonomous systems. Because L2TPv3 tunnels require only IP connectivity to be established, these can traverse multiple autonomous systems without any special consideration. Therefore, an L2TPv3-based solution is ideal for the deployment of RFC 2547bis VPNs over a multi-autonomous-system core.

Note

When you are using L2TPv3 to traverse several autonomous systems, the MP-BGP sessions used to carry VPNv4 updates are still internal BGP sessions. The multiple autonomous systems simply provide IP connectivity and do not influence the autonomous system characteristics of the MP-iBGP overlay. Thus, a single BGP autonomous system is overlaid onto a multi-autonomous-system IGP cloud.

It is important to highlight that this solution continues to use all the components of RFC 2547bis except for the LDP and the LSPs. Therefore, all the considerations and best practice recommendations for core IGP, MP-iBGP, resiliency, and scalability are valid for any implementation of RFC 2547bis and are therefore discussed in a separate section.

RFC 2547bis over GRE

The scenario for GRE as a replacement for LSPs brings similar benefits as L2TPv3. Differences are inherent to the way the tunneling technologies work, but these are beyond the scope of relevance for virtualization and are more of an implementation debate.

Similar to L2TPv3-based RFC 2547bis, the GRE option continues to leverage MP-iBGP and a core IGP, and therefore all the best practices that are discussed in the following sections are applicable to all implementations of RFC 2547bis, including MPLS, GRE, and L2TPv3-based solutions.

IGP Best Practices

As explained in Chapter 2, the core IGP must be deployed in a hierarchical manner; and, where possible, redundant paths must be exploited to provide equal-cost multipath (ECMP) routing. A design with equal-cost paths will largely simplify the overlaying BGP and MPLS design and allow the campus network and MAN to continue benefiting from the fast convergence and load-balancing characteristics it has traditionally enjoyed.

The guidelines for IGP deployment can be summarized as follows:

Provide a hierarchical topology onto which the routing protocol can be laid.
Provide a routing protocol hierarchy by creating zones or summarizing.
Create symmetric topologies leading to equal-cost path formation.
Enable ECMP routing to exploit the use of equal-cost paths for load balancing and improved failover.
Optimize routing protocol convergence.

If all this is done optimally, the benefits of load sharing, rapid reconvergence, and containment of failures will be inherited by the overlaying VPN architecture. Therefore, providing rapid failover or load sharing in an enterprise VPN network might not be a matter of implementing complex MPLS-based traffic engineering, but this functionality may be available due to the design of our core IGP. The next few sections explore many guidelines that help our VPN overlay benefit from the resilient characteristics of our core IGP.

BGP Best Practices: Route Reflectors

As discussed, for each VPN to have a complete routing table, there must be a full mesh of iBGP sessions between all PEs. This will become hard to scale quickly and would require the reconfiguration of all PEs every time a PE is added or removed from the network. It is therefore recommended that route reflectors (RRs) be used and that all PEs peer with these RRs.

In general, it is recommended that the RRs be kept out of the data path. This limits the load on the RRs, protects them from attackers by limiting their reachability, and allows a choice of platforms optimized for memory-intensive tasks and control-plane functionality vs. platforms engineered to forward large amounts of traffic at high speed. It is also advisable that the RRs not be PE routers (for similar reasons).

RRs should be deployed in a resilient array to avoid creating a single point of failure in the network. Because this resiliency is based on TCP connections (or BGP sessions), it is possible to distribute the resilient RRs anywhere in the network to increase their availability.

BGP Best Practices: Route Distinguishers and ECMP Routing

When we are discussing VRFs and BGP, one concept that may be confusing is that of route distinguishers (RDs), their function and how they differ from route targets (RTs).

RDs are 64-bit labels that locally identify a prefix as part of a VRF. These 64-bit RDs are prepended to the IPv4 prefixes in the routing table to form VPNv4 prefixes. MP-iBGP updates carry these RDs as part of the VPNv4 NLRIs. However, the RDs will not determine whether a received NLRI is included in a particular VRF or not; therefore, some other coloring is required to decide which VRF the received routes are to be placed into.

RTs have more of a global role and are used to "color" the MP-iBGP routing updates between the PEs and determine which VRFs will or will not accept the routing updates. Thus, the RTs determine the formation of VPNs and also allow inter- VN route updates necessary for the creation of extranets.

Because the updates carry both an RT and RD value, it is possible to receive multiple updates for a single prefix and be able to differentiate the updates based on the RD value.

In a network that uses dual PEs to connect each prefix into the core, it is important to preserve the routing updates from both PEs for resiliency reasons. Campus networks and MANs are examples of networks that use dual PEs to provide resilient connectivity paths for the networks at each site.

We have recommended the use of RRs for the establishment of the required iBGP sessions. When an RR receives many updates to the same prefix, its default behavior is to choose one of the routes (the best route from its perspective) and discard the rest. Therefore, when dual PEs advertise the same prefixes twice, the RR "reflects" only its preferred routes and discards the redundant routes (and with them the information on the resilient paths we put so much effort into creating). If the updates are not identical, however, the RR "reflects" both routes. One way of making the updates unique is by assigning different RDs to each PE. In the example illustrated in Figure 6-13, the RR receives two updates for subnet 111.111.111.0: one from PE-1 and another from PE-2. Subnet 111.111.111.0 is in VRF red-data.

Figure 6-13. Multipath Routing and RRs

If VRF red-data is configured with the same RD on both PEs, the RR will advertise subnet 111.111.111.0 as reachable only via PE-1 or PE-2 but not both. Example 6-26 shows the BGP tables for the RR and the remote PEs. As seen in the example, the redundant route information is lost at the RR. We focus on the red-data VRF for this illustration.

Example 6-26. Equal-Cost Routes Lost During Route Reflection

 ! PE-1 ip vrf red-data  rd 10:1031  route-target export 10:103  route-target import 10:103 ! ------------------------------------------------------------ ! PE-2 ip vrf red-data  rd 10:1031  route-target export 10:103  route-target import 10:103 ! ------------------------------------------------------------ ! Route reflector 7200-DC2-RR1#show ip bgp vpnv4 rd 10:1031 BGP table version is 7344946, local router ID is 125.1.125.15 Status codes: s suppressed, d damped, h history, * valid, > best, iinternal,               r RIB-failure, S Stale Origin codes: iIGP, eEGP, ?incomplete    Network          Next Hop            Metric LocPrf Weight Path Route Distinguisher: 10:1031 *>i1.1.1.1/32       125.1.125.5              2    100      0 ? *>i3.3.3.11/32      125.1.125.5              2    100      0 ? *>i3.3.3.13/32      125.1.125.5              2    100      0 ? *>i111.111.111.0/24 125.1.125.5              0    100      0 ? * i                 125.1.125.6              0    100      0 ? *>i125.1.101.0/30   125.1.125.5              0    100      0 ? ------------------------------------------------------------ ! PE-3 7600-DC2-PE3#show ip route vrf red-data Routing Table: red-data Gateway of last resort is 125.1.125.17 to network 0.0.0.0 ...      111.0.0.0/24 is subnetted, 1 subnets B       111.111.111.0 [200/0] via 125.1.125.5, 00:00:09

When VRF red-data is configured with a different RD value on each PE, the RR will advertise subnet 111.111.111.0 as reachable via PE-1 and PE-2. Example 6-27 displays the corresponding configuration and BGP tables.

Example 6-27. Equal-Cost Routes Preserved by Using Distinct RDs

 ! PE-1 ip vrf red-data  rd 10:1031  route-target export 10:103  route-target import 10:103 ! ------------------------------------------------------------ ! PE-2 ip vrf red-data  rd 10:1032  route-target export 10:103  route-target import 10:103 ! ! Route reflector 7200-DC2-RR1#show ip bgp vpnv4 rd 10:1031 BGP table version is 7344894, local router ID is 125.1.125.15    Network                  Next Hop            Metric LocPrf Weight Path Route Distinguisher:  10:1031 *>i1.1.1.1/32                 125.1.125.5              2    100      0 ? *>i3.3.3.11/32        125.1.125.5              2    100      0 ? *>i3.3.3.13/32        125.1.125.5              2    100      0 ? *>i111.111.111.0/24   125.1.125.5              0    100      0 ? *>i125.1.101.0/30     125.1.125.5              0    100      0 ? 7200-DC2-RR1#show ip bgp vpnv4 rd 10:1032 BGP table version is 7344894, local router ID is 125.1.125.15    Network                  Next Hop            Metric LocPrf Weight Path Route Distinguisher:  10:1032 *>i111.111.111.0/24   125.1.125.6              0    100      0 ? ! PE-3 7600-DC2-PE3#show ip route vrf red-data Routing Table: red-data Gateway of last resort is 125.1.125.17 to network 0.0.0.0      1.0.0.0/32 is subnetted, 2 subnets ... B       111.111.111.0         [200/0] via 125.1.125.6, 00:04:20                               [200/0] via 125.1.125.5, 00:01:50

We have managed to populate the control plane with the necessary redundant routes. For the PEs to use the redundant routes for load-sharing purposes, it is necessary to enable the "iBGP multipath" capability on the PE routers. Doing so ensures that all available paths are used and traffic is balanced over these. The iBGP multipath command is used under the VRF address family configuration, as shown in Example 6-28.

Example 6-28. iBGP Multipath

 address-family ipv4 vrf red-data redistribute connected metric 1 redistribute ospf 1 vrf red-data match internal external 1 external 2 maximum-paths ibgp unequal-cost 8 no auto-summary no synchronization exit-address-family

Migration Recommendations

The introduction of MPLS and multiprotocol BGP (MBGP) in the enterprise represents a large technological leap from traditional campus network and MAN architectures. One of the main concerns for enterprises adopting these advanced technologies is having a nondisruptive migration strategy.

At a high level, the strategy should be based on creating all the necessary VN components before migrating any users from the original global routing space. In this manner, users continue to operate in their original environment while the new virtual environments are created. In the context of the MAN and LAN, we assume the enterprise is in control of all routed hops and if there are any leased circuits these are Layer 2 circuits. The scenario in which the enterprise deploys MPLS over its own Layer 3 infrastructure is also referred to as a self-deployed MPLS scenario.

You can complete the following control-plane changes without causing any disruption of the production traffic (notice that the core IGP remains untouched):

Creation of VRFs at the PEs
MP-iBGP configuration of the PEs

When ready to migrate the forwarding-plane to MPLS, the following guidelines will minimize downtime:

Enable label switching one link at a time.
Alter one link at a time to ensure that a redundant path always exists to forward production traffic. Try to migrate backup links first.
Enable label switching at both ends of each link before moving to the next link.
Migrate the least-critical portion of the network first and verify that traffic in the global table continues to be forwarded before migrating any other parts of the network.

At this point, all core links are migrated to tag switching, several VRFs are configured, and active MP-iBGP sessions exist between these VRFs. However, none of the user subnets have been included in any of the VRFs. Therefore, all user traffic continues to be routed as usual, although an MPLS label is used to forward the traffic. At this stage, it is advisable to verify that the VPNs are fully functional by using dummy prefixes and verifying that these are populated on the correct VRFs.

When users are ready to be migrated, you have two options:

Use a maintenance window to migrate the users to the VRFs.
Temporarily replicate the global table onto the VPN being populated. Doing so allows devices in the VPN and devices in the global table to continue to communicate with each other during the migration. After all required devices have been migrated to the VPN, the global routes can be eliminated from the VPN because they are no longer necessary. Although this approach reduces the perceived downtime, it does not fully eliminate it because ports go down momentarily when moved from one VLAN to another.