Peer-Based Model for Layer 3 VPNs | Network Virtualization

Almost all the VPN architectures that we have looked at so far have a trait in common: They all use an overlay model. CPE devices peer each other, oblivious to the fact that all their control- and data-plane traffic is tunneled across a network.

The architecture used by the VPNs that we examine in this section differs. It was first proposed in RFC 2547 and is often referred to with this moniker. The major difference is that CE and PE routers form a peer relationship. PE routers use Multiprotocol BGP (MP-BGP) to exchange customer route information.

In a Layer 3 scenario, this greatly reduces the routing complexity on the CE routers (which is the value of such a service to a customer). Now, the PE is the next hop for every CE. Figure 5-8 shows the peer relationship at the edge of the network. Compared to pure overlay solutions, the PE has more work to do. The details will become clearer as you proceed through this section.

Figure 5-8. RFC 2547 Control-Plane Interaction

Figure 5-8 shows a simple RFC 2547 network with two VPNs connected over a core network. The RED network has two sites, which use 192.168.3.0/24 and 192.168.4.0/24. The VPN must connect these together.

PE A and PE B peer with CE1 and CE2, respectively, and use OSPF to exchange routes. CE1 advertises 192.168.3.0/24, and CE2 advertises 192.168.4.0/24 over the dedicated link that connects to the PE router. When the routing protocols over the VPN connection converge, a show ip route command on CE2 will show 192.168.3.0/24 with a next hop of 192.168.2.1, which is an address on PE B.

Note that the CE-PE links are part of the customer address space.

When PE A receives an OSPF update from CE 1, it first stores the route in the RED VRF table. PE A has MP-BGP sessions open with PE B and PE C (only the first of these is shown in Figure 5-8). To provide Layer 3 reachability across the network core, the PEs must advertise customer routes to each other. Thus, for the RED VPN, PE A announces reachability information for prefix 192.168.3.0/24 to PE B, but in a modified form called a VPNv4 address. PE A also received route updates from its peers. In this case, it will learn that 192.168.4.0 is reachable through 10.10.10.12 (PE B's loopback address).

The BGP Network Layer Reachability Information (NLRI) exchange has important additional attributes, too, most notably the following:

VPNv4 address (8 + 4 bytes): 1:1:192.168.3.0/24 with next hop of 10.10.10.11
Route target (RT, 7 bytes)
VPN label (3 bytes)

Now we will review each of the items announced with MP-BGP.

RFC 2547 provides address separation between VPNs by using VRFs on PE routers, dedicated interfaces between CE and PE, and by creating an extended VPNv4 address space across the core network. VPNv4 addresses are created by concatenating a route distinguisher (RD) with the customer route prefixes. Every VPN must have different RDs (note that there can be more than one RD). PEs store customer IPv4 routes in VRF tables and exchange corresponding VPNv4 routers with other PEs. A traffic analyzer would see only VPNv4 data in the internal BGP (iBGP) packets. Each PE is configured so as to map an RD to every VRF. The iBGP session uses this value when it exchanges the prefixes in the VRF with other PEs. VPNv4 prefixes are used only in the control plane. Actual packets (at least with all current implementations) maintain their IP address format for source and destination. In addition, no change occurs to routing exchanges between CE and PE, which use standard IPv4.

PEs rewrite the next-hop address information (which would be CE address) and replace it with one of their own addressesthat is part of the core network address space. In this way, a PE can send traffic destined for another part of a customer VPN over the core network using standard routing lookups.

When the time comes to forward traffic to CE2, PE A will do a lookup to find the route to 192.168.4.0, which will resolve to 10.10.10.12, and PE A must do another forwarding information base (FIB) lookup to find the route to this address. This time the next hop will be 10.0.0.2, which is an LSR. The core routing protocol of Figure 5-8 announces the 10.0.0.0 prefixes, which includes the addresses used by the PE routers as their BGP identifier.

The next two sections discuss two forwarding-plane alternatives to carry the intra-VPN traffic, first with MPLS and then with L2TPv3.

Note the presence of the VPN label in the preceding list. Each PE generates a 20-bit label value for the VPN the address is associated with. The data plane uses this label to identify which VRF should be used to forward a packet received on a core-facing interface.

The RFC 2547 model naturally creates a full mesh between PEs, so every CE can reach any other CE in two hops. It is possible to constrain inter-CE reachability, even as far as creating a hub-and-spoke topology, using route targets (RTs). RTs are extended BGP communities that are announced between PEs.

A PE can be configured to export prefixes with a certain RT and import only prefixes that match a specified RT value. RTs allow arbitrary meshes to be built between sites.

Both RD and RT are encoded in a numeric format, usually based on the autonomous system of the site 100:1. Despite the common format, no link exists between the two. You can use multiple RD values within the same VPN with the restriction of one RD per VRF, but it is better to have a one-to-one mapping for operational simplicity.

Figure 5-9 shows PE1 and PE2 routers that export prefixes with a "Spoke" RT value of 100:1 and use 200:1 for import. PE3 imports the 200:1 routes into VRF_IN, which it then distributes to the CE_Hub1 router. CE_Hub2 announces these routes back to PE3, where they are placed in VRF_OUT, then exported with a 200:1 route-target value. As a result, traffic between CE1 and CE2 is routed through the Hub site.

Figure 5-9. Hub-and-Spoke Using RTs

The details follow:

PE1, PE2, and PE3 exchange prefixes using iBGP. PE1 imports only routes that have an RT equal to 200:1, so only routes from PE3 are loaded into its VRF.
PE3 imports routes from both PE1 and PE2 because they have an RT equal to 100:1. These routes are placed in VRF_IN. PE3 exports routes received from CE_Hub2 to the two other BGP peers using RT 200:1.
When CE1 sends a packet to CE2, it first goes to PE1.
PE1 looks up the CE1 prefix in its VRF and finds the next hop is PE3.
The packet is encapsulated in whatever protocol is required to traverse the VPN core and forwarded to PE3.
PE3 looks in its VRF and finds the next hop for CE2 to be Hub_CE1. The packet is forwarded across the Hub site network to Hub_CE2, which has a next-hop to reach CE2 to be PE3. The packet is now forwarded to PE3, which does a lookup in VRF_OUT and finds the next-hop to be PE2.
PE2 does a VRF route lookup and sends an IP packet to its destination across the dedicated interface that connects it to CE2.

In the default, full-mesh scenario, PE1 and PE2 of Figure 5-9 would install each other's routes so that packets from CE1 to CE2 would follow the most direct path across the core.

The RFC 2547 model supports auto-discovery but not auto-provisioning. If you add a new network at a site, reachability information is automatically propagated to all the other sites. With Layer 3 VPNs, you can also use route aggregation to simplify which prefixes must be advertised between sites.

Provisioning is not complicated, nor is it automatic because, as with any BGP session, you must configure the routes to announce on a PE (and, potentially, the route prefixes to import on other PEs, but this is not obligatory). If you add a new site connected to a new PE, every other PE must be configured to bring up an iBGP session with that PE. RFC 2547 allows the use of BGP route reflectors (RRs), which remove the N-squared connectivity between PEs, thus helping to scale to large numbers of sites, and make provisioning a one-time operation on any PE (which only peers with the RR).

The choice of BGP gives the RFC 2547 architecture well-understood properties of scale and robustness. As the protocol used to manage Internet backbone routes, BGP is known to be suitable for large networks. It also supports flexible policy statements. BGP was extended to work in RFC 2547 architectures; the result is known as MP-BGP and can announce VPNv4, VPNv6, IPv4, and IPv6 routes. Customers are free to use any routing protocol on CE-PE links. The only caveat is that the PE implementation must be able to store the routes in a VRF (and hence must be VRF-aware). Standard route redistribution allows appropriate routes to be announced to and imported from MP-BGP.

MPLS-VPN is the most common implementation of RFC 2547, and we examine it first. The RFC describes both the MP-BGP control plane and the MPLS-based forwarding plane. However, a role exists for a more generalized model that runs over IP networks, and we look at emerging proposals to run RFC 2547 over L2TPv3 and mGRE tunnels.

Note

The statement at the beginning of this section posited that almost all the previous VPNs were overlays. Which one wasn't? VPLS. The CLE peers with the network at Layer 2.

RFC 2547bis the MPLS Way

MPLS VPNs are built with a double layer of label. The inner VPN route label identifies the customer VRF, and the outer tunnel label identifies the next hop on the LSP to the egress PE. The easiest way to understand the operation is through an example. Figure 5-10 shows an end-to-end MPLS VPN example with routing information across the network.

Figure 5-10. MPLS VPN Forwarding

Before traffic is forwarded, PE routers must prepend labels so that the packet can reach the right VRF on the right PE. Figure 5-10 shows a packet going from CE green1 to CE green2.

1.	PE A identifies the next hop (PE D) for this packet as a BGP neighbor.
2.	PE A first imposes a label, 22, that will identify the VPN routing table to PE D. This label was advertised by the neighbor, PE D, during the exchange of BGP prefixes.

3.	The packet must now travel across the MPLS network, so PE A imposes another label, 96, that identifies the next-hop LSR on the IGP path to PE D. This label was advertised by the downstream LSR (LSR B) using LDP.
4.	Each LSR in the core swaps labels and forwards the packet as normal toward PE D. The penultimate hop pops the outer label. In Figure 5-10, there is only one hop to the egress LSR, so LSR B removes the outer label.
5.	PE D uses the remaining label, 22, to identify the GREEN VPN routing table to use for the packet and then pops the label from the packet.
6.	PE D next does an IP lookup in the VPN routing table to find the outgoing interface and forwards the IP packet to CE green2, which will route it to its destination.

It is important to understand that the LSRs have no view of the VPN traffic. They forward labeled traffic along LSPs established by whatever routing protocol is running in the core network. Of course, the choice of IGP can be completely different from the IGPs running on the CE-PE linksthe two do not talk to each other.

Labeled packets are found only in the core network. The CE-PE links use IP. The last step of the list also describes a penultimate hop popping (PHP) operation (PHP was introduced in Chapter 4). When the last LSR pops the outer label, it reveals the packet's inner label. PE D can use a single LFIB lookup to find the VRF to use. It does a second lookup to find the outgoing CE interface.

An MPLS VPN uses two protocols for signaling. The underlying MPLS network uses LDP between LSRs to announce labels for the prefixes in their routing tables. The PEs use MP-BGP to announce VPN route labels. No correlation exists between the two different label spaces.

Here is a succinct summary of MPLS-VPN operation from draft-ietf-l3vpn-greip-2547-03:

In "conventional" BGP/MPLS IP VPNs ([BGP-MPLS-VPN]), when an ingress PE router receives a packet from a CE router, it looks up the packet's destination IP address. . . .As a result of this lookup, the (ingress) PE router determines an MPLS label stack, a data link header, and an output interface. The label stack is prepended to the packet, the data link header is prepended to that, and the resulting frame is queued for the output interface. The bottom label in the MPLS label stack prepended to the packet is called the VPN route label ([BGP-MPLS-VPN]). The VPN route label will not be seen until the packet reaches the egress PE router. This label controls forwarding of the packet by the egress PE router. The upper label in the MPLS label stack is called the tunnel label ([BGP-MPLS-VPN]). The purpose of the tunnel label is to cause the packet to be delivered to the egress PE router which understands the VPN route label.

Recall that Ethernet over MPLS (EoMPLS) and VPLS also use double label stacks, but they have different signaling protocols (directed LDP).

MPLS VPNs have been a successful service, with many hundred operational networks worldwide. Some carriers, however, might be simultaneously attracted to the merits of the RFC 2547 model but resistant to the need to deploy MPLS to support it. For them, there are other proposals that allow 2547 to run over an IP core, that use alternatives to labels for forwarding, but still use labels for VPN identification.

RFC 2547bis Forwarding-Plane Alternatives

The two proposals that we examine in this chapter decorrelate the RFC 2547 control and data planes. Both retain MP-BGP for customer route distribution and continue to use labels for VPN route table identification. However, the core network is no longer based on MPLS, but uses standard IP. The PE-CE reference architecture is maintained.

MPLS over mGRE

This architecture uses dynamic GRE tunnels between PEs to carry customer packets. Route distribution between CE-PE and PE-PE works just like in MPLS-VPN solution. The important difference is how a PE forwards traffic across the core.

Figure 5-11 shows a sample topology that uses the same network addresses as the MPLS VPN example in Figure 5-10. When PE A needs to route a packet from CE1 to CE2, it consults its GREEN VRF table to find the BGP next-hop address (PE D) and the VPN route label announced by PE D. It prepends the label to the packet and looks up the next-hop address, outgoing interface, and encapsulation information in the FIB. PEA then prepends a GRE header to the labeled packet, with a source and destination IP addresses corresponding to the public addresses of PE A and PE D, respectively. The GRE type field indicates an MPLS payload.

Figure 5-11. MPLS over mGRE Example

The inter-PE routes are learned using a standard IGP. No extra control-plane information is in the core network.

The dynamic moniker refers to the fact that the GRE tunnels are never seen as a possible path by routing protocols running on the PE, nor do routing adjacencies form across them. In fact, given the lack of protocol state, this solution really uses GRE as an encapsulation method to traverse an IP core.

One point mentioned in the draft RFC that merits discussion here is the greater susceptibility to spoofing of the core network. With MPLS, the provider can simply discard any labeled packets received at the edge of its network, so it is hard for a malicious user to introduce spoofed packets into MPLS networks. With MPLS over GRE, however, no equivalent boundary exists. The PE receives and forwards IP, so it would have to use some other filtering mechanism to enforce an antispoofing policy.

MPLS over L2TPv3

MPLS over L2TPv3 uses exactly the same principle as the GRE solution just discussed, except with an L2TPv3 data plane. Once again, BGP distributes customer routes and VPN route labels. Recall from the discussion in Chapter 4 that L2TPv3 has its own control plane, and information negotiated during session establishment is found in the data-plane header, notably session ID and cookie ID.

Session and cookie IDs are also used in this architecture. However, their roles differ from what we saw previously. The session ID was a session multiplexer: Different sessions have different identifiers within the same tunnel. Here, a label plays this role, so the session ID value is used only to indicate to the receiving L2TPv3 engine that the incoming packet belongs to a Layer 3 VPN service and that additional processing is required by another subsystem. The cookie ID is still used for antispoofing protection. The value can be generated statically or randomly and can either be global per PE or local per session.

The session and cookie identifiers are announced using MP-BGP but, just as for MPLS over GRE, different implementations could use another protocol.

The forwarding plane is similar if more complex than with the GRE solution. Consider the network in Figure 5-12. When CE1 sends data to CE2, PE A first performs a route table lookup to find the BGP next hop and VPN label identifier. It also finds the L2TPv3 session and cookie identifiers for the remote PE. PE A then does a second lookup to identify the IP address of the PE D and encapsulates the packet in an L2TPv3 frame and sends it. The core is a completely standard IP network.

Figure 5-12. MPLS over L2TPv3 Example

On the ingress, PE D removes the outer L2TPv3 header and verifies session and cookie values. If they are valid, the packet's label is used to identify the correct VRF, and output processing continues as usual. Compared to GRE, antispoofing is somewhat enhanced because PE D can drop incoming packets that do not have a correct cookie ID.

Is there any great difference between using one of these IP-based encapsulations to create a Layer 3 VPN service compared to using MPLS? The major difference is clearly that the core network is still IP, so no migration is necessary to start offering a VPN service. The disadvantage is that you lose some useful MPLS-based tools, such as fast reconvergence (which is being developed for IP) and traffic engineering.