You are on page 1of 139

BGP From Dinosaur to racecar

Webinar - 28 February 2012

Agenda
Deployment Profiles
Summary of current service provider and enterprise customer BGP deployment profiles

New Developments
A review of BGP recent enhancements and features

Scale & Performance Results


BGP RR and PE scaling data

Future Work
Upcoming BGP features and enhancements
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

Deployment Profiles
Deployment Profiles New Developments Scaling & Performance Results

BRKRST-3371: Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Service Provider Profile


Most deployments use route reflector model BGP deployed for L3VPN (VPNv4/6), L2VPN, Internet (IPv4/6), and MVPN routing Current BGP table sizes
Internet: ~415K VPN: ~1.5M Approximately 10% YOY growth expected for both

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Enterprise Profile
BGP deployed for large enterprise core networks running DMVPN, L3VPN over MPLS, and L3VPN over IP L3VPN over IP exploding in enterprise environment L2VPN BGP is gaining momentum Typical deployment scale in the range of a 50K+ routes reflected

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

New Developments
Deployment Profiles New Developments Scaling & Performance Results

BRKRST-3371: Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

New Developments
Scale & Performance
Increase scalability for existing hardware, newer RP cards, and new platforms Faster convergence

Resiliency & High Availability


Increase robustness of BGP peering Provide redundancy for routes and sessions

Features
Support for new functionality in the network

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Scale & Performance Enhancements


BGP Scaling
Update Generation Enhancements Parallel Route Refresh Keepalive Enhancements Adaptive Update Cache Size

PE Scaling
PE-CE Optimization VRF-Based Advertise Bits

Route Reflector Scaling


Selective RIB Download

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Scale & Performance Enhancements


BGP Scaling PE Scaling Route Reflector Scaling

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

General Update Generation Enhancements


BGP Scale/Performance Enhancement

Update generation is the most important, time-critical task

Optimize to improve convergence


New update generation process Parcel work into discrete units Peer-based update message queues Inline freeing of transmitted update messages Optimizing prefix based checkpointing Predictable CPU quantum Efficient suspension/resumption of work Simplified, efficient peer update message handling
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

10

Parallel Route Refresh


BGP Scale/Performance Enhancement

Significant delay (up to 15-30 minutes) seen in advertising incremental updates while RR is servicing route refresh requests or converging newly established peers VRF provisioning triggers route refresh request from PE every 10 to 30 minutes at typical tier-1 service providers Persistent BGP VPN issue on existing production networks

Parallelize refresh and incremental updates


Real update group spawns a refresh update group to reannounce BGP table

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

11

Parallel Route Refresh


BGP Scale/Performance Enhancement
Refresh Group Re-announcements Table Versions of Prefixes Version 0 Version X Transient Updates

Original update group handles new transient updates while refresh update group handles reannouncements Refresh groups used to service newly established peers
End-to-end convergence reduced from 15-30 minutes to 5-20 seconds for typical tier-1 VPN service providers

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

12

Keepalive Enhancements
BGP Scale/Performance Enhancement

Issue: Delayed processing of BGP keepalives often results in session flaps for peers configured with aggressive keepalive timers Cascading outages and CPU/transient memory usage

Insulate keepalive processing


Separate keepalive process to only handle keepalives Priority queues for reading/writing keepalive/update messages Optimizing keepalive timeout cases Aggressive keepalive timers supported reliably under scaled/ stressed conditions Fixes unwanted session flaps and outages
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

13

Adaptive Update Message Cache Size


BGP Scale/Performance Enhancement

Update message cache size throttles update groups during update generation and controls transient memory usage Fast convergence aided by large cache sizes Old cache sizing scheme can t take advantage of expanded memory available on new platforms

Scale up cache size appropriately considering


Amount of installed system memory Number of peers in an update group Type of peers in an update group Address family of update group

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

14

Adaptive Update Cache Size


BGP Scale/Performance Enhancement

Routers with more system memory get bigger cache sizes and thereby queue more update messages VPNv4 iBGP update groups have larger cache size Update groups with large number of peers get larger update cache Faster convergence is the result

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

15

Scale & Performance Enhancements


BGP Scaling PE Scaling Route Reflector Scaling

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

16

PE-CE Optimization
BGP Scale/Performance Enhancement

Issue: Slow convergence when the number CE sessions was scaled on a PE router

Intelligently evaluate VPN prefixes for processing


Data structures/algorithms optimized for VRF-centric update generation For each CE update group only consider prefixes in CE s VRF, not whole VPN table Enables VRF and peer session scaling Convergence typically improved by 300-400% Initial convergence for ASR with 4K VRFs, 8K peers, and 2.3M routes is under 10 mins
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

17

VRF-Based Advertise Bits


BGP Scale/Performance Enhancement

Issue: Increased memory consumption when the number VRFs was scaled on a PE router

Smart reuse of advertise bit space for VRFs


Prefixes in a VRF used to have advertise bit for every CE update group on the router Bits only needed for CEs in the same VRF For PE with 1000 VRFs, savings of about 120+B per prefix Considerable memory savings allows greater prefix scaling

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

18

Scale & Performance Enhancements


BGP Scaling PE Scaling Route Reflector Scaling

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

19

Selective RIB Download


BGP Scale/Performance Enhancement

Issue: BGP installing routes in RIB/FIB that are not in the forwarding path wastes CPU and memory

Selectively filter which BGP routes are installed in the RIB Implemented as filter extension to table-map command
Significant CPU and memory savings by avoiding unnecessary installation Testing on ASR platform indicated 300% increase in route reflector client scaling (on order of 1000s)

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

20

Scaling & Performance Release Matrix


12.2(31) SB
Selective RIB Download PE-CE Optimization Update Generation Task Parallel Route Refresh Keepalive Enhancements Variable Update Cache Size

12.2(33) SB No 33SB6 No No 33SB6 33SB6

12.2SR
SRC, SRD

12.2XN
XNC, XND

Component Code
12.2SRE 12.2XNE 15.0x

No 31SB16 31SB14 31SB14 31SB16 31SB16

No No No No No No

12.2XNC 12.2XND No No No 12.2XND

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

RR: 31SB, 12.2XN, component code PE: 33SB, component code

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

21

Resiliency & High Availability


PIC Edge Slow Peer Management VRF-Based Dampening GR/NSR Enhancements

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

22

PIC Edge
BGP Resiliency/HA Enhancement

Issue: Sub-second convergence is desirable. Presently, routing around failures is not immediate, resulting in forwarding traffic loss at the site of failure

PIC: Prefix Independent Convergence


All prefixes using failed nexthop for forwarding shift to backup in constant time PIC Edge can update nexthop for 250K prefixes in < 500 ms using 12.2(33)SRE

Current solution targets VPNs and IP edge routers PIC Edge supports 2 cases: link and node failures

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

23

PIC Edge: Link Protection


BGP Resiliency/HA Enhancement
Primary
PE1 RR PE3

10.1.1.0/24 VPN1 Site #1

Trac Flow
MPLS Cloud
CE1 PE2 PE4 CE2

10.2.2.0/24 VPN1 Site #2

Backup

PE3 configured as primary, PE4 as backup


PE3 preferred over PE4 by local preference CE2 has different RDs in VRFs on PE3 and PE4 PE4: advertise-best-external, to advertise route via PE4-CE2 link PE3: additional-paths install, to install primary and backup path
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

24

PIC Edge: Link Protection


BGP Resiliency/HA Enhancement
Primary
PE1 RR PE3

10.1.1.0/24 VPN1 Site #1

Trac Flow
MPLS Cloud
CE1 PE2 PE4 CE2

10.2.2.0/24 VPN1 Site #2

Backup

PE3 has primary and backup path


Primary via directly connected PE3-CE2 link Backup via PE4 best external route

What happens when PE3-CE2 link fails?


Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

25

PIC Edge: Link Protection


BGP Resiliency/HA Enhancement
Primary
PE1 RR PE3

10.1.1.0/24 VPN1 Site #1

Trac Flow
MPLS Cloud
CE1 PE2 PE4 CE2

10.2.2.0/24 VPN1 Site #2

Backup

CEF (via BFD or link layer mechanism) detects PE3-CE2 link failure
CEF immediately swaps to repair path label Traffic shunted to PE4 and across PE4-CE2 link

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

26

PIC Edge: Link Protection


BGP Resiliency/HA Enhancement Trac Flow
PE1 RR

Primary
PE3

10.1.1.0/24 VPN1 Site #1

Withdraw route via PE3 MPLS Cloud


CE1 PE2 PE4 CE2

10.2.2.0/24 VPN1 Site #2

Backup

PE3 withdraws route via PE3-CE2 link


Update propagated to remote PE routers

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

27

PIC Edge: Link Protection


BGP Resiliency/HA Enhancement
Primary
PE1 RR PE3

10.1.1.0/24 VPN1 Site #1

Withdraw route via PE3 MPLS Cloud


CE1 CE2

10.2.2.0/24 VPN1 Site #2

Trac Flow
PE2 PE4

Backup

BGP on remote PEs selects new bestpath


New bestpath is via PE4 Traffic flows directly to PE4 instead of via PE3

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

28

PIC Edge: Edge Node Protection


BGP Resiliency/HA Enhancement
Primary
PE1 RR PE3

10.1.1.0/24 VPN1 Site #1

Trac Flow
MPLS Cloud
CE1 PE2 PE4 CE2

10.2.2.0/24 VPN1 Site #2

Backup

PE3 configured as primary, PE4 as backup


PE3 preferred over PE4 by local preference CE2 has different RDs in VRFs on PE3 and PE4 PE4: advertise-best-external, to advertise route via PE4-CE2 link PE1: additional-paths install, to install primary and backup path
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

29

PIC Edge: Edge Node Protection


BGP Resiliency/HA Enhancement
Primary
PE1

Trac Flow
10.1.1.0/24 VPN1 Site #1

RR

PE3

MPLS Cloud
CE1 PE2 PE4 CE2

10.2.2.0/24 VPN1 Site #2

Backup

PE1 has primary and backup path


Primary via PE3 Backup via PE4 best external route

What happens when node PE3 fails?


Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

30

PIC Edge: Edge Node Protection


BGP Resiliency/HA Enhancement
Primary
PE1

Trac Flow
10.1.1.0/24 VPN1 Site #1

RR

PE3

MPLS Cloud
CE1 PE2

PE3 s /32 host route removed from IGP


CE2 PE4

10.2.2.0/24 VPN1 Site #2

Backup

IGP propagates loss of PE3 s /32 host route across the core to remote PEs

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

31

PIC Edge: Edge Node Protection


BGP Resiliency/HA Enhancement
Primary
PE1

Trac Flow
10.1.1.0/24 VPN1 Site #1

RR

PE3

MPLS Cloud
CE1 PE2

PE3 s /32 host route removed from IGP


CE2 PE4

10.2.2.0/24 VPN1 Site #2

Backup

PE1 detects loss of PE3 s /32 host route in IGP


CEF immediately swaps forwarding destination label from PE3 to PE4 using backup path

BGP on PE1 computes a new bestpath later, choosing PE4


Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

32

PIC Edge: Test Results


BGP Resiliency/HA Enhancement

Test Setup

Node Failure

Link Failure

No PIC Edge, No BFD BFD Only PIC Edge Only PIC Edge, BFD

12-14 sec 10-12 sec 8 sec 0 sec

8-17 sec 6-12 sec 4 sec 0 sec

Duration of forwarding outage for all streams at tier-1 service provider on C10K

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

33

Slow Peer Management


BGP Resiliency/HA Enhancement

Issue: Slow peers in update groups block convergence of other update group members by filling message queues/ transmitting slowly Persistent network issue affecting all BGP routers

Two components to solution


Detection Protection

Detection
BGP update timestamps Peer s TCP connection characteristics

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

34

Slow Peer Management


BGP Resiliency/HA Enhancement

Protection
Move slower peers out of update group Separate slow update group with matching policies created Any slow members are moved to slow update group Detection can be automatic or manual with CLI command

Automatic recovery
Slow peers are periodically checked for recovery Recovered peers rejoin the main update group Isolation of slow peers unblocks faster peers and lets them converge as fast as possible

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

35

VRF-Based Dampening
BGP Resiliency/HA Enhancement

BGP route dampening is now configurable per-VRF instead of for whole VPN table Allows service provider to configure dampening parameters on an individual customer basis Gives operators more flexible control of unstable customer routes in service provider network

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

36

BGP NSR
BGP Resiliency/HA Enhancement

BGP NSR support for VPN CE BGP peers introduced in 2004


Code hardened through DDTS resolution

Improved scaling to 4000 Ebgp pe-ce peers and 2 million VPN routes
Enable IBGP support for NSR as well

Route updates during switchover are announced to NSR capable CE peers without any delays
Prevents data black-holes during switchover as in the case of GR peers

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

37

Graceful Restart Changes


BGP Resiliency/HA Enhancement

Configurable RIB failsafe timer


New CLI parameter Allows users to tune value according to scale requirements

GR configurable per neighbor New address family support


MDT L2VPN

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

38

Resiliency & HA Release Matrix


12.2(31) SB
PIC Edge Slow Peer Management VRF-Based Dampening GR/NSR Changes

12.2(33) SB No No No 33SB6

12.2SR
SRC, SRD

12.2XN
XNC, XND

Component Code
12.2SRE 12.2XNE 15.0x

No 31SB16 No 31SB16

No No No 33SRD3 (No NSR)

No No No No

Yes Hardware Yes Yes Yes

Yes Software Yes Yes Yes

Yes Software Yes Yes Yes

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

39

BGP Features
4-Byte AS Support Automated Route Target Filtering BGP L3VPN Over MGRE Dynamic Neighbor Discovery BGP L2VPN Autodiscovery

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

40

BGP Features (Contd)


Enhanced Route Refresh Route Consistency Checker BGP MVPNs BGP Origin Validation BGP Graceful Shutdown

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

41

BGP Features (Contd)


BGP Route Servers

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

42

4-Byte AS Support
BGP Feature

2B ASN pool being exhausted RIR s allocating 4B ASNs by default IOS BGP extended to support RFC 4893
4B ASN capability negotiated when opening session Support for mixed 2B/4B AS deployments

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

43

Automated Route Target Filtering


BGP Feature

Increased VPN service deployment increases load on VPN routers


10% YOY VPN table growth Highly desirable to filter unwanted VPN routes

Multiple filtering approaches


New RT filter address family Extended community ORF

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

44

Automated Route Target Filtering


BGP Feature

Derive RT filtering information from VPN RT import lists automatically Exchange filtering info via RT filter AF or extended community ORF Translate filter info received from neighbors into outbound filtering policies Generate incremental updates for received RT update queries Incremental deployment possible/desirable

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

45

Automated Route Target Filtering


BGP Feature
VRF- Blue! VRF- Red! PE-3!
RT-Constraint:! NLRI= {VRF-Blue, VRF-Red}! RT-Constraint:! NLRI= {VRF-Green, VRF-Purple}!

VRF- Green" VRF- Purple" PE-1!

RT-Constraint:! NLRI= {VRF-Blue, VRF-Red, VRF-Green}!

RR-1! VRF- Red! VRF- Green! PE-4!

RR-2! VRF- Purple" VRF- Blue" PE-2!

RT-Constraint:! NLRI={VRF-Green, VRF-Purple, VRF-Blue}! RT-Constraint:! NLRI= {VRF-Red, VRF-Green}! RT-Constraint:! NLRI= {VRF-Purple, VRF-Blue}!

Improves PE and RR scaling and performance by sending only relevant VPN routes

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

46

BGP L3VPN Over MGRE


BGP Feature

Providers want to offer VPN service without using MPLS


MPLS is powerful, but complex Replace MPLS with MGRE tunnel for forwarding

Earlier tunnel solution is complex to configure on PE


Manual tunnel creation (source interface, mode) RIV ( Resolve-in VRF) Static default route to tunnel in RIV Route map sets nexthop in RIV for recursive lookup

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

47

BGP L3VPN Over MGRE


BGP Feature

New feature streamlines PE config


User creates encapsulation profile Automatic BGP discovery of source and remote endpoints BGP inbound route map associates routes with profile Profile used to set up forwarding

Tunnel endpoints created/destroyed dynamically No RIV, no static default route, no recursive lookup, simple config
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

48

Dynamic Neighbor Discovery


BGP Feature

BGP passively listens to configured address range for incoming sessions BGP neighbor dynamically created
Remote address is source of TCP connection Config template associated with listen range is applied

Provisioning
No manual config necessary on hub for new clients Significant reduction in config overhead

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

49

BGP L2VPN Auto-discovery


BGP Feature

Allows auto-discovery of LDP L2VPNs Existing support for inter-AS option A/C New support for inter-AS option B

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

50

BGP Enhance Route Refresh


BGP Feature

Route Refresh modified to send Refresh Start-ofRIB and Refresh End-of-RIB Force cleanup of stale routes in ADJ-RIB-IN after receiving Refresh End-of-RIB
Provided timer support in case Refresh End-of-Rib is not received Provided timer support to generate Refresh EOR

Allows cleanup of stale routes after route refresh is done

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

51

BGP Route Consistency Checker


BGP Feature

Provides consistency checking of BGP nexthops and Labels


Same nexthops across different paths should have same labels for a given prefix

Check outbound policies against ADJ-RIB-OUT CLI to configure and run consistency checker Force Route Refresh to fix issues or notify operator Ability to detect stale nexthops or labels

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

52

BGP MVPNs
BGP Feature

Support for BGP based MVPNs


Support for BGP AD and C-multicast routing within an AS

Next release to provide an Inter-AS support Support for SAFI 129 (VPN equivalent of SAFI 2) Helps avoid PIM soft state refresh in the provider network Allows MVPN to scale by using standard BGP based VPN filtering mechanism

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

53

BGP Origin Validation


BGP Feature

Origin Validation for E-BGP routes


Next release to cover origin validation for locally sourced routes

Support client functionality of RPKI RTR protocol


Separate database to store record entries from the cache

Support to announce path validation state to IBGP neighbors using a well known path validation state extended community Modified route policies to incorporate path validation states

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

54

BGP Graceful Shutdown


BGP Feature

Able to gracefully shutdown BGP neighbors Provided a cli knob to configure local pref, attach provider specific community
Idea is to de-preference routes with lower local preference or a wellknown, provider specific community CLI knob as an extension to an existing neighbor shutdown command

Mechanism to gracefully shutdown the peer without impacting (minimizing impact on) traffic

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

55

BGP Route Servers


BGP Feature

Designed to be used at Internet Exchanged points


Alternative to EBGP full mesh

Does E-bgp route reflection without adding its own AS to the ASPath Support for IPv4 and IPv6 afi Allow customized bestpaths for RS Clients
Policy dictates which path gets to be announced to RS clients

Allows Internet Exchange points to scale its E-BGP peering by avoiding full mesh

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

56

12.2XN

Feature Release Matrix


4-Byte AS Support

XNC, XND

Component Code
S Train T Train

Yes No No No No No No No No No No

12.2(33)SRE XE3.1/15.0(1)S XE3.2/15.1(1)S XE3.1/15.0(1)S XE3.4/15.1(3)S XE3.4/15.1(3)S XE3.3/15.1(2)S XE3.6/15.2(2)S XE3.5/15.2(1)S XE3.6/15.2(2)S XE3.3/15.1(2)S

15.0(1)M 15.1(2)T 15.2T Yes Yes 15.2(3)T 15.2(3)T 15.2(3)T 15.2(4)M 15.3(1)T 15.2(3)T

Dynamic Neighbors Automated Route Target Filtering BGP L3VPN over MGRE BGP L2VPN AD IAS Option B BGP Enhance Route Refresh BGP Route Consistency Checker BGP MVPNs BGP Origin Validation BGP Graceful Shutdown BGP Route Server

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

57

Scaling & Performance Results


Deployment Profiles New Developments Scaling & Performance Results Future Work

BRKRST-3371: Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

58

7200 RR Scale Results


Convergence tests conducted for 350 VPN RR clients, 1M - 1.5M VPN routes on 7200-NPE G2
1200 1000 Time (seconds) 800 600 400 200 0 00 500K of Routes 500,0001,000,000 1.5M 2,000,000 No. 1M 1,500,000 2M
2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

Convergence (Sec) 12.2 (31)SB14/G2 Convergence (Sec) 12.2 (31)SB16/G2 Convergence (Sec) 12.2 SRE/G2 Convergence (Sec) 12.0 12.0(33)S2 (33)S2/PRP2

Google December 2011:Advances in BGP

59

ASR1K RR Scale Results


ASR1000 RP1 (2GB) ASR1000 RP1 (4GB) ASR1000 RP2 (8GB) ASR1000 RP2 (16GB)

IPv4 Routes VPNv4 Routes IPv6 Routes VPNv6 Routes BGP Sessions

2M* 2M 500K 2M 4000

7M* 6M 1.5M 5M 4000

12M* 10M 3M 9M 8000

29M* 24M 7M 21M 8000

*Tested with BGP Selective RIB Download feature for IPv4 for dedicated RR. This feature will be
implemented for IPv6 address family in future releases.

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

60

RR Software Recommendations
7200 NPE G1/G2
12.2(31)SB18 12.2(33)SRE

ASR1K
12.2(33)XNC 12.2(33)XND 12.2(33)XNE

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

61

C10K PRE2/PRE4 PE Scalability


Testing with PRE2
550K total VPNv4, VPNv6 prefixes with convergence under 10 minutes 1200 eBGP sessions, 4 iBGP sessions, no NSR/GR Should scale higher depending on prefix/attribute mix

Testing with PRE4


800K-1M total VPNv4, VPNv6 prefixes Same profile as listed above

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

62

ASR1K PE Scalability
Uni-dimensional Scale

RP1/ESP10
VRF 1K 1M (RP1 4GB) 1M 4K/8K/32K 1K/3K/4K/1K 4K 1K 1K 1K 1K 1K 15K 1K (max 200 VP mode) 8K 4K/256 4K/50K 8Mpps/10Gbps 1500
Cisco Public

RP2/ESP20
4K 1M (RP2 8GB) 4M (RP2 16GB) 1M 4K/8K/64K 1K/3K/4K/1K 8K 1K 1K 4K 2K 1K 15K 1K (max 200 VP mode) 16K 4K/256 4K/100K 10Mpps/20Gbps 5500
63

VPNv4 routes (use per VRF label allocation, assume 20% local routes and 80% routes learned from remote PEs) MPLS label space VLAN (per port/per SPA/per system) ATM PVC (per port/per SPA/per system/with OAM enabled) eBGP PE-CE sessions OSPF PE-CE sessions EIGRP PE-CE sessions RIP PE-CE sessions Link/Targeted LDP sessions Number of Traffic Engineering Tunnel Head Number of Traffic Engineering Tunnel Midpoint ATM CRoMPLS AC/PW (VC/VP mode) EoMPLS AC/PW Unique QOS service policy/class maps per service policy ACL/ACE Non-drop rate (with uRPF, security ACL and ingress policing on VLAN subinterfaces) FIB download/Convergence speed (prefixes/second)
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved.

Dual Stack Scalability


Forward-able IPv6 /64 Prefixes with 500K IPv4 /24 Prefixes Present
2,000,000 1,800,000 5,000,000 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0

Additional IPv6 /64 Prefixes 500K IPv4 / 24 Prefixes


ASR1000 CRS 960k (-40k) XR12000 GSRIOS 220k (-250k) 210k (-240k) 7600 269k (-246k) 7750 32k T640 440k (-310k) MX960 410k (-310k)
Cisco Public

V6 Prefixes Change from V6 only

4.6M (-600k)

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

64

Backup Slides

BRKRST-3371: Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

65

Agenda
q XR BGP Feature Set - Current releases q XR BGP new Features deep-dive - Multi-instance/Multi-AS, RT-Constrain, Add-path, PIC, 3107 Labeled architecture, Attribute Error handling. q XR BGP Roadmap and Q& A

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

66

XR BGP Feature Set

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

67

IOS-XR 3.8.X Release Features


Major Features Deployment Knobs Internal
1. IPv6 over IPv4 session 1. BGP NSR 2. BGP Session Scale 2. NH change 1700 (PRP-2/CRS) 2000 (C12k/PRP-3) 3. BGP 3107 Architecture 1. show bgp prefix detail 3. Reset Weight on import 2. Net timestamp with 4. Disable connected check in show bgp prefix 5. Per neighbor enforce-first as 3. show bgp 6. Ability to change any sessions command attribute on Route-reflector 4. Show bgp nsr 7. Support for multiple cluster- 5. Show bgp table id in BGP <afi> <safi> 8. Allow-as-in changes to 6. Additional VPN avoid hard reset stats into o/p of 9. Route-reflection show bgp process Functionality under VRF performance detail 10.Min-acceptable hold-timer command knob 11.Local-as replace-as knob
Cisco Confidential

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

68

IOS-XR 3.9.X Release Features


Major Features
1. BGP PIC Unipath 2. BGP Best-External 3. BGP Session Scale 1700 (PRP-2/CRS) 2400 (C12k/PRP-3) 4. Ability to support aggressive timers with large Session Scale 5. L2VPN BGP Autodiscovery with BGP/LDP Signaling 6. Implementation of v0 of draft Error handling for Optional Transitive Attributes

Deployment Knobs
1. BFD for directly connected iBGP peers 2. BGP BFD for IPv6 Sessions 3. IPv6 eBGP Multipath Support 4. Per VRF MDT Source Selection Capability 5. Ability to configure sub-second MRAI timer 6. BGP Local-as dual-as knob 7. MVPN w/ CsC 8. BGP NBR Adj change msg enhancement to show more info 9. 6PE per VRF/per-CE label allocation (3.9.2)

Internal
1. Async Socket APIs to improve BGP-TCP interaction 2. Import/Label thread optimizations 3. Control plane batching 4. Ltrace optimization 5. BGP MIB Perf improvements (Caching / Batching) 6. BGP MIB traps batching 7. Moved BGP MIB implementation to RFC 4273 from draft 8. Added support for additional afi/safi 9. RPL optimization in case policy name is different but content is the same
69

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

IOS-XR 4.0.X Release Features


Major Features
1. BGP Add-path 2. Support for AIGP 3. AIGP to Cost-community conversion 4. AIGP to MED conversion 5. MVPN Hub & Spoke support 6. BGP changes for PICEdge for labeled unicast (default VRF) 7. X86 Support for CRS-3 8. Parallel update-gen during route-refresh 9. Native as-path matches in as-set 10. Deterministic regex engine porting & usage 11. IPv6 Peer table MIB and IPv6 trap support 12. Netflow support for L3VPN and IPv6
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved.

Deployment Knobs
1. IOS message when OPEN with unsupported hold-timer value received 2. ORF optimization for updategroup allocation 3. Next-hop self knob on RR 4. eBGP NH unchanged knob 5. BGP remove-private-as enhancement 6. Support for prefix-set or route-policy names with colons in it 7. XML support for show rpl 8. IGP metric change propagation timer knob 9. 6PE iBGP PE-CE Support 10. 6PE per VRF/per CE label 11. Allow-as-in and as-override knobs for default VRF sessions (4.0.2)
Cisco Confidential

Internal
1. Show command enhancement for RIB install stats/flags 2. Commit replace optimization 3. BGP attribute ID allocation change 4. Support for 4-byte-AS in the Cisco

70

4.1 and 4.2 Release Features


4.1 Release
1. BGP RT-Constrain 2. Legacy PE support with RTconstrain 3. ISSU 4. mLDP BGP Autodiscovery 5. Selective VRF Download 6. BGP NSR 5k Session Scale (ASR9k) 7. BGP Accept-own (4.1.1) 1. 2. 3. 4. 5.

4.2 Release (Not FCSed yet)


Support for Multiple AS Multi-instance BGP RPKI BGP Session Scale (CRS-3) draft-chen-ebgp-errorhandling-00.txt implementation

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

71

Update-generation Optimizations
Incremental Update-generation with RT Constrain
Only send relevant updates in response to a route refresh request instead of the entire bgp table

Parallel update-generation
Ensures that bgp convergence is not affected on account of servicing route-refresh requests. Prioritizes prefix updates over the refresh so that we do not see head of the line blocking.

Optimized CE update-generation
Scoped walk of the CE VRF table, instead of a entire VPN walk used to generate updates. Distinct PE/CE advertise bits in use
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

72

Multi instance BGP and Multi-AS Support (IOS-XR 4.2.0)

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

73

What is Multi-Instance BGP?


A new IOS-XR BGP architecture to support multiple instances along the lines of OSPF instances Each BGP instance is a separate process running on the same or a different RP/DRP node The BGP instances do not share any prefix table between them No need for a common adj-rib-in (bRIB) as is the case with distributed BGP The BGP instances do not communicate with each other and do not set up peering with each other Each individual instance can set up peering with another router independently
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

74

What is Multi-AS BGP?


It will be possible to configure each instance of a multiinstances BGP with a different AS number Global address families cant be configured under more than one AS except vpnv4 and vpnv6 VPN address-families may be configured under multiple AS instances that do not share any VRFs

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

75

Why Multi-Instance/Multi-AS?
It provides a mechanism to consolidate the services provided by multiple routers using a common routing infrastructure into a single IOS-XR router It provides a mechanism to achieve AF isolation by configuring the different AFs in different BGP instances It provides a means to achieve higher session scale by distributing the overall peering sessions between multiple instances It provides a mechanism to achieve higher prefix scale (especially on a RR) by having different instances carrying different BGP tables IOS-XR CRS Multi-chassis systems can be used optimally by placing the different BGP instances on different RP/DRPs
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

76

Deployment Route-reflector
Rack1 RP L C L C RP Rack2 L C L C RP Rack3 DRP
BGP (VPNv6) BGP (VPN) BGP (IPv4) BGP (IPv6)

Rack4 L C RP L C L C

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

77

Deployment AF Isolation
Rack1 RP L C L C RP Rack2 L C L C RP Rack3 L C L C RP Rack4 L C L C

BGP (VPNv4)

BGP (VPNv6)

BGP (IPv4)

BGP (IPv6)

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

78

Deployment Service Integration


Rack1 DRP L C L C DRP Rack2 L C L C Rack3 DRP L C L C DRP Rack4 L C L C

BGP AS1 (L3VPN)

BGP AS2 (L2VPN)

BGP AS3 (Internet)

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

79

Deployment Session Scale increase


Rack1 RP L C L C DRP
BGP AS1 (L3VPN)

RR PE-CE sessions

PE-CE sessions

BGP AS1 (L3VPN) BGP AS1 (L3VPN)

PE-CE sessions

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

80

Configuration Example
Instance VPN: router bgp 1 instance internet bgp router-id 10.0.0.1 address-family ipv4 unicast neighbor 10.0.101.1 remote-as 100 address-family ipv4 unicast route-policy inbound in route-policy outbound out ! ! ! ! Instance Internet: router bgp 2 instance vpn bgp router-id 20.0.0.1 address-family vpnv4 unicast neighbor 20.0.101.1 remote-as 200 address-family vpnv4 unicast route-policy inbound in route-policy outbound out ! ! ! !

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

81

Peering Example
RR1 (Active) 20.0.0.1 BGP VPNv4 PE1 BGP VPNv4

10.0.0.1

20.0.0.2

BGP IPv4

BGP IPv4 10.0.0.2

RR2 (backup) 30.0.0.2 BGP

30.0.0.1

Multi-instance PE1 peering with a multi-instance RR1 and a regular BGP on RR2 Each BGP instance on PE1 has a peering with the corresponding instance of BGP on RR1 Separate loopbacks needed on RR2 due to use of multi-instance BGP
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

82

RT Constrain and Legacy PE Support IOS-XR4.1.0

Subtitle

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

83

RT-Constrain Feature Overview


In L3VPN, PE routers use Route Target extended communities to control the distribution of routes into the destination VRFs. This enables the separation of the VPNs. It is common for PEs to receive more than the routes they are interested in and then filter out the unwanted routes for VPNs that they are not connected to. This results in waste of router resources in cases where VPN membership is sparse (not many PEs are connected to the same VPN). The sender generates and transmits a routing update and the receiver has to filter out the unwanted routes. It would be beneficial to avoid the generation of such route updates in the first place.

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

84

RFC 4684 (Constrained Route distribution or RT constrain)


PEs send RT membership information to RR (carried in a new SAFI in BGP) RR creates multiple filter groups (one per PE) corresponding to RT membership of PEs RR sends to PEs only the routes for RTs configured on the PEs
PEs receive and filter less routes (less processing overhead) improved scale & stability

RR collects the RT membership information from its clients and advertises that set to the neighbouring RRs RR receives and stores only the routes for all the RTs that PEs in its region are interested in
RRs store and process less routes improved scale & stability
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

85

Advantages
Reduce load on PE (not having to receive all network PE routes and filter) PE Reduce load on RR (not having to receive and store all network routes) Improved stability due to reduced load on RR and PE
rt membership NLRI

RR
plane1

PE PE PE PE

Region1 RR
plane2

RR PE PE PE PE PE PE
plane1

Region2 RR
plane2

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

86

RT constrain Implementation features


Single update-generation walk for the neighbors with common outbound characteristics. Will not increase number of updategroups on RR. Policy / Filtering optimizations for efficient filtering Incremental Update generation sends only relevant delta VPN routes to peer after a new RT update is received Support for default RT announcement for PEs to avoid having to store membership RT information Automatic default RT to iBGP peer if one of the RRC is not RTconstrain capable

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

87

Migration path
RT constrain requires PE to send RT membership information to the RR using NLRIs
New code required on PE to do this

RR creates a RT filter list based upon the RT membership information received from PE. It propagates this list to other RRs in the IBGP mesh
New code required on RR to do this

Thus RT constrain requires both RR and PEs be upgraded

SA915 Confidential

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

88

Limitations
Vanilla RT constrain doesnt support PEs that are not upgraded, a.k.a, legacy PEs
Legacy PEs cannot signal RT membership information to the RR automatically Thus Legacy PE will have to receive and filter routes from ALL other RTs even though it is not interested in them

Even if one PE doesnt get upgraded, the corresponding RR has to store ALL routes for the entire network (or plane) Thus benefit seen on RR only if ALL PEs in the cluster are upgraded 4.1 XR implements legacy PE support in addition to RFC 4684 which does not require all PEs to be upgraded
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

89

Legacy PE support Solution description


Use existing VPN advertisement mechanism to convey RT membership from the legacy PEs
Requires new configuration step on those PEs

Upgraded PEs advertise RT constrain NLRIs RR processes both advertisement mechanisms of RT membership information(from legacy and upgraded PEs)
Requires new code on the RRs to build RT filter list from both advertisement mechanisms

RRs translate the legacy PE RT membership information to equivalent RT constrain NLRIs to propagate to other RRs

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

90

Legacy PE support
Upgraded PEs (propagates RT membership information using rt-filter SAFI. Receive
reduced set of routes from RRs after RT filtering)

RR doesnt propagate Legacy PE VPN routes to iBGP peers RR sends equivalent converted RT SAFI NLRI

Legacy PEs (propagates RT membership using VPN routes with a special community.
Receive reduced set of routes from RRs) after filtering

RR PE PE PE PE PE PE
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

plane1

Region1 RR
plane2

91

Legacy PE support PE & RR behavior


Legacy PE Identify all VRFs provisioned on the PE and collect their import RTs Create a special VRF, called route-filter VRF Originate one or more prefixes in that VRF and attach the collected import RTs as equivalent export RTs so as to distribute the communities evenly across the routes Attach a standards action Community value (CV), NO_ADVERTISE, and NO_EXPORT to each route & advertise to RRs RR Identify route A/B from legacy PE for retrieving RT membership information by the CV (& filter VPN routes to legacy PEs) Translate the corresponding information to equivalent RT constrain NLRIs to propagate further
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

92

Legacy PE support Illustration


VPNA RT 1,1 VPNB RT 2,2 VPNC RT 3,3
A/B:RD1 1,2,3 CV-C

PE1

Each PE generates special routes attaching Import RTs for each VPN configured. The RD is configured to be the same value across all legacy PEs The RR Identifies A/B by the reserved CV that has been attached Based upon the commonality of A/ Bs the RR creates a set of filters to be applied to each session that an A/B was received on.

VPNA RT 1,1 VPND RT 4,4 VPNE RT 5,5

A/B:RD1 1,4,5 CV-C

RR

PE2

VPNA RT 1,1 VPNB RT 2,2 A/B:RD 1,2,3 CV-C VPNC RT 3,3 PE3

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

93

BGP Add-path IOS-XR4.0.0

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

94

Add-path in XR
Add-path:IETF add-path draft: draft-ietf-idr-addpaths-02 Goal: to improve path diversity in BGP topologies
Assumption: multiple paths to the same prefix are generally available at the edge of the network Multiple analyses show they do
RR1 PE3
Z/p PE1 Z/p PE2

Application
Fast Connectivity Restoration / PIC Load balancing Eliminate route oscillation Churn reduction

PE2
Z/p PE2 Z/p PE1

Z/p

PE1
backup-path-RR

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

95

Problem: Data hiding


Path reduction at two places:
Less preferred border (AS or confed) routers dont announce their paths to iBGP RRs (or confed-ebgp peers) hide all but the best path

Thus ingress routers most often know about one exit point only When that exit point fails, traffic loss proportional to control plane convergence
Local repair techniques cant get triggered

Not knowing about more exit points also means the ingress routers cant do load balancing Not having path diversity has other issues as well:
Route oscillation: a protocol bug

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

96

Add-path application best-external


Less preferred border routers should announce their own path instead of withdrawing it
RR1
2 1
Z/p, Z/p, Locpref Locpref 200 200 Withdraw

PE2 Z/p

PE3
3

PE1
Z/p, Locpref 100

Best-edge

Best-external draft-ietf-idr-best-external-00.txt
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

97

Add-path draft overview


Extend NLRI format to include path-ID (so that multiple paths for the same prefix can be advertised).
Length Prefix Path ID Length Prefix

Path-ID is application specific, but mostly an opaque ID that is pair-wise


id1:z/p id2:z/p

Capability negotiation for add-path support per [AFI, SAFI] along with a send/receive flag for each
Ingress routers most often need the support for only receiving multiple paths Implementing the receive part is quite straightforward
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

98

Applications
Fast convergence / connectivity restoration As the ingress routers have visibility to more paths, they can switch to the backup paths faster once the primary path goes away. Requires backup paths to be sent. Load balancing As the ingress routers have visibility to more paths, they can do ECMP on multiple paths. Requires either backup paths or all paths to be sent. Churn reduction since alternate paths are available, withdraws can be suppressed (implicit update). Route oscillation see RFC 3345 for scenarios. Requires group best paths (in some cases all paths) to be sent.
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

99

Implementation: what does it change?


What paths to advertise? (when we dont want to advertise all)
Selecting backup paths / second-best Selecting group bests

Update generation
Adj-RIB-Out is per-prefix today since only best path is sent Needs change to advertise multiple paths

Update reception
Control plane: process multiple instances of prefix, select second-best

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

100

Add-path: selecting second-best


Simple rule
1 2

Select best Remove all paths whose next-hop == bests (including best) Run bestpath selection again on the remaining paths to select backup

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

101

CLI
Global command, per address family, to turn on addpath in BGP
It can optionally accept a route policy where the policy matches on prefixes and sets one of the following: Select and send backup paths (& how many) Select and send group-best paths Send all paths
router bgp 7018 address-family vpnv4 unicast additional-paths install backup additional-paths advertise additional-paths receive additional-paths selection route-policy xx

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

102

iBGP and Add-path


BGP speakers within an AS must have a consistent routing view, otherwise forwarding loops can occur With add-path, it is thus important to maintain that property by the senders disseminating the same set of paths to each IBGP receiver Each BGP speaker (receiver) can independently run the decision process with the consistent view and loop freedom will be guaranteed

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

103

Cost
Memory overhead
Additional memory overhead on the receiving PE due to additional paths Additional memory overhead of maintaining per path AdjRib-Out information

CPU cycle increase for update processing


Update reception at edge routers increases proportional to #additional paths Update generation at aggregators also increases proportional to #additional paths

CPU cycle increase for other internal processing as well


E.g. Next-hop trigger
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

104

BGP PIC-Edge

Subtitle

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

105

Feature Overview
Internet Service Providers provide a strict SLAs to their Financial and Business VPN customers where they need to offer a sub-second convergence in the case of Core/ Edge Link or node failures in their network Prefix Independent Convergence (PIC) has been supported in IOS-XR for a while for CORE link failures as well as edge node failures BGP Best-External project provides support for advertisement of Best-External path to the iBGP/RR peers when a locally selected bestpath is from an internal peer BGP PIC Unipath projects provides a capability to install a backup path into the forwarding table to provide prefix independent convergence in case of the PE-CE link failure
NAG 09 Cisco Confidential 2009 Cisco Systems, Inc. All rights reserved.

106

End to End Service Availability Customer Uptime


BGP
RR1 RR2

CE1 PE1 CE2 PE2 PE3 CE3

IP/OSPF/ MPLS/BGP PIC

IP/OSPF/ MPLS/TE-FRR

IP/OSPF/ MPLS/BGP PIC

Improved Failure Detection L1/2 OAM & BFD

Core Domain & GETS TE FRR

Improved Failure Detection L1/2 OAM & BFD

Edge Domain BGP PIC Sub-second convergence


EDCS-720331 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

107

PIC unipath and Best-External


:: best path :: :: best external path ::

RR PE1 PE3
RD1:1/8 via PE1, LOCPREF=200 1/8

CE
RD2:1/8 via PE2, LOCPREF=100 1/8

Create primary-backup topology (primary = PE1-CE link, backup = PE2-CE link). q Make PE1 exit point more preferable and PE2 exit point less preferable (e.g. LOCAL_PREF configuration) q Makes PE2 select IBGP path as best But PE2s EBGP path should be advertised to increase path diversity and achieve much faster failover to the backup path.

PE2

..

CEs

Note: Add-path may still be a requirement to pass bestexternal paths through the route reflectors to ingress PEs. (e.g. non-unique RD VPN design, non-VPN prefixes).
NAG 09 Cisco Confidential 2009 Cisco Systems, Inc. All rights reserved.

108

We are going to discuss


Ingress PE and its behavior on best egress PE failure RR PE1 PE3
RD1:1/8 via PE1, LOCPREF=200 1/8

Primary PE and its behavior upon CE link failure

CE
RD2:1/8 via PE2, LOCPREF=100 1/8

PE2

..

CEs

Backup PE and its behavior wrt. best external advertisement

NAG 09

Cisco Confidential

2009 Cisco Systems, Inc. All rights reserved.

109

Steady State Traffic flow

q
RR PE1 PE3
RD1:1/8 via PE1, LOCPREF=200 RD2:1/8 via PE2, LOCPREF=100 1/8

All PEs including PE2 use PE1 as the exit point


q

No traffic sent via PE2

CE
1/8

PE2

..

All PEs also have PE2s route as a potential backup

CEs

NAG 09

Cisco Confidential

2009 Cisco Systems, Inc. All rights reserved.

110

Forwarding Table Setup


:: best path ::

:: best external path ::

PE1
IP 1/8 CE Label L1 (allocated for 1/8) CE

RR PE1 PE3
RD1:1/8 via PE1, LOCPREF=200 1/8

q
CE

PE2
IP 1/8 PE1, push [L1], [PE1 IGP label] Label L2 (allocated for 1/8) CE

New with best-external

RD2:1/8 via PE2, LOCPREF=100

1/8

PE2

..

q
CEs

PE3
IP 1/8 PE1, push [L1], [PE1 IGP label]

NAG 09

Cisco Confidential

2009 Cisco Systems, Inc. All rights reserved.

111

Traffic flow Primary link failure (with Backup path in forwarding)


Behavior at PE1
q RR PE1 PE3
RD1:1/8 via PE1, LOCPREF=200 RD2:1/8 via PE2, LOCPREF=100 1/8

FIB detects CE failure FIB will modify the BGP loadinfo to now point to the backup path (PE2) Traffic is restored once the loadinfo touch-up is done Since PE2 has pre-programmed the label pointing to CE, traffic will be forwarded to the CE. BGP prefix independent convergence

q CE
1/8

PE2

..

CEs

PE1 IP 1/8 CE (active) PE2, push [L2], [PE2 IGP label] (backup) Label L1 (allocated for 1/8) CE (active) PE2, push [L2], [PE2 IGP label] (backup)

NAG 09

Cisco Confidential

2009 Cisco Systems, Inc. All rights reserved.

112

Traffic flow Primary PE failure (with Backup path in forwarding)


Behavior at PE3
q RR PE1 PE3
RD1:1/8 via PE1, LOCPREF=200 RD2:1/8 via PE2, LOCPREF=100 1/8

FIB detects PE1 failure upon IGP convergence FIB will modify the BGP loadinfo to now point to the backup path (PE2) Traffic is restored once the loadinfo touch-up is done Since PE2 has pre-programmed the label pointing to CE, traffic will be forwarded to the CE. BGP prefix independent convergence

q CE

1/8

PE2

..

q CEs q

PE3 IP 1/8 PE1, push [L1], [PE1 IGP label] (active) PE2, push [L2], [PE2 IGP label] (backup)

NAG 09

Cisco Confidential

2009 Cisco Systems, Inc. All rights reserved.

113

Configuration
Global (per-AF) and Per-VRF knob to turn on bestexternal advertisement
router bgp 7018 address-family vpnv4 unicast advertise best-external vrf cust_1 address-family ipv4 unicast advertise-best-external [disable]

router bgp 7018 address-family vpnv4 unicast additional-paths install backup vrf cust_1 address-family ipv4 unicast additional-paths install backup [disable]

NAG 09

Cisco Confidential

2009 Cisco Systems, Inc. All rights reserved.

114

3107 (BGP Labeled Unicast) Architecture & AIGP Attribute IOS-XR (3.8.0 / 4.0.0)

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

115

ISP CORE with multiple IGP Areas


iBGP

P3 CE9 PE7

PE1

CE0

P5

P6 PE8
IGP+LDP IGP+LDP

P4
IGP+LDP

PE2

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

116

ISP Core
IGP runs in the core May be segmented into different areas IGP+LDP provides reachability to PEs in the network May span one or more AS under the same administration Problem: When PE scale increases, IGP database size increases Problem: Convergence is affected

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

117

BGP 3107
BGP 3107 to carry PE reachability BGP IPv4-label address-family sessions between PE and P routers IGP+LDP still runs within areas but does not carry PE reachability across areas Remote PE loopback is a BGP ipv4 labeled route in RIB Nexthop for BGP service prefix (L3VPN, L2VPN) is a BGP 3107 route

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

118

BGP 3107 Architecture


VPNv ) 4 (bgp

ABR-RR CE9 PE7 P5


IGP+LDP

P3

PE1

CE0

P6 P4 PE8 PE2

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

119

BGP 3107 Pros


Higher PE scale Add-path capability can be enabled for 3107 address-families to provide path diversity PIC functionality to handle core link/router failures (future release) AIGP attribute to enable use of more accurate (end-to-end) metrics

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

120

AIGP
IGPs run within a single administrative domain and select the best path between two nodes based on total distance/ metric. When a single administration runs multiple BGP networks, it can be desirable for BGP to select best path based on endto-end metric AIGP: new BGP attribute that carries the accumulated metric for an end-to-end path

Usage:
Originate the AIGP attribute for routes local to the AS Accumulation: For a received route with an AIGP metric, add the metric of the route to the nexthop to the existing value before advertising if the router sets itself as nexthop Decision process: Compare the AIGP metric of paths after local-preference comparison step
Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

121

BGP Knobs to enable 3107/AIGP Solution


IPv4 Label PIC IPv4 Label PIC IPv4 Label Add-path Send AIGP originate

AIGP comparis ion

AIGP accumulate AIGP accumulate


VPNv ) 4 (bgp

CE9 PE1 CE0

RR PE7 P5
IGP+LDP

P3

P6 PE8

P4

PE2

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

122

PIC for Labeled prefixes in default VRF


IPv4 Label AddPath Send Allocate label

IPv4

-uni

RR
(ebg p)

PE1 PE3
IGP+LDP
IPv4 -uni (ebg p) Prim ary

CE4

IGP+LDP

CE0

Receive and install Additional path in FIB

PE2

Allocate label Best External advertisemnt


Presentation_ID 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

123

Cisco BGP Attribute filtering and error-handling

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

124

Overview
Attribute filtering
Unwanted optional transitive attribute such as ATTR_SET, CONFED segment in AS4_PATH causing outage in some equipments. Prevent unwanted/unknown BGP attributes from hitting the legacy equipments. Block specific attributes Block a range of non-mandatory attributes

Error-handling
draft-ietf-idr-optional-transitive-04.txt Punishment should not exceed the crime Gracefully fix or ignore non-severe errors Avoid session resets for most cases

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

125

Architecture
Malformed BGP Updates
Invalid Attribute Contents Wrong Attribute Length

Transitive Attributes
Unknown Attributes Unwanted Attributes

Attribute Filtering

Error-handling

NLRI processing

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

126

Attribute filtering
First level of inbound filtering Filtering is configured as a range of attribute codes and a corresponding action to take Actions
Discard the attribute Treat-as-withdraw

Applied when parsing each attribute in the received Update message


When a attribute matches the filter, further processing of the attribute is stopped and the corresponding action is taken

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

127

Error-handling
Comes into play after attribute-filtering is applied When we detect one or more malformed attributes or NLRIs or other fields in the Update message Steps
Classification of errors Actions to be taken Logging

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

128

Error-handling details
Classification of errors
Minor: invalid flags, zero length, duplicates, optional-transitive attributes Medium: Non-optional-transitive attributes, inconsistent attribute length Major: Invalid or 0 length nexthop Critical: NLRI parsing, inconsistent message / total attributes length

Actions taken
Local repair Discard attribute Treat-as-withdraw Reset session Discard Update message

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

129

IOS-XR implementation
Error-handling Router level configuration knob Separately for EBGP and IBGP Separately for basic and extended degrees of error-handling Neighbor level configuration knob Last resort hidden knob to avoid session reset at all costs (by simply malformed Update message) Logging Last few malformed messages are stored Attribute-filtering Neighbor level configuration knob Specify a range of attribute codes (except ORIGIN, AS_PATH, MP_REACH, MP_UNREACH) Two possible actions: discard-attribute; treat-as-withdraw Logging Optionally store the last few messages that matched any filter NEXT_HOP, discarding

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

130

Roadmap

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

131

Future Release Features


No specific Priority
1. Add-path for eBGP peers 2. BGP Flow-Spec 3. Import from default VRF to non-default VRF 4. Import from non-default VRF to default VRF 5. Conditional RPL policies 6. Support for traffic blackhole via RPL 7. mGRE AF 8. IPv4 over IPv6 (RFC 5747) 9. BGP mibv2 10.Import/export policy filtering 11.Per neighbor NSR knob 12.mLDP / MVPN enhancements 13.BGP diverse path 14.Half Duplex Hub & Spoke

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

132

Q and A

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

133

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

134

Future Work
BGP E-VPN BGP Error handling
Accumulated IGP Connect Apps and Instrumentation for Route Servers Vrf to Global import Enhanced GR BGP RT Filtering for Legacy Routers BGP Based Auto-discovery for SAF and other services (iBGP) BGP Advisory Message/Soft-notify BGP Flow-Spec (RFC5575) BGP Monitoring Protocol BGP Virtual Aggregation
Note: Expected availability dates are tentative

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

135

Summary
Scale and performance has been enhanced
New RPs, platforms Existing platforms

Software releases are consolidating to single codebase


Reduction in quality issues Increased feature velocity

Full feature roadmap


Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

136

ASR1K RP1/2 RR Performance Comparison


Tested with 1M Total Unique Routes ipv4 (1K RR clients) vpnv4 (1K RR clients, 8K RT) ipv6 (1K RR clients) vpnv6 (1K RR clients, 8K RT) ipv4 (2K RR clients) vpnv4 (2K RR clients, 8K RT) ipv6 (2K RR clients) vpnv6 (2K RR clients, 8K RT)
Total Routes Reflected by RR to All Clients (Number of Routes x Number of Clients)

ASR1000 RP1 Convergence (in seconds)


220 680 720 877 375 1285 1126 1766

ASR1000 RP2 Convergence (in seconds)


75 221 194 293 138 394 284 551

1 Billion 1 Billion 1 Billion 1 Billion 2 Billion 2 Billion 2 Billion 2 Billion

Tested with peer groups (1K RR clients per peer group) ASR1K RP2 converges about twice as fast as 7200 NPE-G2 based on RR customer profile testing CPU utilization below 5% after convergence Link to Isocore report: http://www.cisco.com/en/US/prod/collateral/routers/ps9343/ITD13029-ASR1000-RP2Validationv1_1.pdf
Google December 2011:Advances in BGP 2010 Cisco and/or its affiliates. All rights reserved. Cisco Public

138

Slow Peer Management


BGP Resiliency/HA Enhancement

Static protection
[no] neighbor slow-peer split-update-group static

Dynamic detection
[no] bgp slow-peer detection [threshold <seconds>] [no] neighbor slow-peer detection [threshold <seconds>]

Dynamic protection
[no] bgp slow-peer split-update-group dynamic [permanent] [no] neighbor slow-peer split-update-group dynamic [permanent]

Google December 2011:Advances in BGP

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Public

139

You might also like