Professional Documents
Culture Documents
Cell Phones Who am I? Who are you? Service Provider Enterprise Studying for CCIE Advanced Class Assume BGP Operational Experience
Basic configuration Show commands
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
Introduction Agenda
Generic Troubleshooting Advice Troubleshooting Peers Bestpath Algorithm Table Version Initial Convergence Periodic Convergence High Utilization Layer 3 VPNs Looking Glasses
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
bgp log-neighbor-changes
Generates a syslog message when a peer goes up or down Always configure this OSPF, ISIS, and EIGRP all have log-neighbor-changes too
BRKRST-3320
Cisco Public
BRKRST-3320
Cisco Public
10
BRKRST-3320
Cisco Public
11
IOS-XR
Only supported on ASR-9000 Use ACLs to control what packets to SPAN
RSPAN
RSPAN has all the features of SPAN, plus support for source ports and destination ports that are distributed across multiple switches, allowing one to monitor any destination port located on the RSPAN VLAN. Hence, one can monitor the traffic on one switch using a device on another switch.
BRKRST-3320
Cisco Public
12
Very handy if a dedicated sniffer is not available Available on IOS and NX-OS
BRKRST-3320
Cisco Public
13
BRKRST-3320
Cisco Public
14
BRKRST-3320
Cisco Public
15
BRKRST-3320
Cisco Public
16
If the filter is red, your syntax is busted If the filter is green, your syntax is correct
BRKRST-3320
Cisco Public
17
BRKRST-3320
Cisco Public
18
service timestamps debug datetime msec localtime service timestamps log datetime msec localtime brain1(config)#access-list 100 permit ip host 1.1.1.1 host 2.2.2.2 brain1#debug ip packet 100 IP packet debugging is on for access list 100 brain1#
BRKRST-3320
Cisco Public
19
Finite number of most recent events are stored Use show commands later to
Display an event in a debug like format Merge events from various protocols
BRKRST-3320
Cisco Public
20
BRKRST-3320
Cisco Public
21
BRKRST-3320
Cisco Public
22
BRKRST-3320
Cisco Public
23
Troubleshooting Peers
R1
R2
interface Loop0 ip address 2.2.2.2/32 ! router bgp 100 neighbor 1.1.1.1 remote-as 100 neighbor 1.1.1.1 update-source Loop0 (state) LISTEN
R1#sh tcp brief all TCB Local Address 64328548 *.179 R1#
BRKRST-3320
Cisco Public
25
R1
R2
interface Loop0 ip address 2.2.2.2/32 ! router bgp 100 neighbor 1.1.1.1 remote-as 100 neighbor 1.1.1.1 update-source Loop0
R1#ping 2.2.2.2 source Loop0 Sending 5, 100-byte ICMP Echos to 2.2.2.2 Packet sent with a source address of 1.1.1.1 ..... Success rate is 0 percent (0/5) R1#
BRKRST-3320
Cisco Public
26
BRKRST-3320
Cisco Public
27
BRKRST-3320
Cisco Public
28
BRKRST-3320
Cisco Public
29
OPEN Message Subcodes shown above The second 2 in 2/2 is the Error Subcode.so Bad Peer AS
BRKRST-3320
Cisco Public
30
R2
10.1.2.2
router bgp 100 no synchronization bgp log-neighbor-changes neighbor 10.1.2.2 remote-as 200 no auto-summary
10.1.2.2
R2
router bgp 200 no synchronization bgp log-neighbor-changes neighbor 10.1.2.1 remote-as 10 no auto-summary
BRKRST-3320
Cisco Public
32
BRKRST-3320
Cisco Public
33
BRKRST-3320
Cisco Public
34
BRKRST-3320
Cisco Public
35
For eBGP peers that are more than 1 hop away a larger TTL must be used
No longer verifies if NEXTHOP is directly connected
Configured TTL
AS65000
BRKRST-3320
Cisco Public
36
R1
R2
BRKRST-3320
Cisco Public
37
R2
BRKRST-3320
Cisco Public
38
R2 is not generating keepalives R2 is generating keepalives but R1 is not receiving them Is R2 out of memory or CPU? Output drops on the outbound interface towards R1? When did R2 last build a keepalive? R2#show ip bgp neighbors 1.1.1.1 Last read 00:00:15, last write 00:00:44, hold time is 180, keepalive interval is 60 seconds
show ip bgp summary Watch R2s MsgSent counter for R1.does it increment?
BRKRST-3320
Cisco Public
39
Ping using peering addresses (loopback to loopback) Ping with mss (max-segment-size) with df-bit set 536 bytes by default Path MTU Discovery finds smallest MTU between R1 and R2 Subtract 40 bytes for TCP/IP overhead R1#sh ip bgp neighbors BGP neighbor is 2.2.2.2, remote AS 2, external link Datagrams (max data segment is 1460 bytes): R1# ping 2.2.2.2 source loop0 size 1500 df-bit
BRKRST-3320
Cisco Public
40
BRKRST-3320
Cisco Public
41
NOTIFICATION
%BGP-5-ADJCHANGE: neighbor 1.1.1.1 Down BGP Notification sent %BGP-3-NOTIFICATION: sent to neighbor 1.1.1.1 4/0 (hold time expired) R2#show ip bgp neighbor 1.1.1.1 | include last reset Last reset 00:01:02, due to BGP Notification sent, hold time expired
Are the keepalives being lost in the cloud? Is R2 having a problem receiving the keepalive?
BRKRST-3320
Cisco Public
42
R2 hasnt received a Keepalive in more than keepalive interval seconds Time to check R1
How is R1 on memory? What is the R1s CPU load? Is R2s TCP window open?
BRKRST-3320
Cisco Public
43
R1#show ip bgp sum | begin Neighbor Neighbor MsgRcvd MsgSent TblVer 2.2.2.2 53 284 10167
OutQ is incrementing due to Keepalives MsgSent is not incrementing Something is stuck in the OutQ The keepalives arent leaving R1!
BRKRST-3320 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Bestpath Algorithm
BRKRST-3320
Cisco Public
47
5 6 7 8 9 10 11 12 13
ORIGIN MED eBGP over iBGP Metric to Next Hop Multiple Paths in RIB Oldest External Wins BGP Router ID CLUSTER_LIST Neighbor Address
Lowest wins
BRKRST-3320
Same concept but will show you all of the multipaths for x.x.x.x
BRKRST-3320
Cisco Public
49
BRKRST-3320
Cisco Public
50
BRKRST-3320
Cisco Public
52
BRKRST-3320
Cisco Public
53
If peer 1.1.1.1 has a table version of #60 this tells us we have informed 1.1.1.1 of all bestpath changes for prefixes with a table version of <= #60 If any prefix has a table version > #60 then we need to inform 1.1.1.1 of that prefixs bestpath Once 1.1.1.1 has been updated his table version will be updated accordingly Same concept for the RIB and its table version
BRKRST-3320
Cisco Public
54
Highest table version of any prefix = main routing table version RIB is converged 1.1.1.1 is converged
BRKRST-3320
Cisco Public
55
5 prefixes experience a bestpath change Highest table version is now #15 Inform the RIB of these 5 changes
Do RIB adds, deletes, and/or modifies When complete, set the RIB table version to #15
BRKRST-3320
Cisco Public
56
You should monitor the table version in your network to determine what is normal for you If the table version is increasing rapidly then that could explain why BGP Router and BGP IO are busy
BRKRST-3320
Cisco Public
57
Initial Convergence
BGP Convergence
HeyWho are you calling slow? Two general convergence situations
Initial startup Periodic route changes
BRKRST-3320
Cisco Public
59
How long initial convergence takes is a factor of the amount of work to be done and the router/networks ability to do this fast and efficiently
BRKRST-3320
Cisco Public
60
BRKRST-3320
Cisco Public
61
2) Calculate bestpaths 3) Install bestpaths in the RIB 4) Advertise bestpaths to all peers
BRKRST-3320
Cisco Public
62
Router Variables
CPU horsepower Code version Outbound Interface Bandwidth
BRKRST-3320
Cisco Public
63
UPDATE Packing refers to how efficiently an implementation packs NLRIs into UPDATEs
Least efficient: BGP only puts one NLRI per UPDATE Most efficient: BGP puts all NLRI with a certain Attribute set in one UPDATE Least Efficient
MED 50 Origin IGP MED 50 Origin IGP 10.1.1.0/24 10.1.2.0/24 MED 50 Origin IGP 10.1.3.0/24 MED 50 Origin IGP
Most Efficient
BRKRST-3320
Cisco Public
64
BRKRST-3320
Cisco Public
65
IP Header IP Header
Attribute NLRI
NLRI ..NLRIs..
..NLRIs.. NLRI
Increased MSS
IP Header
TCP Header
Attribute
NLRI
..NLRIs..
NLRI
..NLRIs..
NLRI
BRKRST-3320
Cisco Public
66
BRKRST-3320
Cisco Public
67
UPDATEs are generated for one member of an update-group and then replicated to the other members
More Efficient Two peers in the same update-group Attribute NLRI NLRI
BRKRST-3320
Cisco Public
68
RR sends out tons of UPDATES to RRCs RRCs send TCP ACKs RR core facing interface(s) receive huge wave of TCP ACKs
TCP ACKs
RRCs
BRKRST-3320
Cisco Public
69
BRKRST-3320
Cisco Public
70
Convergence
How do you know if BGP has converged? Watch the global table version
Increases by 1 for every bestpath change In the lab: Table version stabilizes In the real world: Reaches your normal rate of change
MSS/PMTU
Efficient packaging of BGP messages in TCP
BRKRST-3320
Cisco Public
72
Periodic Convergence
Convergence
How long does it take to process and propagate information about the failure? (t1 to t2)
t0 t1 t2
Failure
Process Propagate
Recovery
BRKRST-3320
Cisco Public
74
BRKRST-3320
Cisco Public
75
A client tells ATF what prefixes he is interested in ATF tracks each prefix
Notify the client when the route to a registered prefix changes Client is responsible for taking action based on ATF notification Provides a scalable event driven model for dealing with RIB changes
BRKRST-3320
Cisco Public
76
BGP
ATF
BRKRST-3320
Debugs
debug ip bgp events nexthop debug ip bgp rib-filter
BRKRST-3320
Cisco Public
78
eBGP multihop
Relies on holdtime or BFD
BRKRST-3320 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
BRKRST-3320
Cisco Public
80
Multihop eBGP #1 Link 1 fails #2 Link 2 fails #3 FSD takes down peer
BRKRST-3320
Cisco Public
81
Off by default
neighbor x.x.x.x fall-over
BRKRST-3320
Cisco Public
82
FSD
Relies on control plane (absence of a route in the RIB) to tear down the peer We could have a route but not have connectivity
BFD
Relies on forwarding plane to detect down peer If we loose connectivity, the peer comes down
BRKRST-3320 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 83
User may see a wave of updates and withdraws to peer X every MRAI seconds User will NOT see a delay of MRAI between each individual update and/or withdraw
BGP would never converge if this were the case
BRKRST-3320
Cisco Public
84
Convergence MRAI
MRAI timeline for BGP peer w/ MRAI of 5 seconds T0
The big bang
Bestpath Change #2 Bestpath Change #1
T7
Bestpath Change #1 UPDATE sent immediately MRAI timer starts, will expire at T12
T10
Bestpath Change #2 Must wait until T12 for MRAI to expire
t0
t5
t10
t15
t20
t2 5
T12
MRAI expires Bestpath Change #2 is Txed MRAI timer starts, will expire at T17
MRAI Expires
T17
MRAI expires No pending UPDATEs
BRKRST-3320 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
Convergence MRAI
BGP is not a link state protocol, it is path vector May take several rounds/cycles of exchanging updates and withdraws for the network to converge MRAI must expire between each round! The more fully meshed the network and the more tiers of ASes, the more rounds required for convergence Think about
How many tiers of ASes there are in the Internet How meshy peering can be in the Internet
BRKRST-3320
Cisco Public
86
Convergence MRAI
Internet churn means we are constantly setting and waiting on MRAI timers
One flapping prefix slows convergence for all prefixes Internet table sees roughly 6 bestpath changes per second
BRKRST-3320
Cisco Public
87
High Utilization
Router#show process cpu CPU utilization for five seconds: 100%/0%; one minute: 99%; five minutes: 81% .... 139 6795740 1020252 6660 88.34% 91.63% 74.01% 0 BGP Router
Define High
Know what normal CPU utilization is for the router in question Is the CPU spiking due to BGP Scanner or is it constant?
BRKRST-3320
Cisco Public
90
High Utilization
How to identify route churn?
Do sh ip bgp summary, note the table version Wait 60 seconds Do sh ip bgp summary, compare the table version from 60 seconds ago
You have 150k routes and see the table version increase by 300
This is probably normal route churn Know how many bestpath changes you normally see per minute
You have 150k routes and see the table version increase by 150k
This is bad and is the cause of your high CPU
BRKRST-3320
Cisco Public
91
High Utilization
What causes massive table version changes? Flapping peers
Hold-timer expiring? Corrupt UPDATE?
Route churn
Dont try to troubleshoot the entire BGP table at once Identify one prefix that is churning and troubleshoot that one prefix Will likely fix the problem with the rest of the BGP table churn
BRKRST-3320
Cisco Public
92
High Utilization
Table Version Changing Rapidly: A Little Lab Fun
RP/0/RP0/CPU0:XR#sh route | include 00:00: Wed Apr 27 13:53:40.201 EDT O 1.0.0.0/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 O 1.0.0.4/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 O 1.0.0.8/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 O 1.0.0.12/30 [110/3] via 10.1.2.1, 00:00:00, GigabitEthernet0/0/0/1 ... RP/0/RP0/CPU0:XR#sh route | include 00:00: Wed Apr 27 13:53:44.162 EDT B 1.0.0.0/30 [20/2] via 1.1.1.1, 00:00:01 < 4 seconds later B 1.0.0.4/30 [20/2] via 1.1.1.1, 00:00:01 B 1.0.0.8/30 [20/2] via 1.1.1.1, 00:00:01 B 1.0.0.12/30 [20/2] via 1.1.1.1, 00:00:01 ...
BRKRST-3320
Cisco Public
93
High Utilization
Table Version Changing Rapidly: A Little Lab Fun
RP/0/RP0/CPU0:aggies#sh ip bgp 1.0.0.4 Wed Apr 27 14:00:36.066 EDT ... Last Modified: Apr 27 14:00:35.387 for 00:00:00 Paths: (1 available, no best path) ... 100 1.1.1.1 (inaccessible) from 1.1.1.1 (1.1.1.1) ...
High Utilization
Something is wrong with NEXTHOP 1.1.1.1 Flip flops between inaccessible and accessible with an IGP cost of 2 Troubleshoot 1.1.1.1 and the churning will stop
BRKRST-3320
Cisco Public
95
Layer 3 VPNs
Layer 3 VPNs
#1
PE1
PE2
#2
#2
CE1
CE2
BRKRST-3320
Cisco Public
97
Layer 3 VPNs
#3 PE PE vrf connectivity
Can PEs ping the vrf interface of the other PE? If not double check your import/export Route Targets
PE1
#3
PE2
#4 PE CE connectivity
Verify each PE can ping the CE connected to the other PE
#4 #5
CE1 CE2
#4
#5 CE CE connectivity
At this point you should be able to ping CE to CE
BRKRST-3320
Cisco Public
98
Looking Glasses
BRKRST-3320
Cisco Public
100
BGP Looking Glass servers are computers on the Internet running one of a variety of publicly available Looking Glass software implementations. A Looking Glass server (or LG server) is accessed remotely for the purpose of viewing routing info. Essentially, the server acts as a limited, read-only portal to routers of whatever organization is running the Looking Glass server. Typically, publicly accessible looking glass servers are run by ISPs or NOCs. http://www.bgp4.as/looking-glasses
BRKRST-3320
Cisco Public
102
http://whois.arin.net/ui
BRKRST-3320 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 103
BRKRST-3320
Cisco Public
104
BRKRST-3320
Cisco Public
105
Or lookup a specific AS
http://whois.arin.net/rest/asn/AS1239/pft
BRKRST-3320
Cisco Public
106
The University's Route Views project was originally conceived as a tool for Internet operators to obtain real-time information about the global routing system from the perspectives of several different backbones and locations around the Internet. Although other tools handle related tasks, such as the various Looking Glass Collections (see e.g. NANOG, or the DTI NSPIXP-2 Looking Glass), they typically either provide only a constrained view of the routing system (e.g., either a single provider, or the route server) or they do not provide real-time access to routing data. While the Route Views project was originally motivated by interest on the part of operators in determining how the global routing system viewed their prefixes and/or AS space, there have been many other interesting uses of this Route Views data. For example, NLANR has used Route Views data for AS path visualization (see also NLANR), and to study IPv4 address space utilization (archive). Others have used Route Views data to map IP addresses to origin AS for various topological studies. CAIDA has used it in conjunction with theNetGeo database in generating geographic locations for hosts, functionality that both CoralReef and the Skitter project support.
Maximize your Cisco Live experience with your free Cisco Live 365 account. Download session PDFs, view sessions on-demand and participate in live activities throughout the year. Click the Enter Cisco Live 365 button in your Cisco Live portal to log in.
Cisco Public 108
BRKRST-3320