BRKDCT-3144 Dipl.-Ing. Andreas la Quiante alaquian@cisco.com Nexus Product Management Cisco Data Center Group Level 3 11:30 Chapter 0: Housekeeping Our ASICs are starting with zero, so do we today ;-) 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Housekeeping Contribution & Great Help from: Matt Martin Ron Roland Dmitry Ronald Adam Need help like me? Terri They provided feedback, answers to my questions. Also I borrowed some contentDanke! Anver Q&A Moderator 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Housekeeping Icons N7K Switch Router PC Layer 3 Layer 2 Focus Notes N7004-Berlin# sh int e 3/12 CLI my notes Geek content Error/Failure/Challenge Cisco TAC Partner request: Include real cases/examples Reference Slide Hidden Slides 148 in total [28-JAN-14] Interface 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 8 Housekeeping The Rides 1 2 3 4 5 6 Strategy Tools & System Data-Plane Layer 2 Backup Data-Plane Layer 3 Cisco Live 2014: 90 min Control-Plane Inband Control-Plane ARP 1 Layout of each chapter, like a train/subway line with a color code 25 20 12 12 8 1 1 WebEx 120 min for YOU 12-FEB-14 10:00 CET opt-in: alaquian@cisco.com subject BRKDCT-3144 11:35 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 9 Housekeeping Adjacent Sessions Breakouts Avoiding any content overlap since we have only 90 minutes today please consider the following sessions to complement your skills: BRKDCT-2049 Overlay Transport Virtualization BRKDCT-2048 Deploying Virtual Port Channel in NX-OS (vPC) BRKDCT-3144 Troubleshooting Cisco Nexus 7000 Series Switches BRKDCT-2121 VDC Design and Implementation Considerations with Nexus 7000 BRKNMS-2695 Admin.and Mon.of the Cisco DataCenter with Cisco Prime DCNM BRKDCT-3346 End-to-End QoS Implementation and Operation with Cisco Nexus BRKDCT-3445 Building scalable data center networks with NX-OS and Nexus 7000 BRKDCT-3145 Adv - Troubleshooting Cisco Nexus 5000 / 2000 Series Switches BRKARC-3470 Advanced - Cisco Nexus 7000/7700 Switch Architecture BRKDCT-2081 Cisco FabricPath Technology and Design BRKDCT-3103 Advanced OTV Config/Troubleshooting OTV in the network 0 Chapter 1: Strategy, Tools and System Strategy CLI Ethanalyzer ELAM ELAME System 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Strategy Three areas Direction System Troubleshooting - Core, CPU, Memory, Interface/Vlan behaving odd, hardware challenges 1 Data Plane Troubleshooting - Packets are lost - your primary questions is where - 100% loss or partial loss - consistent or periodically Control Plane Troubleshooting - Something is flapping - Convergence challenges - start at the process (log) ELAM(E) Anything better than checking everything is an improvement ALQ 2014 L2 Inband ARP TCAM L3 L3 ELAM(E) Ethanaylzer ELAM: Embedded Logic Analyzer Memory leaks not common on NX-OS Each process has an max-mem limit N7K# sh sys internal memory-alert-log 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 12 Strategy System, Data-Plane, or Control-Plane I/O Module (Forwarding Engine) I/O Module (Forwarding Engine) System Control- Plane Data- Plane 1 2 3 6 4 5 Reference Point 2 Reference Point 1 Supervisor (Control-Plane) Tools 13 11:40 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools NX-OS Logging and a Powerful CLI NX-OS Value NX-OS is build up with most extensive, fine granular logging capabilities NX-OS CLI, SNMP XML, Python GUI, OF OnePK Chef, Puppet High Performance Feature Rich Switching Logging Switching Logging Configuration 1 NX-OS: Build in Flight Recorder Standard CLI Engineering CLI Internal keyword output is not documented 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools Show Tech Show tech ABC Always try to use the detailed version show tech detail
Feature Event history
States (PSS,...)
HW states Always redirect to a file Always use a separate file per show tech 1 Suggestion for VDC: BRKDCT-2121 VDC Design and Implementation Considerations Global Service VDC-1 Default Feature project binary logger Significan time saver Show tech all-binary Avoiding also we need show tech A after a while doing RCA we need show tech B For use by TAC/BU/ENG t0 t2 t3 t1 t0 to t2 trigger failure Immediately collect data! Then start troubleshooting 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 17 Tools Custom ASICs provide detaild counters via CLI ASICs Some error counters are part of a normal operation (e.g. dropping packets at ingress trunk if the marked VLAN is not known (CBL drops), diag packets, extra flooded packets) One of TACs favourite commands. Use all to look for all modules / ASICs 1 N7004-Berlin# show hardware internal errors module 3 |------------------------------------------------------------------------| | Device:Clipper MAC Role:MAC Mod: 3 | | Last cleared @ Mon Nov 25 21:41:37 2013 | Device Statistics Category :: ERROR |------------------------------------------------------------------------|
Instance:2 Cntr Name Value Ports ----- ---- ----- ----- 0 GD GMAC bad character interrupt 0000000000000002 12 - 1 GD GMAC sequence error interrupt 0000000000000002 12 - 2 GD GMAC transition from nosync to sync int 0000000000000002 12 - 3 GD GMAC transition from sync to nosync int 0000000000000001 12 - 4 PL ingress_cbl_drop 0000000000003426 12 - GD GMAC Build in MAC Controller Our innovative ASICs provide many counters ASICs are a great source of information (esp. for the Data Path) e.g. F1/F2/F3 Non-Zero Counter 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools CLI-Tools: How can I select what I need? Tips & Tricks N7004-Berlin# show system internal pktmgr interface <SNIP> Vlan1, ordinal: 38 Hash_type: 1 SUP-traffic statistics: (sent/received) Packets: 2769 / 1896 Bytes: 1619370 / 241310 Instant packet rate: 1 pps / 0 pps Packet rate limiter (Out/In): 0 pps / 0 pps Average packet rates(1min/5min/15min/EWMA): Packet statistics: Tx: Unicast 1123, Multicast 1641 Broadcast 5 Rx: Unicast 163, Multicast 1730 Broadcast 3
N7004-Berlin# show system internal pktmgr interface |i or|I <SNIP> Vlan1, ordinal: 38 Hash_type: 1 Instant packet rate: 0 pps / 0 pps Packet rate limiter (Out/In): 0 pps / 0 pps port-channel100, ordinal: 72 Hash_type: 1 Instant packet rate: 1 pps / 1 pps Packet rate limiter (Out/In): 0 pps / 0 pps If I am only interested in parts of the output I can ask for just those items You save time by having to read less Nexus# sh ver | ? egrep Egrep - grep Grep - head Displ 1 st ln last Displ last less Filter no-more sed wc Count begin Begin with count Count exclude Exclude ln include Include ln 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools Log Window (upcoming) 1 N7K# source logw.py 15/01/2014 12:24:55 100 starting with empty stats stats init done Logw system check port version 0.060813 Time range 2014-01-15 12:24:55 ... 2014-01-15 12:26:35 Got 343 show ... event-history clis 244 clis left after pre-filtering collecting outputs...done, collected 2602 events in 96.197735 seconds sorted
<snip> Trigger logw.py [-h] [-v] [-f FILTERS] [-t TRUNCATE] [-n MAX_EVENTS] [-s] start_date start_time duration Logfile 10MB logfile NVRAM On-board Event History Tip: show log log immediately displays the logfile output, and is faster than show log which has to read the logging severity settings On-Board: Major state changes, MTS transactions, useful for module troubleshooting
It is a good idea to synchronize all devices in your network to one time source 11:45 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools Accounting Log or who did it? Audit Recording N7004-Berlin# show accounting log | last 3 Mon Dec 2 03:33:05 2013:type=update:id=console0:user=admin:cmd=switchto ; configure terminal ; interface port-channel110 ; shutdown (SUCCESS) Mon Dec 2 03:33:08 2013:type=update:id=console0:user=admin:cmd=switchto ; configure terminal ; interface port-channel110 ; no shutdown (REDIRECT) Mon Dec 2 03:33:08 2013:type=update:id=console0:user=admin:cmd=switchto ; configure terminal ; interface port-channel110 ; no shutdown (SUCCESS)
N7004-Berlin(config)# terminal log-all
N7004-Berlin(config)# show accounting log all | last 2 Mon Dec 2 03:53:28 2013:type=update:id=console0:user=admin:cmd=switchto ; show accounting log all | last 2 (SUCCESS) Mon Dec 2 03:52:11 2013:type=update:id=console0:user=admin:cmd=switchto ; show hardware internal errors all (SUCCESS) Only configuration commands are captured by default. Enable all commands to be captured with terminal log-all (feature requires 5.x NX-OS or higher) 1 Trigger With the log informaton you see what happend With the accounting-log you see who triggered it and more importantly which action triggered it logw.py 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools Ethanalyzer Guidance In production networks usable (low risk) 1 Use display mode so simply verify packet is present Use pcap-file for detail analysis outside the device at a later time Ethanalyzer Kernel N7004# ethanalyzer local interface inband decode-internal limit-frame-size 150 display Capturing on inband 2013-12-07 16:04:07.855965 Cisco_b5:26:49 -> PVST+ STP 96 RST. Root = 327 68/1/00:0c:30:8b:a0:40 Cost = 2 Port = 0x9063 <SNIP> Allows quickly to narrow the failure domain Ethanalyzer captures traffic to/from the control plane and not the data plane Inband Reference Point 1 Reference Point 2 ELAM(E) Ethanaylzer Ethanalyzer captures traffic for Control-Plane Troubleshooting The second reference point serves traffic for both Control- and Data- plane Troubleshooting Ingress 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools ELAM & ELAME ELAM & ELAME Embedded Logic Analyzer Module 1 It is widely used by engineering, QA, TAC and escalation teams ELAM is an unsupported and internal tool ELAM requires a great deal of platform architecture and ASIC knowledge to use. This limits the audience of the raw tool. Identifying the appropriate FE, creating triggers, and interpreting ELAM data for complex flows requires full architectural and forwarding knowledge
Good news: ELAME makes ELAM easy to use
skill ELAME F-Series M-Series 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools ELAM/ELAME Motivation workflow 1 Determin the FE Configure Trigger Start ELAM Analyze ELAM allows you to verify if a packet is present and/or to analyze ELAME allows you to verify quickly if a packet is present, especially in a complicated setup it saves you TIME! Use cases:
1) Determining the failure domain 2) Analyze the System behaviour IP 42.42.42.1 MAC aaaa.bbbb.cccc IP 42.42.42.2 MAC aaaa.bbbb.dddd You MUST know the source and destination MAC/IP pairs involved for troubleshooting. Is the source and/or destination dual-homed? Is the source and/or destination real or virtual? ureka, Lamira Orion, Clipper, Flanker 11:50 Attach to the module 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 34 Tools ELAME, Part 1 ELAME 1 N7004-Paris# source sys/elame 10.0.2.2 224.0.0.5 elam helper, version 1.015 ... source 10.0.2.2, destination 224.0.0.5 ... getting current vdc ... 4 ... ingress interface derived from source address ... ingress interface list is Ethernet4/1 ... expanded ingress interface list is Ethernet4/1 ... FE instance list is 4/1/1 ... setting trigger... ... elam trigger set ... starting capture... ... elam capture started ... no packet captured so far press [enter] when packets in question are known to have been sent ... packet captured at FE: 4/1/1 ... capture instance 4/1/1 (slot/type/instance) Since NX-OS 6.2(2) we include elame.tcl in the distribution: Berlin 10.0.2.2/24 Paris 10.0.2.4/24 Do we receive OSPF packets from our neighbor on E 4/1? E 4/1 M-Series line card skill ELAME F-Series M-Series Because ELAM especially on M-Series is complicated this example show how easy it is to use ELAME ELAME works on F2 and M-Series line cards with IPv4 You just specify source and destination address the tool determines the correct FE to programm even on M-Series Modules 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 36 Tools ELAME, Part 3 ELAME 1 N7004-Paris# source sys/elame 10.0.2.2 224.0.0.5 <SNIP> ... packet captured at FE: 4/1/1 ... capture instance 4/1/1 (slot/type/instance) +++ IPv4 packet: 86 bytes from MAC 4055.390f.5642 / IP 10.0.2.2 to MAC 0100.5e00.0005 / IP 224.0.0.5 TTL 1 +++ protocol OSPF +++ packet received on interface Eth4/1 vlan 0 (source index 0x00030) ... rbus: ccc 0x0 cap1 0x1 cap2 0x1 flood 0x1 dest_vlan 0 dest_index 0x00032 l2_fwd 0x0 +++ packet is flooded to BD 50 / vlan 0 ... destination index is NOT from L2 table lookup +++ copy of the packet is sent to CPU ... lamira OFE: rdt 0x0 dest_index 0x010c7 flood 0x0 l2fwd 0x0 ofe_drop 0x0 +++ lamira OFE exception(s): CPP_LIF (0x200000000) ... FE instance 4/1/1 context after analysis: pb2 retried ... done DBUS and RBUS captured, easy tool even on M-Series line cards (here N7K-M224) skill ELAME F-Series M-Series E 4/1 LTL 0x30 SUP LTL 0x10C7 Paris Berlin Lamira Eureka The lines beginning with +++ are the important once ELAM(E) Ethanaylzer 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools ELAM ELAM F2 Embedded Logic Analyzer Module 1 F2 no PB for ELAM (:= more simple but the recommendation is to still use ELAME like the pros) Clipper: Layer 2 ELAM and/or Layer 3 ELAM module-3# elam asic clipper instance 2 Module-3(clipper-elam)# layer 3 module-3(clipper-l3-elam)# trigger dbus ipv4 if source-ipv4-address 42.42.42.142 module-3(clipper-l3-elam)# trigger rbus ofe if trig module-3(clipper-l3-elam)# start module-3(clipper-l3-elam)# status <SNIP> L2 L3 Clipper FE2 E3/12 OFE IFE OFE := Outgoing Pipeline IFE := Incomming Pipeline Status: Armed := waiting for the packet Status: Triggered := we have captured 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools ELAM, DBUS ELAM F2 Embedded Logic Analyzer Module 1 42.42.42.142 E 3/12 F-Series line card module-3(clipper-l3-elam)# show dbus -------------------------------------------------------------------- Clipper Instance 02 - Capture Buffer On L3 DBUS: <SNIP> -------------------------------------------------------------------- L3 DBUS CONTENT - IPV4 PACKET -------------------------------------------------------------------- <SNIP> l2-packet-length : 0x52 ingress-lif : 0xfca vlan-id : 0x2a ilm-addr : 0x32 source-index : 0x402 destination-index : 0x0 frame-type : 0x5 sequence-number : 0x94 l2-frame-type : 0x0 l4-protocol : 0x59 recirc-preserve-acos: 0x0 recirc-multicast-bridge-disable: 0x0 ipv4_l4_info_elsewhere_1: 0x0 ipv4_l4_info_elsewhere_2: 0x0 destination-mac-address: 0100.5e00.0005 source-mac-address: 0010.7be8.53b0 source-ipv4-address: 42.42.42.142 Destination-ipv4-address: 224.0.0.5 Berlin 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools ELAM, RBUS ELAM F2 Embedded Logic Analyzer Module 1 42.42.42.142 E 3/12 F-Series line card module-3(clipper-l3-elam)# show rbus -------------------------------------------------------------------- Clipper Instance 02 - Capture Buffer On L3 RBUS: <SNIP> -------------------------------------------------------------------- L3 RBUS OFE CONTENT -------------------------------------------------------------------- OFE valid: 0x1 trig : 0x1 l2-l3-acos : 0x0 <SNIP> dvif : 0x0 vlan : 0x2a md-di-valid : 0x0 redirect : 0x0 ccc : 0x4 l2-forward : 0x1 routed : 0x0 eid-select : 0x0 lif-status-enable : 0x1 bcn-compatible : 0x0 VID 42:= 0x2a Berlin 11:55 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Tools ELAM ELAM F2 Embedded Logic Analyzer Module 1 module-3# elam asic clipper instance 2 Module-3(clipper-elam)# layer 2 Module-3(clipper-l3-elam)# trigger dbus ipv4 if destination-ipv4-address 42.42.42.142 Module-3(clipper-l3-elam)# trigger rbus ingress if trig L2 L3 Clipper FE2 E1/12 egr ingr Since the former example indicated no Layer 3 rewrite we look now into Layer 2 ELAM (still looking for Layer 3 information) module-3(clipper-l2-elam)# show rbus <SNIP> inner-cos : 0x0 acos : 0x0 di-ltl-index : 0x8015 l3-multicast-di : 0x0 source-index : 0x402 vlan-id : 0x2a index-direct : 0x0 eid-sel : 0x0 vqi : 0xfa v5-fpoe-idx : 0xf9 l3-fpoe-idx : 0x0 l3-multicast-v5 : 0x0 dft : 0x0 dfst : 0x0 System Troubleshooting 42 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 44 System Troubleshooting Is my Interface operational and who owns my Interface? Ethernet IF E 3/12 1 N7004-Berlin# show int eth 3/13 Ethernet3/13 is down (SFP not inserted) N7004-Berlin# show int eth 3/12 Ethernet3/12 is up The Interface could be described as the Port-ASIC including the MAC Controller
Another view would be the Software Process in the Control Plane Ethpm (:= Ethernet Port Manager) An up-to-date network drawing helps Ethpm VID 1 VID 42 STP Vlan Mgr show system internal vpcm event interaction 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 45 System Troubleshooting Interface shutdown timeout Ethernet IF E 1/27 1 Ethpm Phy_off 802.1X PIXM ACL QOS L2FM STP N7K(config)# interface e1/27 N7K(config-if)# shut N7K# show inter e1/27 Ethernet1/27 is down (Internal-Fail errDisable, libeventseq: sequence timeout) Processes and Services are depending on each other Collect information about the whole environment: (e.g. Show tech ) As you likely dont know all dependent processes Ethpm is interacting with each service sequencially (Request and Response) OK, how about shutting down a port (e.g. e1/27)? N7K(config-if)# shut 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 48 System Reducing MTTR Core Files 1 Collect cores form all locations on the active (dont forget your standby SUP) and attach them to a TAC case right away N7004# show cores vdc-all VDC Module Instance Process-name PID Date(Year-Month-Day Time) --- ------ -------- --------------- -------- ------------------------- VDC Module Instance Process-name PID Date(Year-Month-Day Time) --- ------ -------- --------------- -------- ------------------------- 1 17 1 pixmc 2134 2013-10-28 16:52:48 1 8 1 pixmc 2134 2013-10-28 16:52:50 1 16 1 pixmc 2134 2013-10-28 16:52:50 If you find a file, be prepared to send it (a selection) or attach them right away to the case to save time SR 123 2010 Jul 17 00:30:18 vrt001 %$ VDC-1 %$ %SYSMGR-SLOT8-2-SERVICE_CRASHED: Service "mtm" (PID 1600) hasn't caught signal 6 (core will be saved). SUP-A active SUP-B stdby VDC-1 VDC-1 VDC-2 VDC-2 Here you see slot 8 := you know the line card and MTM is a line card process look in dir bootflash:core (of both SUPs) to make sure...and clean up... 12:00 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 52 Strategy Tools and System Troubleshooting Summary & Take Away Flight Recorder Strategy CLI Ethanalyzer ELAM Baseline your network, know your counters and have a show tech in a good state, have an up-to-date drawing Right after the challenge collect a show tech detail if feasible Traffic to/from the control plane: Wireshark DataPlane capture with Nexus 7000: ELAME Look at normal counters, logs and if you need more: remember show hardware internal errors all 1 ELAME System Chapter 2: Data- Plane Layer 2 2 MAC Table PIXM L2FM STP 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Data-Plane Failure Domain Failure Domain I am loosing packets between A and B? How can I determine where 2 Determine Failure Domain Quickly 100% traffic loss: Table not progammed Wrongly programmed Inconsistency 100% traffic loss? X % traffic loss? ELAME X % traffic loss: Congestion? Periodically? Timer/Aging event (e.g. MAC Table) A B 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 55 Architecture Three Stage Fabric, Troubleshooting At the ingress forwarding engine for unicast
multicast replication occures at the egress line card
Congestion F-Series (Ingress) M-Series (Egress) Ingress Module First Stage Egress Module Third/Last Stage EALR 8 SoC Xbar Xbar Xbar EALR 8 SoC Suggestion for QoS & Queueing: BRKDCT-3346 End-to-End QoS Implementation and Operation 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 56 Data-Plane Congestion: X % packet drops Troubleshooting
N7009-Lagos# show hardware internal errors all |------------------------------------------------------------------------| | Device:Sacramento Xbar ASIC Role:FABRIC Mod: 9 | | Last cleared @ Fri Nov 15 02:19:12 2013 | Device Statistics Category :: ERROR |------------------------------------------------------------------------| Instance:0 ID Name Value Ports -- ---- ----- ----- 2129 FB09-P21 LOW_BP_CNT_IN 0000000000000099 1-48 I1-2 |------------------------------------------------------------------------| | Device:Clipper XBAR Role:QUE Mod: 9 | | Last cleared @ Fri Nov 15 05:18:38 2013 | Device Statistics Category :: CONGESTION |------------------------------------------------------------------------| Instance:0 ID Name Value Ports -- ---- ----- ----- 132 VQ credited pkt replica VOQ tail drops 0000000000000189 1-4 - 137 VQ credited pkt replica drop count 0000000000000189 1-4 - 9602 VQ VQI 204 CCOS 3 drop count 0000000000000189 1-4 - Clipper Sacramento BP := Backpressure Not the first command to use Displays non-zero error counters CMD test with x packets - CMD again System FPGA Version on FAB2 needs to be PM 0.007 for SUP-2/2E Q Verify our System status before troubleshooting 12:05 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 57 Data-Plane Line Card Architecture LC Families EARL based Line Cards M-Series (:= M1, M2)
SoC based Line Cards F-Series (:= F1, F2, F3) M2 2 x per LC SoC e.g. F2E Clipper up to 60 mpps per SoC Fabric ASIC Fabric ASIC EARL 8 Up to 60mpps L2 L3 P R Q Q:= Queuing Engine R:= Replication Engine P:= Port ASIC FE .= Forwarding Engine F1 16 x SoC F2/F2E 12 x SoC F3 N7K (and all 1G/10G) 6 x SoC F3 N77 12 x SoC Q R P FE M-Series F-Series Suggestion: BRKARC-3470 Advanced - Cisco Nexus 7000/7700 Switch Architecture 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Data-Plane
Forwarding Similar: show platform hardware capacity forwarding on C6K N7004-Berlin# show hardware internal forwarding engine usage slot 4 Forwarding Engine Usage ----------------------- Module inst pps peak pps 4 1 0 4 @Tue Nov 26 20:17:33 2013
N7004-Berlin# show hardware internal statistics module 3 rates Hardware statistics on module 03: + ============================= + Clipper MAC Instance 0 + ============================= |-- Ingress IN | |--- Packets/sec | | |--- 2: 0 | | |--- 1: 0 | | |--- 3: 0 | | |--- 4: 0 | | |--- sum: 0 | |--- Bytes/sec | | |--- 2: 3 | | |--- 1: 75 | | |--- 3: 91 | | |--- 4: 0 | | |--- sum: 169 |-- Egress OUT | |--- Packets/sec | | |--- 2: 0 | | |--- 1: 0 | | |--- 3: 0 | | |--- 4: 0 | | |--- sum: 0 | |--- Bytes/sec | | |--- 2: 3 | | |--- 1: 76 | | |--- 3: 87 | | |--- 4: 73 | | |--- sum: 239 This command works for M-Series line cards This command works for F-Series line cards FE 0 E 3/1 vPC PKA E 3/2 & 3/3 vPC PL Module 3: F2 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 60 Data-Plane Line Card Components LC Internals module-1# show hardware internal dev-port-map -------------------------------------------------------------- CARD_TYPE: 12 port 100G >Front Panel ports:12 -------------------------------------------------------------- Device name Dev role Abbr num_inst: -------------------------------------------------------------- > Flanker Eth Mac Driver DEV_ETHERNET_MAC MAC_0 12 > Flanker Fwd Driver DEV_LAYER_2_LOOKUP L2LKP 12 > Flanker Xbar Driver DEV_XBAR_INTF XBAR_INTF 12 > Flanker Queue Driver DEV_QUEUEING QUEUE 12 > Sacramento Xbar ASIC DEV_SWITCH_FABRIC SWICHF 2 > Flanker L3 Driver DEV_LAYER_3_LOOKUP L3LKP 12 > EDC DEV_UNDEFINED PHYS 12 +-----------------------------------------------------------------------+ +----------------+++FRONT PANEL PORT TO ASIC INSTANCE MAP+++------------+ +-----------------------------------------------------------------------+ FP port | PHYS | MAC_0 | L2LKP | L3LKP | QUEUE |SWICHF 1 0 0 0 0 0,1 2 1 1 1 1 0,1 3 2 2 2 2 0,1 4 3 3 3 3 0,1 5 4 4 4 4 0,1 <SNIP> EDC0 EDC1 Flanker 0 Flanker 1 SAC0 SAC1 000c.308b.a040 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 62 Layer 2 Hardware Learning Layer 2 Berlin PO 110 MAC Address Table (16K, 64K, or 128K) 2 N7004-Berlin# show mac address-table vlan 1 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, (T) - True, (F) - False VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID ---------+-----------------+--------+---------+------+----+------------------ G 1 0000.0c9f.f001 static - F F sup-eth1(R) G 1 4055.390f.5642 static - F F sup-eth1(R) * 1 4055.390f.5643 static - F F vPC Peer-Link * 1 000c.308b.a040 dynamic 0 F F Po110 MAC Address Table 000c.308b.a040 N7004-Berlin# show hardware internal forwarding f2 l2 table utilization L2 entries: Module inst total used mcast ucast lines lines_full 3 0 16384 15 0 15 512 0 N7004-Berlin# show hardware internal forwarding l2 table utilization L2 entries: Module inst total used mcast ucast lines lines_full 4 1 131072 22 8 14 8192 0 Sync via CFS 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 64 Layer 2 Hardware Learning, Moves and Aging Layer 2 MAC A MAC Index Flag A PO1 PI_E C 3/3 MAC Index Flag A PO1 PI_E C 3/3 MAC Index Flag A PO1 C 3/3 PI_E MAC C E 1/1 E 2/2 E 3/3 Line Card 1 Line Card 2 Line Card 3 PO1 2 L2FM show mac address-table show hardware mac address-table Learning and Aging optimized for physical and logical ports (:= PC Port Channel) with additional signaling via L2FM L2FM N7004-Berlin(config)# logging level l2fm 6 2013 Dec 17 02:52:46 N7004-London %$ VDC-3 %$ %L2FM-4-L2FM_MAC_MOVE: Mac f0de.f1f2.c804 in vlan 42 has moved from Eth3/37 to Eth3/41 2013 Dec 17 02:53:00 N7004-London %$ VDC-3 %$ %L2FM-4-L2FM_MAC_MOVE: Mac f0de.f1f2.c804 in vlan 42 has moved from Eth3/41 to Eth3/37 12:10 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Layer 2 Looking for the internal history of a MAC address Layer 2 L2FM Looking back in time for a specific MAC Address 2 65 N7004-London(config)# show system int l2fm l2dbg macdb address f0de.f1f2.c804 Legend Db: 0-MACDB, 1-GWMACDB, 2-SMACDB, 3-RMDB, 4-SECMACDB Src: 0-UNKNOWN, 1-L2FM, 2-PEER, 3-LC, 4-HSRP 5-GLBP, 6-VRRP, 7-STP, 8-DOTX, 9-PSEC 10-CLI 11-PVLAN 12-ETHPM, 13-ALW_LRN, 14-Non_PI_MOD, 15-MCT_DOWN, 16 - SDB 17-OTV, 18-Deounce Timer, 19-AM, 20-PCM_DOWN, 21-MCT_UP, 22-L2VPN Slot:0 based for LCS 19-MCEC 20-OTV/ORIB VLAN: 42 MAC: f0de.f1f2.c804 Time If/swid Db Op Src Slot FE Sat Dec 14 22:18:20 2013 0x1a124000 0 INSERT 3 2 9 Sat Dec 14 22:18:20 2013 0x1a124000 0 RESET_LL_UNDERWAY 2 0 15 Sat Dec 14 22:18:51 2013 0x1a124000 0 NON_PI_MOD 3 2 15 Sat Dec 14 22:18:51 2013 0x1a124000 0 NON_PI_MOD 3 2 15 Sat Dec 14 22:18:51 2013 0x1a124000 0 NON_PI_MOD 3 2 15 Sat Dec 14 22:19:31 2013 0x1a124000 0 FLUSH 12 0 15 Sat Dec 14 22:19:31 2013 0x1a124000 0 DELETE 0 0 15 Sat Dec 14 22:19:36 2013 0x1a128000 0 INSERT 3 2 10 12 3 6 N7004-London# show interface snmp-ifindex |i 1a124000
Eth3/37 !Port 437403648 !IFMIB (0x1a124000) !IFINDEX 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 66 Layer 2 Internal Header with Meta Data to make Forwarding Decision and steer Frame Layer 2 LTL := Local Target Logic (e.g. Source Index (SI) and Destination Index (DI) e.g. 0x00402) BD := Bridge Domain E 3/1 Internal Header added by PORT ASIC or SoC (FE)
Ingress L2 Logic learns MAC Address in HW (M & F-Series) Header Packet DI = 402h VLAN, ... Internal Header contains SI, DI, VLAN SI = BAh 402h Org Packet We add an internal header to carry needed information (e.g. Index, VLAN) 2 N7004-Berlin# show hardware mac address-table 3 address 000c.308b.a040 !reformatted! FE | Valid| PI| BD | MAC | Index| Stat| SW | Modi| Age| Tmr| GM| ---+------+---+------+---------------+-------+-----+-----+-----+----+----+--- 0 1 0 17 000c.308b.a040 0x00402 0 0x009 0 121 1 0 2 1 1 17 000c.308b.a040 0x00402 0 0x009 0 121 1 0 + removed Packet 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Layer 2 Internal Indices Layer 2 0402h 8011h BD VLAN VDC 2:17 := 1 DB PO110 A interface is assigned one or more indices One port gets assigned one or more index values, internally we use the concept of bridge domains (which map to VLAN ID) 2 67 N7004-Berlin# show system internal pixm info ltl 0x00402 PC_TYPE PORT LTL RES_ID LTL_FLAG CB_FLAG MEMB_CNT ------------------------------------------------------------------------------ Normal Po110 0x0402 0x1600006d 0x00000000 0x00000002 1 Member rbh rbh_cnt Eth3/12 0x000000ff 0x08 CBL Check States: Ingress: Enabled; Egress: Enabled VLAN| BD| BD-St | CBL St & Direction: -------------------------------------------------- 1 | 0x11 | INCLUDE_IF_IN_BD | FORWARDING (Both) Member info ------------------ Type LTL ---------------------- PORT_CHANNEL Po110 FLOOD_W_FPOE 0x8011 N7004-Berlin# show vlan internal bd-info bd-to-vlan 17 VDC Id BD Id Vlan Id ------ ------- ------- 2 17 1 How to convert a BD (in dec) to a VLAN ID 11h = 17 STP ingress/egress 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 68 Layer 2 Internal Indices Table PIXM 000bh E 3/12 0402h 8011h PO110 10C7h 10C8h SUP LTL setup (here) for SUP-2 and NX-OS 6.2(5.41) 2 N7004-London# show system internal pixm info ltl-region =========================================================== PIXM VDC 1 LTL MAP Version: 2 Description: LTL Map for N7K SUP2 Silverstone (all flavors) =========================================================== LTL_TYPE SIZE START END ======================================================================== LIBLTLMAP_LTL_TYPE_PHY_PORT 1024 0x0 0x3ff LIBLTLMAP_LTL_TYPE_PC 3204 0x400 0x1083 LIBLTLMAP_LTL_TYPE_SUP_FUTURE 67 0x1084 0x10c6 LIBLTLMAP_LTL_TYPE_SUP_ETH_INBAND 64 0x10c7 0x1106 ------------------------------------------------------------------- SUB-TYPE LTL ------------------------------------------------------------------- LIBLTLMAP_LTL_TYPE_SUP_INBAND_HQ 0x10c7 LIBLTLMAP_LTL_TYPE_SUP_INBAND_LQ 0x10c8 <SNIP> 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 69 Layer 2 STP STP STP STP root Config BPDU DP DP := Designated Port RP := Root Port BPDU := Bridge Protocol Data Unit RP TCN BPDU Know your port states in a stable condition (:= before the troubleshooting, prepare yourself) Two BPDU types: Configuration BPDUs and TCN BPDUs 2 Tracking Port Role Changes, Root Changes via SYSLOG For vPC with peer switch configuration both devices are sending BPDUs as root. NX-OS 4.2(6), 5.0(2a) logging level spanning-tree 6
%STP-6-PORT_ROLE: Port Ethernet2/1 instance VLAN0001 role changed to designate 12:15 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 70 Layer 2 Spanning Tree, Data Loop STP 2 Symptoms for a Data Loop High link utilization (100%) High CPU and fabric traffic utilization Constant MAC Address re-learning and flapping Exessive output drops on an interface N7004-Berlin# show interface e 3/7 | i rate 30 seconds input rate 24 bits/sec, 0 packets/sec 30 seconds output rate 304 bits/sec, 0 packets/sec 300 seconds input rate 104 bits/sec, 0 packets/sec 300 seconds output rate 424 bits/sec, 0 packets/sec Verify each switch on the redundant path Someone who is supposed to block is forwarding... No loop in my lab today
In the real world we see loops created by blade servers, teaming- nics and hypervisors (:= virtual swiches) 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 71 Layer 2 Spanning Tree, Data Loop STP 2 Verifying systematically the path N7004-Berlin# show spanning-tree interface ethernet 3/7 detail Port 391 (Ethernet3/7) of VLAN0042 is designated forwarding <SNIP> BPDU: sent 1972, received 5 N7004-Paris# show spanning-tree interface ethernet 4/2 detail Port 514 (Ethernet4/2) of VLAN0042 is root forwarding <SNIP> BPDU: sent 5, received 2007 N7004-Berlin# show system internal pktmgr interface ethernet 3/7 Ethernet3/7, ordinal: 80 Hash_type: 2 SUP-traffic statistics: (sent/received) Packets: 2217 / 82 Bytes: 139163 / 17376 Instant packet rate: 0 pps / 0 pps Packet rate limiter (Out/In): 0 pps / 0 pps Average packet rates(1min/5min/15min/EWMA): Packet statistics: Tx: Unicast 0, Multicast 2217 <SNIP> Paris Berlin VID 42 Moscow London E4/17 STP STP pktmgr pktmgr Ethanaylzer Ethanaylzer ELAME 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public STP 75 Layer 2 Spanning Tree STP 2 N7004-London(config-if)# spanning-tree port type edge Warning: Edge port type (portfast) should only be enabled on ports connected to a single host. Connecting hubs, concentrators, switches, bridges, etc... to this interface when edge port type (portfast) is enabled, can cause temporary bridging loops. Use with CAUTION N7004-London(config-if)# show spanning-tree vlan 1 detail VLAN0001 is executing the rstp compatible Spanning Tree protocol Bridge Identifier has priority 32768, sysid 1, address 4055.390f.5643 Configured hello time 2, max age 20, forward delay 15 Current root has priority 32769, address 000c.308b.a040 Root port is 4195 (port-channel100), cost of root path is 2 Topology change flag not set, detected flag not set Number of topology changes 2 last change occurred 0:15:50 ago from port-channel100 <SNIP> What is our STP role? Are we stable? TCN send or received? If yes through which Interface did we received last TCN? In case of an access port enable port-fast STP DP RP TCN BPDU 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Layer 2 STP event history STP STP 2 N7004-London(config-if)# sh spanning-tree internal event-history tree 1 interface port-channel 110 VDC03 VLAN0001 <port-channel110> 0) Transition at 795697 usecs after Sat Dec 14 21:20:53 2013 State: DIS Role: Unkw Age: 0 Inc: no [STP_PORT_EV_UP] <SNIP> 5) Transition at 800886 usecs after Sat Dec 14 21:20:53 2013 State: FWD Role: Root Age: 0 Inc: no [STP_PORT_ROLE_CHANGE] Looking back in time for STP: Event-History 800886 us 795697 us = 5189 us ~ 5.2 ms 12 3 6 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 77 Layer 2 Summary Flight Recorder Show tech vlan Show tech stp Show tech lacp Show tech l2fm Show tech forwarding l2 unicast 2 MAC Table PIXM L2FM STP Always look for the detail option and use it 12:20 Chapter 3 Data-Plane Layer 3 3 uRIB Ex SPAN LC Control-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 79 Layer 3 Unicast Routing Architecture 3 Areas to verify FIB Manager uFDM uRIB OSPF route adj IS-IS RIP IP BGP mRIB RIB fully resolved and used for packets originated by the control plane Is control plane state as expected (route exists, points to expected next hop)?
Is control plane stable?
Is control plane consistent with data plane (route programmed in forwarding plane, consistent with control plane)? Data-Plane Control-Plane Forwarding Hardware Neighbor management Protocol database Add/Delete prefixes Translate routes to hardware format Program hardware forwarding engine Push routes to platform Route download Control-Plane Data-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Layer 3 Unicast L3 Paris 42.42.42.4 Ip ospf-42 42.42.42.142 11.0.0.1/32 VID = 42 3 N7004-Paris# show ip ospf 42 internal txlist urib ospf 42 ospf process tag 42 ospf process instance number 1 ospf process uuid 1090519321 ospf process linux pid 7746 <SNIP>
OSPFv2->URIB transmit list: version 0x10 85 N7004-Paris# show processes cpu sort |i PID|7746 PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 7746 10450 502752 0 0.00% 0.01% 0.01% - ospf uRIB route adj ARP 5 13: 42.42.42.0/24 14: 11.0.0.1/32 15: 10.0.2.0/24 16: 10.0.4.0/24 16: RIB marker OSPF-42 SAP 320 Assumption: Control-Plane is stable, OSPF receives LSAs we look at the flow of information from OSFP to HW, if not good check a) configuration (hidden slides before this one) b) look for Control Plane Control-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 86 Layer 3 Unicast L3 uRIB OSFP route adj N7004-Paris# show ip route ospf-42 detail <SNIP 255.255.255.255/32, ubest/mbest: 1/0 *via sup-eth1, [0/0], 01:59:22, broadcast 11.0.0.1/32, ubest/mbest: 1/0 *via 42.42.42.142, Vlan42, [110/41], 01:57:18, ospf-42, inter
N7004-Paris# sh ip arp 42.42.42.142 <SNIP> IP ARP Table Total number of entries: 1 Address Age MAC Address Interface 42.42.42.142 00:03:39 0010.7be8.53b0 Vlan42
OSPF Routes in URIB Administrative distance assigned Is there a route to the destination ? Do we have a resolved Layer 2 address? N7004-Paris# sh ip ospf 42 route <SNIP> 11.0.0.1/32 (inter)(R) area 0.0.0.0 via 42.42.42.142/Vlan42 , cost 41 distance 110 (D) route is directly attached (R) route is in RIB Control-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public uFDM uRIB client route adj 87 Layer 3 Unicast L3 Forwarding Hardware FIB Manager 3 N7004-Paris# show forwarding ipv4 route 11.0.0.1 module 4 IPv4 routes for table default/base ------------------+------------------+----------------------+----------------- Prefix | Next-hop | Interface | Labels ------------------+------------------+----------------------+----------------- 11.0.0.1/32 42.42.42.142 Vlan42
N7004-Paris# show forwarding adjacency 42.42.42.142 module 4 IPv4 adjacency information next-hop rewrite info interface -------------- --------------- ------------- 42.42.42.142 0010.7be8.53b0 Vlan42
N7004-Paris# show ip arp 42.42.42.142 Address Age MAC Address Interface 42.42.42.142 00:08:56 0010.7be8.53b0 Vlan42
Is adjacency consistent with ARP In the control plane? Verifying on the ingress line card Hardware forwarding (FIB) information on per-module basis Displays hardware adjacency table information Data-Plane 12:25 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public uFDM uRIB client route adj Layer 3 Unicast L3 Forwarding Hardware FIB Manager Verifying on the ingress line card 3 N7004-Paris# show system internal forwarding route 11.0.0.1 module 4 detail
RPF Flags legend: S - Directly attached route (S_Star) V - RPF valid M - SMAC IP check enabled G - SGT valid E - RPF External table valid
N7004-Paris# show system internal forwarding adjacency entry 0xa038 module 4 Device: 1 Index: 0xa038 dmac: 0010.7be8.53b0 smac: 0055.390f.5644 e-vpn: 7 e-lif: 0x35 packets: 0 bytes: 0 88 Data-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 89 Layer 3 Unicast Verificaton Location L2/L3 reachability for multicast and max. unicast 3 N7004-Paris# ping multicast 224.0.0.5 interface vlan 42 PING 224.0.0.5 (224.0.0.5): 56 data bytes 64 bytes from 42.42.42.5: icmp_seq=0 ttl=254 time=0.836 ms 64 bytes from 42.42.42.5: icmp_seq=1 ttl=254 time=0.685 ms 64 bytes from 42.42.42.5: icmp_seq=2 ttl=254 time=0.613 ms <SNIP> 64 bytes from 42.42.42.142: icmp_seq=0 ttl=254 time=4.461 ms 64 bytes from 42.42.42.142: icmp_seq=1 ttl=254 time=5.007 ms 64 bytes from 42.42.42.142: icmp_seq=2 ttl=254 time=5.771 ms <SNIP>
N7004-Paris# ping 42.42.42.142 packet-size 1472 PING 42.42.42.142 (42.42.42.142): 1472 data bytes 1480 bytes from 42.42.42.142: icmp_seq=0 ttl=254 time=5.493 ms 1480 bytes from 42.42.42.142: icmp_seq=1 ttl=254 time=5.37 ms 1480 bytes from 42.42.42.142: icmp_seq=2 ttl=254 time=5.337 ms <SNIP> Why not 1500? 1500 20 (IP) -8 (ICMP) = 1472 Ethanalyzer ELAME Debug Better alternatives OSFP Debug Ethanalyzer ELAME CoPP RL ICMP Data-Plane Q 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 90 Layer 3 Inconsistency Example 3 N7K# test forwarding inconsistency N7K# show forwarding inconsistency IPV4 Consistency check : table_id(0x13) Execution time : 14327 ms () No inconsistent adjacencies. Inconsistent routes: 1. slot(1), vrf(default), prefix (172.31.38.6/32), Route extra in FIB Software 2. slot(1), vrf(default), prefix (172.31.38.2/32), Route extra in FIB Software Test for inconsistency Data-Plane N7K# show ip route 172.18.144.2 IP Route Table for VRF "default" <SNIP> 172.18.144.0/24, ubest/mbest: 1/0 *via 172.31.38.2, [200/0], 1d22h, bgp-65000, internal, tag 64949
N7K# show ip fib route 172.18.144.2 <SNIP> ------------------+------------------+----------------------+-------- Prefix | Next-hop | Interface | Labels ------------------+------------------+----------------------+--------- *172.18.144.0/24 0.0.0.0 Null0 How can we recover? (show forwarding ipv4 route 172.18.144.2 module 1) FIB Manager uRIB route 12:30 IDS 91 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 92 Layer 3 Internal Security Check Security Check This checking drops various illegal packets These drops can be also seen in show hardware internal errors but there they might look a bit more cryptic The checks can be disabled via hardware ip verify in default VDC (for all VDCs)
3 N7004-Paris# show hardware forwarding ip verify module 4 IPv4 IDS Checks Status Packets Failed -----------------------------+---------+------------------ address source broadcast Enabled 0 address source multicast Enabled 0 address destination zero Enabled 0 address identical Disabled -- address reserved Disabled -- address class-e Disabled -- checksum Enabled 0 protocol Enabled 0 fragment Disabled -- length minimum Enabled 0 length consistent Enabled 0 length maximum max-frag Enabled 0 length maximum udp Disabled -- length maximum max-tcp Enabled 0 tcp flags Disabled -- tcp tiny-frag Enabled 0 version Enabled 0 <SNIP> IDS and how do we identify the source or sender? Data-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Layer 3 & Tools Exception SPAN Examples 1 Forwarding Engine Line Card DI := SUP DI := drop Exception Redirect Table SPAN Engine ERSPAN SPAN E 3/37 DI := SUP DI := drop Use inband SPAN - MTU failures - TTL errors - ICMP redirect Use exception SPAN - IP Option fail - IP check - RPF - Unsupported RW 3 N7004-Berlin(config)# monitor session 1 N7004-Berlin(config-monitor)# source interface sup-eth 0 both or N7004-Berlin(config-monitor)# source exception [layer 3|fabricp | other | all] Destination Index := Drop can be changed to SPAN Engine Data-Plane 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Data-Plane Layer 3 Flight Recorder show tech-support routing ip unicast show tech-support ip Show tech-support ospf show tech forwarding l3 unicast mod 4 Show tech eltm Show tech ethpm
3 uRIB Ex SPAN LC Data-Path L2|L3|OVT|MPLS Unicast|Multicast Chapter 4: Control Plane
4 Inband Concept Trigger CoPP Netstack RL Inband 12:35 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Architecture Inband Path Two Tasks Looking for dropped packets which are targeted for the Control Plane Management Port 1G 10G PID Multiple CPU Cores Inband Forwarding Engine CoPP RL OSPF SUP Line Card System Controller High CPU due to: Punted traffic ACL processing Control Plane tasks Indentifying from where/what is being send from/to the CPU Ethanalyzer Kernel ELAME Reference Point 2 Reference Point 1 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane High Inband traffic (CoPP customizing slides hidden for reference) OSPF 192.251.19.22 Syslog messages report OSPF neighbor failures 4 40.9.0.0 2011 Mar 26 15:38:56.395 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process 6467, Nbr 192.251.19.22 on Vlan19 from INIT to DOWN, DEADTIME 2011 Mar 26 15:38:56.584 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process 6467, Nbr 192.251.19.22 on Vlan19 from DOWN to INIT, HELLORCVD 2011 Mar 26 15:39:33.865 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process 6467, Nbr 192.251.19.22 on Vlan19 from INIT to DOWN, DEADTIME 2011 Mar 26 15:39:35.754 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process 6467, Nbr 192.251.19.22 on Vlan19 from DOWN to INIT, HELLORCVD The trigger or why you start looking: 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 99 Control-Plane Stable environment? L3 Resources How long since the route was added? How long since ARP has been updated?
How long have adjacency stayed up?
Can we find previous incarnations of adjacency here?
Log of recent routing changes (can filter out prefix in question)? 4 N7004-Paris# show ip route 11.0.0.1 <SNIP> 11.0.0.1/32, ubest/mbest: 1/0 *via 42.42.42.142, Vlan42, [110/41], 02:50:20, ospf-42, inter
N7004-Paris# show ip arp 42.42.42.142 Address Age MAC Address Interface Address Age MAC Address Interface 42.42.42.142 00:01:29 0010.7be8.53b0 Vlan42
N7004-Paris# show ip ospf neighbors OSPF Process ID 42 VRF default Total number of neighbors: 2 Neighbor ID Pri State Up Time Address Interface 42.0.0.5 1 FULL/BDR 02:52:48 42.42.42.5 Vlan42 200.0.0.10 1 FULL/DR 02:51:44 42.42.42.142 Vlan42 Are we stable? 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Protocol Flapping Failure Domain Determine with Etheranalzer the failure domain
From Prozess point of view: Do I get enough? Do I get too much? Ingress MAC Drops? Ethanalyzer HWRL Drops? CoPP Drops? Inband Drops or FC? Packet Manager? IPv4/IPv6 ARP/AM uRIB Line Card ELAME OSFP Do we receive the packet? Do we receive the packet (e.g. BPDU or LSA at the CPU? CPU? MEM? We verified on the other side we are sending LDP, BGP, OSPF, One real world example in session 5 (ARP) 12:40 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Too much inband traffic OSPF 192.251.19.22 Syslog messages report OSPF neighbor failures CPU states high utilization caused by OSPF and Netstack process 4 40.9.0.0 Here two processes OSPF and NETSTACK are using most resources. How much do they use usually? How does my base line look like?
N7K-1-VDC2# show system resources Load average: 1 minute: 2.92 5 minutes: 2.38 15 minutes: 2.27 Processes : 1267 total, 4 running CPU states : 34.0% user, 42.5% kernel, 23.5% idle Memory usage: 4115232K total, 3638780K used, 476452K free
N7K-1-VDC2# show processes cpu sort PID Runtime(ms) Invoked uSecs 1Sec Process ----- ----------- -------- ----- ------ ----------- 3981 127 276 462 43.2% ospf 3841 267 78 3427 16.4% netstack 2941 34146488 7377876 4628 0.9% platform 3982 118 245 485 0.9% ospfv3 (CoPP customizing slides hidden for reference) + statistics per Core for SUP- 2/SUP-2E and with newer NX- OS for SUP-1 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Environment Control-Plane You can Customize CoPP but don not turn it off 40.9.0.0/16 OSPFv2 224.0.0.5 224.0.0.6 Module 1 CoPP N7K-1# show policy-map interface control-plane module 1 class copp-system-class-ospf-test control Plane service-policy input: copp-system-policy class-map copp-system-class-ospf-test (match-any) match access-grp name copp-system-acl-malicious police cir 100 bps , bc 200 ms module 1 : conformed 0 bytes; action: drop violated 0 bytes; action: drop N7K-1# show policy-map interface control-plane module 2 class copp-system-class-ospf-test control Plane service-policy input: copp-system-policy class-map copp-system-class-ospf-test (match-any) match access-grp name copp-system-acl-ospf-test police cir 100 bps , bc 200 ms module 2 : conformed 0 bytes; action: drop violated 1799505072 bytes; action: drop
Module 2 CoPP Generic: show policy-map interface control-plane you determine the affected class, and with N7K# show class-map type control-plane you determine what is classified for those classes. 4 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public RL Control-Plane Next to CoPP we restrict some traffic via RL (AKA HWRL) As with CoPP policers, modifying the default rates should be carefully planned before any configuration changes. Rate-limiters can prevent overwhelming the control-plane CoPP RL Multiple CPU Cores 4 N7004-Berlin# show hardware rate-limiter Units for Config: packets per second Allowed, Dropped & Total: aggregated since last clear counters Module: 3 R-L Class Config Allowed Dropped Total +----------------+--------+-------------+-------------+----------------+ L3 mtu 500 0 0 0 L3 ttl 500 0 0 0 L3 control 10000 0 0 0 L3 glean 100 0 0 0 <SNIP> L2 storm-ctrl Disable access-list-log 100 0 0 0 copy 30000 1423 0 1423 receive 30000 8540 0 8540 L2 port-sec 500 0 0 0 L2 mcast-snoop 10000 2 0 2 <SNIP> 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Inband Control-Plane Inband SUP-2 / NX-OS 6.2 (5.41) B P D U Q0 Q1 Clipper R2D2 CPU 4 BDR-529-Berlin# show system inband queuing status Weighted Round Robin Algorithm Weights BPDU - 64, Q0 - 16, Q1 4
BDR-529-Berlin# show system inband queuing statistics Inband packets unmapped to a queue: 0 Inband packets mapped to bpdu queue: 2078 Inband packets mapped to q0: 1339 Inband packets mapped to q1: 4 In KLM packets mapped to bpdu: 0 In KLM packets mapped to arp : 0 In KLM packets mapped to q0 : 0 In KLM packets mapped to q1 : 0 In KLM packets mapped to veobc : 0 Inband Queues: bpdu: recv 2078, drop 0, congested 0 rcvbuf 2097152, sndbuf 4194304 no drop 1 (q0): recv 1339, drop 0, congested 0 rcvbuf 2097152, sndbuf 4194304 no drop 0 (q1): recv 4, drop 0, congested 0 rcvbuf 2097152, sndbuf 4194304 no drop 0 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Inband Control-Plane Inband CPU 4 N7004# show hardware internal cpu-mac inband events 1) Event:TX_PPS_MAX, length:4, at 382147 usecs after Fri Jan 10 20:04:37 2014 new maximum = 191 2) Event:RX_PPS_MAX, length:4, at 382147 usecs after Fri Jan 10 20:04:37 2014 new maximum = 195 How to determine the max pps rate to/from the CPU, if we run out of buffer and its occurrence How to determine the time of the max pps rate to correlate against your logs? N7004-Berlin# show hardware internal cpu-mac inband stats | in rate|buffer Rx no buffers .................. 0 Packet rate limit ........... 64000 pps Rx packet rate (current/max) 85 / 195 pps Tx packet rate (current/max) 85 / 191 pps Goal: Compare against logs Possible next step: logw.py
t
12:45 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Software Architecture NX-OS Packet Manager NetStack IP Clients NetStack VDC -1 L3 L2 ip input ARP OSPF System manager OSPF ARP System Manager starts and controls / monitors If the heatbeat fails core sig6 -> system troubleshooting N7004-Berlin# debug pktmgr frame 2014 Jan 10 20:14:40.061027 pktmgr: In 0x0800 82 7 4055.390f.5645 -> 0100.5e00.0005 Eth3/6 STP BGP Clients Ethanalyzer ELAME Debug Packet Manager NetStack IP NetStack VDC-2 L3 L2 ip input System 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Flight Recorder Data Collection show tech-support sysmgr (depreciated) Show tech-support ha show tech-support netstack detail show tech-support pktmgr show tech-support <service> 4 Inband Concept Trigger CoPP Netstack RL Inband Chapter 5 Control Plane ARP 5 ARP glean 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 116 Control-Plane Address Resolution Protocol and Adjacency Manager Layer 2/3 ARP Incomplete...
5 N7004-Berlin# show ip arp Flags: * - Adjacencies learnt on non-active FHRP router + - Adjacencies synced via CFSoE # - Adjacencies Throttled for Glean D - Static Adjacencies attached to down interface IP ARP Table for context default Total number of entries: 3 Address Age MAC Address Interface IP ARP Table for context default Total number of entries: 5 Address Age MAC Address Interface 192.168.0.3 00:04:41 4055.390f.5643 Vlan1 10.0.3.5 00:06:35 4055.390f.5645 Ethernet3/6 10.0.2.4 00:07:14 4055.390f.5644 Ethernet3/8 20.0.0.13 00:00:14 INCOMPLETE Ethernet3/14 192.168.0.254 - 0000.0c9f.f001 Vlan1
E 3/13 E 3/14 20.0.0.0/24 .13 .14 VRF uRIB (253) route adj AM ARP 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Address Resolution Protocol and Adjacency Manager 5 N7004-Berlin# show ip arp statistics ethernet 3/14 ARP packet statistics for interface: Ethernet3/14 Sent: Total 10, Requests 9, Replies 0, Requests on L2 0, Replies on L2 0, Gratuitous 1, Tunneled 0, Dropped 0 Send packet drops details: MBUF operation failed : 0 Context not yet created : 0 Invalid context : 0 Invalid ifindex : 0 Invalid SRC IP : 0 Invalid DEST IP : 0 Destination is our own IP : 0 Unattached IP : 0 <SNIP> E 3/13 20.0.0.0/24 .13 .14 VRF ARP E 3/14 N7004-Berlin# debug ip arp packet 2014 Jan 5 21:51:40.477507 arp: (context 1) Sending packet on interface Ethernet3/14, (prty 0) Hrd type 1 Prot type 800 Hrd len 6 Prot len 4 OP 1, Pkt size 28 2014 Jan 5 21:51:40.477629 arp: Src 4055.390f.5642/20.0.0.14 Dst ffff.ffff.ffff/20.0.0.13 2014 Jan 5 21:51:40.481061 arp: (context 4) Receiving packet from interface Ethernet3/13, (prty 6) Hrd type 1 Prot type 800 Hrd len 6 Prot len 4 OP 1, Pkt size 46 2014 Jan 5 21:51:40.481131 arp: Src 4055.390f.5642/20.0.0.14 Dst ffff.ffff.ffff/20.0.0.13 Consider the use of Debug-Filter and send to a file 12:50 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Address Resolution Protocol and Adjacency Manager 5 Received: Total 1, Requests 0, Replies 0, Requests on L2 0, Replies on L2 0 Proxy arp 0, Local-Proxy arp 0, Tunneled 0, Fastpath 0, Snooped 0, Dropped 1 Received packet drops details: Appeared on a wrong interface : 0 Incorrect length : 0 Invalid protocol packet : 0 Invalid context : 0 Context not yet created : 0 Invalid layer 2 address length : 0 Invalid layer 3 address length : 0 Invalid source IP address : 0 Source IP address is our own : 0 No mem to create per intf structure : 0 Source address mismatch with subnet : 0 Directed broadcast source : 0 <SMIP> E 3/13 20.0.0.0/24 .13 .14 VRF ARP
N7004-Berlin# show ip arp statistics vrf ALQ <SNIP> Received: Total 13, Requests 0, Replies 0, Requests on L2 0, Replies on L2 0 Proxy arp 0, Local-Proxy arp 0, Tunneled 0, Fastpath 0, Snooped 0, Dropped 13 <SNIP> Invalid source MAC address : 0 Source MAC address is our own : 13 <SMIP> N7004-Berlin# show ip arp statistics ethernet 3/14 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Address Resolution Protocol and Adjacency Manager 5 Check CoPP and/or HWRL:
Customer# show class-map type control-plane copp-system-p-class-normal class-map type control-plane match-any copp-system-p-class-normal match access-group name copp-system-p-acl-mac-dot1x match exception ip multicast directly-connected-sources match exception ipv6 multicast directly-connected-sources match protocol arp class-map copp-system-p-class-normal (match-any) violate action: drop module 5: violated 20557632224 bytes, 5-min violate rate 4154397 bytes/sec module 9: violated 0 bytes, 5-min violate rate 0 bytes/sec Customer# show hardware rate-limiter | i Module|R-L|glean Module: 5 R-L Class Config Allowed Dropped Total +------------------+--------+---------------+-------------+-----------------+ L3 glean 100 4904 2935 7839 L3 glean-fast 100 863401 1539316 2402717 SWT-1 SWT-2 ARP INCOMPLETE It worked before no new deployment Ethanalyzer verifies ARP packets are being send by SWT-1 but not received On SWT-2 ARP is being Received and Send customer# show vpc brief vPC keep-alive status : peer is not reachable through peer-keepalive 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Control-Plane Flight Recorder Data Collection show tech-support arp show tech-support adjmgr show tech-support hsrp
5 ARP glean Summary 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Inband 127 Nexus 7000 Troubleshooting N7K offers complete visibility, excessive logging and integrates great testing tools Supervisor (Control-Plane) Fabric I/O Module (Forwarding Engine) Data Plane Data Center OS NX-OS I/O Module (Forwarding Engine) ELAME Control Plane Management Plane ethanalyzer PI := Platform Independent PD := Platform Dependent RL CoPP Netstack ASIC counters ASIC counters show debug/filter logging ASIC counters 12:55 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Call to Action Visit the World of Solutions:- Cisco Campus Walk-in Labs Technical Solutions Clinics
Meet the Engineer
Lunch Time Table Topics, held in the main Catering Hall
Recommended Reading: For reading material and further resources for this session, please visit www.pearson-books.com/CLMilan2014
128 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public Complete your online session evaluation Complete four session evaluations and the overall conference evaluation to receive your Cisco Live T-shirt Complete Your Online Session Evaluation 129 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 130 Offering The Rides via WebEx 1 2 3 4 5 6 Strategy Tools & System Data-Plane Layer 2 [TCAM] Optional Data-Plane Layer 3 Control-Plane Inband Control-Plane ARP 1 25 20 12 12 8 1 st WebEx 120 min for YOU 12-FEB-14 10:00 CET opt-in: alaquian@cisco.com subject BRKDCT-3144 11:35 Chapter 6: ACLs Optional 6 TCAM 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 134 TCAM Ternary Content Addressable Memory Result Types 6 N7004-London# show system internal access-list feature bank map vlan ingress <SNIP> slot 3 ======= _________________________________________________________________________ Feature Rslt Type T0B0 T0B1 T1B0 T1B1 _________________________________________________________________________ QoS Qos X RACL Acl X PBR Acl X VACL Acl X DHCP Acl X ARP Acl X Netflow Acl X X Netflow (SVI) Acl X X Netflow Sampler Acc X Netflow Sampler (SVI) Acc X SPM WCCP Acl X X BFD Acl X SPM OTV Acl X ACLMGR ERSPAN (source) Acl X Per bank only one result type can be used: for VLAN & Ingress either QoS or NF Sampler I cant configure QoSthe system rejects my configuration T0B0 T0B1 T1B0 T1B1 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 141 6 TCAM TCAM Statistics per Entry & Logging N7K# sh ip access example IP access list example statistics per-entry 10 permit ip any 10.1.2.100/32 [match=3452] 20 deny ip any 10.1.68.101/32 [match=49920] 30 deny ip any 10.33.2.25/32 [match=232324] 40 permit tcp any any eq 22 [match=9881] 50 deny tcp any any eq telnet [match=442] 60 deny udp any any eq syslog [match=87112] 70 permit tcp any any eq www [match=4345667] 80 permit udp any any eq snmp [match=234222] ACL logging is enabled by including the log keyword in an ACL rule (show log log). The Sup receives a copy of the packet. The original packet is forwarded/dropped in hardware with no performance penalty.
Statistics per Entry The CPU is protected by using one of the available rate limiters. Forwarding engine hardware enforces rate to avoid saturating inband interface CPU. hardware rate-limit access-list-log command adjusts rate (def 100 pps) ACL Logging can be a useful tool during troubleshooting. Use ACL logging to sample specific packets from data plane.Use onboard ethanalyzer (wireshark) to analyze sampled packets 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public 142 Statistics per entry results in no optimization and no merge activity. Instead a 1:1 mapping of configured ACE to CL TCAM will be seen
6 TCAM Space ...when using ACL stats per entry on the 7K the TCAM utilization goes up to 47%, when removed, it dropped to 7%... object groups do NOT offer ANY optimization in terms of CL (:= Classification) TCAM utilization Statistics per entry are often used for troubleshooting with host ACEs
TCAM Utilization ACLs Statistics are NOT enabled by default (fundamental difference vs. IOS) because they require the ACEs NOT to be merged and this affects the TCAM utilization. 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public TCAM Utilization ACLs 6 N7004(config)# show hardware access-list vdc 3 input statistics module 3 VDC-3 Ethernet3/30 : ==================== INSTANCE 0x7 / Tcam 1 resource usage: ---------------------- Label_b = 0x2 Bank 0 ------ IPv4 Class Policies: BFD() [Merged] Netflow profile: 0 Netflow deny profile: 0 Entries: [Index] Entry [Stats] --------------------- [0058:000e:000e] prec 1 redirect(0x64)-routed udp 0.0.0.0/0 0.0.0.0/0 eq 3784 ttl eq 255 flow-label 3784 [1891] [0059:000f:000f] prec 1 redirect(0x64)-routed udp 0.0.0.0/0 0.0.0.0/0 eq 3785 ttl eq 254 flow-label 3785 [63895] [005a:0010:0010] prec 1 permit-routed ip 0.0.0.0/0 0.0.0.0/0 [128276] <SNIP> Specific applications (dhcp, bfd) may install their own ACLs which must merge with user configured racl,vacl,pacl 3/30 L3 BFD London FE7 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public TCAM Bank Management TCAM 6 N7004(config)# hardware access-list resource feature bank-mapping N7004(config)# show system internal access-list feature bank-class map ingress slot 3 ======= Feature Class Definition: 0. CLASS_QOS : QoS, 1. CLASS_INBAND : Tunnel Decap, SPM LISP, 2. CLASS_PACL : PACL, Netflow, 3. CLASS_DHCP : DHCP, Netflow, Netflow (vlan), ARP, 4. CLASS_RACL : RACL, RACL_STAT, Netflow (SVI), ARP, <SNIP> Feature Class Combination (Ingress) 0. CLASS_PACL, CLASS_QOS_INTF, CLASS_EMPTY, CLASS_EMPTY 1. CLASS_PACL, CLASS_NF_SMPL_INTF, CLASS_EMPTY, CLASS_EMPTY <SNIP> 33. CLASS_EMPTY, CLASS_EMPTY, CLASS_NF_SMPL, CLASS_QOS now I can configure QoS and NF Sampler 2014 Cisco and/or its affiliates. All rights reserved. BRKDCT-3144 Cisco Public TCAM Utilization Flight Recorder show tech-support aclmgr show tech-support aclqos 6 TCAM