Overview
Spine-leaf (Clos) is the de facto standard for modern data center fabrics. Every leaf connects to every spine โ no leaf-to-leaf or spine-to-spine links. Every server-to-server path is exactly two hops. ECMP across all spines simultaneously. No Spanning Tree anywhere in the fabric. This guide covers the full stack: design rules, eBGP underlay, addressing, VXLAN integration, leaf roles, and deep CLI troubleshooting for NX-OS environments.
Part 1 โ The Five Rules of Spine-Leaf
- Leaves never connect to leaves โ all east-west traffic goes leaf โ spine โ leaf. No direct leaf links ever.
- Spines never connect to spines โ no inter-spine links. Spines are pure transit.
- Every leaf connects to every spine โ full mesh between tiers. Creates N equal-cost paths where N = spine count.
- No Spanning Tree in the fabric โ fully routed IP underlay. STP only exists on server-facing ports where required.
- Scale out, not up โ add leaf switches for more ports; add spine switches for more bandwidth between tiers.
Part 2 โ IP Addressing Scheme
Use /31 subnets on all fabric links. Format: 10.{spine-id}.{leaf-id}.{0|1}/31
# Spine-1 (ID=1) to Leaf-1 (ID=1): 10.1.1.0/31
# Spine-1 port: 10.1.1.0 Leaf-1 port: 10.1.1.1
# Spine-1 (ID=1) to Leaf-2 (ID=2): 10.1.2.0/31
# Spine-1 port: 10.1.2.0 Leaf-2 port: 10.1.2.1
# Spine-2 (ID=2) to Leaf-1 (ID=1): 10.2.1.0/31
# Spine-2 port: 10.2.1.0 Leaf-1 port: 10.2.1.1
#
# Loopback addressing:
# Spine loopbacks : 10.0.0.{id}/32 โ Spine-1: 10.0.0.1, Spine-2: 10.0.0.2
# Leaf underlay : 10.1.{id}.1/32 โ Leaf-1: 10.1.1.1, Leaf-2: 10.1.2.1
# Leaf NVE/VTEP : 10.2.{id}.1/32 โ Leaf-1: 10.2.1.1 (anycast if vPC pair)
Part 3 โ eBGP Underlay Configuration (NX-OS)
eBGP is the recommended underlay protocol. Each leaf gets a unique ASN; spines share an ASN. bestpath as-path multipath-relax is mandatory to allow ECMP when AS paths differ.
! ===== SPINE-1 BGP CONFIG =====
Spine-1(config)# feature bgp
Spine-1(config)# router bgp 65000
Spine-1(config-router)# router-id 10.0.0.1
Spine-1(config-router)# bestpath as-path multipath-relax
Spine-1(config-router)# address-family ipv4 unicast
Spine-1(config-router-af)# maximum-paths 64
Spine-1(config-router-af)# redistribute direct route-map RM-LOOPBACK
# Neighbor template (apply to all leaf peers)
Spine-1(config-router)# template peer LEAF-TEMPLATE
Spine-1(config-router-ptmp)# bfd
Spine-1(config-router-ptmp)# address-family ipv4 unicast
Spine-1(config-router-ptmp-af)# send-community
# Peer each leaf
Spine-1(config-router)# neighbor 10.1.1.1 remote-as 65101
Spine-1(config-router-neighbor)# inherit peer LEAF-TEMPLATE
Spine-1(config-router-neighbor)# description Leaf-1
Spine-1(config-router)# neighbor 10.1.2.1 remote-as 65102
Spine-1(config-router-neighbor)# inherit peer LEAF-TEMPLATE
Spine-1(config-router-neighbor)# description Leaf-2
! ===== LEAF-1 BGP CONFIG =====
Leaf-1(config)# feature bgp
Leaf-1(config)# router bgp 65101
Leaf-1(config-router)# router-id 10.1.1.1
Leaf-1(config-router)# bestpath as-path multipath-relax
Leaf-1(config-router)# address-family ipv4 unicast
Leaf-1(config-router-af)# maximum-paths 64
# Advertise both loopbacks into BGP
Leaf-1(config-router-af)# network 10.1.1.1/32
Leaf-1(config-router-af)# network 10.2.1.1/32
# Peer Spine-1
Leaf-1(config-router)# neighbor 10.1.1.0 remote-as 65000
Leaf-1(config-router-neighbor)# bfd
Leaf-1(config-router-neighbor)# description Spine-1
Leaf-1(config-router-neighbor)# address-family ipv4 unicast
Leaf-1(config-router-neighbor-af)# send-community
# Peer Spine-2
Leaf-1(config-router)# neighbor 10.2.1.0 remote-as 65000
Leaf-1(config-router-neighbor)# bfd
Leaf-1(config-router-neighbor)# description Spine-2
Part 4 โ ECMP and BFD
# ECMP hashing โ use full 5-tuple (src IP, dst IP, proto, src port, dst port)
Spine-1(config)# ip load-sharing address source-destination port source-destination
# BFD โ fast failure detection (300ms ร 3 = 900ms detection)
Leaf-1(config)# feature bfd
Leaf-1(config)# bfd interval 300 min_rx 300 multiplier 3
Part 5 โ Leaf Roles
Part 6 โ CLI Troubleshooting: Full Scenario Walkthrough
This section walks through the complete troubleshooting methodology for spine-leaf fabrics โ from verifying physical connectivity up through hardware forwarding tables. Each step includes what healthy output looks like and what to look for when something is wrong.
Scenario A โ BGP Session Not Establishing
Symptom: show bgp ipv4 unicast summary shows a peer in Idle or Active state with 0 prefixes received.
## Step 1: Check BGP summary โ identify which peer is down
Leaf-1# show bgp ipv4 unicast summary
# Healthy output โ both spines Established:
# Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
# 10.1.1.0 4 65000 1452 1448 88 0 0 01:02:33 5
# 10.2.1.0 4 65000 1449 1445 88 0 0 01:02:31 5
#
# Problem output โ Spine-1 peer stuck in Active:
# 10.1.1.0 4 65000 0 0 0 0 0 00:00:12 Active
## Step 2: Check physical interface โ is the port up?
Leaf-1# show interface ethernet 1/1 | include line|admin|CRC|error
# Must show: Ethernet1/1 is up, line protocol is up (connected)
# Any CRC or input errors = check SFP/cable
## Step 3: Check IP reachability to BGP peer
Leaf-1# ping 10.1.1.0 source 10.1.1.1 count 5
# If ping fails: IP not configured on remote port, or link not up
Leaf-1# show ip interface brief | include 10.1.1
# Must show interface with 10.1.1.1/31 in "up/up" state
## Step 4: Check BGP neighbor state detail
Leaf-1# show bgp ipv4 unicast neighbors 10.1.1.0
# Look for: "BGP state = Active" means cannot reach peer (TCP 179 failing)
# Look for: "BGP state = OpenSent" means reaching peer but capabilities mismatch
# Look for: "Notification Error" at bottom โ mismatched ASN is common cause
# Verify remote-as on both sides match (Leaf-1 expects Spine-1 as AS 65000)
## Step 5: Check BGP notifications / error history
Leaf-1# show bgp ipv4 unicast neighbors 10.1.1.0 | include notification|error|reset
# "OPEN Message Error, Bad Peer AS" = ASN mismatch
# "Hold Timer Expired" = keepalives not reaching peer (MTU/QoS issue)
# "Administratively shut down" = neighbor shutdown command applied
## Step 6: Check if BGP is listening on the right interface
Leaf-1# show tcp brief | include 179
# Should show ESTABLISHED connections to spine IPs on port 179
# If no entry: TCP session never formed โ check ACL/route/interface
Scenario B โ BGP Up But Missing Prefixes
Symptom: BGP sessions are established but PfxRcd is lower than expected, or a specific leaf loopback is not reachable from another leaf.
## Step 1: Check what prefixes are being received from each spine
Leaf-1# show bgp ipv4 unicast neighbors 10.1.1.0 received-routes
Leaf-1# show bgp ipv4 unicast neighbors 10.2.1.0 received-routes
# Compare both โ should have same set of prefixes from each spine
# If one spine has fewer: check what that spine is advertising
## Step 2: Check what Spine-1 is advertising to this leaf
Spine-1# show bgp ipv4 unicast neighbors 10.1.1.1 advertised-routes
# Should include loopbacks of ALL other leaves
# If Leaf-3's loopback is missing: Leaf-3 may not be advertising it, or route-map filtering
## Step 3: Check the missing prefix on the originating leaf
Leaf-3# show bgp ipv4 unicast 10.1.3.1/32
# Is it in the BGP table? If not โ check "network" statement or redistribution
Leaf-3# show running-config | include network
# Must show: network 10.1.3.1/32 under router bgp 65103
Leaf-3# show ip route 10.1.3.1/32
# The prefix must exist in RIB before BGP will advertise it
# If it's a loopback, verify: show interface loopback0
## Step 4: Check route-map filtering on spines
Spine-1# show route-map
# Look for any permit/deny entries that might be dropping leaf prefixes
Spine-1# show bgp ipv4 unicast policy statistics neighbor 10.1.3.1 in
# Shows how many routes were matched/filtered by inbound policy
## Step 5: Force BGP to re-advertise / soft-reset
Spine-1# clear bgp ipv4 unicast 10.1.1.1 soft out
# Soft outbound reset โ re-sends all advertised routes without dropping session
Leaf-1# clear bgp ipv4 unicast 10.1.1.0 soft in
# Soft inbound reset โ re-processes all received routes through inbound policy
Scenario C โ No ECMP: Only One Path in Routing Table
Symptom: show ip route shows only a single next-hop for a remote leaf loopback instead of two (one per spine).
## Step 1: Check routing table for ECMP paths
Leaf-1# show ip route 10.1.3.1/32
# Healthy ECMP output:
# 10.1.3.1/32, ubest/mbest: 2/0
# *via 10.1.1.0, [20/0], Ethernet1/1, BGP, tag 65000
# *via 10.2.1.0, [20/0], Ethernet1/2, BGP, tag 65000
#
# Problem output โ only one path:
# 10.1.3.1/32, ubest/mbest: 1/0
# *via 10.1.1.0, [20/0], Ethernet1/1, BGP, tag 65000
## Step 2: Check BGP table for the prefix โ are both paths present?
Leaf-1# show bgp ipv4 unicast 10.1.3.1/32
# Healthy: two paths, both marked with "multipath"
# Problem: only one path, or second path marked "not bestpath"
## Step 3: Check multipath-relax is configured
Leaf-1# show running-config | include multipath
# Must show: bestpath as-path multipath-relax
# Without this, eBGP paths with different AS paths are NOT equal-cost
# (each spine reflects with its own AS path, so paths look different)
## Step 4: Check maximum-paths setting
Leaf-1# show running-config | section "router bgp" | include maximum-paths
# Must show: maximum-paths 64 (or at least 2)
# Default is 1 โ this is the most common cause of missing ECMP
## Step 5: Check ECMP hashing config
Spine-1# show ip load-sharing
# Should show: source-destination IP + port (5-tuple)
# If only source/dest IP โ flows with same src/dst will always hash same way
Scenario D โ Traffic Blackhole: Route in RIB but Not in Hardware FIB
Symptom: show ip route shows the prefix with two ECMP paths, but ping/traceroute fails or traffic is dropped. This is a hardware programming failure.
## Step 1: Confirm route is in software RIB
Leaf-1# show ip route 10.1.3.1/32
# If route is here but traffic is dropped โ suspect hardware FIB issue
## Step 2: Check hardware FIB (forwarding table)
Leaf-1# show forwarding ipv4 route 10.1.3.1/32
# Healthy: shows same next-hops as RIB, with outgoing interface and adjacency
# Problem: route missing from FIB, or shows different/stale next-hop
# If missing: hardware FIB is full (TCAM exhaustion) or programming error
## Step 3: Check TCAM utilization
Leaf-1# show hardware capacity forwarding
# Check: LPM (longest prefix match) utilization
# If above 85%: TCAM near full โ routes will start failing to program
# Fix: summarize routes, or upgrade to platform with larger FIB
## Step 4: Check adjacency table (ARP/ND resolved?)
Leaf-1# show ip arp 10.1.1.0
# Spine-1's fabric IP must be in ARP table
# If missing: ARP failing โ check interface config, link state
Leaf-1# show forwarding adjacency 10.1.1.0
# Must show valid adjacency with outgoing interface
## Step 5: Packet capture to isolate drop point
Leaf-1# debug dataplane packet-diag set filter match source 10.1.1.1 destination 10.1.3.1
Leaf-1# debug dataplane packet-diag set log on
# Send test traffic then:
Leaf-1# debug dataplane packet-diag show log | tail
# Shows packet path through pipeline โ identifies which stage dropped it
Leaf-1# debug dataplane packet-diag clear
Scenario E โ Intermittent Traffic Loss / BFD Flapping
Symptom: BGP sessions go up and down repeatedly. Users report intermittent connectivity. BFD events in syslog.
## Step 1: Check BFD session status
Leaf-1# show bfd neighbors
# Healthy: all sessions show "Up" with stable uptime
# Problem: sessions showing "Down" or flapping (short uptime, repeated restarts)
Leaf-1# show bfd neighbors detail
# Shows Tx/Rx interval, multiplier, and miss count
# "Missed Hellos" counter incrementing = packets being dropped/delayed
## Step 2: Check BFD timers (may be too aggressive)
Leaf-1# show running-config | include bfd interval
# If set to 100ms ร 3 = 300ms detection: very sensitive to CPU spikes
# Recommendation for most fabrics: 300ms ร 3 = 900ms
# Fix if flapping due to CPU:
Leaf-1(config)# bfd interval 300 min_rx 300 multiplier 3
## Step 3: Check interface errors (physical instability)
Leaf-1# show interface ethernet 1/1 counters errors
# Check: CRC errors, input errors, output drops
# Any incrementing CRC errors = bad cable, SFP, or DOM threshold exceeded
Leaf-1# show interface ethernet 1/1 transceiver
# Check Tx/Rx power levels โ must be within vendor spec
# Low Rx power = dirty/bent fiber, bad SFP, excessive insertion loss
## Step 4: Check system logs for BGP/BFD events
Leaf-1# show logging log | include BGP|BFD|ADJCHANGE | tail 30
# Look for pattern: does flap correlate with specific time of day?
# Correlation with backup jobs / spanning-tree TCNs / LACP events?
## Step 5: Check CPU utilization (BFD is CPU-bound)
Leaf-1# show processes cpu sort | head 10
# If BFD process consuming >30% CPU: reduce session count or increase timers
Leaf-1# show system resources
# Check overall CPU and memory โ spike during maintenance window is common cause
Scenario F โ VXLAN/EVPN: VMs Can't Talk Across Leaves
Symptom: Servers on Leaf-1 cannot reach servers on Leaf-3 even though underlay BGP is up and ECMP is working.
## Step 1: Verify underlay reachability leaf-to-leaf loopbacks
Leaf-1# ping 10.1.3.1 source loopback0 count 5
# Must succeed โ if this fails, fix underlay BGP first before debugging VXLAN
## Step 2: Check NVE (VTEP) peer status
Leaf-1# show nve peers
# Healthy output:
# Interface Peer-IP State LearnType Uptime Router-Mac
# nve1 10.1.3.1 Up CP 01:22:14 5254.0012.3456
# If state is "Down": NVE can't reach peer VTEP โ underlay issue
# If peer is missing entirely: EVPN BGP not distributing VTEP info
## Step 3: Check EVPN BGP session (overlay BGP)
Leaf-1# show bgp l2vpn evpn summary
# Must show Established sessions to spine route-reflectors
# PfxRcd should be non-zero โ if zero, EVPN routes not being exchanged
## Step 4: Check MAC/IP table โ is the remote VM's MAC learned?
Leaf-1# show mac address-table vlan 10
# Local MACs: learned via hardware
# Remote MACs: should show VTEP IP as next-hop (learned via EVPN type-2)
Leaf-1# show bgp l2vpn evpn
# Check for type-2 routes (MAC/IP advertisements) from Leaf-3
## Step 5: Check VNI to VLAN mapping
Leaf-1# show nve vni
# All VNIs must show "Up" state with correct VLAN mapping
# If VNI is Down: check VLAN exists, NVE interface config
Leaf-1# show vxlan
# Confirms VXLAN feature is enabled and NVE source interface is up
## Step 6: ARP / EVPN type-3 (BUM traffic)
Leaf-1# show ip arp suppression-cache vlan 10
# Shows ARP entries suppressed (proxy ARP by anycast GW)
# If target VM IP missing: EVPN type-2 route not received yet or ARP not resolved
Scenario G โ Asymmetric Traffic / One-Way Connectivity
Symptom: Server on Leaf-1 can ping server on Leaf-3, but not the reverse. Or traceroute shows traffic takes different paths in each direction.
## Step 1: Check route symmetry โ does Leaf-3 have same ECMP paths back?
Leaf-3# show ip route 10.1.1.1/32
# Should show same number of ECMP paths as Leaf-1 has toward Leaf-3
# If only 1 path: Leaf-3 missing a spine peer โ asymmetric routing
## Step 2: Check if a spine is advertising routes asymmetrically
Spine-1# show bgp ipv4 unicast neighbors 10.1.3.1 advertised-routes | include 10.1.1
Spine-2# show bgp ipv4 unicast neighbors 10.1.3.1 advertised-routes | include 10.1.1
# Both spines should advertise the same set of leaf loopbacks
## Step 3: Check for routing policy differences between spines
Spine-1# show bgp ipv4 unicast policy statistics neighbor 10.1.3.1 out
Spine-2# show bgp ipv4 unicast policy statistics neighbor 10.1.3.1 out
# Compare โ if one spine filters more routes, traffic will be asymmetric
## Step 4: Check security policy / ACLs on each leaf
Leaf-3# show ip access-lists | include deny
# An ACL blocking return traffic causes one-way connectivity
Leaf-3# show interface ethernet 1/1 | include input access group
# Check if any ACL is applied to fabric-facing interfaces
Scenario H โ Full Fabric Health Check Sequence
Run this sequence to baseline the entire fabric health in order:
## === PHYSICAL LAYER ===
Leaf-1# show interface brief | include Eth | exclude notconn
# All fabric-facing ports must be connected, no err-disabled
Leaf-1# show interface status err-disabled
# Any err-disabled ports = investigate root cause (BPDU guard, port security, etc.)
## === BGP UNDERLAY ===
Leaf-1# show bgp ipv4 unicast summary | include Estab
# Count of Established neighbors must equal spine count (typically 2)
Leaf-1# show ip route summary
# bgp route count = (leaf loopbacks + spine loopbacks) ร 2 (one per spine)
## === ECMP VERIFICATION ===
Leaf-1# show ip route 10.1.2.1/32
# ubest/mbest must be 2/0 (two equal-cost next-hops)
Leaf-1# show ip route 10.1.3.1/32
# Same check for each remote leaf loopback
## === HARDWARE FIB ===
Leaf-1# show forwarding ipv4 route 10.1.2.1/32
# Must show both next-hops, matching the RIB
Leaf-1# show hardware capacity forwarding
# LPM utilization must be below 80%
## === BFD ===
Leaf-1# show bfd neighbors
# All spines must show "Up" with stable uptime (no recent flaps)
## === VXLAN OVERLAY (if deployed) ===
Leaf-1# show nve peers
# All remote VTEPs must show "Up"
Leaf-1# show bgp l2vpn evpn summary
# EVPN sessions Established with non-zero PfxRcd
Leaf-1# show nve vni
# All VNIs in "Up" state
## === END-TO-END REACHABILITY ===
Leaf-1# ping 10.1.2.1 source loopback0 count 10
Leaf-1# ping 10.1.3.1 source loopback0 count 10
Leaf-1# ping 10.1.4.1 source loopback0 count 10
# All must be 100% success โ any packet loss = investigate that leaf's BGP/link
Leaf-1# traceroute 10.1.3.1 source loopback0
# Must be exactly 2 hops (leaf โ spine โ leaf)
# If 3+ hops: traffic is leaving the fabric (routing error)
Spine-Leaf Design Checklist
- No leaf-to-leaf links anywhere โ all east-west traffic transits spines
- No spine-to-spine links โ spines are pure transit
- Every leaf connects to every spine โ full mesh
bestpath as-path multipath-relaxconfigured on all fabric nodesmaximum-paths 64configured under address-family ipv4 unicast- BFD enabled on all BGP sessions โ 300ms intervals, multiplier 3
- /31 subnets on all fabric links โ no /30 or larger
- Loopbacks are BGP router-ID and VTEP source โ never fabric-facing IPs
- vPC/MLAG for any dual-homed server โ single-homed servers are a risk
- Border leaves are the only external connection point โ server leaves never peer external BGP
- Hardware FIB verified after changes โ RIB alone does not confirm packet forwarding
- TCAM utilization monitored โ alert at 75% full