Back to Blog
โ˜…โ˜…โ˜…Advanced๐Ÿข Data Center / Cloud
Spine-LeafClosData CenterECMPBGPBest PracticesTroubleshootingNexus

Spine-Leaf Architecture Best Practices: Design, BGP Underlay, and Troubleshooting

March 10, 2026ยท20 min read

Overview

Spine-leaf (Clos) is the de facto standard for modern data center fabrics. Every leaf connects to every spine โ€” no leaf-to-leaf or spine-to-spine links. Every server-to-server path is exactly two hops. ECMP across all spines simultaneously. No Spanning Tree anywhere in the fabric. This guide covers the full stack: design rules, eBGP underlay, addressing, VXLAN integration, leaf roles, and deep CLI troubleshooting for NX-OS environments.


// Spine-Leaf Clos Fabric โ€” Reference Topology with IP Addressing SPINE TIER LEAF TIER Spine-1 AS 65000 Lo0 10.0.0.1/32 Spine-2 AS 65000 Lo0 10.0.0.2/32 Leaf-1 AS 65101 Lo0 10.1.1.1/32 Leaf-2 AS 65102 Lo0 10.1.2.1/32 Leaf-3 AS 65103 Lo0 10.1.3.1/32 Leaf-4 AS 65104 Lo0 10.1.4.1/32 10.1.1.0/31 10.1.2.0/31 10.1.3.0/31 10.1.4.0/31 10.2.1.0/31 10.2.2.0/31 10.2.3.0/31 10.2.4.0/31 10.10.1.0/24 10.10.2.0/24 10.10.3.0/24 10.10.4.0/24 Servers Servers Servers Servers Spine-1 eBGP (spine=.0 / leaf=.1) Spine-2 eBGP (spine=.0 / leaf=.1) Every Leaf-to-Leaf path = 2 hops ยท ECMP active across both spines simultaneously

Part 1 โ€” The Five Rules of Spine-Leaf

  1. Leaves never connect to leaves โ€” all east-west traffic goes leaf โ†’ spine โ†’ leaf. No direct leaf links ever.
  2. Spines never connect to spines โ€” no inter-spine links. Spines are pure transit.
  3. Every leaf connects to every spine โ€” full mesh between tiers. Creates N equal-cost paths where N = spine count.
  4. No Spanning Tree in the fabric โ€” fully routed IP underlay. STP only exists on server-facing ports where required.
  5. Scale out, not up โ€” add leaf switches for more ports; add spine switches for more bandwidth between tiers.

Part 2 โ€” IP Addressing Scheme

Use /31 subnets on all fabric links. Format: 10.{spine-id}.{leaf-id}.{0|1}/31

# Spine-1 (ID=1) to Leaf-1 (ID=1): 10.1.1.0/31
#   Spine-1 port: 10.1.1.0   Leaf-1 port: 10.1.1.1
# Spine-1 (ID=1) to Leaf-2 (ID=2): 10.1.2.0/31
#   Spine-1 port: 10.1.2.0   Leaf-2 port: 10.1.2.1
# Spine-2 (ID=2) to Leaf-1 (ID=1): 10.2.1.0/31
#   Spine-2 port: 10.2.1.0   Leaf-1 port: 10.2.1.1
#
# Loopback addressing:
#   Spine loopbacks : 10.0.0.{id}/32   โ†’ Spine-1: 10.0.0.1, Spine-2: 10.0.0.2
#   Leaf underlay   : 10.1.{id}.1/32   โ†’ Leaf-1: 10.1.1.1, Leaf-2: 10.1.2.1
#   Leaf NVE/VTEP   : 10.2.{id}.1/32   โ†’ Leaf-1: 10.2.1.1  (anycast if vPC pair)

Part 3 โ€” eBGP Underlay Configuration (NX-OS)

eBGP is the recommended underlay protocol. Each leaf gets a unique ASN; spines share an ASN. bestpath as-path multipath-relax is mandatory to allow ECMP when AS paths differ.

! ===== SPINE-1 BGP CONFIG =====
Spine-1(config)# feature bgp
Spine-1(config)# router bgp 65000
Spine-1(config-router)#  router-id 10.0.0.1
Spine-1(config-router)#  bestpath as-path multipath-relax
Spine-1(config-router)#  address-family ipv4 unicast
Spine-1(config-router-af)#   maximum-paths 64
Spine-1(config-router-af)#   redistribute direct route-map RM-LOOPBACK
# Neighbor template (apply to all leaf peers)
Spine-1(config-router)#  template peer LEAF-TEMPLATE
Spine-1(config-router-ptmp)#   bfd
Spine-1(config-router-ptmp)#   address-family ipv4 unicast
Spine-1(config-router-ptmp-af)#    send-community
# Peer each leaf
Spine-1(config-router)#  neighbor 10.1.1.1 remote-as 65101
Spine-1(config-router-neighbor)#   inherit peer LEAF-TEMPLATE
Spine-1(config-router-neighbor)#   description Leaf-1
Spine-1(config-router)#  neighbor 10.1.2.1 remote-as 65102
Spine-1(config-router-neighbor)#   inherit peer LEAF-TEMPLATE
Spine-1(config-router-neighbor)#   description Leaf-2

! ===== LEAF-1 BGP CONFIG =====
Leaf-1(config)# feature bgp
Leaf-1(config)# router bgp 65101
Leaf-1(config-router)#  router-id 10.1.1.1
Leaf-1(config-router)#  bestpath as-path multipath-relax
Leaf-1(config-router)#  address-family ipv4 unicast
Leaf-1(config-router-af)#   maximum-paths 64
# Advertise both loopbacks into BGP
Leaf-1(config-router-af)#   network 10.1.1.1/32
Leaf-1(config-router-af)#   network 10.2.1.1/32
# Peer Spine-1
Leaf-1(config-router)#  neighbor 10.1.1.0 remote-as 65000
Leaf-1(config-router-neighbor)#   bfd
Leaf-1(config-router-neighbor)#   description Spine-1
Leaf-1(config-router-neighbor)#   address-family ipv4 unicast
Leaf-1(config-router-neighbor-af)#    send-community
# Peer Spine-2
Leaf-1(config-router)#  neighbor 10.2.1.0 remote-as 65000
Leaf-1(config-router-neighbor)#   bfd
Leaf-1(config-router-neighbor)#   description Spine-2

Part 4 โ€” ECMP and BFD

# ECMP hashing โ€” use full 5-tuple (src IP, dst IP, proto, src port, dst port)
Spine-1(config)# ip load-sharing address source-destination port source-destination

# BFD โ€” fast failure detection (300ms ร— 3 = 900ms detection)
Leaf-1(config)# feature bfd
Leaf-1(config)# bfd interval 300 min_rx 300 multiplier 3

Part 5 โ€” Leaf Roles

RoleConnects ToSpecial Config
Server LeafCompute servers, VMs, containersVXLAN VTEP, anycast gateway, access VLANs
Border LeafWAN, internet, firewall, external BGPExternal BGP peering, route redistribution, VRF-aware
Service LeafLoad balancers, shared firewalls, storagePBR or policy-based service chaining
vPC/MLAG PairDual-homed servers (active-active)vPC peer-link, same anycast VTEP IP on both leaves

Part 6 โ€” CLI Troubleshooting: Full Scenario Walkthrough

This section walks through the complete troubleshooting methodology for spine-leaf fabrics โ€” from verifying physical connectivity up through hardware forwarding tables. Each step includes what healthy output looks like and what to look for when something is wrong.


Scenario A โ€” BGP Session Not Establishing

Symptom: show bgp ipv4 unicast summary shows a peer in Idle or Active state with 0 prefixes received.

## Step 1: Check BGP summary โ€” identify which peer is down
Leaf-1# show bgp ipv4 unicast summary
# Healthy output โ€” both spines Established:
# Neighbor    V    AS    MsgRcvd  MsgSent  TblVer  InQ  OutQ  Up/Down   State/PfxRcd
# 10.1.1.0    4  65000     1452     1448      88    0     0   01:02:33    5
# 10.2.1.0    4  65000     1449     1445      88    0     0   01:02:31    5
#
# Problem output โ€” Spine-1 peer stuck in Active:
# 10.1.1.0    4  65000        0        0       0    0     0   00:00:12   Active

## Step 2: Check physical interface โ€” is the port up?
Leaf-1# show interface ethernet 1/1 | include line|admin|CRC|error
# Must show: Ethernet1/1 is up, line protocol is up (connected)
# Any CRC or input errors = check SFP/cable

## Step 3: Check IP reachability to BGP peer
Leaf-1# ping 10.1.1.0 source 10.1.1.1 count 5
# If ping fails: IP not configured on remote port, or link not up
Leaf-1# show ip interface brief | include 10.1.1
# Must show interface with 10.1.1.1/31 in "up/up" state

## Step 4: Check BGP neighbor state detail
Leaf-1# show bgp ipv4 unicast neighbors 10.1.1.0
# Look for: "BGP state = Active" means cannot reach peer (TCP 179 failing)
# Look for: "BGP state = OpenSent" means reaching peer but capabilities mismatch
# Look for: "Notification Error" at bottom โ€” mismatched ASN is common cause
# Verify remote-as on both sides match (Leaf-1 expects Spine-1 as AS 65000)

## Step 5: Check BGP notifications / error history
Leaf-1# show bgp ipv4 unicast neighbors 10.1.1.0 | include notification|error|reset
# "OPEN Message Error, Bad Peer AS" = ASN mismatch
# "Hold Timer Expired"              = keepalives not reaching peer (MTU/QoS issue)
# "Administratively shut down"      = neighbor shutdown command applied

## Step 6: Check if BGP is listening on the right interface
Leaf-1# show tcp brief | include 179
# Should show ESTABLISHED connections to spine IPs on port 179
# If no entry: TCP session never formed โ€” check ACL/route/interface

Scenario B โ€” BGP Up But Missing Prefixes

Symptom: BGP sessions are established but PfxRcd is lower than expected, or a specific leaf loopback is not reachable from another leaf.

## Step 1: Check what prefixes are being received from each spine
Leaf-1# show bgp ipv4 unicast neighbors 10.1.1.0 received-routes
Leaf-1# show bgp ipv4 unicast neighbors 10.2.1.0 received-routes
# Compare both โ€” should have same set of prefixes from each spine
# If one spine has fewer: check what that spine is advertising

## Step 2: Check what Spine-1 is advertising to this leaf
Spine-1# show bgp ipv4 unicast neighbors 10.1.1.1 advertised-routes
# Should include loopbacks of ALL other leaves
# If Leaf-3's loopback is missing: Leaf-3 may not be advertising it, or route-map filtering

## Step 3: Check the missing prefix on the originating leaf
Leaf-3# show bgp ipv4 unicast 10.1.3.1/32
# Is it in the BGP table? If not โ€” check "network" statement or redistribution
Leaf-3# show running-config | include network
# Must show: network 10.1.3.1/32 under router bgp 65103
Leaf-3# show ip route 10.1.3.1/32
# The prefix must exist in RIB before BGP will advertise it
# If it's a loopback, verify: show interface loopback0

## Step 4: Check route-map filtering on spines
Spine-1# show route-map
# Look for any permit/deny entries that might be dropping leaf prefixes
Spine-1# show bgp ipv4 unicast policy statistics neighbor 10.1.3.1 in
# Shows how many routes were matched/filtered by inbound policy

## Step 5: Force BGP to re-advertise / soft-reset
Spine-1# clear bgp ipv4 unicast 10.1.1.1 soft out
# Soft outbound reset โ€” re-sends all advertised routes without dropping session
Leaf-1# clear bgp ipv4 unicast 10.1.1.0 soft in
# Soft inbound reset โ€” re-processes all received routes through inbound policy

Scenario C โ€” No ECMP: Only One Path in Routing Table

Symptom: show ip route shows only a single next-hop for a remote leaf loopback instead of two (one per spine).

## Step 1: Check routing table for ECMP paths
Leaf-1# show ip route 10.1.3.1/32
# Healthy ECMP output:
# 10.1.3.1/32, ubest/mbest: 2/0
#   *via 10.1.1.0, [20/0], Ethernet1/1, BGP, tag 65000
#   *via 10.2.1.0, [20/0], Ethernet1/2, BGP, tag 65000
#
# Problem output โ€” only one path:
# 10.1.3.1/32, ubest/mbest: 1/0
#   *via 10.1.1.0, [20/0], Ethernet1/1, BGP, tag 65000

## Step 2: Check BGP table for the prefix โ€” are both paths present?
Leaf-1# show bgp ipv4 unicast 10.1.3.1/32
# Healthy: two paths, both marked with "multipath"
# Problem: only one path, or second path marked "not bestpath"

## Step 3: Check multipath-relax is configured
Leaf-1# show running-config | include multipath
# Must show: bestpath as-path multipath-relax
# Without this, eBGP paths with different AS paths are NOT equal-cost
# (each spine reflects with its own AS path, so paths look different)

## Step 4: Check maximum-paths setting
Leaf-1# show running-config | section "router bgp" | include maximum-paths
# Must show: maximum-paths 64 (or at least 2)
# Default is 1 โ€” this is the most common cause of missing ECMP

## Step 5: Check ECMP hashing config
Spine-1# show ip load-sharing
# Should show: source-destination IP + port (5-tuple)
# If only source/dest IP โ€” flows with same src/dst will always hash same way

Scenario D โ€” Traffic Blackhole: Route in RIB but Not in Hardware FIB

Symptom: show ip route shows the prefix with two ECMP paths, but ping/traceroute fails or traffic is dropped. This is a hardware programming failure.

## Step 1: Confirm route is in software RIB
Leaf-1# show ip route 10.1.3.1/32
# If route is here but traffic is dropped โ€” suspect hardware FIB issue

## Step 2: Check hardware FIB (forwarding table)
Leaf-1# show forwarding ipv4 route 10.1.3.1/32
# Healthy: shows same next-hops as RIB, with outgoing interface and adjacency
# Problem: route missing from FIB, or shows different/stale next-hop
# If missing: hardware FIB is full (TCAM exhaustion) or programming error

## Step 3: Check TCAM utilization
Leaf-1# show hardware capacity forwarding
# Check: LPM (longest prefix match) utilization
# If above 85%: TCAM near full โ€” routes will start failing to program
# Fix: summarize routes, or upgrade to platform with larger FIB

## Step 4: Check adjacency table (ARP/ND resolved?)
Leaf-1# show ip arp 10.1.1.0
# Spine-1's fabric IP must be in ARP table
# If missing: ARP failing โ€” check interface config, link state
Leaf-1# show forwarding adjacency 10.1.1.0
# Must show valid adjacency with outgoing interface

## Step 5: Packet capture to isolate drop point
Leaf-1# debug dataplane packet-diag set filter match source 10.1.1.1 destination 10.1.3.1
Leaf-1# debug dataplane packet-diag set log on
# Send test traffic then:
Leaf-1# debug dataplane packet-diag show log | tail
# Shows packet path through pipeline โ€” identifies which stage dropped it
Leaf-1# debug dataplane packet-diag clear

Scenario E โ€” Intermittent Traffic Loss / BFD Flapping

Symptom: BGP sessions go up and down repeatedly. Users report intermittent connectivity. BFD events in syslog.

## Step 1: Check BFD session status
Leaf-1# show bfd neighbors
# Healthy: all sessions show "Up" with stable uptime
# Problem: sessions showing "Down" or flapping (short uptime, repeated restarts)
Leaf-1# show bfd neighbors detail
# Shows Tx/Rx interval, multiplier, and miss count
# "Missed Hellos" counter incrementing = packets being dropped/delayed

## Step 2: Check BFD timers (may be too aggressive)
Leaf-1# show running-config | include bfd interval
# If set to 100ms ร— 3 = 300ms detection: very sensitive to CPU spikes
# Recommendation for most fabrics: 300ms ร— 3 = 900ms
# Fix if flapping due to CPU:
Leaf-1(config)# bfd interval 300 min_rx 300 multiplier 3

## Step 3: Check interface errors (physical instability)
Leaf-1# show interface ethernet 1/1 counters errors
# Check: CRC errors, input errors, output drops
# Any incrementing CRC errors = bad cable, SFP, or DOM threshold exceeded
Leaf-1# show interface ethernet 1/1 transceiver
# Check Tx/Rx power levels โ€” must be within vendor spec
# Low Rx power = dirty/bent fiber, bad SFP, excessive insertion loss

## Step 4: Check system logs for BGP/BFD events
Leaf-1# show logging log | include BGP|BFD|ADJCHANGE | tail 30
# Look for pattern: does flap correlate with specific time of day?
# Correlation with backup jobs / spanning-tree TCNs / LACP events?

## Step 5: Check CPU utilization (BFD is CPU-bound)
Leaf-1# show processes cpu sort | head 10
# If BFD process consuming >30% CPU: reduce session count or increase timers
Leaf-1# show system resources
# Check overall CPU and memory โ€” spike during maintenance window is common cause

Scenario F โ€” VXLAN/EVPN: VMs Can't Talk Across Leaves

Symptom: Servers on Leaf-1 cannot reach servers on Leaf-3 even though underlay BGP is up and ECMP is working.

## Step 1: Verify underlay reachability leaf-to-leaf loopbacks
Leaf-1# ping 10.1.3.1 source loopback0 count 5
# Must succeed โ€” if this fails, fix underlay BGP first before debugging VXLAN

## Step 2: Check NVE (VTEP) peer status
Leaf-1# show nve peers
# Healthy output:
# Interface  Peer-IP         State LearnType Uptime   Router-Mac
# nve1       10.1.3.1        Up    CP        01:22:14 5254.0012.3456
# If state is "Down": NVE can't reach peer VTEP โ€” underlay issue
# If peer is missing entirely: EVPN BGP not distributing VTEP info

## Step 3: Check EVPN BGP session (overlay BGP)
Leaf-1# show bgp l2vpn evpn summary
# Must show Established sessions to spine route-reflectors
# PfxRcd should be non-zero โ€” if zero, EVPN routes not being exchanged

## Step 4: Check MAC/IP table โ€” is the remote VM's MAC learned?
Leaf-1# show mac address-table vlan 10
# Local MACs: learned via hardware
# Remote MACs: should show VTEP IP as next-hop (learned via EVPN type-2)
Leaf-1# show bgp l2vpn evpn
# Check for type-2 routes (MAC/IP advertisements) from Leaf-3

## Step 5: Check VNI to VLAN mapping
Leaf-1# show nve vni
# All VNIs must show "Up" state with correct VLAN mapping
# If VNI is Down: check VLAN exists, NVE interface config
Leaf-1# show vxlan
# Confirms VXLAN feature is enabled and NVE source interface is up

## Step 6: ARP / EVPN type-3 (BUM traffic)
Leaf-1# show ip arp suppression-cache vlan 10
# Shows ARP entries suppressed (proxy ARP by anycast GW)
# If target VM IP missing: EVPN type-2 route not received yet or ARP not resolved

Scenario G โ€” Asymmetric Traffic / One-Way Connectivity

Symptom: Server on Leaf-1 can ping server on Leaf-3, but not the reverse. Or traceroute shows traffic takes different paths in each direction.

## Step 1: Check route symmetry โ€” does Leaf-3 have same ECMP paths back?
Leaf-3# show ip route 10.1.1.1/32
# Should show same number of ECMP paths as Leaf-1 has toward Leaf-3
# If only 1 path: Leaf-3 missing a spine peer โ€” asymmetric routing

## Step 2: Check if a spine is advertising routes asymmetrically
Spine-1# show bgp ipv4 unicast neighbors 10.1.3.1 advertised-routes | include 10.1.1
Spine-2# show bgp ipv4 unicast neighbors 10.1.3.1 advertised-routes | include 10.1.1
# Both spines should advertise the same set of leaf loopbacks

## Step 3: Check for routing policy differences between spines
Spine-1# show bgp ipv4 unicast policy statistics neighbor 10.1.3.1 out
Spine-2# show bgp ipv4 unicast policy statistics neighbor 10.1.3.1 out
# Compare โ€” if one spine filters more routes, traffic will be asymmetric

## Step 4: Check security policy / ACLs on each leaf
Leaf-3# show ip access-lists | include deny
# An ACL blocking return traffic causes one-way connectivity
Leaf-3# show interface ethernet 1/1 | include input access group
# Check if any ACL is applied to fabric-facing interfaces

Scenario H โ€” Full Fabric Health Check Sequence

Run this sequence to baseline the entire fabric health in order:

## === PHYSICAL LAYER ===
Leaf-1# show interface brief | include Eth | exclude notconn
# All fabric-facing ports must be connected, no err-disabled
Leaf-1# show interface status err-disabled
# Any err-disabled ports = investigate root cause (BPDU guard, port security, etc.)

## === BGP UNDERLAY ===
Leaf-1# show bgp ipv4 unicast summary | include Estab
# Count of Established neighbors must equal spine count (typically 2)
Leaf-1# show ip route summary
# bgp route count = (leaf loopbacks + spine loopbacks) ร— 2 (one per spine)

## === ECMP VERIFICATION ===
Leaf-1# show ip route 10.1.2.1/32
# ubest/mbest must be 2/0 (two equal-cost next-hops)
Leaf-1# show ip route 10.1.3.1/32
# Same check for each remote leaf loopback

## === HARDWARE FIB ===
Leaf-1# show forwarding ipv4 route 10.1.2.1/32
# Must show both next-hops, matching the RIB
Leaf-1# show hardware capacity forwarding
# LPM utilization must be below 80%

## === BFD ===
Leaf-1# show bfd neighbors
# All spines must show "Up" with stable uptime (no recent flaps)

## === VXLAN OVERLAY (if deployed) ===
Leaf-1# show nve peers
# All remote VTEPs must show "Up"
Leaf-1# show bgp l2vpn evpn summary
# EVPN sessions Established with non-zero PfxRcd
Leaf-1# show nve vni
# All VNIs in "Up" state

## === END-TO-END REACHABILITY ===
Leaf-1# ping 10.1.2.1 source loopback0 count 10
Leaf-1# ping 10.1.3.1 source loopback0 count 10
Leaf-1# ping 10.1.4.1 source loopback0 count 10
# All must be 100% success โ€” any packet loss = investigate that leaf's BGP/link
Leaf-1# traceroute 10.1.3.1 source loopback0
# Must be exactly 2 hops (leaf โ†’ spine โ†’ leaf)
# If 3+ hops: traffic is leaving the fabric (routing error)

Spine-Leaf Design Checklist

  • No leaf-to-leaf links anywhere โ€” all east-west traffic transits spines
  • No spine-to-spine links โ€” spines are pure transit
  • Every leaf connects to every spine โ€” full mesh
  • bestpath as-path multipath-relax configured on all fabric nodes
  • maximum-paths 64 configured under address-family ipv4 unicast
  • BFD enabled on all BGP sessions โ€” 300ms intervals, multiplier 3
  • /31 subnets on all fabric links โ€” no /30 or larger
  • Loopbacks are BGP router-ID and VTEP source โ€” never fabric-facing IPs
  • vPC/MLAG for any dual-homed server โ€” single-homed servers are a risk
  • Border leaves are the only external connection point โ€” server leaves never peer external BGP
  • Hardware FIB verified after changes โ€” RIB alone does not confirm packet forwarding
  • TCAM utilization monitored โ€” alert at 75% full