Overview
Cisco Catalyst SD-WAN (formerly Viptela) separates the WAN into three distinct planes: a centralized control plane managed by vSmart controllers, an orchestration plane handled by vBond, and a management plane through vManage. Edge routers (WAN Edges / vEdges) form encrypted BFD-monitored tunnels across any transport โ MPLS, broadband, LTE, or satellite โ and apply application-aware routing policies pushed from vManage. When it works, SD-WAN dramatically simplifies WAN operations. When it breaks, the layered architecture means failures can be hard to isolate without a structured approach.
Part 1 โ Architecture and Design Principles
1.1 โ Understand the Four Components
Every Cisco SD-WAN deployment has four roles. Understanding what each does is essential for troubleshooting:
- vManage โ the single pane of glass. Pushes config templates, policies, and software. All GUI and REST API access goes here. If vManage is down, the overlay continues to run โ existing tunnels and policies are unaffected.
- vBond โ the orchestrator. Every WAN Edge contacts vBond first during onboarding to discover vSmart and vManage addresses. vBond must be publicly reachable. After initial onboarding, WAN Edges do not rely on vBond for data plane operation.
- vSmart โ the controller. Runs OMP (Overlay Management Protocol), distributes routes and policies to all WAN Edges. A vSmart failure stops route and policy updates but does not drop existing tunnels.
- WAN Edge (vEdge/cEdge) โ the data plane. Builds IPSec tunnels to all other edges, runs BFD for tunnel health monitoring, and applies local policies pushed by vSmart.
1.2 โ Transport Independence
Design your underlay transports as completely independent failure domains:
- Use at least two transport types (e.g., MPLS + broadband, or MPLS + LTE) โ never two circuits from the same provider
- Assign each transport to a separate color in SD-WAN (e.g.,
mpls,biz-internet,lte) - Use
restricton colors only when you need strict transport separation (e.g., never route voice over LTE)
1.3 โ Template-Driven Configuration
Never push ad-hoc CLI to WAN Edges in vManage-managed deployments. Always use feature templates:
- One Device Template per device type (ISR 1100, ISR 4K, C8000v, etc.)
- Feature templates for: System, VPN 0 (transport), VPN 512 (management), Service VPNs
- Use variables (
{{hostname}},{{system-ip}}) for per-device values โ keeps templates reusable - Attach the same template to all devices of the same role โ this enforces consistency and makes auditing straightforward
Part 2 โ OMP and Routing Best Practices
OMP (Overlay Management Protocol) is the SD-WAN control plane. It runs between WAN Edges and vSmart over a DTLS/TLS connection and carries three route types: OMP routes (learned prefixes), TLOC routes (transport location endpoints), and service routes.
# Verify OMP sessions on WAN Edge
WAN-Edge# show sdwan omp summary
WAN-Edge# show sdwan omp peers
# Check OMP routes received from vSmart
WAN-Edge# show sdwan omp routes
WAN-Edge# show sdwan omp routes vpn 1
# Check TLOC routes (transport endpoints)
WAN-Edge# show sdwan omp tlocs
WAN-Edge# show sdwan omp tlocs detail
# Verify service-side routes being advertised into OMP
WAN-Edge# show sdwan omp advertised-routes
2.1 โ Route Policy Best Practices
- Always use centralized data policies for application-aware routing โ do not use local policies except for edge cases
- Apply policies to site lists rather than individual devices โ policy changes apply consistently across all sites in the list
- Use SLA classes to define acceptable loss/latency/jitter thresholds per application class, then reference them in AAR policies
- Keep route policies simple: prefer hub-and-spoke topologies for branch sites, full-mesh only for DC-to-DC
Part 3 โ Application-Aware Routing (AAR)
AAR is SD-WAN's primary value proposition โ automatically shifting traffic to the best transport based on real-time BFD measurements.
# Check BFD tunnel status and metrics per color
WAN-Edge# show sdwan bfd sessions
WAN-Edge# show sdwan bfd sessions detail
# BFD output shows per-tunnel loss/latency/jitter
# State: up = healthy | down = path failed | NA = not applicable
# Check which path a specific application is using
WAN-Edge# show sdwan policy service-path vpn 1 interface ge0/0 source-ip 10.1.0.10 dest-ip 10.2.0.10 protocol 6 dest-port 443
# View active AAR decisions
WAN-Edge# show sdwan app-route stats
WAN-Edge# show sdwan app-route sla-class
3.1 โ SLA Class Design
# Good SLA class design โ tiered by application sensitivity:
# Voice/Video: loss < 1%, latency < 150ms, jitter < 30ms
# Critical apps: loss < 2%, latency < 300ms
# Best effort: no SLA โ any available path
# Verify SLA class hits
WAN-Edge# show sdwan app-route sla-class name VOICE-SLA
Part 4 โ Troubleshooting SD-WAN
Step 1 โ Control Plane: vBond Reachability
# Check if WAN Edge can reach vBond (first step for any onboarding issue)
WAN-Edge# show sdwan control connections
WAN-Edge# show sdwan control connection-history
# State should be: vbond=up, vsmart=up, vmanage=up
# If vbond=connecting โ check DNS, NAT, firewall (UDP 12346 must be open)
# Verify certificate and organization name
WAN-Edge# show sdwan certificate serial
WAN-Edge# show sdwan certificate validity
Step 2 โ Data Plane: BFD and Tunnel Health
# Check all BFD sessions โ down sessions prevent data plane traffic
WAN-Edge# show sdwan bfd sessions
WAN-Edge# show sdwan bfd summary
# Check tunnel interface status
WAN-Edge# show sdwan interface
WAN-Edge# show sdwan tunnel statistics
# Ping over a specific color/transport
WAN-Edge# ping sdwan 10.2.0.1 vpn 0 source ge0/0
Step 3 โ Policy Not Applied
# Check active policies on WAN Edge
WAN-Edge# show sdwan policy from-vsmart
WAN-Edge# show sdwan policy data-policy-filter
# Verify policy counters (shows if traffic is matching policy)
WAN-Edge# show sdwan policy data-policy-filter detail
# On vManage โ check policy push status
# Monitor > Devices > [device] > Real-Time > Policy
Step 4 โ Service VPN Routing Issues
# Check service VPN routing table
WAN-Edge# show ip route vrf 1
WAN-Edge# show sdwan omp routes vpn 1 detail
# Ping from service VPN
WAN-Edge# ping vrf 1 10.2.0.10 source ge0/2
# Check NAT for DIA (direct internet access) traffic
WAN-Edge# show ip nat translations vrf 1
Quick Reference โ Common SD-WAN Issues
SD-WAN Hardening Checklist
- All WAN Edges use signed certificates from vManage โ never use self-signed in production
- vBond is deployed in a DMZ and reachable from all transport IPs (UDP 12346)
- Two vSmart controllers deployed for HA โ never rely on a single controller
- vManage is backed up daily (
request nms configuration-db backup) - All device templates use variables โ no hardcoded values except role-specific config
- BFD timers are tuned per transport (MPLS: 1sร6, LTE: 3sร5)
- AAR SLA classes defined for voice, critical apps, and best-effort tiers
restrictkeyword used on colors that should never carry specific traffic types- Zero Trust: WAN Edges only accept control connections from known controller IPs