Overview
VMware VeloCloud SD-WAN (now VMware SD-WAN by Broadcom) transforms enterprise WANs by abstracting complex MPLS/internet hybrid networks into a centrally managed overlay. It delivers application-aware routing, dynamic path selection, and zero-touch provisioning โ all from a single dashboard.
This guide is built from hands-on experience managing VeloCloud deployments across 42 countries, including a major QoS optimization project at Cebu HQ that reduced VoIP complaints by over 80%. It covers architecture, design best practices, QoS tuning, and deep troubleshooting with real CLI references.
Architecture
VeloCloud operates on a three-tier model. Every branch Edge builds encrypted VCMP tunnels up to Cloud Gateways, which connect back to the Orchestrator for central management. Branch-to-branch traffic can flow Edge-to-Edge directly without hair-pinning through the data center.
Three-Tier Components
VCMP (VeloCloud Multipath Protocol) runs over IPsec UDP/2426 and is the data-plane backbone. DMPO (Dynamic Multi-Path Optimization) continuously measures each path's latency, jitter, and packet loss โ and steers traffic to the best-performing link per application class in real time.
Best Practices
1. Edge and Circuit Sizing
Always deploy dual WAN links per Edge. VeloCloud's core value is dynamic multi-path โ a single link removes the ability to steer between paths and makes the deployment no better than a traditional router.
Sizing rules:
- Edge throughput should be 2ร actual peak traffic to absorb encryption overhead
- LTE backup should be scoped for critical traffic only โ apply QoS to suppress bulk during failover
- Always confirm ISP MTU โ VCMP adds ~80 bytes of header overhead per packet
2. Business Policy and QoS
The Business Policy engine is what separates a well-tuned SD-WAN from a poorly performing one. Every application class needs explicit classification, a path preference, and appropriate bandwidth treatment.
Key per-rule settings to configure:
- Network Service โ Direct (local breakout), Multi-Path (overlay), or Gateway Backhaul (data center)
- Link Steering โ Auto, Preferred Link, or Transport Group (e.g. MPLS-only group)
- Service Class โ Real-time, Transactional, or Bulk (maps to internal QoS queuing)
- Rate Limit โ Set ceiling percentages on bulk/best-effort to protect real-time queues
Microsoft 365 tip: Use the built-in
Office 365application category and set it to Direct Breakout at the Edge. Microsoft publishes optimized endpoint lists that VeloCloud auto-imports โ this avoids backhauling O365 traffic through the data center and improves SharePoint, Teams, and Exchange performance dramatically.
3. WAN Link Configuration
Accurate link configuration is the foundation of QoS. VeloCloud uses the configured bandwidth values for scheduling โ overestimate and the queues overflow silently, underestimate and you waste purchased capacity.
# Example WAN link configuration (VCO CLI / API reference)
# Configure > Edge > Device > Interface > WAN Overlay
Interface: GE3 (Public WAN)
Link Type: Wired
ISP Name: PLDT Fibr
Upstream: 100 Mbps โ set to actual contracted value
Downstream: 200 Mbps โ set to actual contracted value
MTU: 1500
Link Mode: Active
Dynamic Bandwidth Adjustment: Enabled
Overhead Calculation: Ethernet (0%)
Critical link settings explained:
| Setting | What it does | Recommendation | |---|---|---| | Bandwidth (up/down) | Used for QoS scheduling and capacity math | Set to actual ISP values โ never guess | | Dynamic Bandwidth Adjustment | Detects ISP throttling in real time | Always enable for variable links (LTE, cable) | | Link Mode | Active/Active = load balance, Active/Standby = failover only | Active/Active for dual DIA, Standby for LTE backup | | MTU | Must account for VCMP overhead (~80 bytes) | Set 1400 for DSL/PPPoE, 1420 for LTE | | Overhead Calculation | Accounts for ISP encapsulation overhead | Ethernet 0%, DSL ~10%, LTE ~15% |
4. Network Segmentation
VeloCloud supports multi-tenant segmentation at the Edge. Each segment is an isolated routing and policy domain โ guest traffic, IoT devices, and corporate users never share the same forwarding table or firewall context.
5. High Availability
For any site where outages are business-critical, deploy an Edge HA pair. Active/Standby is the recommended mode โ simpler to troubleshoot than Active/Active and sufficient for most sites.
# HA configuration best practices (VCO: Configure > Edge > HA)
HA Mode: Active/Standby
HA Interface: GE1 (dedicated crossover or VLAN)
Heartbeat Interval: 300ms # default 500ms โ 300ms for faster detection
Detection Time: ~1.5s # 5 ร heartbeat interval
Split-Brain Detection: Enabled
# Verify HA status on each Edge
VCE# show ha status
VCE# show ha heartbeat
HA rules to follow:
- Use a dedicated HA link โ never share it with LAN or WAN traffic
- Both Edges must run identical firmware โ upgrade standby first, failover, then upgrade the former active
- Test failover quarterly โ physically unplug the active Edge and confirm standby takes over within 5 seconds
- Enable Split-Brain Detection to prevent both Edges claiming the Active role simultaneously
6. Gateway Assignment
- Assign primary and secondary gateways from geographically close PoPs โ APAC sites should use Singapore or Tokyo, not US West
- Pin critical sites to specific gateways (VCO: Configure > Edge > Gateway Pools) to prevent random reassignment after gateway maintenance
- Enable Cloud VPN (Edge-to-Edge direct tunnels) for branch-to-branch traffic โ eliminates data center hair-pin and reduces latency dramatically
- For backhauled internet traffic, monitor gateway utilization โ a saturated gateway PoP degrades all sites assigned to it
Troubleshooting Guide
Issue 1 โ Edge Offline / Not Activating
Symptom: Edge shows "Offline" in VCO immediately after physical installation.
# Access Edge local console: serial cable or browser at https://192.168.2.1
# Check WAN interface status and IP assignment
VCE# show interface
VCE# show ip
# Verify DNS can resolve VCO and VCG hostnames
VCE# nslookup vco.velocloud.net
# Check VCMP tunnel establishment
VCE# show tunnel
VCE# show status
# Verify outbound ports are not blocked by upstream firewall
# Required: UDP/2426 (VCMP data), TCP/443 (management to VCO)
Issue 2 โ Tunnel Flapping
Symptom: Tunnels to gateways or peer Edges repeatedly go up and down. QoE score fluctuates.
# Check tunnel state and uptime
VCE# show tunnel detail
# Check per-link quality measurements
VCE# show link quality
# Look for packet loss spikes in the flow stats
VCE# show flow stats
Diagnosis decision tree:
Tunnel Flappingโโโ ISP packet loss > 1% or jitter > 50msโ โโโ Check link quality graphs in VCO over 24h windowโ โโโ Fix: Contact ISP, or switch traffic to secondary linkโโโ MTU issue (VCMP overhead)โ โโโ Tunnels flap intermittently, larger packets affectedโ โโโ Fix: Set WAN MTU to 1400 (DSL/PPPoE) or 1420 (LTE)โ Enable PMTUD on the WAN overlayโโโ NAT session timeoutโ โโโ Tunnels drop after period of idle (common on LTE and ISP CGNATs)โ โโโ Fix: Enable keep-alive on WAN link configโโโ Duplex/speed mismatch โโโ CRC errors on WAN interface โโโ Fix: Force interface speed/duplex to match ISP handoffFix MTU on a flapping tunnel:
# VCO: Configure > Edge > Device > Interface > WAN Overlay
# Set these values for each WAN link type:
DSL/PPPoE: Link MTU = 1400
LTE: Link MTU = 1420
Ethernet: Link MTU = 1500 (or match ISP if lower)
Enable PMTUD: Yes # Allows automatic MTU discovery
Issue 3 โ Poor Voice and Video Quality
Symptom: Users report choppy audio, video freezing, or dropped calls on Teams or Zoom.
# Step 1: Check QoE for the affected Edge in VCO
# Monitor > Edge > QoE tab โ filter by Real-time application class
# Look for: latency > 100ms, jitter > 30ms, packet loss > 0.5%
# Step 2: Verify Business Policy classification
# Configure > Business Policy โ confirm RTP/SIP maps to Real-time class
# Step 3: Check link utilization
VCE# show link quality # Look for utilization > 85% on any link
Root cause tree:
Poor Voice / Video Qualityโโโ Business Policy wrong โ traffic not classified as Real-timeโ โโโ Fix: Set RTP/SIP โ Real-time service class, Direct network serviceโโโ Link saturated (utilization > 85%)โ โโโ Fix: Rate-limit bulk/best-effort, add bandwidth, or enable bondingโโโ Jitter > 30ms on primary pathโ โโโ Fix: Enable FEC (Forward Error Correction) for real-time trafficโโโ ISP packet loss > 0.5%โ โโโ Fix: Force traffic to secondary link, open ISP fault ticketโโโ Traffic not matching any policy rule โโโ Fix: Add custom application definition for non-standard portsEnabling Forward Error Correction (FEC):
FEC reconstructs lost packets at the receiver by sending redundant data. On links with 0.5โ5% loss, it eliminates audible voice degradation at the cost of ~15% extra bandwidth.
# VCO: Configure > Business Policy > [Real-time rule] > Link Steering
FEC Mode: Adaptive # Recommended โ activates only when loss is detected
# Alternative: "Always On" โ constant overhead but guaranteed reconstruction
# Apply FEC only to Real-time class โ applying to bulk wastes bandwidth
Issue 4 โ Traffic Bypassing the SD-WAN Overlay
Symptom: Traffic takes the underlay path directly, bypassing VeloCloud policy and encryption.
# Check Edge routing table โ look for static routes overriding overlay
VCE# show ip route
# Check if affected flow matches a Business Policy rule
VCE# show flow table
# Confirm default route points through overlay, not legacy router
VCE# show ip route 0.0.0.0
Common causes: Legacy static routes on LAN-side switches pointing to a pre-SD-WAN router, OSPF/BGP redistribution injecting underlay prefixes into the overlay, incorrect VLAN assignments placing traffic in a non-SD-WAN segment, or split DNS causing internal IP resolution that bypasses the tunnel.
Fix: Ensure the Edge has a default route pointing through the overlay as the lowest-metric path. Remove any conflicting static routes on downstream switches that were configured before the SD-WAN deployment.
Issue 5 โ Slow SaaS Application Performance
Symptom: O365, Salesforce, or other SaaS apps are slow despite adequate bandwidth on all links.
# Step 1: Check if traffic is being backhauled to data center
# VCO: Monitor > Edge > Applications โ look for O365 flows routing via Gateway
# Step 2: Verify DNS is resolving to a geographically close CDN node
VCE# nslookup outlook.office365.com
# Resolved IP should geolocate to your region, not a distant PoP
# Step 3: Enable O365 Direct Breakout
# VCO: Configure > Business Policy > Office 365 category
# Set: Network Service = Direct Internet
# Enable: DNS Proxy on the Edge for local resolution
When O365 traffic is backhauled through the data center gateway, it travels hundreds of extra miles before reaching Microsoft's servers โ adding 30โ100ms of unnecessary latency. Direct breakout removes this and lets the Edge connect directly to the nearest Microsoft PoP.
Issue 6 โ HA Failover Not Working
Symptom: The Standby Edge doesn't take over when the Active Edge loses connectivity or power.
# Check HA state on both Edges
VCE# show ha status
VCE# show ha heartbeat
# Check physical HA link interface status
VCE# show interface detail
# Expected output on healthy HA pair:
Active: HA state: ACTIVE | Link: UP | Peer: STANDBY
Standby: HA state: STANDBY | Link: UP | Peer: ACTIVE
Issue 7 โ Zero-Touch Provisioning Failure
Symptom: New Edge powers on and connects to the internet but never appears as Active in VCO.
# Verify pre-requisites before connecting the Edge
# 1. Serial number registered in VCO: Inventory > Edges
# 2. Activation URL reachable: TCP/443 outbound must be open
# 3. VCMP port open: UDP/2426 outbound on any upstream firewall
# If ZTP fails โ use manual activation via local console
# Browser: https://192.168.2.1 > System > Activation
# Enter activation key from VCO > Edges > [Edge] > Overview
VCE# activate
# Common ZTP failure: captive portal on upstream network
# ZTP cannot authenticate through hotel / guest WiFi portals
# Fix: Use a mobile hotspot or pre-activate manually before deployment
Topology Example โ Multi-Site Deployment
This diagram shows a realistic three-site deployment: HQ with HA Edge pair, a medium branch, and a small remote site. All sites build VCMP tunnels to the nearest Gateway PoP, and direct Edge-to-Edge tunnels handle branch-to-branch traffic without routing through HQ.
Performance Optimization Checklist
Use this before go-live and during quarterly reviews:
Conclusion
VeloCloud SD-WAN dramatically simplifies WAN operations โ but the difference between a mediocre deployment and a great one is in the details: accurate link sizing, precise Business Policy classification, and proactive monitoring. A well-tuned deployment with proper QoS and direct SaaS breakout can reduce VoIP complaints by over 80% and cut SaaS latency in half compared to backhauled architectures.
Invest time in the VCO dashboard, build alerting integrations early, and test HA failover before you need it in a real incident. The faster you detect a degraded link or misconfigured policy, the faster you resolve it โ ideally before users notice.