Contents
The Alert Fatigue Problem
Every NOC eventually drifts into one of two failure modes:
- Too few alerts — incidents go undetected until a client calls you
- Too many alerts — engineers stop reading them because most are noise
Both kill your SLA. SolarWinds NPM can generate thousands of alerts per day out of the box if you're not careful. The goal of a good alerting strategy is surgical: every alert that fires must be actionable, meaningful, and routed to the right person.
After tuning alerts across a 42-country network, this is what works.
Alert Design Principles
Before configuring a single alert, establish these rules with your team:
1. Alert on symptoms, not causes. "Interface down" is a symptom. "BGP peer lost" could be a symptom of that interface going down. Don't alert on both — they're the same incident. Alert on the highest-level symptom that an engineer can act on.
2. Every alert needs a runbook. If there's no documented procedure for responding to an alert, it should not be generating PagerDuty pages. Create the runbook before enabling the alert.
3. Alerts must be stateful. SolarWinds supports trigger conditions (alert fires) and reset conditions (alert clears). Both must be configured. An alert that fires but never clears creates ghost notifications — engineers get paged for incidents that resolved themselves 20 minutes ago.
4. Maintenance window suppression is mandatory. Every change that causes expected device state changes (reloads, interface flaps) must suppress alerts. Use SolarWinds Scheduled Maintenance to automatically suppress during the window, not ad-hoc disable-and-forget.
Alert Types and Thresholds
Interface State Alerts
Node Availability Alerts
# SolarWinds Alert Condition — Node Down (best practice example)
#
# Trigger condition:
# Node.Status = Down
# AND Node.CustomProperties.Environment = "Production"
# AND Node.CustomProperties.Tier != "Non-Critical"
# Sustained for: 5 minutes
# Evaluation interval: every 2 minutes
#
# Reset condition:
# Node.Status = Up
#
# Key: the 5-minute sustained condition filters out ICMP timeouts
# from temporary congestion bursts. Without it, every congestion event
# generates a node-down false positive.
CPU and Memory Alerts
Don't use single-poll thresholds for CPU/memory. CPU spikes happen during BGP reconvergence, interface state changes, and routing table updates. A single 95% CPU reading is normal during convergence. A 90% CPU reading sustained for 10 minutes is not.
Alert Suppression Strategies
Parent-Child Dependencies
Configure parent-child node relationships in SolarWinds. When a parent node (e.g., a core router) goes down, SolarWinds automatically suppresses alerts for all child nodes (access switches, servers) reachable only through that router. This is the single most effective way to reduce alert floods during a P1.
Setup: Manage → Dependencies → Add Dependency
- Parent: core-router-01
- Children: all nodes in that site
Maintenance Windows
# SolarWinds API — programmatically create a maintenance window
# Useful for scripting suppression into your change management workflow
$headers = @{ "Content-Type" = "application/json" }
$body = @{
EntityType = "Orion.Nodes"
EntityID = "12345" # SolarWinds Node ID
StartTime = "2026-04-15T02:00:00"
EndTime = "2026-04-15T04:00:00"
Message = "CHG0012345 - Router reload for IOS upgrade"
} | ConvertTo-Json
Invoke-RestMethod -Uri "https://solarwinds/api/v1/maintenance" -Method POST -Headers $headers -Body $body
PagerDuty Integration
SolarWinds connects to PagerDuty via webhooks. The key is routing — not every alert should page the same team.
Alert Deduplication
Configure PagerDuty's Alert Grouping to prevent multiple SolarWinds alerts from the same incident creating dozens of separate PagerDuty incidents. Group by: node hostname + alert type + time window (5 minutes).
Without deduplication, a single core router failure generates: node-down alert, interface-down alerts for every connected interface, BGP neighbor alerts for every peer, child node alerts for downstream devices. That's 20+ PagerDuty pages for a single event.