Key Takeaways
|
|
Within many organizations, there’s been a strategic imperative to abandon MPLS in favor of SD-WAN and direct internet access, particularly when it comes to branch office connectivity. The benefits of this move are undeniable and compelling. Organizations can establish direct cloud connectivity and realize cost savings and improved agility.
However, when you make this move, you fundamentally alter your network's foundation, trading the predictable, engineered transport of MPLS for the best-effort nature of the public internet. This has created a critical visibility gap—a gray zone between your branch and your applications that most network teams are ill-equipped to peer into.
This isn't just a minor operational hurdle; it's a paradigm shift. The days of a single, accountable provider with a contractually backed service level agreement (SLA) are over. Your control over the end-to-end packet path has vanished, replaced by a reality that’s governed by factors you don't manage.
It's crucial to internalize that the internet is not a cohesive network; it's a federation of thousands of independent autonomous systems (AS) that interconnect via the Border Gateway Protocol (BGP). The path your branch office traffic takes to your data center or to a SaaS application is determined by a complex and dynamic web of peering agreements between these service providers. This path can change at any moment due to routing policy adjustments, network congestion, or even politically motivated traffic engineering, none of which are under your control.
Unlike MPLS, where traffic paths are engineered for performance, internet paths are often engineered for the lowest cost to the provider. This frequently leads to “hot-potato” routing, instances in which an ISP hands off your traffic to the next network as quickly as possible. This can route your packets through congested peering points thousands of miles out of the way, introducing significant latency and packet loss. Even if your local broadband connection is performing perfectly, a problem in the middle mile—at an interconnection point between two major backbones—can cripple application performance, and you would have no way of knowing where it's happening.
SD-WAN is a great technology for navigating this new reality, but it's not a silver bullet for visibility. SD-WAN operates at the overlay level, creating secure tunnels (like IPsec) that run on top of physical internet connections, which constitute the underlay. Your SD-WAN appliance intelligently monitors the performance of these tunnels, measuring metrics like latency, jitter, and packet loss. If it detects that the path over your primary broadband link is degrading, it can dynamically reroute application traffic to a secondary link, such as another internet connection or LTE.
The limitation, however, is that the SD-WAN overlay only sees the cumulative result of the underlay path; it treats the entire internet journey as a single, opaque link. It can tell you that a tunnel's performance is poor, but it cannot tell you why. The packet loss could be occurring on the local loop, within the ISP's metro-area network, at a BGP peering point, or on the SaaS provider's network doorstep. Without visibility into the underlay, you can't perform root cause analysis. You're left making blind decisions, like upgrading bandwidth at a branch when the actual bottleneck is an underperforming transit provider hundreds of hops away.
The tools you've relied on for decades are insufficient for this new landscape. SNMP-based monitoring, for example, is great for telling you the interface status of the router you own, but it's completely blind to the provider networks beyond it. Your router can be perfectly healthy, while the user experience is unbearable because of an issue deep within the internet.
Simple diagnostics like ping and traceroute offer a glimpse but are ultimately inadequate. ICMP traffic is often deprioritized or blocked by network devices, providing unreliable performance metrics. While traceroute shows a list of IP hops, it doesn't provide historical performance data and can be misleading due to asymmetric paths and unresponsive routers. It gives you an instantaneous, often incomplete, picture and fails to measure critical metrics like per-hop jitter or loss.
To effectively manage branch office connectivity today, you must shift your focus from device health to a continuous, hop-by-hop understanding of the entire data path. This requires a new class of visibility solution that goes beyond the overlay. The goal is to actively measure performance from the branch edge, across every BGP-defined AS hop in the internet underlay, all the way to the application's hosting environment.
This is achieved with active monitoring that simulates application traffic (for example, using TCP or UDP packets) to continuously probe the network path. By analyzing the response from each hop, you can build a detailed, historical map of per-hop latency, loss, and jitter. This transforms troubleshooting. Instead of following up on a vague complaint about a slow application, you can pinpoint that a specific peering exchange between two ISPs began exhibiting 10% packet loss at a specific time, directly correlating it to user-reported issues.
This level of empirical data ends the finger-pointing between your network team, your ISPs, and your application vendors. It provides objective evidence to escalate issues effectively and validate that provider fixes have actually resolved the underlay problem. Moving beyond MPLS was a strategic necessity, but succeeding in this new environment requires you to stop guessing about internet performance and start measuring it from end to end.
To explore these concepts in greater detail and see how you can move from traditional to active monitoring, view our on-demand technical webcast, Take the Hassle Out of SD-WAN Management.