Considerations for Active Monitoring from an SD-WAN Site

Written by Helen Burke | Mar 27, 2024 3:09:47 PM

Key Takeaways

Implement active monitoring to promptly detect and address issues, ensuring network reliability.
Deploy proactive alerting systems to quickly identify and mitigate problems that can affect end-user traffic.
Route monitoring traffic through SD-WAN tunnels, providing visibility into end-to-end performance.

As companies adopt SD-WAN technologies, they increasingly rely on network services outside their control. The new reality is that network operations need end-to-end visibility on the network performance whether or not they own the infrastructure.

In a 2023 EMA survey, 63% of companies report using the Internet as their primary WAN connectivity. Traditional monitoring offers passive traffic analysis, which is not sufficient to provide insights into performance problems when applications run outside of the data center. This is why active monitoring has become so critical to help operations teams understand how end users experience the network.

What is active monitoring?

Active monitoring involves probing the end-to-end network path and applications using test traffic or synthetic transactions. Unlike passive monitoring, which captures network activity as it naturally occurs, active monitoring actively generates traffic to assess various parameters such as latency, packet loss, bandwidth utilization, and application responsiveness.

The primary benefits of active monitoring include:

24x7 visibility so operations are aware of performance problems affecting the network before end users report them
Accelerating triage of issues and identification of the “fault domain” to reduce mean time to innocence (MTTI) and mean time to repair (MTTR)
Continuously validating that failover links stay available even if those links are passive

Using active monitoring, network operations can save lengthy troubleshooting efforts, avoid multi-vendor blame-game scenarios, and increase visibility across portions of the network they do not own.

Active monitoring of the network

Continuous network path analysis involves periodically sending out small bursts of packets to user-determined network targets and collecting timing data about the packets after they traverse the network. Such a lightweight technique helps determine if there are network problems and enables the automatic initiation of deeper diagnostic tests to pinpoint the cause. Measurements are derived into key performance metrics, such as round-trip time (RTT), latency, jitter, and data loss. The data can also be used to infer other metrics like the total capacity and the utilized capacity of a network link.

Active monitoring of the applications

Synthetic monitoring is a modern way to see trends in the performance of SaaS and web applications. This approach uses scripting to emulate the paths and actions that end users take as they use an application. Tests are run at regular intervals from monitoring points strategically located in the user subnets. Each time a test is run, the timing is broken down into DNS timing, TCP connection, SSL connection, request wait, and download timings. This allows for the identification of where the network, security, server, or application is responsible for performance degradation. Measurements are collected and stored for analysis, presentation, and alerting whenever the user experience runs outside of acceptable limits.

Isolating network problems using active monitoring

Several fault domains can potentially impact connectivity and performance across end-to-end network communication. These include the local network, the SD-WAN platform, the WAN links underlay, and the application environment or the data center.

A common strategy for efficient issue isolation is to configure active monitoring from the end-user subnets to the applications, whether they are located in a VPC, a SaaS provider, or a remote data center. Monitoring traffic goes through the WAN links using one of the two possible strategies:

Monitoring SD-WAN Overlay and Underlay. This strategy involves connecting one interface of the monitoring point directly to the edge router, bypassing the SD-WAN tunnels and the other interface of the monitoring point to a user subnet. Monitoring paths can then be set up over both the underlay routes and through the overlay. This method provides clear hop-by-hop visibility into the underlay circuits delivered by third-party providers, as traffic is not encapsulated within the SD-WAN overlay. It also provides proactive alerting when issues affect end-user traffic passing through the SD-WAN overlay.
Monitoring specific SD-WAN tunnels. Alternatively, monitoring traffic can be configured to pass through the SD-WAN tunnels by setting up specific routing rules within SD-WAN policies. While this approach may mask some underlay details, it still allows for monitoring and assessing the health of the WAN links. By routing monitoring traffic through the SD-WAN tunnels, it becomes subject to the same policies and optimizations applied to regular application traffic. This method enables observability of end-to-end performance, including how the SD-WAN overlay impacts application traffic. However, it does not provide as granular insights into underlay specific issues compared to bypassing the SD-WAN edge.

By comprehensively analyzing the monitoring results obtained from the two strategies, it is possible to get a better understanding of network behavior and make informed decisions to troubleshoot the issues effectively.

When encountering issues where all applications are affected, the focus of investigation typically shifts towards the LAN or SD-WAN Edge. This is because SD-WAN technology is inherently designed to adjust to underlay issues dynamically, minimizing their impact on end users. Doing some diagnostics on the LAN side could help reduce the scope of investigations. Additionally, if all the tunnels are affected simultaneously, the primary suspect becomes the SD-WAN Edge. It's highly improbable, though not entirely impossible, for multiple communication providers to experience issues simultaneously. In such cases, thoroughly examining the SD-WAN Edge configuration, hardware, or software components is how to identify and fix any underlying issues impacting WAN connectivity and application performance.

In scenarios where applications are only partially impacted, the troubleshooting approach may involve analyzing underlay networks and collaborating with external stakeholders such as ISPs or CSPs to resolve the underlying causes effectively.

Drawing it all together

In the ever-evolving network and cloud landscape, the strain on SD-WAN systems intensifies. As a result, the imperative for end-to-end visibility becomes paramount, whether organizations own the network infrastructure or not. Active monitoring emerges as a critical tool, providing insights that passive monitoring can no longer deliver. Ultimately, by embracing active monitoring as a fundamental component of network management, organizations can effectively adapt to the challenges posed by increasing cloud dependence and unmanaged network architectures.

Explore strategies to reduce the complexity of managing SD-WAN environments by reading our complimentary white paper.

View full post