<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1110556&amp;fmt=gif">
Skip to content
    January 31, 2023

    Outages Happen. Now What?

    Network outages happen more often than you think. We may not experience them directly or even know they're occurring at all. When outages affect household names like Facebook, Amazon, Microsoft, and others, however, we're sure to find out after the fact that there was an issue.

    Depending on the user's activities and the duration of the issue, stress and frustration levels can vary. When a marketer can’t get that ground-breaking advertisement up on Facebook, they can get antsy. When a hybrid worker can’t place an order for that amazing home office equipment deal on Amazon, they can feel cheated. And when we’re unable to finish up that Microsoft PowerPoint presentation that our boss is waiting for, we can get very stressed out.

    Imagine how much more stressful it can be if you’re the one responsible for the service levels being delivered. Further, think about what it would be like if you’re responsible for operations at a large-scale enterprise that delivers business- or mission-critical digital services. First, the direct cost of outages can be massive. The average cost of IT downtime is $5,600 per minute or $336,000 per hour according to Gartner. Second, outages can cause more than just immediate revenue loss: staff productivity can suffer, data and other precious assets can get lost, customers can get angry and frustrated, the brand can be damaged, and the organization’s compliance status can be put at risk.

    Why Do Outages Happen and What Can You Do About It?

    Outages happen for a number of reasons. Cyberattacks, including ransomware, distributed denial of service (DDoS), malware, and other attacks continue to rank among the top causes of outages. Human errors like typos, misconfigurations, and cutting corners by ignoring documented procedures or applying unauthorized shortcuts are also common culprits.

    What’s more, outages are getting increasingly common. Outages related to software, network, and system problems are increasing as a result of complexities from adopting cloud technologies and software-defined architectures. In particular, networks are especially problematic. According to Uptime’s 2022 Data Center Resiliency Survey, networking-related issues have been the top cause of downtime over the past three years.

    For years, many network operations (NetOps) teams have relied on network monitoring tools to manage availability and performance within the four walls of their organization’s data centers. But the connectivity demands of today’s digital business are driving the need for a new approach. Now, NetOps teams need a way to gain better visibility and control of both internally managed networks and the networks run by external organizations, including cloud providers and ISPs. The question then becomes, “In today’s hyper-connected and multi-cloud environments, what can you do to prevent outages or respond faster when they happen?”

    The adoption of experience-driven network observability and management can help. This approach represents a superset of network monitoring. With experience-driven network observability and management, NetOps teams can understand, manage, and optimize the performance of digital services. With this approach, teams can gain visibility into the end-to-end user experience delivery chain, including every communication path and potential degradation point. This enables teams to focus on getting ahead of issues—before they affect end users.

    Three Ways to Protect User Experience in an Outage

    Experience-driven network observability and management tools and practices can help your NetOps teams gain actionable insights about the current and future state of a network. They deliver these insights by ingesting telemetry on network device performance; network and internet paths; alarms, faults, logs, and configurations; cloud and SaaS application performance; network traffic flows; and user experience metrics. Armed with this intelligence, NetOps teams can take the following actions to protect the user experience from the damaging effects of outages:

    1. Identify user impact first. Many times, outages will impact certain applications or regions, but not others. At any time, you need to know the state of the network and how the user experience is affected by changing network conditions. With experience-driven network observability and management tools, teams can identify any application in use at any location, continuously measure its performance for each user, and understand the impact on the network that delivers it. In the case of the recent Microsoft outage, the culprit was a network connectivity issue that arose between users and Microsoft applications.

      ESD_FY23_Academy-Blog.Outages Happen - Now What.Figure 1
      Graphs reveal connection outages and high amounts of loss and latency.
    2. Isolate where issues are located, and which are crucial. With these tools, teams can quickly and accurately identify the root cause of a problem. Knowing the source of the issue will either help you validate innocence or accelerate problem resolution. Using robust event correlation techniques can help you understand how outages and performance issues are affecting actual end-user experience and application delivery. As a result, you can prioritize remediation efforts based on business impact rather than simply on alarm duration or severity.

    3. Employ active monitoring of the network. To use apps like Office365, which run in Microsoft’s networks, users’ connections may traverse a huge number of network hops. In the example below, over the course of 30 minutes and then one hour, the number of dynamic paths varies for a single device targeting Office 365. This illustrates how dynamic cloud environments can be. Small changes can have big consequences for connectivity. This heightens the value of actively testing network delivery to track SaaS and web applications, enabling you to proactively find and fix issues before they affect users. With experience-driven network observability and management tools, your teams can actively and continuously measure the end-to-end health, performance, and availability of the network.

      ESD_FY23_Academy-Blog.Outages Happen - Now What.Figure 2
      Multi-path route visualization shows how routes terminate at the edge of the Microsoft network.

    How Broadcom Can Help

    With Broadcom, your team can establish optimized NetOps capabilities. With our solutions, you can minimize the risk and impact of network outages, streamline operations, and maximize network performance and availability. In the process, the solution helps you more fully capitalize on revenue opportunities. Register now and join our 30-minute Small Bytes webinar on February 1st at 12 PM EST to learn more about how to troubleshoot Microsoft Teams issues in today’s hybrid work environments.

    Gedeon Hombrebueno

    Gedeon focuses on bringing the Network Observability by Broadcom solution to market. The solution enhances network visibility to boost network operations efficiency and user experience—key to today’s business success. Gedeon has extensive product marketing, product management, and integrated marketing experience in...

    Other Resources You might be interested In

    icon
    Blog August 22, 2025

    Handling Incomplete User Stories at the End of an Iteration

    When a team reaches the end of an iteration, some user stories may not be completed. This post details causes and options for managing these scenarios.

    icon
    Blog August 20, 2025

    What’s Hiding in Your Wiring Closets?

    See why you must move from periodic audits to a state of perpetual awareness. Track every change, validate it against policy, and understand its impact.

    icon
    Blog August 15, 2025

    All Network Monitoring Tools Are Created Equal, Right?

    See how observability platforms provide a unified view across multi-vendor environments and correlate network configuration changes with performance issues.

    icon
    Blog August 15, 2025

    Scale Observability, Streamline Operations with AppNeta Monitoring Policies

    This post reveals how, with AppNeta’s monitoring policies, you can leverage a powerful framework for scalable, flexible, and accurate network observability.

    icon
    Course August 14, 2025

    AppNeta: Current Network Violation Map Dashboard

    Learn how to configure and use the Current Network Violation Map dashboard in AppNeta to identify geographic regions impacted by WAN performance issues.

    icon
    Course August 14, 2025

    AppNeta On-Prem: Minimize Unplanned Downtime

    Learn how to configure the AppNeta On-Prem environment following best practices for high availability and disaster recovery to maintain service continuity and minimize unplanned downtime.

    icon
    Office Hours August 12, 2025

    Rally Office Hours: August 7, 2025

    Get tips on how to use the Capacity Planning feature in Rally, then follow the weekly Q&A session with Rally product experts.

    icon
    Blog August 11, 2025

    dSeries Version 25.0 Boosts Insights, Security, and Operational Efficiency

    Discover how ESP dSeries Workload Automation 25.0 represents a significant leap forward, making workload automation more secure, visible, and efficient.

    icon
    Blog August 7, 2025

    What Your SD-WAN Isn't Telling You

    SD-WAN's limited view blinds it to underlay issues. Augment SD-WAN with end-to-end visibility to validate decisions and diagnose root causes for network resilience.