January 13, 2022

Best Practices for Maximizing the Value of Situation Alarms

Today, IT operations teams have to process large volumes of events or alarms in near real-time in order to protect service levels, stay competitive, and deliver a great experience to customers.

If it takes too long for teams to spot and repair issues, an organization runs the risk of significant business service downtime, SLA penalties, and brand reputation damages. As IT landscapes continue to grow in scale and complexity, guarding against these risks becomes increasingly difficult. To adapt, IT operations teams need to modernize incident management and introduce artificial intelligence (AI) and machine learning to analyze thousands of events in real time.

DX Operational Intelligence enables teams to have a unified view of their monitoring environment and effectively scale to manage all their domain alarms. In this blog, we introduce the concept of alarm clustering and share a few best practices to help you efficiently manage your event/alarm traffic.

What is a Situation Alarm

Situation alarms are a collection of alarms that are correlated and grouped into clusters. Alarms are clustered by combining the alerts based on distinct dimensions. These clusters represent a problem that affects applications, infrastructure, or data center health. This clustering supports simplified triage and more efficient analysis. This grouping also enables you to consolidate a massive number of contextually relevant alarms and address them as a whole. For example, a collection of cascading alarms that are caused by a shortage of specific resources are automatically grouped into a single situation cluster.

From a technology perspective, situations are created using machine learning-based clustering algorithms that employ time correlation, topological relationship, and natural language processing. Situation alarms can map to a number of dimensions, such as text, active time series, host, and historical time series. Alarms can also be mapped based on the relationship of a configuration item to a business service. All these dimensions can be used to create situation clusters.

Situations help reduce alarm noise and provide a holistic view of a problem across distinct IT silos, helping IT operations triage problems faster.

ESD_FY2021_Academy-Blog-Best-Practices-Maximizing-Value-Situation-Alarms.figure_01

Figure 1: Example of a situation composed of three sub-clusters.

Best Practices to Get Value from Situation Alarms

To get the most out of situation alarms, it is important to follow these best practices:

Harness Timelines

Leverage the timeline to understand the order of events’ arrival. This section contains useful historical information to understand how a situation has developed over time.

ESD_FY2021_Academy-Blog-Best-Practices-Maximizing-Value-Situation-Alarms.figure_02

Employ Filtering

Build situation filters to focus on what is most relevant for your team. You can zoom in on situations affecting a specific business service or situations in which the root cause has been identified. This is the first step toward creating a policy for ITSM integration or triggering automations from clustered alarms.

Customize Cluster Dimensions

Leverage APIs for clustering situation dimensions to fine-tune how situations are built. You can make adjustments to the weight of each dimension (service, host, time, text, and historical). You can also act on situations via APIs to facilitate automated calls or scripts.

Track Situation Flow

Consider the situation flow. This will help your team better understand which problems should be addressed, and in which sequence. For example, while low entropy problems (that is those that have a low level of change) may be candidates for immediate investigation, problems that have just started may only need to be monitored.

Situations evolve from active, to stable, then to closed.

Active: New/updated alarms are still forming the cluster.
Stable (No associated icon): No new or updated alarms have been received.

A situation will transition from active to stable in two possible scenarios:

Natural stabilization: If no new/updated alarms are received for 30 minutes.
Forced stabilization: After a configurable “situation window” has passed, this option can force stabilization, even though the cluster may have facets that keep changing.

In addition, these two states (active or stable) can be tagged as “noisy” if new/updated alarms are still arriving in the cluster.

Determine What to Address First

Start by looking at the situations with high severity that are having an impact on your key services or entities. Then, you can decide to tackle situations that are active (still evolving but showing near real-time issues) or stable (cluster already formed and closed).

Leverage Noise Reduction

Leverage noise reduction indicators. This is a great resource to understand how situation alarms are helping your team and improving staff productivity by reducing the number of alarms they need to analyze.

As shown in the example below, instead of browsing through 621 raw alarms, you may only need to handle 12 situations.

ESD_FY2021_Academy-Blog-Best-Practices-Maximizing-Value-Situation-Alarms.figure_03

Next Steps

Now that you have learned the best practices to deal with alarm flooding by leveraging alarm clustering, get started by exploring our Broadcom Enterprise Software Academy for more DX Operational Intelligence resources or check out this blog: Reduce Noise and Speed Root Cause Analysis with Alarm Analytics

Tag(s): AIOps , DX OI

Nestor Falcon Gonzalez

Nestor holds a Master's Degree in Telecommunication Engineering and has over 20 years of experience in Telco, Network and Infrastructure Operations in different roles: SWAT, pre-sales and Solution Architect. He focuses on helping customers on their network transformation, driving innovation and providing value for...

Other resources you might be interested in

Blog October 29, 2025

Your Root Cause Analysis is Flawed by Design

Discover the critical flaw in your troubleshooting approaches. Employ network observability to extend your visibility across the entire service delivery path.

Read Blog

Blog October 29, 2025

Whose Fault Is It When the Cloud Fails? Does It Matter?

In today's interconnected environments, it is vital to gain visibility into networks you don't own, including internet and cloud provider infrastructures.

Read Blog

Blog October 29, 2025

The Future of Network Configuration Management is Unified, Not Uncertain

Read this post and discover how Broadcom is breathing new life into the trusted Voyence NCM, making it a core part of its unified observability platform.

Read Blog

Office Hours October 23, 2025

Rally Office Hours: October 9, 2025

Discover Rally's new AI-powered Team Health Widget for flow metrics and drill-downs on feature charts. Plus, get updates on WIP limits and future enhancements.

View Recording

Course October 23, 2025

AAI - Navigating the Interface and Refining Data Views

This course introduces you to AAI’s interface and shows you how to navigate efficiently, work with tables, and refine large datasets using search and filter tools.

Go to Training

Office Hours October 23, 2025

Rally Office Hours: October 16, 2025

Rally's new AI-driven feature automates artifact breakdown - transforming features into stories or stories into tasks - saving time and ensuring consistency.

View Recording

Blog October 22, 2025

What’s New in Network Observability for Fall 2025

Discover how the Fall 2025 release of Network Observability by Broadcom introduces powerful new capabilities, elevating your insights and automation.

Read Blog

eBook October 22, 2025

Modernizing Monitoring in a Converged IT-OT Landscape

The energy sector is shifting, driven by rapid grid modernization and the convergence of IT and OT networks. Traditional monitoring tools fall short.

Read eBook

Blog October 22, 2025

Your network isn't infrastructure anymore. It's a product.

See why it’s time to stop managing infrastructure and start treating the network as your company's most critical product. Justify investments and prove ROI.

Read Blog