November 17, 2021
5 Criteria You Need to Drive Efficient Alarm Management

Written by: Jason Normandin
As a commercial pilot landing at night on an unfamiliar runway, the last thing you want is a cockpit alarm telling you the passenger in 14A wants more ice in their soda. You need to concentrate on the job at hand. At that critical moment in flight, you only want visibility into the alarms that matter.
It’s the same with your monitoring environment. Too often, you can be overwhelmed by a tsunami of alarms—thousands of monitoring alerts that all point to the same problem. You then need to sift through these redundant alarms to filter out the noise and focus on key issues.
You can blame the evolution of technology for this dilemma. Spiralling complexity makes it harder than ever to manage systems and the network. In response, most organizations have turned to separate, siloed monitoring tools to fix each problem as they’ve arisen. Indeed, recent research reveals that more than half (52%) of the companies surveyed are using six or more monitoring tools, with over one in ten companies relying on 20 or more tools.
On-call engineers are losing this war on observability. They end up moving from alert to alert, attempting to identify which ones are superfluous and which need to be resolved. Alert fatigue quickly sets in, putting system performance, availability—and ultimately the business—at risk.
5 Criteria You Need to Manage Alarms More Efficiently
It doesn’t need to be this way. These are five criteria an operations engineer needs to think about for optimized alarm management:
- Right alarms: You need to separate signal from noise, eliminate false positives, and determine which alarms need your attention.
- Right problems: You need to group and relate alarms that represent the context of a larger or cross-domain problem.
- Right priorities: You need to tie problems to their impact on services and applications, so you can focus your resources on what matters most for the business.
- Right resources: You need to notify and assign the relevant teams depending on the domain and the context of the problems.
- Right remediation: You need to leverage proven solutions or workarounds that can remediate problems and mitigate the business impact.
Gain Visibility and Control into End-To-End Business Services
Among many new and exciting features, the latest release of DX Operational Intelligence introduces innovative alarm triage and noise reduction improvements. This forward-thinking AIOps platform uses machine learning-driven algorithms to reduce alarm noise, identify root cause, decrease ticket volume, and automate ticket management. Related alarms are clustered into situations to help identify patterns of issues that may have an impact on the health and performance of the business.
In this new release, situations have been enhanced to cluster on a set of DX NetOps root cause and symptom alarms, to determine whether an issue is an isolated situation or part of a larger cross-domain problem involving applications, infrastructure, and network. Situation annotations can now be automatically synchronized with ServiceNow tickets, enabling network operators to standardize on DX Operational Intelligence as their primary triage tool, without the burden of manually updating the ITSM platform. In addition, customers can leverage various message templates to provide tailored information to the operations team for deeper insights into the impact, cluster drivers, and probable root cause insights driven through situations. These insights can be included in SNOW tickets, Slack messages, email messages, or any other third-party, REST-compatible communication tool, such as GoogleChat.
Let’s return to our pilot scenario. Aircraft manufacturers devote significant design time to ensuring cockpit warnings are prioritized to eliminate alarm fatigue and avoid false positives. By harnessing major advances in AI and machine learning, AIOps platforms like DX Operational Intelligence can handle the speed and the volume of modern digital environments to maximize alarm management efficiency.
Now is the right time for your organization to regain control on your own IT operations cockpit and eradicate alarm fatigue.
Visit our AIOps page and the new release presentation at Broadcom’s Enterprise Software Academy to discover how modern AIOps solutions can help your organization automate root cause analysis and reduce alert fatigue.

Jason Normandin
Jason Normandin has over 17 years of experience in the Network Performance and Fault monitoring industry. Focusing on User Experience, APIs and new technologies Jason drives to provide simplicity to complex technologies and insights into today’s massive data repositories.
Other resources you might be interested in
Nobody Cares About Your MTTR
This post outlines why IT metrics like MTTR are irrelevant to business leaders, and it emphasizes that IT teams need network observability to bridge this gap.
Rally Office Hours: October 2, 2025
The Rally Model Context Protocol (MCP) Server acts as a standardized interface for AI models and developer tools. Learn about this exciting new feature then follow the weekly Q&A session with Rally...
Why 1% Packet Loss Is the New 100% Outage
In an era of real-time apps and multiple clouds, the old rules about 'acceptable' network errors no longer apply. See why you need end-to-end observability.
Rally Office Hours: September 25, 2025
Rally Office Hours delivers an essential product tip: Learn to transition from Legacy Custom Pages to powerful Custom Views. Plus, Q&A insights.
Defining the Network Engineer of Tomorrow
Read this post and see why the most important investment isn't in new hardware, but in transforming your team from device managers to service delivery experts.
Harnessing AppNeta’s Browser- and HTTP-based Workflows to Track User Experience
AppNeta’s browser- and HTTP-based workflows let you see what users actually experience. Preempt issues before they become headaches for your end users.
“Rego U” Recap: Why SPM Is Still Hot
Rego Consulting’s Annual Conference underscored why strategic portfolio management (SPM) is still essential. Leverage SPM to bridge strategy and execution.
What's New in AutoSys 24.1: Built for the Modern Automation Landscape
See how AutoSys 24.1 is designed to streamline your daily tasks, accelerate troubleshooting, and simplify how you integrate with the latest technologies.
Rally Office Hours: September 18, 2025
In the latest edition of Rally office hours, learn about changes to the Progress Views widget and then follow the weekly Q&A session with Rally product experts.