DX Operational Observability: Five New, Powerful Capabilities

Written by Pramit Saxena | Jun 6, 2025 1:00:02 PM

Key Takeaways

Learn about five powerful new capabilities delivered in DX Operational Observability.
Discover the benefits of these new capabilities, including deeper insights, improved efficiencies, and a more unified observability experience.
See the steps needed to start employing these capabilities in your environment.

DX Operational Observability (DX O2), our next-gen AIOps and Observability product, continues to provide new features and enhancements for practitioners across IT. DX O2 delivers a host of enhancements designed to empower IT operations, DevOps, and SRE teams.

In this post, I introduce five powerful enhancements, outline steps to get started, and describe some of the benefits, which include deeper insights, improved efficiencies, and a more unified observability experience. Here are the five enhancements:

Alarm Enrichment, adding topology and custom attributes to alarms post creation
Triage Inspector, managing service health
Automating Alarm actions via APIs
Infrastructure Discovery, detecting and mapping all infrastructure components
Role-specific Alarm Queues, streamlining alarm management

1. Alarm Enrichment

What’s new?

Alarm Enrichment is a powerful new capability that allows users to define rules that enrich alarms after the alarm creation. With additional context attached to alarms, IT teams tasked with remediating issues can prioritize better, and respond faster and more effectively.

With configurable rules, teams can use Alarm Enrichment to:

Add metadata, such as environment, severity, tags, or related knowledge base articles to alarms.
Enrich alarms with custom fields to help teams improve prioritization and tracking of incidents.

Alarm Enrichment helps you customize your alarm payload to improve both the clarity and relevance of alarms for your teams.

Key benefits

Improved alarm context

By adding relevant metadata and custom fields to alarms, teams receiving the alarms can quickly understand the nature of the issue, the environment in which it occurred, and the potential impact. This removes guesswork that can occur when interpreting raw alarm data.

Better prioritization

Enrichment allows you to fine-tune which alarms should take precedence. You can use custom fields like Geolocation, Environment, ESX Host, and Cluster Name to classify alarms and focus on the most critical ones first.

Enhanced automation

With enriched alarms, automated workflows and response mechanisms are more precise. The additional context lets you trigger the right notifications and remediation actions, whether that’s notifying the right team or triggering an automated recovery process.

Getting started

Getting started with Alarm Enrichment rules in DX O2 is easy. Follow these steps to create enrichment rules:

Navigate to Settings >> Alarm Enrichment Rules.
Select the option to create a new rule.
Define an alarm filter. This will determine the payload for the specific enrichment rule (or rules) to work on.
Select the attributes from the list of configured values that you want to alarms enriched with OR configure and add more attributes as per your requirements. Note: There are two types of attributes:
- Topology attributes may include Cluster Name, Geo, IP, Pod name, etc.
- Custom attributes (which are user-defined) may include Environment, Group, etc.
Preview view the list of alarms that qualify the given alarm filter criteria defined above.
Save the new rule.

New alarms that match the filtering criteria defined in the enrichment rule will now be enriched with the specified attributes and values.

These attributes are available when creating a policy, as in the Message Templates, so that they can be shared via email or as Slack messages.

For additional information, refer to the documentation.

2. Triage Inspector: Managing service health

What's new?

A deterioration in the health of a critical service sets alarm bells ringing and puts pressure on IT teams to quickly resolve the issue and restore normalcy. To understand dependencies, suspected causes, and other impacted services, IT teams can use Triage Inspector, which provides these insights at-a-glance.

Triage Inspector improves how teams manage service health. The ability to launch Triage Inspector directly from the Service Health bar chart makes the troubleshooting process more streamlined than ever before.

Key benefits

Greater use of all observability data: Triage Inspector aggregates all of the observability data available in DX O2 and presents it in intuitive summaries for analysis. This enables users of all levels to make fuller use of data across IT domains and of the powerful analytics of DX O2.
Lower MTTR: The capability significantly reduces the time needed to understand issues (eliminating the need to toggle between different views) by providing immediate access to triage actions directly from the Service Health widget.
Service context: Triage Inspector enhances the ability of teams to monitor, diagnose, and resolve service health issues, making the entire process faster, more comprehensive, and more precise.
Intuitive user experience: All teams benefit from a smoother user experience and streamlined troubleshooting process, which makes them more confident in what they are prioritizing and the steps they are taking to address issues.

Getting started

For single time periods: If you notice a degradation in service health (depicted as a dip represented by a single bar), simply hover over the bar and select Triage. This will launch Triage Inspector, where you can dive into the details for the timeframe the bar represents.

For multiple time periods: If you need to investigate multiple time periods, select Triage and then drag the crosshair to select the desired range. Triage Inspector will then open and display the context for the entire chosen duration.

Triage Inspector displays a comprehensive view of all relevant alarms. This gives all users a clearer, more focused view of system health so they can address issues more effectively.

To learn more, refer to the documentation.

3. Alarm Actions APIs

What’s new?

Alarms APIs provide a comprehensive set of endpoints to manage and interact with alarms programmatically. The latest release of Alarm Actions APIs enable automation of alarm workflows, ensuring efficient incident management and response.

Getting started

The following new APIs are available to use:

Acknowledge/unacknowledge alarms: Mark alarms as acknowledged to indicate they are being addressed, or un-acknowledge them to revert their status.
Assign/unassign alarms: Allocate alarms to specific users or teams for resolution, or remove such assignments as needed.
Create ticket: Integrate with external ticketing systems by creating tickets directly from alarms, facilitating seamless incident tracking.

To learn more, refer to the documentation.

4. Infrastructure Discovery

What’s new?

Enterprise IT teams often struggle with observability gaps. With dynamic and complex environments, teams may not be aware or lack visibility when the inventory of IT resources that should be monitored changes. This creates work for IT and prevents the organization from shifting from reactive to proactive monitoring. Infrastructure Discovery solves this problem by automatically detecting and mapping all infrastructure components—such as servers and VMs, removing the need for manual tracking, while ensuring monitoring tools have accurate, up-to-date information. This is crucial, especially for managing modern, cloud-native environments that are inherently dynamic.

This new capability streamlines the process of identifying and cataloging assets across your infrastructure. It supports two primary discovery types:

Host discovery: Automatically discovers hosts running on Windows and Linux systems. This enables teams to maintain an up-to-date inventory of on-premises servers and devices.
Cloud discovery: Discovers resources within cloud environments, specifically Amazon Web Services (AWS) and Microsoft Azure. This includes identifying various cloud services and instances, ensuring visibility into your cloud infrastructure.

Key benefits

Comprehensive visibility: Gain a unified view of both on-premises and cloud assets, facilitating better management and monitoring.
Automated inventory management: Reduce manual effort and shift to proactive monitoring by automatically discovering and updating your asset inventory.
Enhanced monitoring: With accurate and up-to-date asset information, improve the effectiveness of your monitoring and alerting systems.

Getting started

Installation: Download and deploy the Discovery Agent on your target systems. For host discovery, install the agent on your Windows or Linux machines. For cloud discovery, configure it to access your AWS or Azure environments.

Configuration: Set up the necessary credentials and permissions to allow the agent to access and discover resources within your environments.

Integration: Once configured, the Discovery Agent will feed the discovered asset information into DX O2 so that observability gaps will be closed and analysis of alarms and the health and performance of IT systems will be more accurate.

Monitor: Keep track of active and in-progress discovery jobs along with historical ones with their status.

You can learn more about this feature by referring to the respective documentation below:

5. Role-specific alarm queues

What’s new?

To reduce noise for their teams by providing them a view that helps them perform their jobs faster and with ease, Administrators can now set a default alarm queue for specific roles. With this configured, when a user opens the All-Alarms page, they will see the alarm queue that was configured as the default for their role.

Key benefits

Easier alarm management: With role-specific default alarm queues, teams can focus on the most relevant alarms, without having to sift through large amounts of data that include both noise and signal.
Improved control: Tenant Administrators can ensure consistency across roles.
Customizable user experience: While administrators set the default, users still have the flexibility to personalize their individual alarm queue view.

Getting started

Tenant Administrators can configure the default queue for any role.

Navigate to Settings >> Manage Alarm Queues page.
Select and edit an existing queue. Configure the roles for which this queue will be default.

User personalization: If a user prefers a different queue, the user can override the default queue. User settings take precedence over the default configuration defined by the Tenant Administrator.

You can learn more about this feature by referring to the documentation.

View full post