Broadcom Software Academy Blog

How to Harness GenAI in DX NetOps to Speed Troubleshooting

Written by Nestor Falcon Gonzalez | May 10, 2024 4:56:00 PM
Key Takeaways
  • Employ GenAI to enrich alerts with contextual insights, boosting operational efficiency. 
  • Query AI chatbots to gain intelligence on reported issues, enhancing NOC efficiency and speed.
  • Supplement network monitoring data to help engineers speed up troubleshooting and reduce the impact and cost of an outage.

Have you ever considered leveraging generative AI, also known as GenAI, to support your network operations? If so, you are not alone. According to IDC, teams in 43% of IT organizations are investigating various potential applications of GenAI.  Additionally, Gartner predicts that within the next two years, GenAI technology will be responsible for 20% of initial network configurations.

GenAI is a type of artificial intelligence technology designed to create new content based on models and data used for training. GenAI models use techniques such as deep learning to predict outcomes. The technology has gained significant attention due to its ability to generate creative content, but the potential spans various domains, including coding and design.

There are numerous potential applications for GenAI in network operations. In this blog, we will provide two practical use cases for augmenting network monitoring by enriching network operations center (NOC) alerts with contextual data produced by AI chatbots. By supplementing network monitoring data, GenAI can help level-one engineers speed up troubleshooting and reduce the impact and business cost of an outage.

Enrich alerts with contextual information

For NOC teams to be effective, they must establish  a comprehensive understanding of all monitored infrastructure and technologies and document all relevant processes. However, the NOC is often overwhelmed by the vast amounts of data originating from the network domain. This influx of data spans various networking technologies, including legacy and modern systems. This diverse data can present significant challenges, making it difficult to bridge the skill gaps that may exist between  the NOC's level-one engineers.

In the first use case, we showcase how to enrich an incoming alert by prompting an AI chatbot. Through this enrichment, teams can gain better insights into an alert’s meaning and the technologies involved. This integration requires minimal effort but can greatly benefit NOC engineers by enhancing their comprehension of the alert domain and supplementing information about the issue. As illustrated below, from DX NetOps, users can directly query the AI chatbot about the meaning of a particular alert.

Contextual menu to query ChatGPT

In this scenario, ChatGPT from OpenAI offers contextual information regarding the significance of the alarm and may suggest potential causes to assist NOC operators in troubleshooting more effectively.

ChatGPT response enriching a DX NetOps alert

Leveraging GenAI for simple troubleshooting tasks makes perfect sense. Unlike static repositories of information, such as traditional knowledge databases, AI chatbots harness advanced techniques to understand natural language queries, adapt over time, and provide contextually relevant responses. With their ability to learn from user interactions, these technologies are invaluable in helping NOC teams  navigate the complexity of modern networks and optimize their mean time to repair (MTTR).

General ISP/SaaS outage validation

In January 2024, Microsoft encountered an outage with its SaaS-based Teams offering. Within two hours of this outage, more than 14,000 incidents were reported across the internet.  Imagine the potential power of harnessing such collective feedback, based on global user experience events. Akin to a "whisper from the crowd," this intelligence equips NOC teams with invaluable information.

The second use case illustrates how GenAI can empower NOC teams contending with quality issues with a particular business service. In this scenario, the NOC receives an alert from AppNeta that indicates the user experience has been degraded. The Google Gemini AI chatbot is then queried about any "known" outage that could match the alert.

Contextual menu to query Gemini

Gemini’s ability to leverage Google’s search capabilities makes it a perfect tool for determining third-party providers' accountability for a service outage. When searching the internet, Gemini doesn’t produce results like those displayed on a Google search page. Instead, it summarizes them, as shown in the screen shot below.

A query to Gemini issues a comprehensive response about known service outages

Such an integration is highly valuable. This can condense a NOC investigation that could take several minutes, and deliver findings in just a few seconds. This can drastically reduce mean time to innocence (MTTI) and enhance overall NOC response and efficiency. Understanding that the root cause lies beyond the NOC's sphere of responsibility helps teams shift away from time-consuming “war room” meetings focused on resolution and instead concentrate efforts on mitigation.

Drawing it all together

A significant portion of IT organizations are exploring the potential application of GenAI, signaling its growing importance. GenAI can provide contextual insights and become a transformative tool for network operations. By using AI chatbots to enrich alerts with contextual information, the NOC can streamline troubleshooting processes, reduce mean time to repair, and enhance overall operational efficiency. As businesses continue to navigate the complexities of modern networks, integrating GenAI into network operations represents a key enabler to transcend traditional approaches, fostering improved agility and resilience.

To learn more about increasing network visibility and enhancing NOC efficiency, visit our dedicated Network Observability page.