August 24, 2021

An Introduction to Anomaly Detection

In early 1900, Sakashi Toyoda invented a loom that automatically stops when the thread breaks, limiting the need for someone to watch the machine constantly. This approach was later named “Jidoka” and became one of the two pillars of the TPS (Toyota Production System) with just-in-time production representing the second pillar.

With modern manufacturing, Jidoka involves self-monitoring devices that automatically detect anomalies and safely stop, allowing technicians to inspect and make adjustments as early as possible. Ultimately, Jidoka limits waste and enhances both quality and efficiency.

In many ways, managing today’s digital processes is not so different from what we are used to with traditional manufacturing. As in manufacturing, unnoticed anomalies can lead to defects, performance glitches, or downtime, which can have significant consequences on business operations. When it comes to detecting anomalies within digital environments, what makes things even more complicated is the growing volume, speed, and complexity of IT infrastructures.

What is an anomaly?

If you look at a common definition of an anomaly, it is about something different, abnormal, peculiar, or not easily classified—a kind of deviation from the norm. In the context of IT operations, where it is relatively easy to measure most aspects of operational performance, an anomaly can be seen as an undesirable change within data patterns that represents a departure from business as usual.

Within IT operations, there are all sorts of anomalies. It can be helpful to classify anomalies into two main categories. The first category is known anomalies. A good example of these can be a CPU spike due to end-of-month computations or a peak in website traffic generated by a marketing campaign. Unknown anomalies are another kind of animal, and because they are not well understood, it is the most exciting category. Unknown anomalies include phenomena such as a sudden drop of application activity, or an unexpected system condition that results from a complex confluence of events.

Traditional rules-based systems are effective at detecting recurring patterns of data that signify a known anomaly. Still, they require considerable effort to configure thresholds, and they can also lead to large amounts of false positives. That’s where analytics and machine learning come in. Detecting unknown anomalies requires dynamic baselining. In this way, you can determine normal activity under given circumstances, and then detect behaviors that do not align with the dynamic baseline.

What is anomaly detection?

Anomaly detection is the process of identifying the events that represent a deviation from the normal behavior of the dataset, based on historical trends. Events identified as anomalous can point to a critical incident, like a glitch in hardware, an intrusion attack on a system, or unprecedented system usage. Today, with the help of machine learning algorithms, it is possible to achieve continuous dynamic baselining, which enables the identification of anomalous events, without human intervention.

Operations teams can leverage anomaly detection for a range of use cases, including the following:

Determining the anomalous consumption of resources, such as CPU, memory, network, or storage across the monitored landscape.
Identifying the workloads that see a sudden spike in usage and that need team attention to ensure continued availability and performance.
Pinpointing an anomalous increase in the response times of critical microservice workloads.

Anomaly Detection for IT Operations: The Challenges

Monitoring applications, infrastructure elements, and networks is a baseline requirement for any enterprise-grade operations team. It is how teams can get a peek into the condition of the digital infrastructure and the workload deployed on it. Through the monitoring of the digital landscape, enterprises end up collecting massive volumes of data in the form of metrics and logs.

While vast amounts of granular data can provide a wealth of information, the volume, variety, and velocity of data generated makes it impractical for meaningful human consumption. Manually sifting through this huge volume of data and determining exactly which events are anomalous can be like finding a needle in a haystack.

How DX Operational Intelligence Helps

DX Operational Intelligence delivers anomaly detection based on machine learning algorithms that consume metrics collected across applications, infrastructure, and network. These metrics can be sourced from DX Application Performance Management, DX Unified Infrastructure Manager, and DX NetOps, as well as any third-party vendor metrics ingested by RESTMon. DX Operational Intelligence offers operations teams an end-to-end solution, from identifying anomalies automatically to acting on anomaly alarms. Below are the capabilities provided:

Detecting Anomalies Based on a Group of Metrics

DX Operational Intelligence features a metric group configuration that enables teams to filter and configure metric groups based on metric source and to activate specific groups for anomaly detection.

An Introduction to Anomaly Detection – Detecting Anomalies Based on a Group of Metrics

Fine-Tuning Anomaly Alarms

As part of the metric group configuration process, teams can adjust the way alarms are related to an anomaly. Alarms can be raised while a metric is above or below the threshold that is derived dynamically by the algorithm. It is also possible to determine how many occurrences of an anomaly need to happen over a specific time frame in order to raise an alarm.

Broadcom Enterprise Software Academy - Fine Tuning Anomaly Alarms

Viewing Anomaly Data

Through the performance analytics capabilities in DX Operational Intelligence, teams can do deeper analysis of the metric exhibiting abnormal behavior. The solution provides intuitive, graphical views to explore discrete time-series values.

An Introduction to Anomaly Detection – Viewing Anomaly Data

Acting on Anomaly Alarms

DX Operational Intelligence provides an integrated, seamless interface to view and act on anomaly alarms, such as acknowledging or assigning an alarm to a colleague.

An Introduction to Anomaly Detection – Acting on Anomaly Alarms

Conclusion

Now’s the time we can apply learnings from prior generations as well as machine learning. In the same way Jikoda revolutionized the manufacturing industry more than a century ago, DX Operational Intelligence is revolutionizing IT operations. With the solution, operations teams can employ end-to-end, self-monitoring approaches. Is now the right time to modernize your monitoring approach?

To deep dive on the anomaly detection capabilities in DX Operational Intelligence, you can check this detailed tutorial.

Tag(s): AIOps , DX OI

Abhinav Shroff

Abhinav Shroff is a Product Manager for the AIOps platform from Broadcom. He has a deep understanding and expertise in cloud technologies along with more than fourteen years of experience in building and marketing software products and services. He likes to describe himself as a product enthusiast, technologist,...

Other resources you might be interested in

Blog October 30, 2025

This Halloween, the Scariest Monsters Are in Your Network

See how network observability can help you identify and tame the zombies, vampires, and werewolves lurking in your network infrastructure.

Read Blog

Blog October 29, 2025

Your Root Cause Analysis is Flawed by Design

Discover the critical flaw in your troubleshooting approaches. Employ network observability to extend your visibility across the entire service delivery path.

Read Blog

Blog October 29, 2025

Whose Fault Is It When the Cloud Fails? Does It Matter?

In today's interconnected environments, it is vital to gain visibility into networks you don't own, including internet and cloud provider infrastructures.

Read Blog

Blog October 29, 2025

The Future of Network Configuration Management is Unified, Not Uncertain

Read this post and discover how Broadcom is breathing new life into the trusted Voyence NCM, making it a core part of its unified observability platform.

Read Blog

Office Hours October 23, 2025

Rally Office Hours: October 9, 2025

Discover Rally's new AI-powered Team Health Widget for flow metrics and drill-downs on feature charts. Plus, get updates on WIP limits and future enhancements.

View Recording

Course October 23, 2025

AAI - Navigating the Interface and Refining Data Views

This course introduces you to AAI’s interface and shows you how to navigate efficiently, work with tables, and refine large datasets using search and filter tools.

Go to Training

Office Hours October 23, 2025

Rally Office Hours: October 16, 2025

Rally's new AI-driven feature automates artifact breakdown - transforming features into stories or stories into tasks - saving time and ensuring consistency.

View Recording

Blog October 22, 2025

What’s New in Network Observability for Fall 2025

Discover how the Fall 2025 release of Network Observability by Broadcom introduces powerful new capabilities, elevating your insights and automation.

Read Blog

eBook October 22, 2025

Modernizing Monitoring in a Converged IT-OT Landscape

The energy sector is shifting, driven by rapid grid modernization and the convergence of IT and OT networks. Traditional monitoring tools fall short.

Read eBook