The use of statistics, advanced algorithms and AI/Ml is becoming omnipresent. The benefits are visible in every walk of life, from web searches, to movie and retail recommendations, to auto-completing our emails. Of course, not many anticipated the dramatic entrance of generative AI in the form of ChatGPT for writing college essays and poetry on arcane topics.
The benefits of these technologies are all around us, although less obvious, in applications such as cost optimization in manufacturing, real-time safety and fuel economy adjustments in automobiles, and life-saving solutions in healthcare. While it may take years for Smart-Quant to mature enough to meet our science fiction-level expectations, the growing investment and progress around us in these areas is undeniable.
To make progress, these models need data…lots of data.
Few parts of society generate and can capture more data than IT Operations. So, the use of the above techniques for solving IT Operations challenges is not surprising. With strong monitoring tools and increasingly more things to monitor, the volume of data proceeds unabated.
Feeding the AIOps machine are varied data sources. To generate meaningful insights, AIOps benefits from a variety of data from direct or synthetic monitoring of applications, infrastructures, networks, and user experiences.
AIOps also needs to make sense of the data.
Powerful normalization and correlation (using statistical models and knowledge of IT assets and service architectures) is of course needed to structure the data for clean analysis.
Most IT Operators love service maps. In order to build these service maps, AIOps must derive or ingest inventory and relationships from multiple sources. Application topologies can be extracted from cross-transaction traces, while network topologies can be established from device logs and connectivity tests between different devices.
AIOps needs to normalize and enrich this data with additional attributes from third party sources and persist in a uniform graph data model. Two elements typically comprise a graph model:
- Vertices/Nodes. Vertices are the entities of the model and can hold any number of attributes or key-value pairs that help describe the entities.
- Edges. Also known as dependencies or relationships, edges provide the relevant connections between two vertices. A relationship always has a direction, a start node, and an end node. Although they can be directed, relationships can be navigated in either direction
ML algorithms can easily query such a graph model and use it as a primary dimension for all analyses. For example, for a typical use case like performance problem isolation, the ML algorithm can easily identify the hosts and application components involved in a business transaction and use this to set the data scope for analyzing relevant performance metrics and alarms.
The overabundance of wonderful data will cause recursive progress for AI/ML and advanced algorithms.
- More data will result in greater need for AI/ML.
- More AI/ML will motivate us to capture, clean and analyze more data.
Rinse and repeat.
Given the limits of humans, we turn to automation.
For AIOps, here are a few areas that beg to be automated:
- Auto-discovery of changes to entities added to or removed from the IT environment. What is new monitoring data available to the AIOps machine? What information is no longer available or relevant?
- Auto-correlation of data. Which entities are associated with the business service I’m responsible for? (See my blog, “IT Operations in 2023: Business Services Become a Viable Organizing Principle”)
- Auto-ticketing. Which teams or individuals should be notified when certain performance thresholds are breached, or better, when certain thresholds may be breached in the future?
- Auto-remediation. Automatically remediate frequently occurring issues in the production environment. This significantly reduces unplanned downtime and the MTTR.
AIOps without automation is a non-starter.
Automation within AIOps will march ahead in 2023, perhaps in smaller incremental steps as IT practitioners test, validate and gradually trust the automation, while relinquishing some level of control in favor of greater productivity in other aspects of their job.
Practitioners will welcome and adopt automation when they have substantial oversight and control of it.
Despite the emergence of AIOps, the IT Operations community as a whole remains cautious. Early adopters who sought first-mover advantage or who had greater tolerance for risk have achieved measurable success, tuning expectations, solution requirements, and adoption plans as they learn on-the-fly.
Other, early- or mid-majority type buyers approached AIOps with more limited expectations and narrowly scoped adoption plans. By constraining AIOps adoption to a single business application team or geography, they could limit risk and isolate other parts of their organization from the chaotic learning associated with emergent, transformative technologies.
Predictability, consistency, and explainability is vital for IT Operations. Equally vital is mining the treasure trove of monitoring data available to them, and automating repetitive, error-prone tasks.
This is why AIOps as a technology segment and transformative approach to IT Operations will “cross the chasm” in 2023. It promises to be a watershed year:
- Technical AIOps and observability solutions have improved dramatically.
- There is greater appreciation of the transformative power of AIOps, both technically and organizationally.
- Expectations for and understanding of AI/ML and advanced algorithms for enterprise-scale AIOps have transitioned from hype to sanity. Consider the advent of ChatGPT and how it has opened multiple doors in natural language and conversational analytics.
- Pressure to work more efficiently in IT Operations has reached a tipping point (again!)
The paradigm in IT Operations is shifting. All indicators are pointing to AIOps with powerful AI/ML and automation
For IT Operations, the most recent phase of applying AI/ML to large datasets and combining automated analytics and actions began about five years ago. This prompted Gartner to coin the term “AIOps” to encapsulate artificial intelligence/machine learning (and advanced algorithms), data analysis and automation for IT Operations.
AIOps and Observability from Broadcom with Service Observability will help you streamline IT Operations, achieve business goals, and provide cross-functional visibility like never before.
New blogs and additional resources on this and related topics can be found on the AIOps blog at Broadcom Software Academy.