March 29, 2021
In-House Innovation and Open Source for a Flexible AIOps Platform
Written by: Sudip Datta
A critical question for technology companies of today is how can we build applications and solutions that won’t become obsolete in the next few years?
Innovation is no longer just about bringing something new to market, but building it in a way that is scalable, flexible, and has the capacity to adapt as new technology emerges. At Broadcom, we build AIOps products that are adaptable and resilient enough to not only survive change but to continue to deliver high value across change.
How do we do this? How can companies today ensure their products can evolve?
Proprietary vs. Open Source Divide
It’s a common perception that commercial software is proprietary and intended to serve only a narrow, specific need and that open source leverages the collective power of the community and serves only them in return. Many commercial software companies are predisposed to believe that to create proprietary solutions, they need to build them from scratch while the open-source community has been building together since inception. This has led to an apparent divide between the two worlds with certain parties being dogmatically religious about their positions.
Valuing Open Source and In-House Equally
At Broadcom, we’ve been marrying in-house innovation with the power of the open-source community, and this blended approach has enabled us to bring state-of-the-art AIOps solutions to the market quickly. One reason this approach has been successful for us is that we value both methods of innovation equally.
As a commercial software company, we continually look at how we can derive the most creativity and innovation from the best talent we can find. By including the open-source community in our definition of the talent pool, we’re able to access a much broader group of developers. Of course, our in-house developers also spend time contributing back to the open-source community, and we have close partnerships with many open-source projects.
We built a robust, enterprise-grade platform that can support internet-scale workloads using both structured and unstructured data. We framed our design goals upfront with the objective of achieving 24x7 observability of our enterprise customers’ mission-critical applications.
And, from the very onset, we emphasized the volume, velocity, variety, and veracity of data. Broadcom (erstwhile CA) is one of the few, if not the only, software providers out there which caters to the APM, Infrastructure, and NetOps markets. Our APM customers typically have a large number of metrics frequently polled while our NetOps customers need to analyze terabytes of flow data coming from thousands of devices. Our observation has been that these disparate worlds are coming together as customers perform triaging and root cause analysis involving applications, infrastructure, and networks and that there is a need for a converged, unified platform that can cater to these needs.
Best of Breed Open Source Components
We built the AIOps platform with best of breed open source components. A major criterion for choosing these components was ubiquity – the more eyes that are on a project, the better.
For an unstructured datastore, therefore, we decided to go with Elastic and Logstash, which has widespread adoption within AIOps and non-AIOps applications. For structured metric key-value store and topology data, we relied on RocksDB which has active support from Facebook.
Additionally, we adopted the supplementary components: Kafka for data pipeline, Apache Spark for jobs, Grafana for dashboarding, etc., not to mention a host of other libraries. Last, but not least, as a part of our SaaS-first strategy, we have standardized on open source Kubernetes as our deployment platform. This has greatly simplified the lifecycle management (i.e. install, upgrade, patching) of our solution, thereby enabling customers to realize value faster.
How Broadcom Builds Upon Open Source
What differentiates us then? It’s what we do with the open source we use and how we build on top of it. This is where broader, deeper innovation happens and how we can build products that have the flexibility to adapt in the future.
An example of this is discovery and dependency mapping, which is very challenging given the dynamic and temporal nature of the container and cloud world. Similarly, software-defined networking has brought unprecedented challenges in the networking world. Things move all the time, get associated, and disassociated, which makes real-time discovery difficult.
A lot of tools have a fragmented view because they only cover one piece and to understand the full context, you need the full picture:
- Where exactly is the problem – in the app or the network or both?
- How far back in time did the problem occur – did the problem start last week or yesterday?
Our ontological modeling capability tackles these questions by giving a 3D perspective and layers in time.
Our AIOps solution makes extensive use of topology to facilitate alarm noise reduction and root cause analysis in multi-tiered applications. We implemented a journaled graph database not only to capture the ontological aspects of the topology but also to preserve the history of the state changes. We’ve developed the industry’s most comprehensive root cause analysis capability that leverages text (NLP) and temporality-driven clustering and learning methods like Random Forest.
Broadcom has been granted numerous patents for our AI and Machine Learning work in metrics, topology, and logs, some of which are listed below:
- Feedback and Customization in Expert System for Anomaly Prediction: Patent 10,474,954
- Multivariate Path-Based Anomaly Detection: Patent 10,628,289
- Cross-Organizational Data Sharing w/ Anonymization Filters: Patent 10,614,248
- Event-Based Service Discovery & Root Cause Analysis: Patent 10,616,044
- Domain Transversal Based Transaction Contextualization of Event Information: Patent 10,372,482
- Ordered Correction of Application Based on Dependency Topology: Patent 10,225,272
- Page Journey Determination from Web Event Journals: Patent 10,831,809
Innovating Open Source
An age-old apprehension about open source has been security. When we first embraced Elastic a few years back, it did not have the multi-tenancy support needed to deliver as SaaS (Elastic has supported multi-tenancy since). We enhanced the model to incorporate multi-tenancy and further enhanced it to support role-based access (RBAC) for multiple personas.
Another example of how we enhanced the capabilities of open source is our use of Grafana. Grafana has been pervasive among our customers as a reporting and dashboarding solution and provides powerful capabilities. However, it lacked the plugin support for some of the best-of-breed data sources I mentioned above. We built the plugins and also powerful aggregation capabilities for those data sources.
Figure 1: In AIOps from Broadcom, a Grafana-based dashboard aggregates data sources across applications, networks, and mainframe.
We believe that, in leveraging the best of both patent and open source, we are able to innovate faster. There are more hands and eyes on the product so it gets continuously enhanced. That helps us develop on-time, in an agile manner, and save resources.
We use domain expertise to solve harder problems that challenge our customers, yet we don’t reinvent the wheel. We let the database specialists, for example, specialize, and we rely on their expertise. This approach helps us improve time to market by 3x and makes our platform robust, scalable, and secure. Our products are stronger and more flexible for it, and our customers experience more value.
Learn more about Broadcom's AIOps solution at the Enterprise Software Academy.
Tag(s):
AIOps
Sudip Datta
Sudip is an accomplished, growth-minded technology executive with 25 years of experience managing large business portfolios and delivering market-leading products and services. Sudip currently heads the AIOps, Observability, and Automation business at Broadcom Inc.
Other posts you might be interested in
Explore the Catalog
Blog
December 13, 2024
Full-Stack Observability with OpenTelemetry and DX Operational Observability
Read More
Blog
December 6, 2024
Power Up Your Alarms! Enriched UIM Alarms for Added Intelligence
Read More
Blog
November 26, 2024
Topology: Services for Business Observability
Read More
Blog
November 22, 2024
Regular Expressions That I Use Regularly
Read More
Blog
November 22, 2024
Cloud Application Performance: Common Reasons for Slow-Downs
Read More
Blog
November 4, 2024
Unlocking the Power of UIMAPI: Automating Probe Configuration
Read More
Blog
October 4, 2024
Capturing a Complete Topology for AIOps
Read More
Blog
October 4, 2024
Fantastic Universes and How to Use Them
Read More
Blog
September 26, 2024