Broadcom Software Academy Blog

Monitoring vCenter with AIOps and Observability from Broadcom

Written by Jörg Mertin | May 20, 2024 5:50:55 PM
Key Takeaways
  • Implement AIOps solutions to enhance monitoring of vCenter, improving system reliability and performance.
  • Utilize observability tools to gain deeper insights into infrastructure, enabling proactive issue resolution and optimization.
  • Analyze performance data regularly to identify trends, allowing for informed decisions and efficient resource allocation.

DX Application Performance Monitoring (DX APM) provides powerful capabilities for monitoring the health and performance of your vCenter infrastructure. In addition to capturing and analyzing important monitoring data, the solution will correlate vCenter performance metrics with metrics of other applications monitored by DX APM. This correlation helps teams quickly determine whether the source of a performance issue is specific to vCenter or, if the team should investigate other areas to determine root cause. When used with DX Operational Intelligence—another AIOps and Observability solution from Broadcom—correlations can be traced with infrastructure, network, and other areas of the IT stack that vCenter depends upon. This speeds triage, simplifies cross-silo collaboration and reduces mean time to repair.

Using the vCenter Monitoring extension with DX APM, teams can monitor the following metrics in the vCenter environment:

vCenter

Datacenter

Cluster

Datastore

Resource Pools

Virtual and Physical NICs

Virtual Switches

Disks

Sensors

ESX

Virtual Machines

 

You can view metrics in the Application Team Center (ATC) of DX APM.

Deploying APM Infrastructure Agent with the vCenter extension

vCenter Monitoring is an extension to APM Infrastructure Monitor. It connects to the vCenter API to collect all related metrics so users can view the health of the configured vCenter cluster.

APM IA—what is it?

The APM Infrastructure Agent (APMIA) serves as a logical connector between monitoring data that is collected through the vCenter Extension and the DX SaaS backend, which is where the data is stored, normalized, correlated, and analyzed prior to their presentation in the dashboard. APMIA can be deployed on many types of operating systems that support Java. You can find a list of supported Java versions here.

Note: The backend can also be installed on-premises.

Data from other extensions are stored in the DX SaaS backend. For information on other pre-built extensions that feed APMIA, view technical documentation here. Tailored extensions can also be written to support additional monitoring needs. The extensions loaded determine the type and amount of data collected. The detailed steps below pertain to vCenter monitoring but are transferable to setting up other extensions as well. From here, alerts can be defined and dashboards can be used to share metrics with teams across IT.

Getting started: APMIA vCenter deployment

The Agent downloader, which helps to pre-configure access to the vCenter API, automatically configures APMIA to connect to the DX SaaS backend.

When configuring the vCenter cluster monitor, refer to these details:

  • vcenter collector hostname Port of vCenter , default value is 443
  • vcenter.collector.port Port of vCenter , default value is 443
  • vcenter.collector.user Username to login into vCenter environment
  • vcenter.collector.password Password to login into vCenter environment

Note: Make sure the user has at least read-only access to vCenter.

Go to your DX APM SaaS tenant.  
Select: "Agents" -> "Download Agent" -> "Infrastructure Agent".
Add the "vCenter" extension, and configure it with the previously collected details.

Download the agent, and deploy it on the designated server.

architect@calypso:/opt/prod$ tar xf Infrastructure_Agent_apmia_20240315_v1.tar
architect@calypso:/opt/prod$ cd apmia/
architect@calypso:/opt/prod$ ls -l
total 696
-rwxr-xr-x 1 architect architect  63281 Mar 15 15:03 APMIACtrl.sh
drwxrwxr-x 2 architect architect   4096 Mar 15 15:59 bin
drwxrwxr-x 2 architect architect   4096 Mar 15 15:59 conf
drwxrwxr-x 4 architect architect   4096 Mar 15 15:59 core
drwxr-xr-x 3 architect architect   4096 Mar 15 15:03 extensions
-rw-r--r-- 1 architect architect   1213 Mar 15 15:03 installInstructions.md
drwxrwxr-x 6 architect architect   4096 Mar 15 15:59 jre
drwxr-xr-x 2 architect architect   4096 Mar 15 15:03 lib
drwxr-xr-x 2 architect architect   4096 Mar 15 15:03 logs
-rw-r--r-- 1 architect architect   1700 Mar 15 15:03 manifest.txt
-rw-rw-r-- 1 architect architect 609352 Mar 15 15:03 wrapper

The file installInstructions.md will provide instructions to quickly install the APMIA. Here is one example:

Follow these steps to install the Infrastructure Agent:

1. Untar the downloaded tarball: `tar -xvf Infrastructure_Agent_apmia_20240315_v1.tar`

2. Navigate to the `apmia` directory of Infrastructure Agent.

3. Run `./APMIACtrl.sh install` to install the Infrastructure Agent. 

4. If you make any changes in configuration later, run `./APMIACtrl.sh restart` to restart the agent for the configuration change to take effect.

The install instructions and configuration instructions for extensions can also be found in the `installInstructions.md` file in the downloaded archive.
---
# Installing the VCenter Monitoring Extension 

1.Navigate to the `apmia/extensions/VCenterExtension` directory and open the `bundle.properties` file in a text editor and configure the vcenter details if not already configured.
2.Restart the CA APM Infrastructure Agent in order to reflect the changes made in bundle.properties file.

Note:Make sure the user has atleast read only access to vcenter.
---
http-collector configuration:

Configure the http-collector details and other properties
in `apmia/extensions/http-collector/bundle.properties`. Please refer to '**http-collector**' under DX APM documentation for more details.

Usually the files remain in place, and the installer will simply "register" the agent as a system service so it will be restarted after a reboot.

For testing however, consider starting it as user by following this example:
architect@calypso:/opt/prod/apmia$ ./APMIACtrl.sh console_start
APM Infrastructure Agent Console Start in Progress...
Running APM Infrastructure Agent in console_start mode... (To stop console_start use ctrl+c)

Deployment errors will be shown at the CLI; application errors will be captured in logs.
Go to another terminal and review the log files in the /opt/prod/apmia/logs directory.
After confirming no errors exist, register the APMIA as a system service as follows:

architect@calypso:/opt/prod/apmia$ sudo ./APMIACtrl.sh install
APM Infrastructure Agent Installation In Progress...
Created symlink /etc/systemd/system/multi-user.target.wants/apmia.service → /etc/systemd/system/apmia.service.
APM Infrastructure Agent Installation Completed.

Note that the installer requires superuser rights to register the APMIA as a system service.
To confirm access privileges, follow this example:

architect@calypso:/opt/prod/apmia$ sudo ./APMIACtrl.sh status
APM Infrastructure Agent (installed with systemd) is running: PID:635451, Wrapper:STARTED, Java:STARTED

You can check the system services to see if APMIA has been successfully registered. Refer to these steps:

architect@k8s-jm-repo:~/prod/apmia$ systemctl status apmia
● apmia.service - APM Infrastructure Agent
     Loaded: loaded (/etc/systemd/system/apmia.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-03-15 15:29:47 UTC; 2min 30s ago
    Process: 635380 ExecStart=/opt/prod/apmia/bin/APMIAgent.sh start sysd (code=exited, status=0/SUCCESS)
   Main PID: 635451 (wrapper)
      Tasks: 145 (limit: 4575)
     Memory: 689.3M
        CPU: 32.445s
     CGroup: /system.slice/apmia.service
             ├─635451 /opt/prod/apmia/bin/../wrapper /opt/prod/apmia/bin/../conf/wrapper.conf wrapper.syslog.ident=apmia wrapper.pidf...
             ├─635471 /opt/prod/apmia/jre/bin/java -Xms256m -Xmx512m -Djava.library.path=lib -classpath lib/Agent.jar:lib/CollectorAg...
             └─635533 /opt/prod/apmia/jre/bin/java -Xmx64m -Djdk.http.auth.tunneling.disabledSchemes= -Dlogging.config=classpath:conf...

Mar 15 15:29:43 k8s-jm-repo systemd[1]: Starting APM Infrastructure Agent...
Mar 15 15:29:43 k8s-jm-repo APMIAgent.sh[635380]: Starting APM Infrastructure Agent...
Mar 15 15:29:47 k8s-jm-repo APMIAgent.sh[635380]: Waiting for APM Infrastructure Agent......
Mar 15 15:29:47 k8s-jm-repo APMIAgent.sh[635380]: running: PID:635451
Mar 15 15:29:47 k8s-jm-repo systemd[1]: Started APM Infrastructure Agent.

Using metrics provided by APMIA and vCenter Extension

Check metric tree

Within DX APM, access the Metric Tree to browse existing metrics.

Note: Because the agent will recursively query all existing metrics, the volume of information displayed in the Metric Tree can grow large.

Host details

A partial map view

If no filtering is applied, the map can grow large quickly. Be sure to set some filters when viewing.

DX Dashboards

All existing metrics are available and can be used to populate DX Dashboards—the Grafana-based backend used by the solution. For example, you may want to create dashboards for the vCenter administrator to show a general overview of the cluster with a general view on Data centers/Clusters/ESX Server, or to show details on CPU, memory, and storage as the associated trends.

Conclusion

Data collected by DX APM, other Broadcom technologies, and by supported third-party products is sent to the repository for AIOps and Observability solutions from Broadcom, where they are normalized, correlated, and processed by our analytic engines. Over time—with 30 days of data, for example—analysis accrues and anomaly detection becomes more refined and increasingly accurate. Insights from the data are then available in DX Dashboards where teams can view usage predictions (memory, CPU, and storage), comparisons to budgets, and other metrics. Additionally, DX Operational Intelligence can generate notifications to alert teams when defined thresholds are breached and can display SLI/SLO information for digital services.

By combining regular APM monitoring reporting with this wider set of insights and analytics from AIOps and Observability from Broadcom, IT teams—across IT domains—can see the impact of oversubscription or an incorrectly sized deployment to an application deployed in the ESX Cluster.