July 7, 2022

Expert Series: Broadcom IT Shares Their View on the Difference Between Monitoring and Monitoring Correctly

This post is part of a series featuring customers, partners, and experienced DX Unified Infrastructure Management (DX UIM) practitioners. We’ve asked these expert users to share their knowledge with the broader DX UIM community.

Today, we’re featuring Kathy Solomon, the Unix Systems Administrator for the R&D Support organization within Broadcom IT. Kathy is responsible for monitoring all R&D devices and met with us to discuss the challenges of the job, how she uses DX UIM, and share lessons learned.

What are the challenges you face in your job and as an organization?

We have different sets of servers. For some sets of servers, we need to be alerted right away when there is a problem, and for other sets, we just need to be able to check status periodically. We only want tickets to be created for the servers that require prompt action. For the less critical servers, we want to be able to look at status and see issues when we have time.

One of the challenges is the sheer scale of servers - my team manages over 10,000 servers, with about 10% needing prompt action.

Another challenge is sharing the monitoring environment with other tracks within IT. Each team has its own SLA and therefore needs its devices monitored in a particular way; at first, those requirements may appear incompatible.

Which DX UIM capabilities are your favorites? Which capabilities provide you with the biggest benefits?

Groups and user tags are our favorites because they allow us to scale. We use groups to consistently apply monitoring profiles and alarm policies to similar servers. The tags drive group membership.

We also love the ability to have API calls to the database to onboard devices. We use API calls to produce dashboards showing which devices are being monitored and which have valid profile deployments. Like groups, this feature allows us to manage at a larger scale.

How do you provide product feedback to the DX UIM product team?

We talk to the DX UIM support team and my assigned SE.

What tips would you like to share with customers?

One thing that is key in planning a DX UIM implementation is organizing servers into groups that need the same monitoring profiles. Then, you should define profiles for each group so that your devices are consistently monitored and you get predictable results. You can then make changes at the group level for efficiency.

We do all that through Monitoring Configuration Service (MCS). If each server needed to be managed at an individual level, it would not be possible. Once you define the profile and alarm policies upfront, you save yourself a lot of time down the road.

To automate group membership, leverage user tags. If needed, you can set one user tag to specify the track to which the server belongs. We pull information from the CMDB and populate another user tag. Dynamic grouping is driven by these user tags, and each server is automatically placed into groups based upon its function, domain, and physical location; this process reduces human error and drives consistency in how devices are monitored because monitoring profiles are deployed automatically based upon group membership. There is a big difference between monitoring and monitoring correctly.

You can also define additional groups that don’t have associated monitoring profiles for status and maintenance. For example, we can prevent ticketing for a group of 3000 devices during a planned maintenance window, and then as the end of the window nears we can assess to see which servers need a little more prodding prior to handing them back to our customers.

While DX UIM supports agentless monitoring, I highly recommend using agent-based. Agent-based monitoring gives you deeper insight and tighter control of the systems and configurations you are monitoring. You should include the DX UIM agent as part of the standard build and configure the robot configuration file as part of that standard build. Then you are two steps ahead of where you would have been, and it deploys monitoring profiles for you. All that is left is validating that everything is working. You can monitor remotely without an agent in special cases such as for legacy OS versions.

Are there other best practices you would like to share with other DX UIM users?

As your monitoring environment matures, be prepared to tweak your monitoring configuration, always at the group level. For example, you may find that you need to adjust thresholds or add additional file system types to your filter for disk monitoring. You might even find that there’s a key difference between devices within one of your groups that requires different thresholds; when this happens, create a new group so you can consistently apply profiles.

Make the most of groups for identifying error conditions and problem areas.

Tag(s): AIOps , DX UIM

Jennifer Liharik

Jennifer is a senior product marketing manager for Automation solutions from Broadcom Software and enjoys helping customers gain business value from today's complex technology. Jennifer has worked as a product marketer, process improvement consultant, and strategic advisor in the B2B software, life sciences, retail,...

Other resources you might be interested in

Blog October 30, 2025

This Halloween, the Scariest Monsters Are in Your Network

See how network observability can help you identify and tame the zombies, vampires, and werewolves lurking in your network infrastructure.

Read Blog

Blog October 29, 2025

Your Root Cause Analysis is Flawed by Design

Discover the critical flaw in your troubleshooting approaches. Employ network observability to extend your visibility across the entire service delivery path.

Read Blog

Blog October 29, 2025

Whose Fault Is It When the Cloud Fails? Does It Matter?

In today's interconnected environments, it is vital to gain visibility into networks you don't own, including internet and cloud provider infrastructures.

Read Blog

Blog October 29, 2025

The Future of Network Configuration Management is Unified, Not Uncertain

Read this post and discover how Broadcom is breathing new life into the trusted Voyence NCM, making it a core part of its unified observability platform.

Read Blog

Office Hours October 23, 2025

Rally Office Hours: October 9, 2025

Discover Rally's new AI-powered Team Health Widget for flow metrics and drill-downs on feature charts. Plus, get updates on WIP limits and future enhancements.

View Recording

Course October 23, 2025

AAI - Navigating the Interface and Refining Data Views

This course introduces you to AAI’s interface and shows you how to navigate efficiently, work with tables, and refine large datasets using search and filter tools.

Go to Training

Office Hours October 23, 2025

Rally Office Hours: October 16, 2025

Rally's new AI-driven feature automates artifact breakdown - transforming features into stories or stories into tasks - saving time and ensuring consistency.

View Recording

Blog October 22, 2025

What’s New in Network Observability for Fall 2025

Discover how the Fall 2025 release of Network Observability by Broadcom introduces powerful new capabilities, elevating your insights and automation.

Read Blog

eBook October 22, 2025

Modernizing Monitoring in a Converged IT-OT Landscape

The energy sector is shifting, driven by rapid grid modernization and the convergence of IT and OT networks. Traditional monitoring tools fall short.

Read eBook