<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1110556&amp;fmt=gif">
Skip to content
    July 7, 2022

    Expert Series: Broadcom IT Shares Their View on the Difference Between Monitoring and Monitoring Correctly

    This post is part of a series featuring customers, partners, and experienced DX Unified Infrastructure Management (DX UIM) practitioners. We’ve asked these expert users to share their knowledge with the broader DX UIM community.

    Today, we’re featuring Kathy Solomon, the Unix Systems Administrator for the R&D Support organization within Broadcom IT. Kathy is responsible for monitoring all R&D devices and met with us to discuss the challenges of the job, how she uses DX UIM, and share lessons learned.

    What are the challenges you face in your job and as an organization?

    We have different sets of servers. For some sets of servers, we need to be alerted right away when there is a problem, and for other sets, we just need to be able to check status periodically. We only want tickets to be created for the servers that require prompt action. For the less critical servers, we want to be able to look at status and see issues when we have time.

    One of the challenges is the sheer scale of servers - my team manages over 10,000 servers, with about 10% needing prompt action.

    Another challenge is sharing the monitoring environment with other tracks within IT. Each team has its own SLA and therefore needs its devices monitored in a particular way; at first, those requirements may appear incompatible.

    Which DX UIM capabilities are your favorites? Which capabilities provide you with the biggest benefits?

    Groups and user tags are our favorites because they allow us to scale. We use groups to consistently apply monitoring profiles and alarm policies to similar servers. The tags drive group membership.

    We also love the ability to have API calls to the database to onboard devices. We use API calls to produce dashboards showing which devices are being monitored and which have valid profile deployments. Like groups, this feature allows us to manage at a larger scale.

    How do you provide product feedback to the DX UIM product team?

    We talk to the DX UIM support team and my assigned SE.

    What tips would you like to share with customers?

    One thing that is key in planning a DX UIM implementation is organizing servers into groups that need the same monitoring profiles. Then, you should define profiles for each group so that your devices are consistently monitored and you get predictable results. You can then make changes at the group level for efficiency.

    We do all that through Monitoring Configuration Service (MCS). If each server needed to be managed at an individual level, it would not be possible. Once you define the profile and alarm policies upfront, you save yourself a lot of time down the road.

    To automate group membership, leverage user tags. If needed, you can set one user tag to specify the track to which the server belongs. We pull information from the CMDB and populate another user tag. Dynamic grouping is driven by these user tags, and each server is automatically placed into groups based upon its function, domain, and physical location; this process reduces human error and drives consistency in how devices are monitored because monitoring profiles are deployed automatically based upon group membership. There is a big difference between monitoring and monitoring correctly.

    You can also define additional groups that don’t have associated monitoring profiles for status and maintenance. For example, we can prevent ticketing for a group of 3000 devices during a planned maintenance window, and then as the end of the window nears we can assess to see which servers need a little more prodding prior to handing them back to our customers.

    While DX UIM supports agentless monitoring, I highly recommend using agent-based. Agent-based monitoring gives you deeper insight and tighter control of the systems and configurations you are monitoring. You should include the DX UIM agent as part of the standard build and configure the robot configuration file as part of that standard build. Then you are two steps ahead of where you would have been, and it deploys monitoring profiles for you. All that is left is validating that everything is working. You can monitor remotely without an agent in special cases such as for legacy OS versions.

    Are there other best practices you would like to share with other DX UIM users?

    As your monitoring environment matures, be prepared to tweak your monitoring configuration, always at the group level. For example, you may find that you need to adjust thresholds or add additional file system types to your filter for disk monitoring. You might even find that there’s a key difference between devices within one of your groups that requires different thresholds; when this happens, create a new group so you can consistently apply profiles.

    Make the most of groups for identifying error conditions and problem areas.

    Tag(s): AIOps , DX UIM

    Jennifer Liharik

    Jennifer is a senior product marketing manager for Automation solutions from Broadcom Software and enjoys helping customers gain business value from today's complex technology. Jennifer has worked as a product marketer, process improvement consultant, and strategic advisor in the B2B software, life sciences, retail,...

    Other posts you might be interested in

    Explore the Catalog
    icon
    Blog November 4, 2024

    Unlocking the Power of UIMAPI: Automating Probe Configuration

    Read More
    icon
    Blog October 4, 2024

    Capturing a Complete Topology for AIOps

    Read More
    icon
    Blog October 4, 2024

    Fantastic Universes and How to Use Them

    Read More
    icon
    Blog September 26, 2024

    DX App Synthetic Monitor (ASM): Introducing Synthetic Operator for Kubernetes

    Read More
    icon
    Blog September 16, 2024

    Streamline Your Maintenance Modes: Automate DX UIM with UIMAPI

    Read More
    icon
    Blog September 16, 2024

    Introducing The eBPF Agent: A New, No-Code Approach for Cloud-Native Observability

    Read More
    icon
    Blog September 6, 2024

    CrowdStrike: Are Regulations Failing to Ensure Continuity of Essential Services?

    Read More
    icon
    Blog August 28, 2024

    Monitoring the Monitor: Achieving High Availability in DX Unified Infrastructure Management

    Read More
    icon
    Blog August 27, 2024

    Topology for Incident Causation and Machine Learning within AIOps

    Read More