Key Takeaways
|
|
For today’s network operations teams, gaining unified visibility is an increasingly urgent imperative. As network delivery paths continue to encompass more disparate technologies and environments, establishing this unified view grows both more critical and more difficult to achieve.
Fundamentally, the more different teams and tools that have to be involved when issues arise, the more costly and time consuming troubleshooting will tend to be. SNMP and syslog monitoring represent common, important examples of this dynamic.
Historically, these two types of intelligence have been captured and managed in different tools, which are typically managed by distinct teams. For network engineers, this has meant that they had restricted access to all the intelligence they needed, and they had to spend a lot of time manually aggregating and synchronizing data to get a complete understanding of a given incident. (For more information on the challenges posed by having disparate systems for SNMP and syslog, see this prior post.)
Ultimately, this reliance on disparate data sets has meant teams tend to contend with more alarms, yet a decreasing percentage of those alarms are actually actionable. An EMA Research Report found that, between 2020 and 2024, the percentage of alerts that were indicative of a real problem decreased, from over 42% to under 30%. To counter these trends, it is increasingly vital to gain unified visibility, including of both SNMP traps and syslog events.
DX NetOps by Broadcom is a scalable network observability solution that delivers advanced capabilities in such areas as network analytics, root cause analysis, noise reduction, and network experience insights.
One of the key advantages of the solution is the fact that it can aggregate and correlate comprehensive sets of metrics, and give teams the timely, targeted intelligence they need. With the solution, teams can make sense of their dynamic, complex, and distributed environments.
Now, the solution features integration with Elastic and Splunk. Through this integration, teams can retrieve syslog entries directly from their DX NetOps portal. The solution supports the retrieval of syslog alarm and device data, providing rich context that fuels more rapid and efficient triage.
Further, this integration features an approach that maximizes resource efficiency and compliance. While this integration enables seamless access to syslog data, the data itself remains in the source Elastic or Splunk platform.
As a result, teams don’t have to worry about the compliance implications of having syslog data stored in an additional repository, and they don’t have to accrue the extra storage costs and overhead of duplicating data sets. Only relevant entries requested are made available on demand and displayed in real-time dashboards within DX NetOps.
The integration with Elastic and Splunk is efficient and straightforward. To start, from the administration menu, users select monitored items management and then syslog configuration, which will feature tabs for Elastic and Splunk connectors.
Operators enable those connectors, then enter relevant details, such as selecting the protocol, submitting the host name for the Elastic or Splunk server, inputting the access token, and so on.
Users can also choose whether they want to have mapping done by name or query. If default names match, including time stamp, severity, facility, and so on, mapping using names is straightforward. If teams have a complex set of indices or multiple hosts or source types, it can be advantageous to save and employ queries.
The following sections offer an overview of different troubleshooting workflows teams running Elastic or Splunk can now employ in DX NetOps.
Operators start by logging into the DX NetOps portal and then going to the alarm console.
Teams can select an alarm to see a range of details, including severity, item name, IP address, probable cause, and more. DX NetOps offers a great deal of flexibility in tailoring these views. For example, administrators can:
This data can feed further analysis and be used to establish baselines. Teams can use this intelligence to do further investigation in Elastic or Splunk. If administrators have access to a device experiencing an issue, they can retrieve more granular intelligence. These capabilities provide useful intelligence for determining the root cause of issues.
Here’s an overview of how a troubleshooting scenario could unfold:
An administrator receives a message indicating that a BGP session peer went from established to idle. They may then want to try and find the root cause. They can quickly view the neighboring topology and click on log events for all the logs that have been configured for DX NetOps integration.
The administrator then sees the BGP session was terminated, with the message that a device belongs to a different autonomous system than expected. There’s a device intending to form a BGP session with a peer, but it detects that the advertised value of the peer doesn’t match the expected value. In this organization, this represents a policy issue. In this way, the administrator can quickly gain details on where the issue was and what the cause was.
Administrators can start by clicking on the inventory on monitored devices for a particular domain, then select devices. Users can enter text of the specific devices they’re interested in assessing. In this case, we’ll have an administrator do a search for a specific switch type. Once the desired item appears, the user can click to select the device and see relevant details.
Administrators can also select log events. They may decide to go back to a specific point in time, in this case, they could point back to the time the alarm was first generated. Administrators can view logs at that specific time and filter to view specific elements. This provides rich contextual device and timeframe information as level-two and -three engineers look to triage issues, enabling them to work faster and more efficiently. At this point, they may opt to export this information as a .csv file or as a syslog query so they can conduct further analysis.
For the vast majority of network operations teams, the reality is that time and budget will always be in short supply. By establishing unified visibility of both SNMP and syslog intelligence, teams can significantly streamline troubleshooting and remediation. In the process, they can make the most of their precious resources. To learn more, see our Small Bytes webcast, “How to Accelerate Triage with Contextual Access to Syslog.” This session features a detailed look at the DX NetOps integration with Elastic and Splunk, offering a demonstration of how the solution speeds root cause analysis.