April 17, 2024
Preventing Costly Network Outages: Why Network Configuration Management is Essential
Written by: Robert Kettles
Key Takeaways
|
|
As organizations continue to operate in an increasingly digitized fashion, ensuring network uptime and performance only keeps getting more critical. Network performance issues and downtime aren’t just a concern for network teams, they’re critical to the entire business.
Network downtime is costly. While actual costs will vary by industry and business, 54% of the respondents to Uptime Institute’s 2023 data center survey report that their most recent significant outage cost more than $100,000, with 16% reporting that costs exceeded $1 million. What’s more, devastating outages keep happening. Here are just a few high-profile examples:
- Cloudflare. A faulty network configuration change led to outages at 19 of the company’s data centers. These data centers are responsible for a significant portion of the company’s global traffic. It took more than an hour for all data centers to return to normal operation.
- Google Cloud. After what was deemed a “routine maintenance event” in their software-defined networking (SDN) environment, checkpoint data with missing configuration information was automatically propagated to switches. This led to an outage that lasted more than three hours.
- Microsoft Azure. A network configuration change caused downtime of the Azure cloud platform, as well as Microsoft 365 services running on the platform, such as Teams, Outlook, and SharePoint, which are relied upon by millions of users around the world.
- Slack. A configuration change led to increased activity hitting the database infrastructure. Ultimately, this extra load left databases unable to serve user requests, which meant users couldn’t send or receive messages, upload files, or join channels. It took three and a half hours for services to be fully restored.
See a theme here? Network configuration is the common thread that ties these high-profile, costly outages together.
According to research by the Uptime Institute, issues with network configuration management (NCM) or network change management are the most common cause of network outages.
Network configuration management challenges
As networks continue to get more dynamic and complex, ensuring proper configuration of network devices is increasingly challenging. Particularly in large, complex organizations, ensuring device configurations constantly comply with policies is getting increasingly difficult.
Traditional approaches of having someone manually logging into consoles to make configuration updates is labor intensive, time consuming, and error prone. For example, an administrator may forget to save the running configuration, which can result in a potential outage when the device is rebooted.
The siloed nature of how changes are managed can also exacerbate these challenges. Different teams and individuals bear responsibility for various aspects, such as engineering, deployment, and monitoring. One individual may make an erroneous change, but another team will be responsible for detecting the problem, and another will be responsible for fixing the issue. These silos can introduce further delays and errors.
Requirements
To keep network configuration issues from taking down their networks, teams need to intelligently manage network device configurations. To do so, they need to establish several key capabilities. First and foremost, they need to understand when a device configuration has been changed. If necessary, alerts should be generated so teams can review changes immediately, and, if necessary, take corrective action, such as rolling back to a known good state.
Next, teams must be able to view the specific changes that have occurred. They also have to be able to track prior events to identify who made the changes, as well as when and where those changes were made. Compliance reports should be available, which can be invaluable in gaining cross-team, cross-silo visibility. Ultimately, teams need to automate all these efforts, so policies can be established and enforced consistently and efficiently.
The solution: NetOps by Broadcom
NetOps by Broadcom delivers an advanced, comprehensive solution for managing and optimizing today’s modern, dynamic networks. The solution features DX NetOps, a unified, scalable network monitoring solution for traditional and modern, software-defined infrastructures.
DX NetOps features advanced, proven network configuration management capabilities that have been around for years and continue to be refined and enhanced. With the solution, your teams can track and govern configuration changes. The solution enables operators to establish policies, track adherence, and identify and address violations when they occur.
Key Capabilities
Gain actionable visibility and control
DX NetOps offers these key network configuration management capabilities:
- Capture, manage configuration information. DX NetOps enables you to maintain a history of device configurations for comparison and troubleshooting.
- Correlate outages to configuration changes. The solution integrates fault and configuration management, helping speed issue resolution.
- Manage device groups. You can load and merge configurations of one or more devices of the same family, so you can more efficiently manage groups of devices.
Generate reports for management and auditing
With the solution, you can establish detailed audit trails of configurations for every device. The solution offers a range of advanced reporting capabilities. You can export configuration data, so teams can visualize information in their preferred analytics and reporting applications. You can also view devices in specific groups and categories, see devices in violation, and execute an alarm to notify relevant teams. For a specific device, you can effectively verify that the correct configuration is currently running.
Leverage flexible change detection
The solution provides multiple ways to identify changes:
- Assisted inspection. DX NetOps can compare running and startup configurations as well as highlight differences between the current and previous configurations. Teams can define reference configurations, helping establish a baseline for configuration compliance.
- Alarms. Alarms can be generated when configuration changes have been identified, policies are violated, or there is drift from the reference.
- Routine synchronization. The solution can proactively capture configurations due to notifications received from a device. You can also set up a schedule of automatic captures to ensure reliability and policy compliance.
Establish script-based validation
With DX NetOps, you can upload configuration scripts to execute more complex workflows. These scripts can validate whether the current configuration is in compliance, provide recommended actions if it is not in compliance, and even take an action automatically.
Employ policy-based workflows
You can create policies for monitoring configurations and verifying that they are compliant. There are more than 70 unique events to report on, such as when start up and running configurations are different, when there’s a deviation from policy, if issues are encountered in capturing configurations, and if changes were made to a device that weren’t scheduled.
Create flexible, role-based workflows
The solution offers capabilities for employing role-based access controls. Administrators can therefore establish permissions to enforce policies and guard against unauthorized or erroneous changes. With the solution, an administrator can implement a policy so that any time a particular user or team makes a configuration change, that change needs to be reviewed and approved before it is rolled out.
Workflows can be triggered, for example when a configuration change is saved. This event can trigger an automated process, such as having an email sent to an approver. The selected individual can then approve or deny the change, or inspect details and even fix an issue, all via links within the email. If a violation is detected, messages can offer context and guidance, such as providing the details on the violation and corrective measures to take.
Benefits
By leveraging the solution’s advanced network configuration management capabilities, teams can realize the following benefits:
- Enhanced visibility and control. System administrators can establish proactive change management approaches that deliver improved visibility into configuration changes and more granular, robust control over configurations across the organization.
- Accelerated resolution. The solution offers the timely, actionable visibility teams need, so they can more quickly detect and address issues, reducing mean time to resolution (MTTR).
- Improve availability and service levels. With the solution’s visibility and control, teams can minimize the erroneous changes that lead to performance issues and outages. With these capabilities, teams can reduce the severity, duration, and incidence of downtime caused by configuration changes.
Conclusion
With NetOps by Broadcom, your teams can harness comprehensive network configuration management capabilities. The solution enables users to capture and track configurations, detect configuration changes, deploy configuration policies that map to compliance requirements, see who made changes, control who can make changes, leverage automation to streamline workflows, and more.
To learn more and see a demo of the solution in action, be sure to watch our Small Bytes session, How NCM Can Help Us Learn from Recent Major Internet Outages. In addition, visit our Small Bytes page to see a complete list of upcoming and on-demand presentations in the series.
Robert Kettles
Robert Kettles started off as a field engineer at Cabletron Systems supporting LAN/WAN switching and routing solutions along with their relatively new network management platform: Spectrum. Over two decades later, he continues to help customers solve network fault and performance management challenges.
Other posts you might be interested in
Explore the Catalog
Blog
December 3, 2024
Unlocking the Untapped Potential of Data Pipelines in Financial Services
Read More
Blog
November 27, 2024
Upgrade Smarter, Not Harder with DX NetOps Upgrade Automation
Read More
Blog
November 26, 2024
Topology: Services for Business Observability
Read More
Blog
November 22, 2024
Regular Expressions That I Use Regularly
Read More
Blog
November 22, 2024
Cloud Application Performance: Common Reasons for Slow-Downs
Read More
Blog
November 22, 2024
Tired of Atlassian Price Hikes? Time to Consider Rally by Broadcom
Read More
Blog
November 20, 2024
How DX NetOps Fuels Rapid, Accurate Isolation in Modern Networks
Read More
Blog
November 18, 2024
Optimizing Resources With Airflow: A Guide to Workload Optimization and SLA Management
Read More
Blog
November 18, 2024