Broadcom Software Academy Blog

Automate Configuration Policy Adherence to Boost Service Levels and Compliance

Written by Robert Kettles | Dec 9, 2024 4:34:33 PM
Key Takeaways
  • See why network configuration issues continue to be the cause of costly network issues.
  • Discover four key steps to establishing and adhering to configuration policies.
  • Employ DX NetOps to guard against configuration errors, maintain performance, and protect the bottom line.

Ensuring continuous network connectivity keeps getting more critical—but costly outages keep happening. This post looks at a key culprit behind many network outages: network configuration errors. We outline the key requirements for streamlining and automating configuration management, and detail how DX NetOps can help.

The problem: Costly outages keep happening

Over the years, we as consumers and employees have come to grow ever more dependent upon our network connectivity, which makes downtime all that much more problematic. Nevertheless, outages keep happening—and keep costing organizations dearly. One report from the Uptime Institute revealed that two-thirds of outages cost businesses more than $100,000.

In spite of the penalties levied by these outages, teams struggle to prevent them. Here are just a few examples of some of the higher profile outages that made headlines:

  • For Optus, the Australian telecommunications provider, a Border Gateway Protocol (BGP) routing problem ended up affecting more than 10 million users.
  • In January 2024, a network issue caused Microsoft Teams to be down for approximately seven hours.
  • In February 2024, tens of thousands of AT&T users experienced service outages.
  • In September and October of 2024, Verizon experienced major outages, with more than 100,000 users reporting issues. Further, these outages occurred a couple months after the company was hit with a fine of more than $1 million due to an investigation by the Federal Communications Commission (FCC) relating to 911 downtime.  

While the nature and cause of these outages isn’t always revealed publicly, what is clear is that ensuring compliance with configuration policies is a challenge plaguing many network operations teams, particularly those managing large, distributed systems.

According to one Uptime Institute report, network and connectivity related issues were the most common cause of IT service downtime. Plus, software and configuration errors were the most common cause of major outages that affected organizations’ third-party IT providers.  The institute also reported that four out of five survey respondents indicated that their most recent serious outage could have been prevented with better management, processes, and configuration.

The question is how do you improve these processes? Fundamentally, how do you spot and rectify erroneous configuration changes before they create outages and other problems?

Four steps for ensuring policy compliance

Establishing and adhering to configuration policies is vital for ensuring compliance with best practices, organizational standards, and external regulatory mandates, such as the Payment Card Industry Data Security Standard (PCI DSS), Sarbanes Oxley, and more.

In a prior post - Preventing Costly Network Outages: Why Network Configuration Management is Essential - I examined the criticality of network configuration management. In this post, I’ll reveal how you can employ various steps to fix problematic errors and ensure compliance with configuration policies. Here is an overview of the key steps to take:

  • Specify policies. First, teams need to define configuration policies for all applicable systems, including edge and core devices. This includes efforts like ensuring logs are being sent to the right servers and SNMP traps are being forwarded correctly. It is also essential to ensure devices and interfaces are intelligently defined, categorized, and so on.
  • Define actions. Next, teams need to define actions to be taken when a policy is violated. This can include generating an alarm with a specific severity. Alarms can also include contextual information, such as recommended actions to take to correct the issue. This can also include automated fixes, for example, to revert a configuration back to its prior state.
  • Repair violations. Operators need to follow up, investigate, and repair problematic configurations.
  • Do reporting and auditing. Teams need to view historical events to report on compliance, including specific violations, which policy was violated, how policies were violated, corrective actions taken, and so on. This reporting is essential in guiding ongoing improvements, and it is key for effectively preparing for and completing internal and external audits. Often, the teams and individuals responsible for handling remediations are different than those who take care of reporting and auditing. Consequently, it is important that these mechanisms support separation of duties.

The solution: DX NetOps

With DX NetOps, teams can establish the processes and controls needed to guard against configuration errors, maintain performance, and protect the bottom line.

For decades, customers have been relying on the network configuration management capabilities in DX NetOps. The platform has proven to provide unparalleled scalability. Today, the platform supports some of the largest enterprises and service providers, including environments with more than 150,000 devices.

The platform offers comprehensive network configuration management capabilities, helping teams with detecting configuration changes, pushing out changes, enforcing policies, reporting and auditing, and more. With the platform, teams can acquire timely, accurate intelligence on configuration changes. This intelligence is essential for various teams, not just the teams responsible for maintaining configurations, but those who are responsible for responding to faults.

Key capabilities

DX NetOps supports the four steps outlined above, enabling teams to define policies and violation actions, repair violations, and manage reporting and auditing. Here’s an overview of how the solution can help.

Defining, monitoring, and managing configurations

The platform can push configurations, capture configurations, and export them for long term compliance. The solution supports the repair of policy violations.

Alerting

Teams can see when violations occur and get all the required details. Operators can receive text and email alerts when configurations are changed. The solution can identify and report on more than 70 configuration-related events.

Configuration auditing

The platform offers support for a range of auditing approaches in order to validate compliance, including simple single-line statements, multi-line blocks, and regular expression matches. Users can also pass configurations through a script and audit new configurations before they are applied to systems.

Reporting

DX NetOps streamlines the process of collecting and sharing intelligence across teams.

In many organizations, engineering, deployment, and monitoring are typically handled by different teams in various organizations. By providing unified visibility and control, the platform helps teams from across these organizations to identify cross-silo issues and collaborate more effectively.

Using the platform’s global collection capabilities, teams can intelligently group monitoring policies and notifications, enabling efficient operations and timely intelligence sharing. (For more information on the power of these capabilities, be sure to review my post, Optimize Network Asset Organization with Global Collections in DX NetOps.)

DX NetOps also facilitates effective fault remediation. The solution’s unified intelligence helps teams responsible for fault management to correlate outages with configuration changes, enabling faster, more effective troubleshooting and remediation.

Conclusion

To minimize the incidence and duration of network outages, effective network configuration management is more critical than ever. With DX NetOps, teams can harness the advanced network configuration management capabilities they need. With the platform, teams can establish policies, track and report on compliance, and rapidly detect and address issues when they arise. To learn more, watch our Small Bytes webcast, Automatically Fix Non-Compliant Network Devices Before an Outage.