December 5, 2024
Are Our Networks Ready for AI?
Why Network Outages Will Derail Your AI Initiatives
Written by: Jeremy Rossbach
Key Takeaways
|
|
With all the hype surrounding AI, it’s critical to focus on building resilient networks capable of handling the performance demands that AI will introduce. As I often say, if your network observability solution isn’t detecting packet loss, neither will your AI engine. When you ask, “What’s the status of our global network health this morning?” a flawed or incomplete response could jeopardize critical decisions.
What does the impact look like?
Consider Meta’s documented challenges in contending with AI’s impact on their network operations.
In 2022, the team at Meta foresaw that the growing demands of AI on their backbone network would be unpredictable and turbulent. The company faced a tsunami of AI data traffic, driven by a huge increase in their GPU builds.
Figure 1: Meta discovered that 57% of its AI workload data was spent on network operations, highlighting the critical role of network I/O in enabling AI success. (Source: Alexis Bjorlinat, 2022 OCP Global Summit)
“The impact of large clusters, generative AI (GenAI), and artificial general intelligence (AGI) is still an unknown,” said Jyotsna Sundaresan, a Meta network strategist. “We haven’t fully explored what that means for the backend.”
Meta’s networking team saw a higher-than-anticipated increase in backbone traffic, leaving them unprepared. As Sundaresan acknowledged, “We were not ready for this.”
It’s a warning echoed by Gartner below: Are any of our networks truly prepared for AI workloads?
Figure 2: Gartner’s analysis on how gen AI will reshape network infrastructure.
Consumer tolerance vs. AI expectations
Today’s consumer may tolerate some buffering or latency in certain scenarios. For instance, during the Mike Tyson/Jake Paul fight, the stream occasionally buffered at critical moments. While frustrating, it didn’t ruin the overall experience—it was free, and viewers could still catch the highlights and find out who won.
AI workloads, however, are far less forgiving. Massive data volumes, complex traffic patterns, and unprecedented bandwidth requirements will demand a level of network performance we’ve never seen before.
For example, imagine walking into a network operations center and asking your AI assistant, “What should I prioritize this morning?” If your Seattle branch office is dropping packets and your observability solution isn’t catching it, your AI engine won’t either. It will reply with a misleading, “Everything’s okay.”
Cloud and SD-WAN complications
Many AI initiatives will incorporate cloud and SD-WAN components, introducing additional challenges. With data transfers, front-end user interfaces, backups, and high availability systems potentially spread across multiple locations, visibility across the entire network path is essential.
If your observability solution doesn’t collect end-to-end delivery metrics from both internally and externally managed networks, you’ll struggle to troubleshoot when an end user reports an issue.
A real-world use case
During a recent interview on The NetOps Expert podcast, I spoke with Julian Guthrie, CEO of Alphy, a startup using AI to improve human communication. When I asked about what the typical network path might look like for a user of their solution, it looked something like this:
👩💻 Hybrid worker → 🛜 Home/Hotel WiFi → 📡 Residential/Local ISP → 🚦 Transit ISPs → 🌦️ Cloud/SaaS Provider (hosting the AI solution).
Seems straightforward, right? But here’s the catch: Does your network team have administrative access to any of these networks? Likely not. If a user reports poor performance, will you tell them to call their ISP?
Figure 3: Visibility into the entire network path the user experience relies upon reveals poor performance of unmanaged cloud and ISP network devices.
Of course not. Regardless of who owns the infrastructure, your organization is responsible for delivering a seamless user experience. It’s your brand, revenue, and reputation on the line. A mature network observability solution provides visibility into ISP and cloud infrastructure, enabling you to triage such issues effectively.
The stakes are high
Earlier this year, tens of thousands of AT&T users experienced widespread service disruptions. While AT&T restored service quickly, they acknowledged the disruption and pledged to prevent similar outages in the future.
With AI’s growing role in our networks, disruptions like this will have far greater consequences. AI-driven systems rely on flawless performance, and even minor issues can derail critical initiatives.
Preparing for an AI-driven future
AI is not just on the horizon—it’s already here. The success of your AI initiatives will depend on the resilience of your network. Start building that resilience today to meet the demands AI will place on your infrastructure tomorrow.
Because in the world of AI, there’s no room for “buffering.”
Jeremy Rossbach
As the Chief Technical Evangelist for NetOps by Broadcom, Jeremy is passionate about meeting with customers to identify their IT operational challenges and produce solutions that fit their business and network transformation goals. Prior to joining Broadcom, he spent over 15+ years working in IT, across both public...
Other posts you might be interested in
Explore the Catalog
Blog
December 17, 2024
Enhance Network Observability with SystemEDGE for DX NetOps
Read More
Blog
December 17, 2024
What’s New in DX NetOps 24.3
Read More
Blog
December 9, 2024
Automate Configuration Policy Adherence to Boost Service Levels and Compliance
Read More
Blog
December 5, 2024
SD-WAN Performance: Don’t Trust, Validate. Here’s How
Read More
Blog
November 27, 2024
Upgrade Smarter, Not Harder with DX NetOps Upgrade Automation
Read More
Blog
November 20, 2024
How DX NetOps Fuels Rapid, Accurate Isolation in Modern Networks
Read More
Blog
November 18, 2024
Three Multi-Cloud Scenarios That Benefit from Active Network Monitoring
Read More
Blog
November 12, 2024
Eighty Percent of Organizations Report Network Complexity and Visibility Blind Spots as Cloud Adoption Flourishes
Read More
Blog
November 7, 2024