Is AI the Spark That Ignites a New Networking Revolution?
That’s the question that Mansour Karam, cofounder of Apstra and current Juniper GVP Products AI Clusters & Data Center, set out to answer at day one of Cloud Field Day 20.
Monsour set two AI pillars for Juniper:
- AI for networking: how AI improves the functioning and management of the network
- Networking for AI: delivering the correct network infrastructure for AI operations
AI for Networking
Apstra AI for networking is provided through intent-based networking. This means that the operator describes the desired state and configuration of the network and the software can pre-validate, test, and deliver the desired outcome.
It’s a closed-loop system with snapshots of states before changes which provides the ability to roll-back the configuration to any previous version. (Apstra is not just for Juniper – it is effectively vendor agnostic, supporting a long list of other networking vendor solutions.)
Part of the process of delivering intent-based networking is the collection of a tremendous amount of data. This becomes the training data for the AI engines.
Intent-based networking is deterministic while AI is probabalistic, and the combination can be very powerful.
Apstra is now application aware through application-level and network-level flow data. You can now ask the network is it working well for every single app.
The goal is to deliver the highest reliability for the network, so Apstra runs on-premises and the AI engine runs in the cloud. The proofpoint of the value of AI for networking: 68% of customers in Q1 upgraded from base to premium services.
Networking for AI
Job completion time is the critical measure for AI – how long it takes to train a model. The largest of the large LLMs take months to train, and during that time, you may have GPUs that are underutilized.
The network is a critical component of the AI infrastructure. And, according to Mansour, despite common wisdom and (mis) conception, Ethernet, not infiniband, is the network technology for AI.
As an example, in 2010, a single Broadcom chip supported 640Gbps. Today, Broadcom can support in excess of 54Tbps in one chip.
Operation of the network for AI is critical for both performance and reliability and that’s where Apstra shines. A closed loop system to fine-tune the network for optimum performance and reliability.
Validated Designs
Juniper has built an in-house AI lab so that they can become AI experts in the various types of AI operations and AI clusters. The team now understands the challenges most organizations face when building and deploying AI engines.
The result is a set of validated designs — guidelines, instructions, and training for Apstra’s AI — for the different types of AI clusters (inference, training, etc.)
The Network Revolution
I believe that the network revolution really started with the cloud/DevOps revolution, were developed an understanding of the value of moving from manual configuration to defining the desired state of the entire system and using automation to achieve our desires.
Now, the dynamic performance, reliability, and scale requirements of AI is clearly the impetus to make the massive investments to upgrade legacy networking to intent-based networking.
[Originally published on LinkedIn]