Arista Introduces intelligent innovations for AI Networking

EOS Smart AI Suite fuels peak AI workload performance.

  • Thursday, 13th March 2025 Posted 1 year ago in by Phil Alsop

Arista Networks has introduced advanced capabilities to maximize AI cluster performance and efficiency. Cluster Load Balancing (CLB) in Arista EOS® maximizes AI workload performance with consistent, low-latency network flows, while Arista CloudVision® Universal Network Observability™ (CV UNO™) now offers AI job-centric observability for enhanced troubleshooting and rapid issue inference ensuring job completion reliability at scale.

Powering Smart AI Networking

The Arista EOS Smart AI Suite is designed for AI-grade robustness and protection and empowers AI clusters with an innovation called Cluster Load Balancing — a new Ethernet-based AI load balancing solution based on RDMA queue pairs that enables high bandwidth utilization between spines and leaves. AI clusters usually have low quantities of large bandwidth flows. Basic load balancing methods are often inefficient for AI workloads, resulting in uneven traffic distribution and increased tail latency. CLB addresses this by using RDMA-aware flow placement, to ensure uniform high performance for all flows while keeping tail latency low. CLB takes a global approach, optimizing traffic flow in both directions, leaf-to-spine and spine-to-leaf, ensuring balanced utilization and consistent low latency.

"As Oracle continues to grow its AI infrastructure leveraging Arista switches, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, vice president and Distinguished Engineer, Oracle Cloud Infrastructure. “Arista’s Cluster Load Balancing feature helps do that.”

Holistic AI Observability

CV UNO, the AI-driven 3600 Network Observability platform powered by Arista AVA™, delivers seamless, end-to-end AI job visibility by unifying network, system, and AI job data within the Arista Network Data Lake (NetDL™). EOS NetDL Streamer, a real-time telemetry framework that continuously streams granular network data from Arista switches into NetDL. Unlike traditional SNMP polling, which relies on periodic queries and can miss critical updates, the EOS NetDL Streamer provides low-latency, high-frequency, event-driven insights into network performance, key to supercharging large-scale AI training and inferencing infrastructure. Designed for AI accelerator clusters, it accelerates impact analysis, pinpoints issues with precision, and enables rapid resolution—ensuring job completion times are minimized. Some of the key benefits include:

AI Job Monitoring – Unlocks a comprehensive view of AI job health metrics, including job completion times, congestion indicators (ECN-marked packets, PFC pause frames, packet drops), and buffer/link utilization for real-time insights.

Deep-Dive Analytics – Uncovers critical job-specific insights by analyzing network devices, server NICs (e.g., PFC out-of-sync events, RDMA errors, PCIe fatal errors), and associated flows — pinpointing performance bottlenecks with precision.

Flow Visualization – Harnesses the power of CV topology mapping to gain real-time, intuitive visibility into AI job flows at microsecond granularity — accelerating issue inference and resolution.

Proactive Resolution – Detects anomalies early and correlates network and compute performance within NetDL — ensuring uninterrupted, high-efficiency AI workload execution.

Arista AI Centers Driven by AVA

Arista’s Etherlink™ AI Platforms deliver ultra-high-performance, standards-based Ethernet systems for next-gen AI networks. Offering 800G/400G fixed, modular, and distributed platforms that are forward-compatible with Ultra Ethernet Consortium (UEC), Etherlink scales from small AI clusters to massive deployments with 100,000+ accelerators. Arista features the AI Analyzer, powered by Arista AVA, which delivers high-resolution traffic data at 100-microsecond intervals, enabling precise performance optimization and troubleshooting. This allows network administrators to optimize performance, quickly troubleshoot issues, and make informed decisions for AI-driven networks. Arista AVA also powers a remote EOS AI Agent, that streams telemetry from SuperNICs or servers to NetDL, ensuring seamless network monitoring, debugging, and QoS consistency across the entire stack.

MSPs embrace hybrid IT for lucrative returns

Posted 2 days ago by Sophie Milburn
New research reveals MSPs are capitalising on hybrid IT for cloud and security returns.
One NZ selects Highlight to strengthen network service visibility and customer satisfaction.
Rubrik enhances its platform with new features for MSPs, aiming to seize growth in the $258 billion global data protection market by 2027.

Huntress extends partner programme for reseller access

Posted 4 days ago by Sophie Milburn
Huntress extends its partner programme to resellers, aiming to strengthen cybersecurity for organisations globally.
LevelBlue and Tenable collaborate to expand vulnerability and exposure management capabilities for MSSP and MSP partners.
Nebula Global Services partners with Netos to expand global deployment of FinOps for IT networks, aiming to provide visibility and optimisation...

Alicia Shepherd to lead GTIA's UK & Ireland community

Posted 6 days ago by Sophie Milburn
GTIA appoints Alicia Shepherd as Regional Community Manager to enhance engagement and growth.

Supply chain turmoil heightens need for cyber resilience

Posted 6 days ago by Sophie Milburn
Zscaler report reveals a rise in supply chain failures due to cyber threats, prompting calls for enhanced resilience strategies.