Arista Introduces intelligent innovations for AI Networking

EOS Smart AI Suite fuels peak AI workload performance.

  • Thursday, 13th March 2025 Posted 1 year ago in by Phil Alsop

Arista Networks has introduced advanced capabilities to maximize AI cluster performance and efficiency. Cluster Load Balancing (CLB) in Arista EOS® maximizes AI workload performance with consistent, low-latency network flows, while Arista CloudVision® Universal Network Observability™ (CV UNO™) now offers AI job-centric observability for enhanced troubleshooting and rapid issue inference ensuring job completion reliability at scale.

Powering Smart AI Networking

The Arista EOS Smart AI Suite is designed for AI-grade robustness and protection and empowers AI clusters with an innovation called Cluster Load Balancing — a new Ethernet-based AI load balancing solution based on RDMA queue pairs that enables high bandwidth utilization between spines and leaves. AI clusters usually have low quantities of large bandwidth flows. Basic load balancing methods are often inefficient for AI workloads, resulting in uneven traffic distribution and increased tail latency. CLB addresses this by using RDMA-aware flow placement, to ensure uniform high performance for all flows while keeping tail latency low. CLB takes a global approach, optimizing traffic flow in both directions, leaf-to-spine and spine-to-leaf, ensuring balanced utilization and consistent low latency.

"As Oracle continues to grow its AI infrastructure leveraging Arista switches, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, vice president and Distinguished Engineer, Oracle Cloud Infrastructure. “Arista’s Cluster Load Balancing feature helps do that.”

Holistic AI Observability

CV UNO, the AI-driven 3600 Network Observability platform powered by Arista AVA™, delivers seamless, end-to-end AI job visibility by unifying network, system, and AI job data within the Arista Network Data Lake (NetDL™). EOS NetDL Streamer, a real-time telemetry framework that continuously streams granular network data from Arista switches into NetDL. Unlike traditional SNMP polling, which relies on periodic queries and can miss critical updates, the EOS NetDL Streamer provides low-latency, high-frequency, event-driven insights into network performance, key to supercharging large-scale AI training and inferencing infrastructure. Designed for AI accelerator clusters, it accelerates impact analysis, pinpoints issues with precision, and enables rapid resolution—ensuring job completion times are minimized. Some of the key benefits include:

AI Job Monitoring – Unlocks a comprehensive view of AI job health metrics, including job completion times, congestion indicators (ECN-marked packets, PFC pause frames, packet drops), and buffer/link utilization for real-time insights.

Deep-Dive Analytics – Uncovers critical job-specific insights by analyzing network devices, server NICs (e.g., PFC out-of-sync events, RDMA errors, PCIe fatal errors), and associated flows — pinpointing performance bottlenecks with precision.

Flow Visualization – Harnesses the power of CV topology mapping to gain real-time, intuitive visibility into AI job flows at microsecond granularity — accelerating issue inference and resolution.

Proactive Resolution – Detects anomalies early and correlates network and compute performance within NetDL — ensuring uninterrupted, high-efficiency AI workload execution.

Arista AI Centers Driven by AVA

Arista’s Etherlink™ AI Platforms deliver ultra-high-performance, standards-based Ethernet systems for next-gen AI networks. Offering 800G/400G fixed, modular, and distributed platforms that are forward-compatible with Ultra Ethernet Consortium (UEC), Etherlink scales from small AI clusters to massive deployments with 100,000+ accelerators. Arista features the AI Analyzer, powered by Arista AVA, which delivers high-resolution traffic data at 100-microsecond intervals, enabling precise performance optimization and troubleshooting. This allows network administrators to optimize performance, quickly troubleshoot issues, and make informed decisions for AI-driven networks. Arista AVA also powers a remote EOS AI Agent, that streams telemetry from SuperNICs or servers to NetDL, ensuring seamless network monitoring, debugging, and QoS consistency across the entire stack.

Robertet Group is advancing its global operations through GTT’s Secure Connect SASE, with the aim of improving cloud access and supporting...

KOcycle's sustainability efforts earn King's Award

Posted 2 days ago by Sophie Milburn
KOcycle's commitment to sustainability earns them the King’s Award for Enterprise, highlighting their role in helping shape the channel’s ESG...
Bitdefender has appointed Frank Koelmel as Chief Revenue Officer, aiming to enhance global business growth and drive go-to-market initiatives.

N-able boosts operational efficiency

Posted 2 days ago by Phil Alsop
New integration automates ticketing, reduces missed alerts, and improves response times for backup failures, strengthening business resilience.
UK Connect has launched a field engineer as-a-service offer for MSPs and channel partners. 
MSP Global will bring together more than 3,000 MSPs and IT leaders at PortAventura near Barcelona on 21–22 October, focusing this year on how...

Westcon-Comstor shares FY26 financial performance update

Posted 3 days ago by Sophie Milburn
Westcon-Comstor reports sales growth and profitability by focusing on software and services, driving margin expansion and adapting to market trends.
WatchGuard has launched Rai, an AI-based solution designed to support MSP security operations. Rai aims to assist with workload management and...