Arista Introduces intelligent innovations for AI Networking

EOS Smart AI Suite fuels peak AI workload performance.

  • Thursday, 13th March 2025 Posted 4 months ago in by Phil Alsop

Arista Networks has introduced advanced capabilities to maximize AI cluster performance and efficiency. Cluster Load Balancing (CLB) in Arista EOS® maximizes AI workload performance with consistent, low-latency network flows, while Arista CloudVision® Universal Network Observability™ (CV UNO™) now offers AI job-centric observability for enhanced troubleshooting and rapid issue inference ensuring job completion reliability at scale.

Powering Smart AI Networking

The Arista EOS Smart AI Suite is designed for AI-grade robustness and protection and empowers AI clusters with an innovation called Cluster Load Balancing — a new Ethernet-based AI load balancing solution based on RDMA queue pairs that enables high bandwidth utilization between spines and leaves. AI clusters usually have low quantities of large bandwidth flows. Basic load balancing methods are often inefficient for AI workloads, resulting in uneven traffic distribution and increased tail latency. CLB addresses this by using RDMA-aware flow placement, to ensure uniform high performance for all flows while keeping tail latency low. CLB takes a global approach, optimizing traffic flow in both directions, leaf-to-spine and spine-to-leaf, ensuring balanced utilization and consistent low latency.

"As Oracle continues to grow its AI infrastructure leveraging Arista switches, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, vice president and Distinguished Engineer, Oracle Cloud Infrastructure. “Arista’s Cluster Load Balancing feature helps do that.”

Holistic AI Observability

CV UNO, the AI-driven 3600 Network Observability platform powered by Arista AVA™, delivers seamless, end-to-end AI job visibility by unifying network, system, and AI job data within the Arista Network Data Lake (NetDL™). EOS NetDL Streamer, a real-time telemetry framework that continuously streams granular network data from Arista switches into NetDL. Unlike traditional SNMP polling, which relies on periodic queries and can miss critical updates, the EOS NetDL Streamer provides low-latency, high-frequency, event-driven insights into network performance, key to supercharging large-scale AI training and inferencing infrastructure. Designed for AI accelerator clusters, it accelerates impact analysis, pinpoints issues with precision, and enables rapid resolution—ensuring job completion times are minimized. Some of the key benefits include:

AI Job Monitoring – Unlocks a comprehensive view of AI job health metrics, including job completion times, congestion indicators (ECN-marked packets, PFC pause frames, packet drops), and buffer/link utilization for real-time insights.

Deep-Dive Analytics – Uncovers critical job-specific insights by analyzing network devices, server NICs (e.g., PFC out-of-sync events, RDMA errors, PCIe fatal errors), and associated flows — pinpointing performance bottlenecks with precision.

Flow Visualization – Harnesses the power of CV topology mapping to gain real-time, intuitive visibility into AI job flows at microsecond granularity — accelerating issue inference and resolution.

Proactive Resolution – Detects anomalies early and correlates network and compute performance within NetDL — ensuring uninterrupted, high-efficiency AI workload execution.

Arista AI Centers Driven by AVA

Arista’s Etherlink™ AI Platforms deliver ultra-high-performance, standards-based Ethernet systems for next-gen AI networks. Offering 800G/400G fixed, modular, and distributed platforms that are forward-compatible with Ultra Ethernet Consortium (UEC), Etherlink scales from small AI clusters to massive deployments with 100,000+ accelerators. Arista features the AI Analyzer, powered by Arista AVA, which delivers high-resolution traffic data at 100-microsecond intervals, enabling precise performance optimization and troubleshooting. This allows network administrators to optimize performance, quickly troubleshoot issues, and make informed decisions for AI-driven networks. Arista AVA also powers a remote EOS AI Agent, that streams telemetry from SuperNICs or servers to NetDL, ensuring seamless network monitoring, debugging, and QoS consistency across the entire stack.

Arctic Wolf enhances Aurora Platform with integrations

Posted 16 hours ago by Aaron Sandhu
Arctic Wolf boosts its Aurora Platform by integrating with Microsoft, Oracle, OneLogin, and CyberArk, enhancing security operations and flexibility.

Riverbed's leap in network observability

Posted 17 hours ago by Aaron Sandhu
Riverbed unveils AI-powered network observability solutions, enhancing IT efficiency and performance with the XX90 appliance series and Flex...
Rubrik and Sophos collaborate to introduce an integrated backup and recovery solution for Microsoft 365, enhancing ransomware resilience and data...

Rackspace Technology unveils cloud management platform

Posted 18 hours ago by Aaron Sandhu
Rackspace Technology introduces its enhanced Cloud Management Platform, empowering organisations with AI-enabled tools and next-gen upgrades for...

AirMDR unveils AI SOC Platform with free plan

Posted 21 hours ago by Aaron Sandhu
AirMDR introduces an AI-driven solution that automates SOC operations, along with a risk-free trial plan.

Cloudera expands horizons with Taikun acquisition

Posted 1 day ago by Aaron Sandhu
Cloudera acquires Taikun to bolster Kubernetes capabilities, enhancing flexibility and efficiency across IT environments.
Advania UK cements its leading position in the UK tech sector following the integration of CCS Media and Servium.
Palo Alto Networks' acquisition of CyberArk marks a significant shift, introducing Identity Security as a core focus in its multi-platform strategy.