Arista Introduces intelligent innovations for AI Networking

EOS Smart AI Suite fuels peak AI workload performance.

  • Thursday, 13th March 2025 Posted 6 days ago in by Phil Alsop

Arista Networks has introduced advanced capabilities to maximize AI cluster performance and efficiency. Cluster Load Balancing (CLB) in Arista EOS® maximizes AI workload performance with consistent, low-latency network flows, while Arista CloudVision® Universal Network Observability™ (CV UNO™) now offers AI job-centric observability for enhanced troubleshooting and rapid issue inference ensuring job completion reliability at scale.

Powering Smart AI Networking

The Arista EOS Smart AI Suite is designed for AI-grade robustness and protection and empowers AI clusters with an innovation called Cluster Load Balancing — a new Ethernet-based AI load balancing solution based on RDMA queue pairs that enables high bandwidth utilization between spines and leaves. AI clusters usually have low quantities of large bandwidth flows. Basic load balancing methods are often inefficient for AI workloads, resulting in uneven traffic distribution and increased tail latency. CLB addresses this by using RDMA-aware flow placement, to ensure uniform high performance for all flows while keeping tail latency low. CLB takes a global approach, optimizing traffic flow in both directions, leaf-to-spine and spine-to-leaf, ensuring balanced utilization and consistent low latency.

"As Oracle continues to grow its AI infrastructure leveraging Arista switches, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, vice president and Distinguished Engineer, Oracle Cloud Infrastructure. “Arista’s Cluster Load Balancing feature helps do that.”

Holistic AI Observability

CV UNO, the AI-driven 3600 Network Observability platform powered by Arista AVA™, delivers seamless, end-to-end AI job visibility by unifying network, system, and AI job data within the Arista Network Data Lake (NetDL™). EOS NetDL Streamer, a real-time telemetry framework that continuously streams granular network data from Arista switches into NetDL. Unlike traditional SNMP polling, which relies on periodic queries and can miss critical updates, the EOS NetDL Streamer provides low-latency, high-frequency, event-driven insights into network performance, key to supercharging large-scale AI training and inferencing infrastructure. Designed for AI accelerator clusters, it accelerates impact analysis, pinpoints issues with precision, and enables rapid resolution—ensuring job completion times are minimized. Some of the key benefits include:

AI Job Monitoring – Unlocks a comprehensive view of AI job health metrics, including job completion times, congestion indicators (ECN-marked packets, PFC pause frames, packet drops), and buffer/link utilization for real-time insights.

Deep-Dive Analytics – Uncovers critical job-specific insights by analyzing network devices, server NICs (e.g., PFC out-of-sync events, RDMA errors, PCIe fatal errors), and associated flows — pinpointing performance bottlenecks with precision.

Flow Visualization – Harnesses the power of CV topology mapping to gain real-time, intuitive visibility into AI job flows at microsecond granularity — accelerating issue inference and resolution.

Proactive Resolution – Detects anomalies early and correlates network and compute performance within NetDL — ensuring uninterrupted, high-efficiency AI workload execution.

Arista AI Centers Driven by AVA

Arista’s Etherlink™ AI Platforms deliver ultra-high-performance, standards-based Ethernet systems for next-gen AI networks. Offering 800G/400G fixed, modular, and distributed platforms that are forward-compatible with Ultra Ethernet Consortium (UEC), Etherlink scales from small AI clusters to massive deployments with 100,000+ accelerators. Arista features the AI Analyzer, powered by Arista AVA, which delivers high-resolution traffic data at 100-microsecond intervals, enabling precise performance optimization and troubleshooting. This allows network administrators to optimize performance, quickly troubleshoot issues, and make informed decisions for AI-driven networks. Arista AVA also powers a remote EOS AI Agent, that streams telemetry from SuperNICs or servers to NetDL, ensuring seamless network monitoring, debugging, and QoS consistency across the entire stack.

The starting signal for a network of decentralized data centers in Central Europe has been given: HOCHTIEF, a technology-driven global provider of...
Paessler GmbH, leading provider of IT and IoT monitoring solutions, has launched its first-ever dedicated Managed Service Provider (MSP) program,...

Hitachi Vantara introduces Hitachi iQ M Series

Posted 7 hours ago by Phil Alsop
Hitachi iQ to leverage NVIDIA AI Data Platform reference design, providing enhanced capabilities for reliability and performance and enabling a...

Nokia and Hetzner enhance hosting infrastructure

Posted 7 hours ago by Phil Alsop
Companies to future-proof data center and core network infrastructure to support growing digital demands.

Nokia strengthens Worldstream’s hosting security

Posted 7 hours ago by Phil Alsop
Enterprise customers using hosting services will benefit from fast network-based mitigation of most complex and high-volume cyberattacks and...

TNS expands branded calling solutions

Posted 7 hours ago by Phil Alsop
Technology Service Distributors and broader channel partner community now gain access to award-winning TNS Enterprise Product Suite to transform...
Expansion of Oracle Cloud Infrastructure supports the UK Government’s vision for an AI-driven future and will help meet global demand for its cloud...

Cohesity partners with Glean

Posted 16 hours ago by Phil Alsop
Cohesity has announced a new partnership with Work AI leader Glean to become a fully integrated data source for the Glean platform. The partnership...