Arista Introduces intelligent innovations for AI Networking

EOS Smart AI Suite fuels peak AI workload performance.

  • Thursday, 13th March 2025 Posted 5 months ago in by Phil Alsop

Arista Networks has introduced advanced capabilities to maximize AI cluster performance and efficiency. Cluster Load Balancing (CLB) in Arista EOS® maximizes AI workload performance with consistent, low-latency network flows, while Arista CloudVision® Universal Network Observability™ (CV UNO™) now offers AI job-centric observability for enhanced troubleshooting and rapid issue inference ensuring job completion reliability at scale.

Powering Smart AI Networking

The Arista EOS Smart AI Suite is designed for AI-grade robustness and protection and empowers AI clusters with an innovation called Cluster Load Balancing — a new Ethernet-based AI load balancing solution based on RDMA queue pairs that enables high bandwidth utilization between spines and leaves. AI clusters usually have low quantities of large bandwidth flows. Basic load balancing methods are often inefficient for AI workloads, resulting in uneven traffic distribution and increased tail latency. CLB addresses this by using RDMA-aware flow placement, to ensure uniform high performance for all flows while keeping tail latency low. CLB takes a global approach, optimizing traffic flow in both directions, leaf-to-spine and spine-to-leaf, ensuring balanced utilization and consistent low latency.

"As Oracle continues to grow its AI infrastructure leveraging Arista switches, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, vice president and Distinguished Engineer, Oracle Cloud Infrastructure. “Arista’s Cluster Load Balancing feature helps do that.”

Holistic AI Observability

CV UNO, the AI-driven 3600 Network Observability platform powered by Arista AVA™, delivers seamless, end-to-end AI job visibility by unifying network, system, and AI job data within the Arista Network Data Lake (NetDL™). EOS NetDL Streamer, a real-time telemetry framework that continuously streams granular network data from Arista switches into NetDL. Unlike traditional SNMP polling, which relies on periodic queries and can miss critical updates, the EOS NetDL Streamer provides low-latency, high-frequency, event-driven insights into network performance, key to supercharging large-scale AI training and inferencing infrastructure. Designed for AI accelerator clusters, it accelerates impact analysis, pinpoints issues with precision, and enables rapid resolution—ensuring job completion times are minimized. Some of the key benefits include:

AI Job Monitoring – Unlocks a comprehensive view of AI job health metrics, including job completion times, congestion indicators (ECN-marked packets, PFC pause frames, packet drops), and buffer/link utilization for real-time insights.

Deep-Dive Analytics – Uncovers critical job-specific insights by analyzing network devices, server NICs (e.g., PFC out-of-sync events, RDMA errors, PCIe fatal errors), and associated flows — pinpointing performance bottlenecks with precision.

Flow Visualization – Harnesses the power of CV topology mapping to gain real-time, intuitive visibility into AI job flows at microsecond granularity — accelerating issue inference and resolution.

Proactive Resolution – Detects anomalies early and correlates network and compute performance within NetDL — ensuring uninterrupted, high-efficiency AI workload execution.

Arista AI Centers Driven by AVA

Arista’s Etherlink™ AI Platforms deliver ultra-high-performance, standards-based Ethernet systems for next-gen AI networks. Offering 800G/400G fixed, modular, and distributed platforms that are forward-compatible with Ultra Ethernet Consortium (UEC), Etherlink scales from small AI clusters to massive deployments with 100,000+ accelerators. Arista features the AI Analyzer, powered by Arista AVA, which delivers high-resolution traffic data at 100-microsecond intervals, enabling precise performance optimization and troubleshooting. This allows network administrators to optimize performance, quickly troubleshoot issues, and make informed decisions for AI-driven networks. Arista AVA also powers a remote EOS AI Agent, that streams telemetry from SuperNICs or servers to NetDL, ensuring seamless network monitoring, debugging, and QoS consistency across the entire stack.

SailPoint launches application management platform

Posted 15 hours ago by Aaron Sandhu
SailPoint introduces a revolutionary solution to enhance application management through intelligent automation and governance, transforming security...

MariaDB reacquires SkySQL

Posted 15 hours ago by Aaron Sandhu
MariaDB strengthens its cloud offerings by re-integrating SkySQL's advanced serverless database-as-a-service platform.
Capgemini is set to acquire Cloud4C, enhancing its cloud managed services with automation and industry-specific frameworks.

Paul Redding appointed in new role at NinjaOne

Posted 20 hours ago by Aaron Sandhu
NinjaOne appoints Paul Redding as Head of MSP Partnerships, driving growth and efficiency.NinjaOne appoints Paul Redding as Head of MSP Partnerships,...
Wavenet introduces a game-changing solution for MSPs and resellers, marrying high-end cyber security with ease of access and tailored offerings.

A Welsh beacon in the global MSP landscape

Posted 1 day ago by Aaron Sandhu
Caerphilly’s Team Metalogic shines as the only Welsh business in 2025's global MSP 501 rankings.
Thrive launches its new NDR service to bolster business cybersecurity, promising quicker threat detection and response.

The importance of independent SaaS data protection

Posted 1 week ago by Aaron Sandhu
Keepit's survey highlights the risks of relying solely on native SaaS backups, underscoring the need for independent, immutable solutions.