The connected thermal-power ecosystem for AI

By Michael Poto - Product Manager - Global Chilled Water Systems at Vertiv.

  • Monday, 20th April 2026 Posted 2 hours ago in by Phil Alsop

The explosion of artificial intelligence (AI) workloads is fundamentally reshaping critical digital infrastructure. Rack densities have escalated dramatically increasing from a few kW to tens of kW, sometimes exceeding 100 kW or even hundreds of kW per rack. These shifts are accompanied by dynamic thermal profiles and load patterns dictated by the alternating demands of AI training and inference cycles. 

In this scenario, treating power and thermal management as isolated domains is no longer viable. AI loads are dynamic and rapid fluctuations create electrical ripples that can cause thermal hotspots and instability. Without synchronised responses, data centre operators face risks such as temperature excursions, unnecessary overprovisioning, degraded performance, and potential downtime.

AI necessitates a connected ecosystem

These risks can be reduced through a comprehensive end-to-end connected ecosystem that spans from chip-level heat capture to facility-scale heat rejection. By tightly integrating advanced liquid and air cooling technologies with a unified controls architecture, thermal responses can align with real-time power consumption and compute demands. This can help to deliver consistent performance and energy efficiency, even under the most dynamic AI workloads, creating a cohesive system that anticipates and adapts to AI's demands.

AI clusters generate high-amplitude, variable load changes that propagate through the electrical chain and manifest as sudden thermal events. Disjointed power and cooling responses can lead to system inefficiencies or failures. Data centre operators must therefore unify the entire thermal chain - from on-rack heat capture to plant-side rejection and potentially heat reuse - under a single, data-driven control strategy. This coherent reaction to workload dynamics is essential for maintaining stability, avoiding overprovisioning, and preserving uptime in critical deployments.

Heat capture 

Direct-to-chip (DTC) liquid cooling is a cornerstone of high-density infrastructure. Modern coolant distribution units (CDUs) enable precise, scalable heat removal aligned with instantaneous AI compute needs. Recent expansions in EMEA have seen the introduction of new CDU models, including 70 kW, 100 kW, and even 2300 kW capacities, in both in-rack and in-row configurations. They support both liquid-to-air and liquid-to-liquid loops for retrofits or greenfield deployments. 

Rear-door heat exchangers (RDHx) offer a practical bridge for sites transitioning to higher densities. Mounted at the rack rear, they capture heat before room entry, which eases perimeter or overhead cooling burdens, minimising recirculation and stabilising inlet temperatures in mixed loads. 

In-row cooling delivers targeted, granular heat removal for residual loads. Features like variable compressors, EC fans, electronic expansion valves, and integrated intelligence enable real-time monitoring, group coordination, and seamless supervisory interaction are ideal for mixed-density aisles and incremental scaling.

Slab-floor thermal wall technology supports gallery-side high-volume, low-speed air delivery, integrating with chilled-water architectures for traditional and hybrid setups.

Heat rejection

The closing stage of the thermal management chain is heat rejection, which can be captured and repurposed for applications such as district heating networks, industrial heating processes and agricultural operations. This helps to improve overall energy efficiency of any facility.

The challenge is that future thermal profiles and operating limits for AI-intensive deployments are evolving as rapid advancements in chip architectures, rack densities, and workload characteristics are expected to produce a broad spectrum of heat fluxes and temperature requirements. 

This variability challenges decisions around committing to a specific chilled-water supply temperature. Fixing a narrow setpoint risks inefficiency, capacity shortfalls under high loads, or excessive energy consumption. This underscores the need for adaptable, future-ready cooling designs that can flexibly accommodate evolving demands without major retrofits.

Trimming the cooling loads is a good solution for maximum free cooling in AI, high density operations. It is ideal when the data centre can operate at elevated water temperatures and can enable free cooling for most of the year. With this approach, installation latitude becomes irrelevant, unlocking consistent performance across diverse climates. This helps to reduce energy consumption and supports future ready architectures designed around higher temperature setpoints.  It offers flexibility to handle unforeseen circumstances, operating across a wide range of water temperatures – from traditional temperatures up to 40°C - enabling the best course of action when a clear choice is unavailable.

Centrifugal chillers provide reliable cooling capacity under variable or extreme conditions at partial or full loads, or where lower supply water temperatures are still required. They are a stable, scalable backbone for heat rejection when free cooling is limited. They help to maximise energy savings and maintain stability, which preserves power for AI load.

Free cooling inverter screw technology is a reliable and robust alternative that works effectively across a wide range of operating conditions. It continuously adjusts the motor speed in real time based on compressed air or cooling load requirements, enabling precise capacity modulation, reduced energy consumption, smoother operation with minimal pressure fluctuations, and a lower starting current. This technology has become a standard in modern industrial and commercial applications for achieving higher efficiency in compressed air and HVAC systems.

The controls layer: Unified orchestration

Technology alone doesn’t make a system AI ready; orchestration does. Multi-tier controls that provide unit-level intelligence, supervisory orchestration, and plant management can unify assets like chillers, CDUs, pumps, and CRAHs. 

A control stack should span unit level intelligence, supervisory coordination and central chilled water plant management:

Unit level cooling controls provide hundreds of data points per unit, self healing routines, and protective logic to keep individual devices from crossing unsafe thresholds. 

Supervisory thermal orchestration centralises sensor and unit data across the room or site, leveraging machine to machine coordination to harmonise setpoints, airflow, and water temperatures across zones. 

Chilled water plant management transforms the chilled-water system into a predictive, self-optimising engine. 

This three tier control is the connective tissue that links power draw to thermal response in real time, aligning chiller setpoints, pump curves, fan speeds, and zone strategies with actual compute demand. It’s also the foundation for fault tolerance, coordinating failover behaviour to avoid cascading trips across hybrid cooling architectures. 

For operators, this visibility also supports better decision-making. Understanding how heat moves through the facility makes it easier to evaluate new technologies, validate design assumptions and manage risk as AI deployments scale. When paired with a comprehensive services contract which includes end-to-end support across the entire thermal chain - from initial design and commissioning through to ongoing optimisation – continuous reliability can be achieved through expert deployment and predictive maintenance.

The thermal chain: architected, instrumented, optimised

AI data centres don’t just need more cooling. They need smarter cooling tightly coupled with power. A connected ecosystem, that spans chip level heat capture, hybrid heat rejection, slab floor air delivery, and multi tier controls, is built to anticipate, coordinate, and adapt. The result is a system that stays efficient at high temperatures, stable at high densities, and fault tolerant under dynamic loads, ready for the realities of AI factories.