Transforming data centres to meet AI’s evolving demands

By Kamlesh Patel, VP Data Center Market Development at CommScope.

  • Sunday, 22nd September 2024 Posted 1 month ago in by Phil Alsop

While 2023 marked a pivotal moment in recognising the vast potential of artificial intelligence, 2024 kicked of what is becoming a truly transformative period. AI’s broad applications, ranging from machine learning and deep learning to natural language processing have seamlessly integrated into our everyday lives, revolutionising how we live, work and connect. 

 

As AI’s popularity soars, data centre managers and their teams are grappling with the challenge of managing, not only the surge of petabytes of data flooding their networks, but also the need for ultra-low latency. Additionally, they are attempting to tackle the increased power demands and higher fibre counts needed to support the advancements that come from supporting AI. 

Similarly, the rise of artificial intelligence has caused a fundamental shift in data centre design, significantly impacting network infrastructure in areas such as cabling, connectivity, architecture, resilience and adaptability. 

 

Here are the key challenges and opportunities that I believe come with cabling AI data centres, some best practices and tips for success. 

 

The unstoppable surge in power demand

Regions that house data centres are experiencing a surge in power demand. In the Republic of Ireland for example, data centres now consume over 20% of the country's electricity, a significant increase from just 5% in 2015. Consequently, for the first time ever, there is no longer a guarantee that the power needed to support data centre operations can be reliably supplied. 

 

Recently, the ‘net zero’ goals of major tech companies have been challenged by this increasing power demand, a direct consequence of AI and energy-hungry data centres. Google reported a 48% increase in its greenhouse gas emissions over the past five years, largely due to the growth of its data centres, while Microsoft’s Scope 3 emissions have risen by over 30% since 2020. 

To strike a balance between enhancing sustainability and expanding capacity and performance, data centres will require support from their infrastructure technology partners.

  

Ultra low latency meets ultra high connectivity solutions

Because the models used to train and run AI consume significant processing capacity and are typically too much for a single machine to handle, processing these large AI models requires numerous interconnected GPUs distributed across multiple servers and racks. This presents a unique challenge for the cabling infrastructure that links everything together to keep data flowing.

For instance, GPU servers demand significantly higher connectivity between servers, but due to power and heat limitations, fewer servers can be housed per rack. As a result, AI data centres require more inter-rack cabling compared to traditional data centres. Each GPU server is linked to a switch within the same row or room, with these connections needing 400G and 800G speeds over distances that traditional copper cables like DACs, AECs or ACCs can’t handle. Moreover, every server must also be connected to the switch fabric, storage, and out-of-band management. 

 

In an ideal setup, GPU servers in an AI cluster would be close together, because AI and machine learning algorithms - like high-performance computing (HPC) - are highly sensitive to latency. It’s estimated that 30% of the time spent running a large training model is due to network latency, while 70% is spent on compute time. To reduce latency, AI clusters strive to keep GPU servers in close proximity, with most links limited to 100 metres. However, not all data centres can place GPU server racks in the same row. These racks require over 40 kW to power a GPU server, far more than typical server racks, forcing traditional data centres to space them out accordingly. 

 

Although extra space isn’t feasible in the densely packed server rack layouts of modern data centres, managing the narrow, congested pathways and the added cabling complexities brought by AI is made possible through innovations like rollable ribbon fibre. 

The innovative design allows for the installation of up to six 3,456 fibre cables within a single four-inch duct, providing more than double the density compared to traditionally packed fibres. 

In the rollable ribbon fibre cable, the fibres are attached intermittently to form a loose web. This design makes the ribbon more flexible, allowing the fibres to flex with a degree of independence from one another. The fibres can now be “rolled” into a cylinder, making much better use of space when compared with flat ribbons.

While the cables are lighter and simplify handling and installation, their intermittent bonding enables installers to position the fibres naturally into a smaller cross-section making it perfect for splicing.

 

Data centre architecture of the future

Looking to the future, the value proposition for data centres will hinge on their extensive processing and storage capabilities and operators need to thoughtfully select the optical transceivers and fibre cables for their AI clusters. 

In an AI cluster, the optics cost is primarily driven by the transceiver due to its short links. Transceivers that utilise parallel fibres are particularly beneficial because they eliminate the need for optical multiplexers and demultiplexers, which are typically required for wavelength division multiplexing (WDM). This results in reduced costs and lower power consumption for transceivers with parallel fibre. 

Links up to 100 metres are supported by both singlemode and multimode fibre applications and advances such as silicon photonics have lowered the cost of singlemode transceivers. 

 

In many AI clusters, active optical cables (AOCs) are used to interconnect GPUs spread over many servers and racks. These cables are usually designed for short distances and are commonly used with multimode fibre and VCSELs. The transmitters and receivers in an AOC may be the same as in analogous transceivers but are the castoffs. These components don’t need to meet stringent interoperability requirements since they are only required to work with the specific unit attached to the other end of the cable. Additionally, since the optical connectors are not accessible to the installer, there is no need for specialised skills to clean and inspect fibre connectors. 

 

Strategic planning for AI cluster cabling

In summary, data centres must evolve and adapt to meet the growing demands of artificial intelligence in business applications and customer service delivery. Infrastructure designers and planners must focus on improving efficiency, scalability, and sustainability. Key to these advancements is the upgrade of cabling systems, which will help reduce costs, energy usage, and installation times. By embracing these innovations, data centre facilities will be well-equipped to manage both current and future AI-driven workloads.

By Kashif Nazir, Technical Manager at Cloudhouse.
By Terry Storrar, Managing Director at Leaseweb UK.
By Manuel Sanchez, Information Security and Compliance Specialist, iManage.
By Peter Hayles, Product Marketing Manager at Western Digital.
By Richard Eglon, CMO, Nebula Global Services.
Anita Mavridis, VP of Product at Zivver, and Sue Musumeci, Director of Quality & Clinical Informatics at Chronic Care Staffing, explore practical...
By Graham Jarvis, Freelance Business and Technology Journalist, Lead Journalist – Business and Technology, Trudy Darwin Communications.
By Krishna Sai, Senior VP of Technology and Engineering.