Traditional data center workloads vs AI workloads

By Isaac Douglas, CRO at servers.com.

There’s no doubt that AI is growing ever more impressive, but these capabilities are bringing new challenges in terms of compute power. And, as more industries find novel ways to pack AI into their systems, the pressure on AI’s underpinning infrastructure is reaching an inflection point. 

This technological momentum is reshaping the data center landscape, and that’s because AI-centric workloads demand far more from their underlying infrastructure than conventional applications do. From specialized hardware to energy-efficient designs and high performance networking, these new requirements are influencing how data centers are retrofitted, and how entirely new facilities are being designed, constructed and operated.

Supporting AI’s GPU needs

Underneath the bells and whistles of these AI tools are GPU-centric architectures. Unlike traditional data center workloads that rely on CPU-based processing, AI technologies need the huge processing capabilities of GPUs. Think about all of the simultaneous tasks that AI models handle - GPUs enable multiple calculations to be made at the same time, whereas CPUs are better suited to sequential tasks. It’s why AI companies rely on GPUs to train and run models effectively.

But with greater processing capabilities comes the need for more electricity. GPUs are power hungry, consuming far more than standard workloads processed by CPUs. As Google recently predicted, machine learning deployments will start to demand more than 500KW per IT rack in the next 5 years. In contrast, a traditional data center rack consumes between 5 and 30KW on average. Given this is no small change, data center operators are grappling with support issues that go far beyond the financial cost of providing this power.

Power and cooling

It’s reported that up to 40% of the energy data centers use goes towards cooling efforts, and this is likely to grow. It’s no surprise then that power hungry AI tools, which generate vast quantities of heat, are encouraging data center operators to reconfigure their cooling set ups to make these as efficient as possible. 

The same goes for developers too, who are increasingly on the lookout for locations that bring natural advantages like cooler climates and access to plentiful renewable energy sources. In Canada and Iceland, there’s an abundance of hydropower and geothermal energy, which provide sustainable, cost-effective options for data centers that manage high-density AI workloads. But the compromise has to come from somewhere - often in the form of increased distance from end users, which can impact the latency of AI systems.

Meanwhile, in warmer areas, teams are getting creative with cutting-edge cooling technology like liquid cooling, and even systems that pump coolant right to the chips. The reality is that many new data centers are popping up in middle-ground locations: not too hot, not too remote, and with decent access to renewables. With some advanced cooling systems brought into the mix, many data center operators have found a solid solution for keeping AI humming along efficiently.

The underpinning network infrastructures

Powering and cooling AI are two crucial issues, but feeding AI models with the right data - at speed - is another challenge. Traditional data centers, built with regular CPUs in mind, usually top out at around 10-20 gigabits per second. But data hungry AI models operate far beyond this, commanding much greater bandwidths to shuffle massive amounts of data between GPUs and other AI hardware.

It explains why there is a push to upgrade the networking infrastructure that underpins AI tools. Data center developers are funnelling investment into high-performance networking solutions like high-speed interconnects, which help facilitate rapid data transfer between GPU clusters and specialized AI processing units like Tensor Processing Units (TPUs). 

If we want AI to work smoothly and actually live up to the hype - achieving fast, reliable, and low-latency performance - then the networking backbone has to be just as advanced as the processing side. Without it, the flashiest of AI hardware wouldn’t be nearly as impressive, or capable.

The great AI race

What’s clear is that the AI race is influencing a similar scramble in the data center world. Some operators foresaw the impact of AI and the shift it would bring to infrastructure years ago, but for others, it’s a race against time to catch up, developing data centers capable of supporting advanced AI capabilities. This is no mean feat, given that it can take years to not only source the correct funding, but also construct an appropriate data center facility.

AI workloads can run in any data center, but it’s clear that not all environments are capable of doing so in a way that is both cost and energy efficient. The coming years will be incredibly telling for data center operators and how they strike the balance between supporting AI adoption, putting the right infrastructure in place, and the pricing models they offer to their customers.

By Matt Torma, Director of Mission Critical at XYZ Reality.
By Jessica Bere, technical director at Gateley Hamer.
By Gordon Johnson, Senior CFD Manager for Subzero Engineering.
Steve Clifford, director of data centres, EMCOR UK, explores the challenges of data centre demand...
By Francesco Fontana, Enterprise Marketing and Alliances Director at Aruba S.p.A.
Half a dozen pressing challenges driving the global data centre crunch. By Daniele Viappiani,...
By Chris Carreiro, CTO at Park Place Technologies.
By Juan Colina, EMEA data centre & IT segment leader, Eaton.