Home Enterprise Supermicro Liquid-Cooling Solutions are Ready for AI

Supermicro Liquid-Cooling Solutions are Ready for AI

by Jordan Ranous

Supermicro has demonstrated its leadership in liquid-cooling solutions by showcasing its latest innovations at Computex 2024 in Taiwan. These new solutions are designed to deliver superior performance and efficiency for large-scale AI and cloud-scale compute infrastructure.

Supermicro has demonstrated its leadership in liquid-cooling solutions by showcasing its latest innovations at Computex 2024 in Taiwan. These new solutions are designed to deliver superior performance and efficiency for large-scale AI and cloud-scale compute infrastructure.

Liquid-Cooled Rack Solutions for Enhanced Performance

Supermicro’s liquid-cooled rack solutions are engineered to support the highest densities and TDP CPUs and GPUs, offering up to 100kW power and cooling per rack. These solutions are fully validated and tested at the system (L10), rack (L11), and cluster (L12) levels to ensure reliability and performance. With accelerated lead times based on in-stock inventory, deployment can be achieved in weeks. The enterprise-grade components include redundant cooling pumps and power supplies, leak-proof connectors, and leak detection systems.

Modern servers equipped with multiple CPUs and GPUs generate significant heat, up to 10kW per server, posing a challenge for traditional air cooling systems. Supermicro’s liquid-cooling solutions are optimized for AI, HPC, and analytics applications, which require advanced CPU and GPU technologies that run hotter than previous generations. By integrating efficient liquid cooling, Supermicro reduces electricity demands at both the server and rack levels, enhancing performance and reducing costs.

Components of an Efficient Liquid-Cooling Solution

Supermicro’s liquid-cooled rack solutions are comprised of several key components designed to ensure high performance and reliability, including:

  • Coolant Distribution Unit (CDU): This unit circulates coolant to the cold plates, cooling the CPUs and GPUs. The CDU features two hot-swappable and redundant pumping and power supply modules, guaranteeing nearly 100% uptime. It supports up to 100kW cooling capacity and includes an easy-to-use touchscreen and WebUI access for monitoring and control.
  • Coolant Distribution Manifold (CDM): CDMs are the pipes that distribute coolant to each server and return the hotter coolant to the CDU. They are available vertically and horizontally to accommodate different rack designs and server types.
  • Hoses and Connectors: Flexible hoses transport coolant to the CPUs and GPUs, while single-handed, 0-drip quick disconnectors allow for safe and efficient servicing of liquid-cooled systems.
  • Cold Plates: These are placed on top of the CPUs and GPUs, cooling them efficiently by flowing coolant through micro-sized channels. Supermicro cold plates are designed to reduce hot spots and achieve ultra-low thermal resistance.

Advanced Engineering for Rack-Scale Solutions

Supermicro’s liquid-cooled racks are engineered to handle a wide range of servers, ensuring flexible and scalable solutions for high-performance computing environments. Integrating advanced liquid-cooling technology is essential to maintain optimal performance and reliability.

Supermicro servers benefiting from rack-scale liquid cooling include:

  • GPU Systems: Supermicro’s GPU systems combine the fastest processors, memory, and GPUs for AI/ML, inferencing, and HPC applications. Available in 2U, 4U, or 8U configurations, these systems support 4 or 8 NVIDIA® H100 GPUs powered by the latest Intel Xeon or AMD EPYC™ processors. With up to 32 DIMMs of DDR5 memory, these systems offer compact and robust solutions for demanding workloads. Direct-to-chip (D2C) coolers are used to maintain optimal temperatures.
  • Big Twin: BigTwin represents Supermicro’s flagship performance solution for demanding applications and HCI environments. This 2U enclosure supports up to four nodes, each with dual Intel Xeon processors, up to 16 DIMMs of DDR5 memory, and multiple high-speed NVMe drives. Networking options include 10GbE, 25GbE, 100GbE, and 200 Gb HDR InfiniBand.
  • Fat Twin: FatTwin offers high-density, multi-node architecture in a 4U chassis, supporting 4 or 8 nodes with single processors. These systems provide cold-aisle serviceability and are optimized for data center infrastructure with flexible compute and storage options.
  • SuperBlade: The SuperBlade features shared cooling, power, and networking infrastructure and supports up to 20 blade servers in an 8U chassis. With options for Intel Xeon or AMD EPYC processors, it is designed for high performance, energy efficiency, and reduced TCO. Advanced networking options, including 200G HDR InfiniBand, are available.
  • Hyper: The X14 Hyper series offers next-generation performance for demanding workloads. Available in 1U or 2U configurations, these servers support up to 32 DIMM slots and are optimized for maximum compute performance with the highest-performing CPUs.
Product Family Server Description
GPU SYS-421GE-TNHR2-LCC Dual 4th/5th Gen Intel Xeon Processors
4U, 32 DIMMS
NVIDIA HGX H100 8-GPU Board
AS -4125GS-TNHR2-LCC Dual 4th Gen AMD EPYC 9005 Series Processors
4U, 24 DIMMs
NVIDIA HGX H-100 8-GPU Board
SYS-821GE-TNHR Dual 4th Gen Intel• Xeon• Scalable Processors
8U, 32 DIMMs
HGX H100 8-GPU SXM5 Multi-GPU Board
AS -8125GS-TNHR Dual 4th Gen AMD EPYC 9004 Series Processors
8U, 24 DIMMs
NVIDIA HGX H100 8-GPU SXM5 Multi-GPU Board
SYS-421GU-TNXR Dual 4th Gen Intel• Xeon• Scalable Processors
4U, 32 DIMMs
NVIDIA HGX H100 4-GPU Multi-GPU Board
SYS-421GE-TNR (PCIe) Dual 4th Gen Intel• Xeon• Scalable Processors
4U, 32 DIMMs
GPU-NVH100-80,GPU-NVA100-80-NC
AS -4125GS-TNRT (PCIe) Dual 4th Gen AMD EPYC 9004 Series Processors
4U, 32 DIMMs
Up to 8 Double-Width/Single-Width Cards (Full Height Full Length)
NVIDIA H100 and AMD MI200 series
BigTwin SYS-221BT-HNTR Dual 4th Gen Intel• Xeon• Scalable Processors
2U, 4-Nodes, 16 DIMMs
SYS-221BT-DNTR Dual 4th Gen Intel• Xeon• Scalable Processors
2U, 2-Nodes, 16 DIMMS
FatTwin SYS-F511E2-RT Single 4th/5th Gen Intel• Xeon• Processor, 4U, 8-Nodes, 16 DIMMs
SYS-F521E3-RTB Single 4th/5th Gen Intel• Xeon• Processor, 4U, 4-Nodes, 16 DIMMS
SuperBlade SBE-820C/J/J2/L/H-820 8U Enclosure
SBI-421E-1T3N Dual 4th/5th Gen Intel• Xeon• Processors
16 DIMMS

Total Solution

Not only can Supermicro provide you with a single server that’s liquid-cooled, the CDU, and the manifold, but they can also give you the entire cooling tower to put outside of your data center. This is a really interesting offering because they can give organizations a complete solution. Further, with Super Cloud Composer, customers get complete management from an individual CPU or GPU temperature out to the position of different valves and pump speeds in the rack and even the cooling tower, in a single pane of glass management experience.

We received a hands-on demo of Super Cloud Composer that showcased an AMD MI300X GPU rack that was liquid-cooled and included as part of a total solution. The Supermicro Super Cloud Composer platform offers a familiar and easy-to-use interface for monitoring and managing, your data center. Additionally, you can get detailed logging metrics out of the Super Cloud Composer database to assess performance health and view trends to help with preventative active maintenance.

We have been covering liquid-cooling solutions for a while with interest continuing to grow. Our Instagram and YouTube videos have garnered millions of views. Liquid-cooling solutions are proven to keep components cooler, especially given the heavy processing required to stay ahead of the AI tsunami.

Supermicro liquid-cooled rack rolling off the production line in San Jose.

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed