CoolIT has cold plates, manifolds, and cooling distribution units designed to help enterprises adopt liquid cooling for power-hungry servers.
We’ve partnered with CoolIT Systems to bring liquid cooling into our lab. As part of that effort, we put together a mini liquid cooling rig and retrofitted a Dell PowerEdge R760, converting it from air-cooled to liquid-cooled. We’re just starting our liquid-cooling journey but have already made important discoveries about the benefits direct liquid cooling (DLC) offers.
Liquid cooling in some form or another will be required to support modern workloads. The math no longer works for air-cooling servers with massive CPU and GPU thermal design power (TDP). DLC delivered via cold plates is the most common solution, every server vendor has at least one option. When it comes to Dell, they’ve partnered with CoolIT Systems to deliver liquid cooling across the PowerEdge portfolio.
Our lab, like most data centers, wasn’t designed at the outset to take advantage of liquid cooling. But, like many data centers, we’re seeing the highest power servers require liquid cooling in some form, and if we want to take advantage of these systems, we need to adapt. It’s a story we’re hearing a lot in the enterprise these days, as data centers are investing in AI, and finding most of these systems will soon require a liquid loop of some kind for operation.
In our case, we decided to start by retrofitting one of the Dell PowerEdge R760 servers in the lab. To be clear, when customers want liquid-cooled servers, they’re ordered from Dell that way. Dell handles the integration with CoolIT and customers receive a server with cold plates installed and hoses plumbed for liquid cooling. DLC PowerEdge systems have a few nuances that make them different from air-cooled servers, we drifted into relatively uncharted territory with this work. The iDRAC card is different for instance, the DLC version has a lead for leak detection. We were successful in the conversion process, but installing your own cold plates isn’t a supported activity.
CoolIT DLC Kit
CoolIT outfitted us with a mini system that’s typically used for a small proof of concept as their customers work through the process of adding liquid cooling to their data centers. That said, this system can scale to 10kW, so it’s a great way for those new to liquid cooling to gain some experience with half a rack or so. There are three key components of this setup, the cold plates, rack manifold, and coolant distribution unit (CDU).
The cold plates are designed for specific TDP use cases and perfectly fit the CPU or GPU that’s being cooled. They look deceptively simple, and even though there are no pumps or moving parts on the plates themselves, the engineering isn’t trivial thanks to rising TDPs. For perspective, CoolIT recently launched a new line of cold plates that can support up to 1500W. The CPUs in our R760 are a bit pedestrian by comparison, the Intel Xeon 8580 CPUs have “just” a TDP of 350W each.
Installing the cold plates is very simple, the blocks even have thermal paste pre-applied, it’s a very simple drop-in kit. As noted earlier, there is a different iDRAC card for DLC systems that has the connection point for the leak detection cable run from the cold plates. The hoses get routed out the back of the R760, through a different bracket that comes with the DLC iDRAC kit.
The cold plates are connected to the manifold via labeled warm/cold connections. The manifold itself is made from stainless steel and the fittings are dripless quick disconnects. It takes a few seconds to connect the server to the manifold, which does come pre-filled. Incidentally, our manifold went in the back of the rack, but it can be configured in the front if needed. We have a mini manifold for this use case, a more traditional DLC rack would have a manifold covering the entire rack. The manifold connects directly to the CDU.
The CDU does the heavy lifting in this loop, we’re using the CoolIT AHx10. This is a 5U liquid-to-air CDU that can handle 7kW of load at 25C ambient. CoolIT offers an expansion kit that will scale this unit to 10kW. Inside the chassis is a liquid-to-air heat exchanger and redundant pumps. The CDU, like the manifold, comes pre-filled. We’ve placed ours relatively low in the rack, but the CDU can go anywhere, depending on how the rack is set up.
The AHx10 has a max power consumption of 750W, which helps in the overall economics conversation around power savings. The system has an intuitive touchscreen display that offers remote access support. Other than setting the pump pressure initially, there’s really very little that has to be done with the CDU, it’s pretty much set it and forget it, ours has been running for a few weeks with no additional intervention.
It is worth noting that with this CoolIT gear, we’re not solving for the heat itself. We’re in effect moving the heat from the R760’s CPUs to the heat exchanger within the CDU. We still need to cool the lab the same as before since we don’t have facility water to be able to transfer the heat outside of the lab. That said, a small system like this is perfect for a few liquid-cooled servers and may be an ideal fit for an enterprise with a small AI deployment, something like the Dell PowerEdge XE9640 would pair well.
Even though we still have to contend with the heat from the DLC R760 in our lab, there are several benefits of moving to liquid cooling.
Benefits of DLC
When moving to liquid cooling from air cooling, the largest and most obvious benefit is the reduction in fan usage. The R760 still needs airflow for system components like DRAM and storage, but they do not need to spin as fast. While this makes the server quieter, the best part of the DLC loop is the reduction in electricity consumption. The other thing we found was a little surprising, the DLC R760 performed a little bit better than when it was air-cooled.
To look at the R760’s electricity consumption more closely, we set up our Quarch QTL2843 Mains Power Analysis Module. We ran the server both with its factory air-cooled heatsinks and again with the CoolIT cold plates. To pressure the CPUs, we ran a Pi calculation to 50 billion digits, which places a very heavy load on the CPU and DRAM. Our intent was to push the CPUs as hard as possible to ensure the fans had to be called into their maximum required duty.
The impact of the DLC implementation was immediately obvious. When running the R760 in the air-cooled configuration, the fans spin to 100% during the workload, as expected. With the DLC config, the R760 chose to spin the fans at 32%, a dramatic drop. That equates to a savings of 200 watts, in just a single server. It’s not just the fan speed that sticks out, the CPUs themselves reported at roughly half the temperature with DLC, 41/42C compared to 88/89C when air-cooled.
But it’s not just power savings that come out the other side when going to liquid cooling. We saw a little performance boost, which we didn’t expect. With the cold plates delivering better cooling, the CPU can operate to its fullest. In the air-cooled configuration, the R760 completed the 50 billion Pi calculation in 369 seconds. In the DLC configuration, the R760 went a little faster, delivering the calculation in 347 seconds. That’s a gain in performance of around 6%, letting us milk a little more out of the Intel CPUs.
Final Thoughts
We’re just getting started with liquid cooling in the lab and are thrilled to have worked with CoolIT on this initial effort. The cold plates are working perfectly on the PowerEdge R760 and the manifold and CDU come together and “just work” without any concerns or ongoing tinkering. For those who are apprehensive about bringing liquid into the data center, ongoing simplicity is critical. We’ve also had no leaks or other more catastrophic events, which is expected, this is enterprise gear with an exceedingly low failure rate.
For enterprises looking to bring high-power AI systems into the data center, liquid cooling is a forgone conclusion. The 8-way GPU servers are going to abandon air-cooling, opting for DLC loops like this or at a minimum, a closed loop and radiator. Either way, some amount of liquid will find its way into the data center. With the substantial electricity savings and modest performance boost, there are a lot of reasons enterprises should be embracing DLC servers.
CoolIT is a clear leader in this space and its relationship with Dell brings a wide variety of liquid cooling solutions to market in an easily consumable way, with very little to worry about. We’re looking forward to exploring our small loop further and can’t wait to see more liquid-cooled servers in the lab.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed