CoreWeave Unveils First Dell XE9712 GB200 NVL-72 System

by Divyansh Jain December 2, 2024

written by Divyansh Jain December 2, 2024

CoreWeave has unveiled its first Dell XE9712 GB200 NVL-72 System – complete with performance insights!

CoreWeave has deployed the latest GB200 NVL-72 system with the new Dell XE9712 servers. The system was showcased in a live demonstration at a Switch state-of-the-art data center, which highlighted its groundbreaking performance and advanced cooling infrastructure.

Dell XE9712 GB200 NVL-72

CoreWeave’s GB200 NVL-72 system, housed in Rob Roy’s Evo Chamber, is designed to handle the most demanding computational workloads. The live demo began with the NCCL All-Reduce Test, a benchmark demonstrating the ultra-high-bandwidth and low-latency of the Nvidia NVLink interconnectivity across the rack’s 72 GPUs. The test ensures seamless communication between the GPUs.

Building on this, the GPU Blaze Test illustrated the system’s raw computational power. The GPUs tackled complex matrix multiplication workloads, simulating operations used in AI training, scientific simulations, and advanced data processing.

Live Training with CoreWeave’s Sunk

The GB200 NVL-72 was also tested with a live training run using Slurm on Kubernetes (Sunk), training the Megatron Model. The training session validated the rack with a real workload and demonstrated the resulting load on the cooling and power infrastructure.

As GPU activity ramped up, the in-rack Cooling Distribution Unit (CDU) dynamically adjusted cooling output to maintain optimal hardware temperatures. Real-time data from the CDU illustrated how fluid return temperatures increased with GPU workloads, ensuring efficient thermal management without compromising performance.

The GB200 NVL-72’s power dashboard provided a continuous overview of the system’s energy requirements, demonstrating its efficiency and transparency in energy management.

Rob Roy’s Evo Chamber

The NVL72 is housed in Rob Roy’s Evo Chamber, which provides an impressive 1MW of power and cooling capability per rack. This advancement in infrastructure combines 250kW of air cooling with 750kW of direct-to-chip liquid cooling capacity, ensuring optimal performance for the most demanding AI and HPC workloads. The chamber’s sophisticated design maintains efficient power usage and thermal management while supporting next-generation computing requirements.

Conclusion

CoreWeave is a clear industry leader when it comes to proving AI infrastructure as a service. Much of their success is due to their ability to onboard the latest AI infrastructure faster than other clouds. The new Dell GB200 NVL-72 systems represent a new era in high-performance computing. They combine cutting-edge GPU performance, advanced cooling solutions, and energy efficiency to meet the demands of AI, scientific research, and data-intensive applications—a massive win for their customers who are running AI workloads at scale.

CoreWeave

Engage with StorageReview

Divyansh Jain

MLOps and Machine Learning Engineer focused on NLP and large-scale training. At Storage Review, I deal with AI, GPU, and emerging workload testing to deliver practical insights and performance analytics.

CoreWeave Unveils First Dell XE9712 GB200 NVL-72 System

Live Training with CoreWeave’s Sunk

Rob Roy’s Evo Chamber

Conclusion

Divyansh Jain

Proxmox Backup Server 3.3 Launched

AWS Expands Storage-Optimized EC2 Instances with the Introduction of I8g and I7ie Types

TRUSTED VENDORS