Home Enterprise 4th Gen AMD EPYC Review (AMD Genoa)

4th Gen AMD EPYC Review (AMD Genoa)

by Jordan Ranous

AMD has announced the general availability of the new EPYC 9004 4th-gen CPUs. Code-named AMD Genoa, the new line of CPUs supports 12 channels of DDR5-4800 (up to 6TB memory capacity per socket), 128 lanes of PCIe Gen5, AMD Infinity Fabric/Guard technology, and up to 96 cores. This makes them ideal for critical workloads across cloud, enterprise, and high-performance computing.

AMD has announced the general availability of the new EPYC 9004 4th-gen CPUs. Code-named AMD Genoa, the new line of CPUs supports 12 channels of DDR5-4800 (up to 6TB memory capacity per socket), 128 lanes of PCIe Gen5, AMD Infinity Fabric/Guard technology, and up to 96 cores. This makes them ideal for critical workloads across cloud, enterprise, and high-performance computing.

Benefits of EPYC 9004 4th-Gen CPUs for Businesses

With its massive 96-core-count in a single processor, the new AMD Genoa processors will allow organizations to reduce their physical footprint by deploying fewer servers while leveraging more powerful servers. This brings greater flexibility to data center ecosystems and helps them to reach sustainability and future-proofing goals.

AMD Genoa server

AMD built their new EPYC processors with a huge focus on enhanced security, particularly with expanding AMD Infinity Guard, the company’s set of features that offers both physical and virtual layers of protection to their CPUs. For example, it features twice the number of encryption keys compared to previous generations, which helps customers keep their data secure whether it’s stored locally, in the cloud, or residing in storage.

Organizations will be able to benefit from their “all-in” feature set, with options to choose a model with the core count and frequency (see table below) that best suits their needs. Moreover, the 4th Gen AMD EPYC processors now support DDR5 memory and PCIe Gen 5, both of which are vital for AI and ML applications. Of course, enterprise SSD vendors are chomping at the bit to get their drives out to the mainstream, unlocking twice as much bandwidth potential as Gen 4.

AMD Genoa 9004 Series SKUs

Model Cores Default TDP cTDP Base (GHz) Boost (GHz)
9654 96 360w 320-400w 2.4 3.7
9634 84 290w 240-300w 2.25 3.7
9554 64 360w 320-400w 3.1 3.75
9534 64 280w 240-300w 2.45 3.7
9454 48 290w 240-300w 2.75 3.8
9354 32 280w 240-300w 3.25 3.8
9334 32 210w 200-240w 2.7 3.9
9254 24 200w 200-240w 2.9 4.15
9224 24 200w 200-240w 2.5 3.7
9124 16 200w 200-240w 3.0 3.7
9474F 48 360w 320-400w 3.6 4.1
9374F 32 320w 320-400w 3.85 4.3
9274F 24 320w 320-400w 4.05 4.3
9174F 16 320w 320-400w 4.1 4.4
9654P 96 360w 320-400w 2.4 3.7
9554P 64 360w 320-400w 3.1 3.75
9454P 48 290w 240-300w 2.75 3.8
9354P 32 280w 240-300w 3.25 3.8

AMD Genoa – Zen 4 Architecture

Released this past September, Zen 4 is the new microarchitecture for the AMD EPYC 9004 CPUs, featuring AMD’s highest-performance core to date. This helps EPYC 9004 CPUs deliver new levels of leadership in performance and energy efficiency and allows customers to accelerate data center modernization for greater application throughput and more actionable insights. Zen 4 also powers their new consumer-grade Ryzen 7000 desktop processors.

amd genoa DRAM

One of the bigger changes Zen 4 brings is that it no longer supports DDR4 memory, as it has moved solely to DDR5-only. Moreover, Zen 4 supports new AMD EXPO SPD profiles, allowing more comprehensive memory tuning and overclocking by RAM manufacturers.

Some of the other new features include:

  • Fast private 1M L2 cache
  • More outstanding misses supported from L2 to L3 per core
  • More outstanding misses supported from L3 to memory
  • Improved L3 and L2 miss BW
  • Higher BW enables prefetch improvements

Zen 4 microarchitecture overview

Zen 4 vs. Zen 3

AMD EPYC 9004 Series Improvements Over Previous Generations

The new AMD Genoa offers a range of noticeable improvements, including an increase in the maximum core count to a whopping 96 per CPU. This is significant compared to the last few generations:

  • Maximum 64 cores per CPU with the 7773X and 7763 (3rd Gen EPYC) models
  • Maximum 40 cores per CPU with the 8380 (3rd Gen Xeon Platinum) model

AMD indicates that this will translate to roughly 2.3x the performance compared to its competition (or 1.6x for performance per watt) when it comes to faster time-to-solutions. This will be the biggest boost in overall performance we’ve seen from their next-gen releases. For enterprise business operations per second, the 4th Gen AMD expects their EPYC CPUs to offer ~2.6x the performance while hinting at a 2.4x performance boost in rendering speeds when using Arnold Autodesk.

amd genoa heatsync

It also offers a significant upgrade in core performance due to the Zen 4 architecture and a potential increase of ~14 percent in IPC uplift for server CPUs.

The new EPYC 9004 Series also has leadership platform capabilities that are provisioned for scaling:

  • 12-channel DDR5-4800 with enhanced single-rank performance
  • 128L 32Gps and 8L 8Gps multi-function SERDES (serializer/deserializer)

In addition, it features CXL1.1+ memory support (CXL “Type3”), which includes advanced memory attach capability for DDR and emerging memory, as well as SEV-SNP, QoS, and tiered memory management extensions. For enhanced security, it supports SEV-SNP key extensions and AES-256-XTS.

AMD EPYC 9004 Series Memory

The AMD Genoa CPUs support 12-channels per CPU, 6TB per socket capacity, up to DDR4800, and can also theoretically reach up to 460GB/s in peak bandwidth.

Here’s an at-a-glance look at the comparisons between 3rd-gen and 4th-gen memory bandwidth performance:

AMD EPYC 9004 Series CPU Positioning

Like in their previous generation, AMD has categorized their new CPUs into three different groups:

  • Core performance, which is comprised of high-frequency CPUs with a large cache/core ratio. Models that fall under this include 9474F (48 cores @ 360W), 9374F (32 cores @ 320W), 9274F (24 cores @ 320W), and 9174F (16 cores @ 320W).
  • Core density, which is comprised of the highest core and thread count CPUs. Models that fall under this include 9654/P (96 cores @ 360W), 9634 (84 cores @ 290W), 9554/P (64 cores @ 360W), 9534 (64 cores @ 280W) and 9454/P (48 cores @ 290W).
  • Balanced and optimized performance, which is comprised of CPUs that feature a balance of performance and TCO. Models that fall under this include 9354/P (32 cores @ 280W), 9334 (32 cores @ 210W), 9254 (24 cores @ 200W), 9224 (24 cores @ 200W), and 9124 (16 cores @ 200W).

AMD Genoa 9004 CPU Performance

Benchmarking Configuration

For our initial testing, we positioned the current top-end Intel and AMD platforms against one another in an initial batch of CPU-intensive workloads. For our Intel platform, we leveraged our initial dual-CPU Intel 8380 platform built around an Intel OEM server against our dual-CPU AMD EPYC 9654 platform inside a Quanta chassis.

Intel Platform Specifications:
2 x Intel Xeon Platinum 8380 40-core CPUs
16 x 32GB 3200MHz DDR4
Windows Server 2022 OS

AMD Platform Specifications:
2 x AMD EPYC 9654 96-core CPUs
24 x 64GB 4800MHz DDR5
Windows Server 2022 OS

V-Ray

V-Ray Benchmark is an application from Chaos Group to score and compare various CPUs and GPUs. Chaos Group is known for its work around visualizations and rendering specializing in ray tracing technology. The V-Ray Benchmark contains a custom build test scene to test any combination of CPU and GPU and compare a system’s performance against another.

In our lab, we utilized the V-Ray benchmark in the CPU-only mode. In order to minimize any potential bottlenecks, we utilized a Solidigm P5520 7.68TB NVMe SSD and a clean installation of Windows Server 2022. The top of the leaderboard for V-Ray was previously a 2x AMD EPYC 7K83 64-Core Processor system that scored an impressive 100,844 average across 6 tests. Our sample system with 2x AMD EPYC Genoa 96-Core scored an average of 126,940 across 9 tests. Compared to the Intel System

Firefox Build from Source

Firefox, the browser from Mozilla, is a huge open-source project. Mozilla is keen to offer you the ability to compile the project from source code yourself, something that has become more ubiquitous as a tool to compare performance. The download for this is in the several gigabytes with thousands of files needing to be compiled.

In our tests, we were more than impressed with the 6-minute and 57-second time of the Intel Xeon 8380 rig, that was until we fired it off on the Genoa rig, which came in at an insanely fast 6-minute 33-second compile time. For comparison, a top-tier workstation is going to be able to complete this task in hardly under 10 minutes if you feed it a steady diet of liquid nitrogen and excess voltage, meaning we are dealing with some serious raw horsepower out of the gate with these chips.

FF Build from source
2 x AMD 9654 96-Core 6:33.85
2 x Intel 8380 40-Core 6:57.85

Blender – CLI benchmark

Blender Benchmark is an established standard in the CPU and GPU benchmarking scene. Blender is an open-source 3D modeling and animation tool that is highly advanced and considered a leader in the space. Consistent with the theme of the Genoa EPYC processors, we are utilizing it to showcase the flexibility of having a blended architecture that is capable of a CPU and a potential GPU stand-in for high-density rack deployments.

Blender has three benchmarks, known as Monster, Junkshop, and Classroom. These are three scenes that are rendered sequentially and given a score for each section, which is then added up for a total score.

Blender Test 2 x AMD 9654 96-Core 2 x Intel 8380 40-Core
Monster 1788.189128 671.145395
Junkshop 1062.533142 407.141514
Classroom 850.646333 320.507039
Total 3701.368603 1398.793948

The Genoa rig scored a crushing 3701 total, with 1788.2 in Monster, 1062.5 in Junkshop, and 850.6 on the Classroom benchmark. Comparing Genoa to the Intel Xeon Platinum may seem unfair in some ways, considering Intel only comes to bat with about 41 percent of the core count, however, if we look at the data and normalize for the difference in core count, the results get interesting. The AMD Genoa Chip, thanks to its newer architecture, instruction sets, and use of DDR5, is about 10 percent faster than the Intel rig.

2 x AMD 9654 96-Core Relative Intel Core Count Percent 2 x Intel 8380 40-Core
192 Core 41.67 percent 80 Core
384 Thread 41.67 percent 160 Thread

 

2 x AMD 9654 96-Core 2 x Intel 8380 40-Core
Blender Total Score 3701 1399
Cores / Threads 192/384 80/160
Intel Core Count/AMD 41.67 percent
Direct Score Comparison Intel/AMD 37.79 percent
Core Normalized AMD Score 1542
Relative Intel/AMD, Core Normalized 90.70 percent

Cinebench R23

Cinebench by Maxon has been a mainstay of benchmarking for some time now, thanks to its standardized testing methodology and use of real-world tests to benchmark multicore, and single-core performance. R23, the latest iteration of Cinebench, does have a limitation we have not had to work around previously; It is only capable of benchmarking 256 cores/threads. Our test rig has 384. Interestingly enough there were a lot of “standard” benchmarks and applications that we ran into that were core capped at 256, so Cinebench is not alone in needing an update for the ultra-core-count future we are headed towards.

To attempt to address this limitation we ran two tests simultaneously, and capped each instance to 196 threads to try and evenly split the load. Usually, you can set the CPU affinity in Task Manager, however, something was blocking this operation with Cinebench, we suspect it is a flag that has been set in the underlying way the API is being called for CPU priority. We tried running it as less privileged users, and launching with the command line “start /affinity NODE 0” flag to try and force it, but were unable to lock the application to a single NUMA Node.

Unable to assign a specific affinity for the program, we just ran the application twice and started them together. The results showed two drastically different scores from the two instances of the app, however monitoring the CPU usage we were able to observe it bouncing between 80 -100 percent utilization during the test.

Cinebench Single Instance 2 x AMD 9654 96-Core 2 x Intel 8380 40-Core
Multi Thread (256 cap) 85,160 70,540
Single Core 972 985
MP Ratio 87.65x 71.63x

 

2 Cinebench Instances
AMD Test Run 1 AMD Test Run 2
Score, 1st instance 82,063 68,231
Score, 2nd instance 57,557 57,221
Total 139,620 125,452

Final Thoughts

The benchmarking process of the AMD EPYC 9004 CPUs has been an interesting exercise, to say the least. The challenges that we faced in the early review process are indicative of the overall challenges that software developers will have to address as the CPU landscape shifts into the ultra-high-density model. While there are some off-the-shelf applications that can utilize it, we increasingly found limits of software that was unable to scale past some thresholds of thread count.

In the lab, we are working on some homegrown Tensorflow Machine Learning benchmarks to be able to test these new CPUs in real-world scenarios. We will follow up with the results when we have confidence in the results of the new application and have validated it across multiple platforms and generations of CPUs.

For now though, the launch of AMD Genoa is very exciting as we’ve seen thus far with the Quanta server. Plus, HPE and Dell have announced their servers, each offering four systems, two 1 CPU chassis, and two 2 CPU chassis. This will bring AMD Genoa to the enterprise immediately, quickly expanding Genoa’s footprint past just the hyperscalers.

amd genoa bare cpu

The big question then is obvious, is AMD Genoa worth the investment? This will come down to workload in terms of justifying the spend, but just like DPUs for VMware, these new CPU technologies have a lot to offer in terms of compute power, security, and efficiency. Replacing 3rd Gen EPYC with these is probably a bit premature, but anyone who’s been waiting for a reason to jump should be very happy to see what Genoa has brought to the table.

We have much more testing and work to do and with Intel Sapphire Rapids coming soon, we’ll want to compare the best that each has to provide. But for now, AMD Genoa is extremely compelling and should be in any infrastructure refresh PoC so organizations can better understand the impact of all of these cores and efficiencies that AMD has to offer.

AMD Launch Video

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed