The AMD EPYC 9754S is designed for HPC workloads with SMT disabled, delivering 128 cores and 128 threads with a default TDP of 360W.
Last year AMD expanded its server CPU line with 4th Gen EPYC. While the 128-core, 256-thread EPYC 9754 took top billing, just under it on the SKU matrix is the AMD EPYC 9754S. The difference between the two chips is simple, yet dramatic. The 9754S has Simultaneous Multithreading (SMT) disabled. This means the 9754S delivers the same 128 cores as the 9754, but with SMT disabled, just 128 threads, compared to 256. This change brings about a nice discount for customers disabling SMT already.
Model | Cores | Max Threads | Default TDP | Base Freq. (GHz) | Boost Freq. (GHz) | L3 Cache (MB) |
---|---|---|---|---|---|---|
9754 | 128 | 256 | 360W | 2.25 | 3.10 | 256 |
9754S | 128 | 128 | 360W | 2.25 | 3.10 | 256 |
9734 | 112 | 224 | 320W | 2.2 | 3.0 | 256 |
What is AMD SMT and Why does the 9754S Exist?
With SMT, a single EPYC CPU core can process two threads simultaneously, this can lead to more efficient use of the processor’s resources. When one thread is waiting for data to be loaded from memory or is otherwise idle, the other thread can be executing instructions. This means the core spends less time idle, potentially improving performance. This is especially true in use cases like virtualization and rendering.
Disabling SMT can allow manufacturers to market these chips as lower-tier products, ensuring they still meet specific performance and stability criteria. CPUs with SMT disabled can be influenced by binning processes, market segmentation strategies, and the desire to cater to specific performance or efficiency needs, showcasing the nuanced approach manufacturers take in product planning and positioning.
That said, not every workload benefits from SMT, and many times, an AMD server may have SMT disabled in the BIOS. While that can be an effective tweak, this brings up another important point. The 9754S chip with SMT disabled is a little less expensive than the 9754. In either event, single-threaded applications, computational workloads, and any use cases where CPU latency is critically important can benefit from having SMT disabled.
AMD EPYC 9754S vs EPYC 9754 Performance
We want to pull two of our regular tests, y-cruncher and Cinebench 2024, and see what performance differences we get with and without SMT. We ran 9754S and 9754 against each other while running the 9754 with SMT on and off to see what advantages the 9754S has without SMT at all.
Test Platform and Specs:
- TYAN Transport HX TN85-B8261
- 512GB DDR5
- Windows Server 2022
Cinebench 2024
First up is Cinebench 2024, with SMT enabled on our non S model. Here we can see we are within run-to-run variation differences.
Cinebench 2024 CPU | 2x EPYC 9754S | 2x EPYC 9754 |
---|---|---|
CPU Multi-Core | 2,682 | 2,587 |
CPU Single-Core | 68 | 69 |
MP Ratio | 39.19x | 37.64x |
y-cruncher specifically was selected because of the architecture of the program, positioned as a total system test. Performing as large of a Pi calculation that will fit into system memory, we aimed to prove our long-standing intuition, that SMT can negatively impact CPU and Memory bound workloads. Let’s take a look at the results first before diving into what it all means.
y-cruncher 0.8.3
y-cruncher 0.8.3 Total Computation Time in seconds (lower is better) |
2x EPYC 9754S | 2x EPYC 9754 (SMT Off) | 2x EPYC 9754 (SMT On) | 9754 SMT Off Performance Increase |
---|---|---|---|---|
1 Billion | 13.481 | 13.546 | 14.139 | 4.65% |
2.5 Billion | 23.818 | 24.144 | 28.111 | 15.27% |
5 Billion | 40.760 | 40.797 | 49.271 | 17.27% |
10 Billion | 77.409 | 77.959 | 95.420 | 18.88% |
25 Billion | 203.303 | 202.124 | 233.629 | 12.98% |
50 Billion | 475.557 | 476.949 | 520.349 | 8.61% |
100 Billion | 1,248.458 | 1,251.36 | 1,242.419 | -0.49% |
y-cruncher 0.8.4
y-cruncher 0.8.4 Total Computation Time in seconds (lower is better) |
2x EPYC 9754S | 2x EPYC 9754 (SMT Off) | 2x EPYC 9754 (SMT On) | 9754 SMT Off Performance Increase |
---|---|---|---|---|
1 Billion | 13.480 | 13.56 | 14.573 | 7.50% |
2.5 Billion | 23.680 | 23.501 | 28.649 | 17.34% |
5 Billion | 40.819 | 40.547 | 50.082 | 18.50% |
10 Billion | 78.523 | 77.466 | 93.842 | 16.32% |
25 Billion | 206.399 | 206.078 | 236.070 | 12.57% |
50 Billion | 483.797 | 482.79 | 521.867 | 7.29% |
100 Billion | 1,269.484 | 1,266.83 | 1,253.446 | -1.28% |
Results Analysis
Diving into the intricacies of AMD SMT, there’s a compelling dialogue within the tech community about its implications on system performance. At its core, SMT appears to be a straightforward choice for those in pursuit of enhanced performance. The theory goes: if enabling SMT can lead to ideal scaling, then why not embrace it as a beneficial architectural choice?
The relationship between SMT efficiency and core architecture isn’t black and white. Lackluster SMT scaling doesn’t necessarily point to a flaw in its implementation. In fact, it could hint at a robust core design that hardly leaves room for SMT to make a noticeable difference. This paradox underscores a crucial industry insight: processor manufacturers can’t claim a one-size-fits-all benefit with SMT or similar technologies. They acknowledge that while SMT can squeeze out additional performance in certain use cases, it’s not without its shortcomings in other scenarios.
Through the lens of high-performance computing and supercomputing tasks, the limitations of SMT become more apparent. While the idea of doubling the thread count per core might sound promising, the reality is not akin to having double the cores. In extreme cases, this can lead to performance dips as threads vie for cache resources. Nonetheless, for the majority of multi-threaded applications, especially those devoid of cache competition, SMT lifts performance, primarily shining in tasks that can fully leverage its potential.
Closing Thoughts
AMD SMT is incredibly useful for a wide variety of workloads that are common in the enterprise. But not every workload needs or benefits from SMT. Through our testing we have shown how AMD is able to take advantage of variations in manufacturing to deliver a solid product that has a unique value proposition. Organizations designing platforms for specific types of workloads that need pure-core without SMT, can save a little bit of money by buying the AMD EPYC 9754S, which has SMT disabled out permanently from the factory.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed