|
[Sponsors] |
November 14, 2022, 12:14 |
AMD Epyc 9004 "Genoa" buyers guide for CFD
|
#1 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
1. Introduction
AMD recently unveiled/released the latest series of server CPUs based on the Zen4 architecture. I won't go into all details here, plenty of news sites covered the launch in detail. E.g. https://www.servethehome.com/amd-epy...nning-fashion/ From a CFD perspective, these are the important bits:
Variety of relevant benchmarks: Ansys (Fluent, CFX, LS-Dyna, Mechanical): https://www.amd.com/system/files/doc...nerational.pdf Siemens STAR-CCM+: https://blogs.sw.siemens.com/simcent...cfd-benchmark/ 2. SKU list Here is a list of all Genoa CPUs announced so far; source: https://www.servethehome.com/amd-epy...fficial-table/ Genoa-X CPUs with increased last level cache were not part of this launch, but are expected to launch in 2023. So keep an eye out for them if you need even more performance. 3. Pitfalls and CPUs to avoid Just like in the previous generations, the connection between the compute dies (CCD) and the I/O-die where the memory controllers reside can be a bottleneck. For 2nd gen Epyc Rome, we had CPUs with effectively half the total memory bandwidth thanks to this bottleneck. The GMI3 links between CCD and an IOD did not undergo major changes. If my napkin math is correct, the bandwidth here is 57.6GB/s and 22.8GB/s for read and write respectively. Which requires 8 of these links to match the memory bandwidth. Remember: reads are more important than writes in most cases, so it doesn't matter that the write bandwidth lacks behind. AMD has a trick up their sleeve: the CPUs with only 4 CCDs can be connected to the IOD with 2 GMI3 links each, which is enough to utilize the full potential of the memory subsystem. So in theory, the full stack of CPUs launched so far should not hide any nasty surprises. CAVEAT: should The wording in AMDs official slides is not definitive enough for my personal taste. It is stated that CPUs with 4 CCDs can utilize 2 GMI3 links per CCD. Not that all of them necessarily do. Maybe I'm just too paranoid, decide for yourself... And there are the 2 low-end CPUs with only 64MB of L3 cache. How this is achieved remains to be seen. Up until now, 64MB L3 cache on an Epyc CPU meant only 2 CCDs are active. Which wasn't enough to get the full bandwidth in previous generations, and still isn't enough in this generation, even with two GMI3 links per CCD. The table above lists them as 4 CCD parts. AMDs official website lists all CPUs with a memory bandwidth of 460.8GB/s. But until detailed benchmarks for some of the lower-end parts are out, I would treat that information with some skepticism. Edit: after sifting through AMD's own technical documentation, I can confirm that ALL CPUs launched so far consist of at least 4 active CCDs. Even those with 64MB of L3 cache. Regardless, the value proposition for the lowest-end SKUs 9224 and 9124 just isn't there. The platform cost is fairly high thanks to PCIe5 and DDR5. And the halved L3 cache will have a negative performance impact. In this price and core count range, you are likely better off with some discounted parts from previous generations. And until further benchmarks or confirmations are available, I reserve final judgement for all all SKUs listed above with a "4+1" configuration. I will post an update if/when that happens. 4. Preliminary recommendations Per-core licensed solvers (e.g. Ansys Fluent): The F-SKUs (9474F, 9374F, 9274F) with higher clock speeds are the obvious choice here if you can justify the price tag with the license costs. Just like in the last generation, the 16-core 9174F is more expensive than the 24-core part. Better avoid it. If you need better value, the 32-core 9354 is the way to go. Or wait until Genoa-X is available. Solvers without license constraints, e.g. OpenFOAM Factoring in the significant platform costs, the 32-core 9354 stands out again for best value. Though if your budget allows it, you can definitely go up to 48 (9454) or even 64 cores (9554) per CPU with this generation. Whether the lower-end parts with 4CCDs and 128MB of L3 cache are worth considering depends on 2 factors: platform costs (DDR5, motherboards) need to come down, and we need confirmation that these parts have full bandwidth. Until then, there might be better value in Milan and Rome. Single-socket CPUs 9454P (48 cores) and 9354P (32 cores) are worth considering. At least on paper, they should provide similar performance as a dual-socket solution from the previous generation. And this time, all CPUs with a P suffix have 256MB of L3 cache. 5. Cooling and power With up to 400W TDP for a single CPU, power consumption and cooling requirements are significantly higher than for the previous generation. For somewhat quiet cooling in a dual-socket workstation, water cooling will be almost mandatory. And we will definitely see more water cooling solutions for servers too. Though it will be interesting to see how much more power efficiency there is to be had by dialing back the cTDP on these CPUs for CFD workloads. 6. Availability 7. Memory considerations As always, you need at least one DIMM for each memory channel. So 12 for a single CPU, 24 for two CPUs. Don't mix and match different modules if you want maximum performance. And don't use weird configurations like 20 DIMMs per CPU. It's either one or two DIMMs per channel. An upside of DDR5, if AMD's official slides are to be believed: the performance hit from single-rank DIMMs is significantly reduced compared to DDR4. It's not gone, but within margin of error for application benchmarks. genoa_bandwidth.jpg source: https://www.servethehome.com/wp-cont...-Bandwidth.jpg Regarding the amount of DIMMs per channel: if the officially supported memory speeds on Zen4 desktop CPUs are any indication, better keep it at one DIMM per channel. Ryzen 7000 CPUs drop supported memory speeds from DDR5-5200 down to DDR5-3600 for everything beyond one dual-rank DIMM per channel. Even for two single-rank DIMMs per channel. ryzen700_memory.png source: https://www.amd.com/en/products/cpu/amd-ryzen-7-7700x That's not too much of an issue on the smaller platform because memory overclocking is a thing. But bumping memory speeds beyond the officially supported specs can be tricky to impossible on server platforms. Things might be different on the server platform, but better safe than sorry at this point. Useful links All AMD guides: https://www.amd.com/en/processors/tu...pe%3Aepyc_9004 Workload tuning guide: https://www.amd.com/system/files/doc...d-workload.pdf Compiler and HPC optimization: https://www.amd.com/system/files/doc...-toolchain.pdf Comments and discussions are welcome! Last edited by flotus1; November 16, 2022 at 08:23. |
|
November 14, 2022, 13:43 |
|
#2 |
New Member
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8 |
Thank you for sharing your thoughts and knowledge, Alex!
I was considering getting one of these beauties. Fortunately, we have some good partners to get them early. Anyway, I was wondering, if we can expect a large difference in a 2x9474F and 2x9554 setup. If we consider the base frequency and the number of cores, the 9554 should deliver "only" around 15% more, while being more than 30% more expensive (only CPU price). The memory bandwidth will probably no be the limiting factor for both, right? Would you say a 9554 is really worth it compared to a 9474F or are there some other factors to take into consideration? I guess the rest of the setup can be exactly the same. By the way, I use Star-CCM+ although licensing is not an issue. Also, Siemens published some benchmarks on their website: https://blogs.sw.siemens.com/simcent...cfd-benchmark/ Sure, always take them with a grain of salt, but still very impressive results . |
|
November 14, 2022, 14:21 |
|
#3 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Thanks for the CCM+ benchmarks, I had not seen them yet.
That's a tough choice if you have the licenses for either of these CPUs. 9554 will be faster, just by how much... It will start to be limited by bandwidth, I am familiar with the scaling of CCM+. So definitely not 30% faster. 15% is probably a good estimate. I guess it depends on the quotes you get. If it's 15% more performance for 15% more total cost for the system, I would consider that worth it at the high-end. There are usually diminishing returns. Up to you (or the controllers at your company) if the budget is there. The entire rest of the system can indeed remain the same. |
|
November 14, 2022, 14:26 |
|
#4 |
New Member
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8 |
Thanks, Alex. Yeah, I asked for a quote for both setups, so let's see how much they charge or if they really want the prices listed above. Maybe the 9554 doesn't fit my budget anyway, than the choice is obvious.
Anyway, whenever I get the server, I am happy to post benchmarks in the other thread. |
|
November 15, 2022, 10:31 |
|
#5 | |
Member
Matt
Join Date: May 2011
Posts: 44
Rep Power: 15 |
Quote:
We're probably going to wait for Genoa-X to upgrade our Rome cluster; a 50-60% speedup per core with Genoa would be nice, but I expect that to be closer to 100% speedup (at least for smaller grids) with Genoa-X. Sapphire Rapids HBM may also be available around the same time as Genoa-X, which could be another option if it outperforms Genoa-X on a per-core basis (which is possible). |
||
January 11, 2023, 18:07 |
|
#6 |
New Member
Prince Edward Island
Join Date: May 2021
Posts: 26
Rep Power: 5 |
So on a per-core basis, genoa is marginally faster (~1.33x) than previous generations (e.g. rome)? So for example if I get a rome setup which isn't bandwidth saturated (e.g. 2x epyc 7302 (32c)) that should be something like a 2 x 12 core genoa setup?
|
|
January 12, 2023, 05:27 |
|
#7 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Hard to make these blanket statements about relative performance.
But I agree, this was one of the points I tried to get across: At the lower end, you are better off with discounted parts from the previous generations. It's just not worth getting Genoa with the added platform costs, when the CPUs can't benefit from 12xDDR5-4800. This definitely applies to 16-core CPUs. |
|
January 12, 2023, 12:25 |
|
#8 |
New Member
Prince Edward Island
Join Date: May 2021
Posts: 26
Rep Power: 5 |
I see. I wasn't quoting an exact figure, just was using that number as an example. But with low core counts, performance for the genoa processors should only be marginally larger than rome?
|
|
January 16, 2023, 06:23 |
|
#9 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Depends on what we call marginal. Single-core performance is up compared to Milan, thanks to the new architecture and higher frequencies. And Milan was also a small improvement compared to Rome in that regard.
Maybe something in the order of 15-20% for each gen, if we have to put a number on it. But please don't quote me on that. Nobody really tests true single-core performance with these CPUs. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
AMD Epyc hardware for ANSYS HPC | chefbouza74 | Hardware | 4 | October 26, 2021 12:51 |
Performance problems on AMD Epyc cluster | crpvn | Hardware | 3 | February 17, 2020 09:50 |
Operating System for AMD Epyc Workstation | jakethejake | Hardware | 14 | November 19, 2019 06:52 |
Building Workstation using 2 x AMD EPYC 7301 | Ivanrips | Hardware | 16 | January 21, 2019 10:39 |
AMD Epyc CFD benchmarks with Ansys Fluent | flotus1 | Hardware | 55 | November 12, 2018 06:33 |