CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

AMD Epyc 9004 "Genoa" buyers guide for CFD

Register Blogs Community New Posts Updated Threads Search

Like Tree19Likes
  • 16 Post By flotus1
  • 1 Post By ym92
  • 1 Post By ym92
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 14, 2022, 12:14
Default AMD Epyc 9004 "Genoa" buyers guide for CFD
  #1
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
1. Introduction

AMD recently unveiled/released the latest series of server CPUs based on the Zen4 architecture. I won't go into all details here, plenty of news sites covered the launch in detail. E.g.
https://www.servethehome.com/amd-epy...nning-fashion/

From a CFD perspective, these are the important bits:
  • New Zen4 architecture (higher IPC, higher clock speed, doubled L2 cache, AVX-512...)
  • Up to 96 cores
  • 12 channels of DDR5-4800, theoretical max. BW 460.8GB/s (~2.2x increase over previous gen)
  • up to 400W TDP
We don't really need benchmarks to see that this is the largest generational performance increase in a long time. These are the CPUs of choice for the highest performance CFD workstations and clusters.

Variety of relevant benchmarks:
Ansys (Fluent, CFX, LS-Dyna, Mechanical): https://www.amd.com/system/files/doc...nerational.pdf
Siemens STAR-CCM+: https://blogs.sw.siemens.com/simcent...cfd-benchmark/

2. SKU list

Here is a list of all Genoa CPUs announced so far;

source: https://www.servethehome.com/amd-epy...fficial-table/

Genoa-X CPUs with increased last level cache were not part of this launch, but are expected to launch in 2023. So keep an eye out for them if you need even more performance.


3. Pitfalls and CPUs to avoid

Just like in the previous generations, the connection between the compute dies (CCD) and the I/O-die where the memory controllers reside can be a bottleneck. For 2nd gen Epyc Rome, we had CPUs with effectively half the total memory bandwidth thanks to this bottleneck.
The GMI3 links between CCD and an IOD did not undergo major changes. If my napkin math is correct, the bandwidth here is 57.6GB/s and 22.8GB/s for read and write respectively. Which requires 8 of these links to match the memory bandwidth. Remember: reads are more important than writes in most cases, so it doesn't matter that the write bandwidth lacks behind.
AMD has a trick up their sleeve: the CPUs with only 4 CCDs can be connected to the IOD with 2 GMI3 links each, which is enough to utilize the full potential of the memory subsystem. So in theory, the full stack of CPUs launched so far should not hide any nasty surprises.
CAVEAT: should
The wording in AMDs official slides is not definitive enough for my personal taste. It is stated that CPUs with 4 CCDs can utilize 2 GMI3 links per CCD. Not that all of them necessarily do. Maybe I'm just too paranoid, decide for yourself...
And there are the 2 low-end CPUs with only 64MB of L3 cache. How this is achieved remains to be seen. Up until now, 64MB L3 cache on an Epyc CPU meant only 2 CCDs are active. Which wasn't enough to get the full bandwidth in previous generations, and still isn't enough in this generation, even with two GMI3 links per CCD. The table above lists them as 4 CCD parts. AMDs official website lists all CPUs with a memory bandwidth of 460.8GB/s. But until detailed benchmarks for some of the lower-end parts are out, I would treat that information with some skepticism.
Edit: after sifting through AMD's own technical documentation, I can confirm that ALL CPUs launched so far consist of at least 4 active CCDs. Even those with 64MB of L3 cache.

Regardless, the value proposition for the lowest-end SKUs 9224 and 9124 just isn't there. The platform cost is fairly high thanks to PCIe5 and DDR5. And the halved L3 cache will have a negative performance impact. In this price and core count range, you are likely better off with some discounted parts from previous generations.
And until further benchmarks or confirmations are available, I reserve final judgement for all all SKUs listed above with a "4+1" configuration. I will post an update if/when that happens.


4. Preliminary recommendations

Per-core licensed solvers (e.g. Ansys Fluent):
The F-SKUs (9474F, 9374F, 9274F) with higher clock speeds are the obvious choice here if you can justify the price tag with the license costs. Just like in the last generation, the 16-core 9174F is more expensive than the 24-core part. Better avoid it.
If you need better value, the 32-core 9354 is the way to go.
Or wait until Genoa-X is available.

Solvers without license constraints, e.g. OpenFOAM
Factoring in the significant platform costs, the 32-core 9354 stands out again for best value. Though if your budget allows it, you can definitely go up to 48 (9454) or even 64 cores (9554) per CPU with this generation.
Whether the lower-end parts with 4CCDs and 128MB of L3 cache are worth considering depends on 2 factors: platform costs (DDR5, motherboards) need to come down, and we need confirmation that these parts have full bandwidth. Until then, there might be better value in Milan and Rome.

Single-socket CPUs
9454P (48 cores) and 9354P (32 cores) are worth considering. At least on paper, they should provide similar performance as a dual-socket solution from the previous generation. And this time, all CPUs with a P suffix have 256MB of L3 cache.


5. Cooling and power

With up to 400W TDP for a single CPU, power consumption and cooling requirements are significantly higher than for the previous generation.
For somewhat quiet cooling in a dual-socket workstation, water cooling will be almost mandatory. And we will definitely see more water cooling solutions for servers too.
Though it will be interesting to see how much more power efficiency there is to be had by dialing back the cTDP on these CPUs for CFD workloads.


6. Availability




7. Memory considerations

As always, you need at least one DIMM for each memory channel. So 12 for a single CPU, 24 for two CPUs.
Don't mix and match different modules if you want maximum performance. And don't use weird configurations like 20 DIMMs per CPU. It's either one or two DIMMs per channel.
An upside of DDR5, if AMD's official slides are to be believed: the performance hit from single-rank DIMMs is significantly reduced compared to DDR4. It's not gone, but within margin of error for application benchmarks.
genoa_bandwidth.jpg
source: https://www.servethehome.com/wp-cont...-Bandwidth.jpg

Regarding the amount of DIMMs per channel: if the officially supported memory speeds on Zen4 desktop CPUs are any indication, better keep it at one DIMM per channel.
Ryzen 7000 CPUs drop supported memory speeds from DDR5-5200 down to DDR5-3600 for everything beyond one dual-rank DIMM per channel. Even for two single-rank DIMMs per channel.
ryzen700_memory.png
source: https://www.amd.com/en/products/cpu/amd-ryzen-7-7700x
That's not too much of an issue on the smaller platform because memory overclocking is a thing. But bumping memory speeds beyond the officially supported specs can be tricky to impossible on server platforms.
Things might be different on the server platform, but better safe than sorry at this point.


Useful links
All AMD guides: https://www.amd.com/en/processors/tu...pe%3Aepyc_9004
Workload tuning guide: https://www.amd.com/system/files/doc...d-workload.pdf
Compiler and HPC optimization: https://www.amd.com/system/files/doc...-toolchain.pdf

Comments and discussions are welcome!

Last edited by flotus1; November 16, 2022 at 08:23.
flotus1 is offline   Reply With Quote

Old   November 14, 2022, 13:43
Default
  #2
New Member
 
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8
ym92 is on a distinguished road
Thank you for sharing your thoughts and knowledge, Alex!

I was considering getting one of these beauties. Fortunately, we have some good partners to get them early.

Anyway, I was wondering, if we can expect a large difference in a 2x9474F and 2x9554 setup. If we consider the base frequency and the number of cores, the 9554 should deliver "only" around 15% more, while being more than 30% more expensive (only CPU price). The memory bandwidth will probably no be the limiting factor for both, right? Would you say a 9554 is really worth it compared to a 9474F or are there some other factors to take into consideration? I guess the rest of the setup can be exactly the same.

By the way, I use Star-CCM+ although licensing is not an issue. Also, Siemens published some benchmarks on their website: https://blogs.sw.siemens.com/simcent...cfd-benchmark/
Sure, always take them with a grain of salt, but still very impressive results .
Blanco likes this.
ym92 is offline   Reply With Quote

Old   November 14, 2022, 14:21
Default
  #3
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Thanks for the CCM+ benchmarks, I had not seen them yet.

That's a tough choice if you have the licenses for either of these CPUs. 9554 will be faster, just by how much...
It will start to be limited by bandwidth, I am familiar with the scaling of CCM+. So definitely not 30% faster. 15% is probably a good estimate. I guess it depends on the quotes you get. If it's 15% more performance for 15% more total cost for the system, I would consider that worth it at the high-end. There are usually diminishing returns.
Up to you (or the controllers at your company) if the budget is there. The entire rest of the system can indeed remain the same.
flotus1 is offline   Reply With Quote

Old   November 14, 2022, 14:26
Default
  #4
New Member
 
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8
ym92 is on a distinguished road
Thanks, Alex. Yeah, I asked for a quote for both setups, so let's see how much they charge or if they really want the prices listed above. Maybe the 9554 doesn't fit my budget anyway, than the choice is obvious.
Anyway, whenever I get the server, I am happy to post benchmarks in the other thread.
flotus1 likes this.
ym92 is offline   Reply With Quote

Old   November 15, 2022, 10:31
Default
  #5
Member
 
Matt
Join Date: May 2011
Posts: 44
Rep Power: 15
the_phew is on a distinguished road
Quote:
Originally Posted by ym92 View Post
By the way, I use Star-CCM+ although licensing is not an issue. Also, Siemens published some benchmarks on their website: https://blogs.sw.siemens.com/simcent...cfd-benchmark/
It seems like 32 core CPUs are the way to go (for CFD) with Genoa if core-licensed, and 64 or even 96 cores otherwise. There may even be linear speedup up to 48 core CPUs, so I'm curious to see those benchmarked.

We're probably going to wait for Genoa-X to upgrade our Rome cluster; a 50-60% speedup per core with Genoa would be nice, but I expect that to be closer to 100% speedup (at least for smaller grids) with Genoa-X. Sapphire Rapids HBM may also be available around the same time as Genoa-X, which could be another option if it outperforms Genoa-X on a per-core basis (which is possible).
the_phew is offline   Reply With Quote

Old   January 11, 2023, 18:07
Default
  #6
New Member
 
Prince Edward Island
Join Date: May 2021
Posts: 26
Rep Power: 5
hami11 is on a distinguished road
So on a per-core basis, genoa is marginally faster (~1.33x) than previous generations (e.g. rome)? So for example if I get a rome setup which isn't bandwidth saturated (e.g. 2x epyc 7302 (32c)) that should be something like a 2 x 12 core genoa setup?
hami11 is offline   Reply With Quote

Old   January 12, 2023, 05:27
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Hard to make these blanket statements about relative performance.
But I agree, this was one of the points I tried to get across: At the lower end, you are better off with discounted parts from the previous generations. It's just not worth getting Genoa with the added platform costs, when the CPUs can't benefit from 12xDDR5-4800. This definitely applies to 16-core CPUs.
flotus1 is offline   Reply With Quote

Old   January 12, 2023, 12:25
Default
  #8
New Member
 
Prince Edward Island
Join Date: May 2021
Posts: 26
Rep Power: 5
hami11 is on a distinguished road
I see. I wasn't quoting an exact figure, just was using that number as an example. But with low core counts, performance for the genoa processors should only be marginally larger than rome?
hami11 is offline   Reply With Quote

Old   January 16, 2023, 06:23
Default
  #9
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Depends on what we call marginal. Single-core performance is up compared to Milan, thanks to the new architecture and higher frequencies. And Milan was also a small improvement compared to Rome in that regard.
Maybe something in the order of 15-20% for each gen, if we have to put a number on it. But please don't quote me on that. Nobody really tests true single-core performance with these CPUs.
Chris2337 likes this.
flotus1 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD Epyc hardware for ANSYS HPC chefbouza74 Hardware 4 October 26, 2021 12:51
Performance problems on AMD Epyc cluster crpvn Hardware 3 February 17, 2020 09:50
Operating System for AMD Epyc Workstation jakethejake Hardware 14 November 19, 2019 06:52
Building Workstation using 2 x AMD EPYC 7301 Ivanrips Hardware 16 January 21, 2019 10:39
AMD Epyc CFD benchmarks with Ansys Fluent flotus1 Hardware 55 November 12, 2018 06:33


All times are GMT -4. The time now is 20:14.