|
[Sponsors] |
AMD Genoa best configuration for 128 cores Fluent/Mechanical licenses |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 13, 2023, 06:10 |
AMD Genoa best configuration for 128 cores Fluent/Mechanical licenses
|
#1 |
New Member
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5 |
Dear members,
I have a Fluent and a Mechanical licenses with 3 HPC packs that allow up to 132 cores parallel sessions. The new AMD Epyc Genoa processors are available and my company wants to invest in a new hardware that best fits with our licenses pack. Our main use is Fluent, Mechanical use is less frequent. The benchmarks show that the Genoa 9374F with 32 cores per CPU is the best per core candidate for CFD/FEM workloads (high memory bandwidth and high frequency per core). Then, one can imagine that a configuration of 2 racks (InfiniBand link) with a double 9374F on each rack is the best one. As CFD is our main use and the memory bandwidth peak on the Genoa series is about 460Gb/s per socket. Therefore, with this configuration, each memory channel will have 38.3Gb/s (460/12) and each core will have 12.8Gb/s (38.3/(32/12)). The frequency per core of the 9374F is 3.85GHz (AMD base clock data) The negative aspect of this configuration is the use of 2 racks related with InfiniBand. So I wonder if an alternative configuration with a single rack is not suitable. This alternative configuration is a double 9554 with 64 cores per socket. That leads to a memory bandwidth of 6.4Gb/s per core. The frequency per core of the 9554 is 3.1GHz (AMD base clock data). So to sum up : Configuration ---------------------------------Memory bandwidth per core ----------------------------------Base clock speed 2 racks of double 9374F each--------------------------12.8 Gb/s---------------------------------------------------- 3.85 GHz Single rack of double 9554------------------------------6.4 Gb/s------------------------------------------------------3.1 GHz I don’t have the prices yet but one can expect that the configuration with 2 racks will be considerably more expensive. So my question is: is it relevant to choose the 2 racks configuration compared to the single rack one ? Thank you in advance for your advices and your help! |
|
February 13, 2023, 07:00 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Coincidentally, AMD+Ansys have published benchmarks for exactly the CPUs you are looking for:
https://www.amd.com/system/files/doc...nerational.pdf They show a 20-30% performance lead for 2x9554 vs 2x9374F. It is a valid assumption that two nodes with 2x9374F each will roughly double performance. That will tell you how much faster your simulations will run with two 64-core nodes vs. a single 128-core node. It is up to you whether that much of a performance uplift is worth the increased hardware costs. But generally speaking, factoring in license costs and how much the engineers working with that system are paid, faster hardware is usually worth it. |
|
February 13, 2023, 08:26 |
|
#3 |
New Member
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5 |
Thank you very much Alex !
|
|
February 19, 2023, 02:45 |
|
#4 |
Senior Member
Dongyue Li
Join Date: Jun 2012
Location: Beijing, China
Posts: 849
Rep Power: 18 |
Yes. Go for two nodes.
I would suggest you to use two racks and connect them just by eth. Two workstations (64+64) is much better than one workstation (128). For the previous one, it can achieve nearly 2x speed up. For the latter one, 128 cores can never achieve 2x speed up (like comparing CPUs with 64 cores in same generation). When I say nealy 2x speed up, it depends on your communcation settings. You can simply use the 10G eth provided by the motherboard (it can simply achieve 1.8-1.95 speed up, just like the document provided by ANSYS). For our own products, if its lower than 1.8 it would be seen as a failure! So, definitely higher than 1.8. You can also choose two infiniband cards, at the most it can achivev 2x speed up. No more. (in ANSYS's document, 2.04 is just 2, no more than 2.1). I would prefer those eth provided by the motherboards since its much cheaper.
__________________
My OpenFOAM algorithm website: http://dyfluid.com By far the largest Chinese CFD-based forum: http://www.cfd-china.com/category/6/openfoam We provide lots of clusters to Chinese customers, and we are considering to do business overseas: http://dyfluid.com/DMCmodel.html |
|
February 20, 2023, 04:58 |
|
#5 | |
New Member
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5 |
Quote:
|
||
February 20, 2023, 05:07 |
|
#6 |
New Member
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5 |
I have a subsidiary question concerning this configuration.
As I have 3 ANSYS HPC packs licenses, the best target is 128 cores hardware config. But I wonder if it is not relevant to choose 2 nodes of 2x 9474F rather than with 9374F. This will give a config of 196 cores rather than 128. The advantage is that this hardware can serve for other tasks at the same time than 128 cores on ANSYS, like parallel optimization with Matlab Simulink models, on high number of cores (high number of parallel designs). My question is: assuming the budget is not limited, what will be the difference on ANSYS models (mainly CFD ones) between 128 cores of 9374F and 128 cores of 9474F ? Thank you in advance for your help |
|
February 21, 2023, 04:15 |
|
#7 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Pretty much zero zero difference between these CPUs when running the same core count. Provided the threads are distributed similarly across all 8 CCDs, which Fluent should be able to handle.
Just to avoid nasty surprises: leftover "free" cores are nice, but don't expect them to be actually free. When you use them to do some other heavy lifting with Matlab, it will both slow down the Matlab runs, and the fluent run. That's because shared CPU resources -like last level caches and memory bandwidth- are almost fully utilized by a Fluent simulation on 128 cores. Additionally, I am not sure if these CPUs are ideal for Matlab/simulink. It's probably fine if your parallel optimization spawns several tasks that run independently, on a single core each. |
|
February 21, 2023, 04:25 |
|
#8 |
New Member
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5 |
Thank you Alex!
In my understanding, when you use the Mathworks parallel toolbox it generates a pool of a chosen number of workers and each worker will handle a design in parallel with the others inside a "parfor" loop for example. Same consideration for Simulink models with the "parsim" feature. And therefore the more cores we have available (not free ), the larger the pool of workers can be. |
|
February 21, 2023, 05:01 |
|
#9 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
That sounds like the ideal case.
I am by no means an expert with Matlab/Simulink, just something you might want to be aware of: https://de.mathworks.com/matlabcentr...n-amd-epyc-cpu No idea if this is fixed by now, or what even caused the issues. |
|
February 21, 2023, 09:53 |
|
#10 | |
New Member
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5 |
Quote:
I will check this. |
||
February 22, 2023, 12:45 |
|
#11 |
Member
Matt
Join Date: May 2011
Posts: 44
Rep Power: 15 |
Different solver, but I operate a 128 core CFD cluster composed of two 2P EPYC ROME nodes connected via 100gbps Infiniband. I went with Infiniband on the recommendation of the software vendor, and a gut feel that the vastly reduced latency would be helpful for the explicit solver I use that can run many time steps per second.
But when I monitored the actual network throughput over the Infiniband adapters, it was shockingly low (less than 1gbps) even for a simulation with over a billion cells. So I would advise against buying top-of-the-line Infiniband adapters; you can pick up surplus 40gbps Infiniband adapters for a fraction of the price of new 100gbps+ parts. I also have a sneaking suspicion that 10gbps ethernet would be more than sufficient. You could always set that up first, and if your CPUs are obviously waiting on network traffic (evidenced by sub-98% utilization), then consider surplus Infiniband interconnects. But don't spend thousands on the latest Infiniband hardware for a 2-node cluster like I did. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CFD workstation configuration calling for help 2 | Freewill1 | Hardware | 6 | July 8, 2020 22:17 |
AMD Epyc CFD benchmarks with Ansys Fluent | flotus1 | Hardware | 55 | November 12, 2018 06:33 |
Some ANSYS Benchmarks: 3 node i7 vs Dual and Quad Xeon | evcelica | Hardware | 14 | December 15, 2016 07:57 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
Error in run Batch file | saba1366 | CFX | 4 | February 10, 2013 02:15 |