CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

AMD Genoa best configuration for 128 cores Fluent/Mechanical licenses

Register Blogs Community New Posts Updated Threads Search

Like Tree4Likes
  • 1 Post By flotus1
  • 2 Post By sharonyue
  • 1 Post By the_phew

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 13, 2023, 06:10
Smile AMD Genoa best configuration for 128 cores Fluent/Mechanical licenses
  #1
New Member
 
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5
chefbouza74 is on a distinguished road
Dear members,

I have a Fluent and a Mechanical licenses with 3 HPC packs that allow up to 132 cores parallel sessions. The new AMD Epyc Genoa processors are available and my company wants to invest in a new hardware that best fits with our licenses pack. Our main use is Fluent, Mechanical use is less frequent.

The benchmarks show that the Genoa 9374F with 32 cores per CPU is the best per core candidate for CFD/FEM workloads (high memory bandwidth and high frequency per core). Then, one can imagine that a configuration of 2 racks (InfiniBand link) with a double 9374F on each rack is the best one.
As CFD is our main use and the memory bandwidth peak on the Genoa series is about 460Gb/s per socket. Therefore, with this configuration, each memory channel will have 38.3Gb/s (460/12) and each core will have 12.8Gb/s (38.3/(32/12)).
The frequency per core of the 9374F is 3.85GHz (AMD base clock data)
The negative aspect of this configuration is the use of 2 racks related with InfiniBand. So I wonder if an alternative configuration with a single rack is not suitable.

This alternative configuration is a double 9554 with 64 cores per socket. That leads to a memory bandwidth of 6.4Gb/s per core. The frequency per core of the 9554 is 3.1GHz (AMD base clock data).

So to sum up :
Configuration ---------------------------------Memory bandwidth per core ----------------------------------Base clock speed
2 racks of double 9374F each--------------------------12.8 Gb/s---------------------------------------------------- 3.85 GHz
Single rack of double 9554------------------------------6.4 Gb/s------------------------------------------------------3.1 GHz

I don’t have the prices yet but one can expect that the configuration with 2 racks will be considerably more expensive. So my question is: is it relevant to choose the 2 racks configuration compared to the single rack one ?

Thank you in advance for your advices and your help!
chefbouza74 is offline   Reply With Quote

Old   February 13, 2023, 07:00
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Coincidentally, AMD+Ansys have published benchmarks for exactly the CPUs you are looking for:
https://www.amd.com/system/files/doc...nerational.pdf

They show a 20-30% performance lead for 2x9554 vs 2x9374F.
It is a valid assumption that two nodes with 2x9374F each will roughly double performance. That will tell you how much faster your simulations will run with two 64-core nodes vs. a single 128-core node.
It is up to you whether that much of a performance uplift is worth the increased hardware costs. But generally speaking, factoring in license costs and how much the engineers working with that system are paid, faster hardware is usually worth it.
wkernkamp likes this.
flotus1 is offline   Reply With Quote

Old   February 13, 2023, 08:26
Default
  #3
New Member
 
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5
chefbouza74 is on a distinguished road
Thank you very much Alex !
chefbouza74 is offline   Reply With Quote

Old   February 19, 2023, 02:45
Default
  #4
Senior Member
 
Dongyue Li
Join Date: Jun 2012
Location: Beijing, China
Posts: 849
Rep Power: 18
sharonyue is on a distinguished road
Yes. Go for two nodes.

I would suggest you to use two racks and connect them just by eth. Two workstations (64+64) is much better than one workstation (128). For the previous one, it can achieve nearly 2x speed up. For the latter one, 128 cores can never achieve 2x speed up (like comparing CPUs with 64 cores in same generation).

When I say nealy 2x speed up, it depends on your communcation settings. You can simply use the 10G eth provided by the motherboard (it can simply achieve 1.8-1.95 speed up, just like the document provided by ANSYS). For our own products, if its lower than 1.8 it would be seen as a failure! So, definitely higher than 1.8.

You can also choose two infiniband cards, at the most it can achivev 2x speed up. No more. (in ANSYS's document, 2.04 is just 2, no more than 2.1). I would prefer those eth provided by the motherboards since its much cheaper.
arvindpj and wkernkamp like this.
__________________
My OpenFOAM algorithm website: http://dyfluid.com
By far the largest Chinese CFD-based forum: http://www.cfd-china.com/category/6/openfoam
We provide lots of clusters to Chinese customers, and we are considering to do business overseas: http://dyfluid.com/DMCmodel.html
sharonyue is offline   Reply With Quote

Old   February 20, 2023, 04:58
Default
  #5
New Member
 
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5
chefbouza74 is on a distinguished road
Quote:
Originally Posted by sharonyue View Post
Yes. Go for two nodes.

I would suggest you to use two racks and connect them just by eth. Two workstations (64+64) is much better than one workstation (128). For the previous one, it can achieve nearly 2x speed up. For the latter one, 128 cores can never achieve 2x speed up (like comparing CPUs with 64 cores in same generation).

When I say nealy 2x speed up, it depends on your communcation settings. You can simply use the 10G eth provided by the motherboard (it can simply achieve 1.8-1.95 speed up, just like the document provided by ANSYS). For our own products, if its lower than 1.8 it would be seen as a failure! So, definitely higher than 1.8.

You can also choose two infiniband cards, at the most it can achivev 2x speed up. No more. (in ANSYS's document, 2.04 is just 2, no more than 2.1). I would prefer those eth provided by the motherboards since its much cheaper.
Thank you sharonyue for your advice. In fact I will go with an Infiniband card as the budget will allow it.
chefbouza74 is offline   Reply With Quote

Old   February 20, 2023, 05:07
Post
  #6
New Member
 
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5
chefbouza74 is on a distinguished road
I have a subsidiary question concerning this configuration.

As I have 3 ANSYS HPC packs licenses, the best target is 128 cores hardware config. But I wonder if it is not relevant to choose 2 nodes of 2x 9474F rather than with 9374F. This will give a config of 196 cores rather than 128.
The advantage is that this hardware can serve for other tasks at the same time than 128 cores on ANSYS, like parallel optimization with Matlab Simulink models, on high number of cores (high number of parallel designs).
My question is: assuming the budget is not limited, what will be the difference on ANSYS models (mainly CFD ones) between 128 cores of 9374F and 128 cores of 9474F ?

Thank you in advance for your help
chefbouza74 is offline   Reply With Quote

Old   February 21, 2023, 04:15
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Pretty much zero zero difference between these CPUs when running the same core count. Provided the threads are distributed similarly across all 8 CCDs, which Fluent should be able to handle.

Just to avoid nasty surprises: leftover "free" cores are nice, but don't expect them to be actually free. When you use them to do some other heavy lifting with Matlab, it will both slow down the Matlab runs, and the fluent run. That's because shared CPU resources -like last level caches and memory bandwidth- are almost fully utilized by a Fluent simulation on 128 cores.
Additionally, I am not sure if these CPUs are ideal for Matlab/simulink. It's probably fine if your parallel optimization spawns several tasks that run independently, on a single core each.
flotus1 is offline   Reply With Quote

Old   February 21, 2023, 04:25
Default
  #8
New Member
 
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5
chefbouza74 is on a distinguished road
Thank you Alex!

Quote:
Originally Posted by flotus1 View Post
Additionally, I am not sure if these CPUs are ideal for Matlab/simulink. It's probably fine if your parallel optimization spawns several tasks that run independently, on a single core each.
In my understanding, when you use the Mathworks parallel toolbox it generates a pool of a chosen number of workers and each worker will handle a design in parallel with the others inside a "parfor" loop for example. Same consideration for Simulink models with the "parsim" feature. And therefore the more cores we have available (not free ), the larger the pool of workers can be.
chefbouza74 is offline   Reply With Quote

Old   February 21, 2023, 05:01
Default
  #9
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That sounds like the ideal case.
I am by no means an expert with Matlab/Simulink, just something you might want to be aware of: https://de.mathworks.com/matlabcentr...n-amd-epyc-cpu
No idea if this is fixed by now, or what even caused the issues.
flotus1 is offline   Reply With Quote

Old   February 21, 2023, 09:53
Default
  #10
New Member
 
Chefbouza
Join Date: Oct 2021
Posts: 10
Rep Power: 5
chefbouza74 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That sounds like the ideal case.
I am by no means an expert with Matlab/Simulink, just something you might want to be aware of: https://de.mathworks.com/matlabcentr...n-amd-epyc-cpu
No idea if this is fixed by now, or what even caused the issues.
Thank you Alex!
I will check this.
chefbouza74 is offline   Reply With Quote

Old   February 22, 2023, 12:45
Default
  #11
Member
 
Matt
Join Date: May 2011
Posts: 44
Rep Power: 15
the_phew is on a distinguished road
Different solver, but I operate a 128 core CFD cluster composed of two 2P EPYC ROME nodes connected via 100gbps Infiniband. I went with Infiniband on the recommendation of the software vendor, and a gut feel that the vastly reduced latency would be helpful for the explicit solver I use that can run many time steps per second.

But when I monitored the actual network throughput over the Infiniband adapters, it was shockingly low (less than 1gbps) even for a simulation with over a billion cells. So I would advise against buying top-of-the-line Infiniband adapters; you can pick up surplus 40gbps Infiniband adapters for a fraction of the price of new 100gbps+ parts.

I also have a sneaking suspicion that 10gbps ethernet would be more than sufficient. You could always set that up first, and if your CPUs are obviously waiting on network traffic (evidenced by sub-98% utilization), then consider surplus Infiniband interconnects. But don't spend thousands on the latest Infiniband hardware for a 2-node cluster like I did.
wkernkamp likes this.
the_phew is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
CFD workstation configuration calling for help 2 Freewill1 Hardware 6 July 8, 2020 22:17
AMD Epyc CFD benchmarks with Ansys Fluent flotus1 Hardware 55 November 12, 2018 06:33
Some ANSYS Benchmarks: 3 node i7 vs Dual and Quad Xeon evcelica Hardware 14 December 15, 2016 07:57
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Error in run Batch file saba1366 CFX 4 February 10, 2013 02:15


All times are GMT -4. The time now is 23:42.