CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

128 core cluster E5-26xx V4 processor choice for Ansys FLUENT

Register Blogs Community New Posts Updated Threads Search

Like Tree16Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 11, 2017, 16:43
Default
  #21
New Member
 
Join Date: May 2013
Posts: 26
Rep Power: 13
hpvd is on a distinguished road
comparison benchmarking Epyc vs Skylake SP starts:
http://www.anandtech.com/show/11544/...-the-decade/12
hpvd is offline   Reply With Quote

Old   July 11, 2017, 17:28
Default
  #22
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18
kyle is on a distinguished road
Found a Euler3d benchmark for Skylake SP:

https://hothardware.com/reviews/inte...-review?page=6

Still nothing for EPYC that I see
kyle is offline   Reply With Quote

Old   July 11, 2017, 17:40
Default
  #23
Senior Member
 
Lucky
Join Date: Apr 2011
Location: Orlando, FL USA
Posts: 5,763
Rep Power: 66
LuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura about
Quote:
Originally Posted by hpvd View Post
comparison benchmarking Epyc vs Skylake SP starts:
http://www.anandtech.com/show/11544/...-the-decade/12
It should be quite obvious that a quad channel setup will outperform a triple channel in throughput. But unfortunately so many benchmarks ignore this basic rule. Anyway, that's exactly why I always recommend quad channel over triple channel. There's only a few select configurations that supports quad channel memory. So just go with the best quad channel setup that you can afford.

Quote:
Originally Posted by kyle View Post
Found a Euler3d benchmark for Skylake SP:

https://hothardware.com/reviews/inte...-review?page=6

Still nothing for EPYC that I see
These processor benchmarks are less applicable to CFD because CFD strains more memory usage, and less CPU usage. Your CPU performance is useless if the CPU isn't doing any calculations because it can't fetch data off the memory fast enough.
LuckyTran is offline   Reply With Quote

Old   July 11, 2017, 17:41
Default
  #24
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18
kyle is on a distinguished road
The page I linked to has a Euler3d benchmark, which is a CFD benchmark.
kyle is offline   Reply With Quote

Old   July 11, 2017, 17:51
Default
  #25
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by LuckyTran View Post
It should be quite obvious that a quad channel setup will outperform a triple channel in throughput. But unfortunately so many benchmarks ignore this basic rule. Anyway, that's exactly why I always recommend quad channel over triple channel. There's only a few select configurations that supports quad channel memory. So just go with the best quad channel setup that you can afford.
Could you please elaborate why you keep dwelling on the triple-channel subject? All of the higher end processors had quad-channel memory controllers for several years now. And I did not find a triple-channel setup in the link you quoted.
flotus1 is offline   Reply With Quote

Old   July 11, 2017, 18:01
Default
  #26
New Member
 
Join Date: May 2013
Posts: 26
Rep Power: 13
hpvd is on a distinguished road
just started a new thread for results of benchmarks of Epyc and Xeon Skylake SP:
Epyc vs Xeon Skylake SP
hpvd is offline   Reply With Quote

Old   July 11, 2017, 18:05
Default
  #27
Senior Member
 
Lucky
Join Date: Apr 2011
Location: Orlando, FL USA
Posts: 5,763
Rep Power: 66
LuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura about
Quote:
Originally Posted by kyle View Post
The page I linked to has a Euler3d benchmark, which is a CFD benchmark.
I missed the Euler benchmark. But again, and it is even acknowledged there, bandwidth is an important factor. But unfortunately the way the benchmarking information is reported does not reflect this difference in setup. If you blindly read the graphs, it appears one cpu is better than another.

Quote:
Originally Posted by flotus1 View Post
Could you please elaborate why you keep dwelling on the triple-channel subject? All of the higher end processors had quad-channel memory controllers for several years now. And I did not find a triple-channel setup in the link you quoted.
Because bandwidth has a clear influence. And as hpvd said earlier: no matter how many cores, always 8-channel DDR4 (quad-channel DDR4). Once you choose this, there's less than a dozen options and the choice of what you should build boils quickly down to what you can afford. It's quite easy to build a high-end system for a CFD application. In other applications (traditional HPC) that is not slowed by memory bandwidth, you can end up in all sorts of headaches. If you are not convinced, feel free to ignore me.
LuckyTran is offline   Reply With Quote

Old   July 17, 2017, 03:01
Default
  #28
New Member
 
Ramón
Join Date: Mar 2016
Location: The Netherlands
Posts: 11
Rep Power: 10
F1aerofan is on a distinguished road
Thank you for the direct links to all the benchmarks! However, I still eagerly await a direct benchmark comparisson with actual CFD software to quantify the differences.

There is another complication, the day the benchmarks came out was exactly the day my director signed-off on the the purchase order for our new system. So a 128 core system with the E5-2667 V4 is coming our way this summer

When it is ready I will need to do some of the official Ansys benchmarks to quantify the speed-up towards our hardware vendor. I can try to post some of the results here as well, if you guys are interested.

Thanks for the support!
flotus1 likes this.
F1aerofan is offline   Reply With Quote

Old   October 27, 2017, 12:18
Default
  #29
Member
 
Join Date: May 2009
Posts: 54
Rep Power: 17
gfilip is on a distinguished road
Thank you for an insightful discussion on the cluster selection for CFD.

Does anyone have any feedback on the use of ARM architecture? The idea of 48-core nodes seems to go against many of the points brought up here in terms of performance. There are some OpenFOAM benchmarks briefly summarized here:

https://developer.arm.com/-/media/de..._CFDvFinal.pdf
elvis likes this.
gfilip is offline   Reply With Quote

Old   January 14, 2018, 03:37
Default
  #30
Member
 
Ivan
Join Date: Oct 2017
Location: 3rd planet
Posts: 34
Rep Power: 9
Noco is on a distinguished road
Quote:
Originally Posted by F1aerofan View Post
Dear fellow CFD engineers and enthusiasts,

As an R&D department we are trying to significantly scale up our CFD solving capabilities. Currently we are using a single machine, with dual CPU Xeon E5-2637 V3 (8 cores) and 64 GB memory. This machine is used for CFD simulations with Ansys FLUENT with the SIMPLE solver, with either steady k-epsilon Realizable or transient SAS/DES/LES turbulence modelling. All simulations are simulated with FGM partially premixed combustion modelling. Meshes sizes are very case/project dependent but range between 3 and 17 million cells.

We are considering a scale up towards 128 cores (thus 3 Ansys HPC license packs with a single Ansys FLUENT solver license). However, I am getting a bit lost in the world of CPU specifications, memory speeds, interconnections and where the bottleneck lies with solving time versus communication time.

Ansys is being a professional independent software supplier by not giving specific hardware advice but providing feedback on configuration proposals. Our hardware supplier appears to have not enough specific knowledge with flow simulations to help us with our decision. Our budget is not determined yet, first we would like to know what it will cost us if we get the best solution possible.

The cluster will exists out of a master node and multiple slave nodes. The only differences between the master and slave nodes will be that the master has extra internal storage and a better GPU. The following specifications are considered at the moment:
- All nodes will be interconnected with Mellanox Infinityband
- Dual SSD in Raid-0 for each machine (I know that normal HHD should be sufficient)
- 8 GB/core RDIMM 2400 MT/s memory
- No GPU yet, as we are not using COUPLED solver at the moment, but mounting possibility will be present.
- Dual socket E5-2683 V4 processors in initial specification.

The E5-2683 V4 'only' runs at 2.1 GHz and I have the feeling that I can get much more simulation performance with one of the other choices of E5-26xx V4 processors available. For example:
- E5-2680 v4; more bus and memory speed per core, slightly more Ghz, one extra server needed (5 instead of 4).
- E5-2667 v4; much more bus and memory speed per core, much more Ghz, but also 2 times more servers needed (8 instead of 4). Will this negatively influence the possible communication bottleneck?. Given the other thread (Socket 2011-3 processors - an overwiew) I should pick this one?

I would very much appreciate advice on how to make a choice, or simply which one of the above to choose or other E5-26xx V4 available processors to consider.

Kind regards

F1aerofan
Please tell what you have bought at the end and how is the performance.
Noco is offline   Reply With Quote

Old   January 19, 2018, 04:53
Default
  #31
New Member
 
Ramón
Join Date: Mar 2016
Location: The Netherlands
Posts: 11
Rep Power: 10
F1aerofan is on a distinguished road
Dear Noco, the following specs apply to our cluster:

8 x slave node:
- Dell Poweredge R630
- 120 GB SSD
- 2x Intel Xeon E5-2667 V4
- 8x 16 GB DIMM 2400 MHz
- Mellanox ConnectX-3 dual port VPI FDR

Head node:
- Some old EDX server we had laying arround
- 800 GB SSD
- 2x Intel Xeon E5645
- 12x 8 GB DIMM 1333 MHz
- Both head and slave nodes have Windows server installed with Windows HPC as cluster software.

These are the performance figures, scaled it to ANSYS "solver rating" in which they define "1 solve = 25 iterations" and "solver rating = amount of single solves in 24 hours". Performed in Ansys FLUENT 18.2.

Aircraft_2million cells benchmark:
-1 node (16 cores) - 2168
-2 node (32 cores) - 3793
-4 node (64 cores) - 6545
-6 node (96 cores) - 9521
-8 node (128 cores) - 11811

Sedan_4million cells benchmark:
-1 node (16 cores) - 1557
-2 node (32 cores) - 2727
-4 node (64 cores) - 5339
-6 node (96 cores) - 8028
-8 node (128 cores) - 9201

Aircraft_14million cells benchmark:
-1 node (16 cores) - 252
-2 node (32 cores) - 477
-4 node (64 cores) - 851
-6 node (96 cores) - 1228
-8 node (128 cores) - 1621

Exhaust system_33million cells benchmark:
-1 node (16 cores) - 83
-2 node (32 cores) - 165
-4 node (64 cores) - 311
-6 node (96 cores) - 459
-8 node (128 cores) - 593

*note that these are benchmarks conducted a few months ago. These do not include recent updates to fix the Intel leaks. Will run benchmarks after those as well.

However, I think as many people will advice you, do not buy this generation of processors anymore!
Noco likes this.

Last edited by F1aerofan; January 19, 2018 at 08:40.
F1aerofan is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
looking for a smart interface matlab fluent chary FLUENT 24 June 18, 2021 10:07
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Problem in using parallel process in fluent 14 Tleja FLUENT 3 September 13, 2013 11:54
problem in using parallel process in fluent 14 aydinkabir88 FLUENT 1 July 10, 2013 03:00
Fluent on a Windows cluster Erwin FLUENT 4 October 22, 2002 12:39


All times are GMT -4. The time now is 08:31.