CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree549Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 12, 2023, 01:19
Default
  #621
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by ym92 View Post
I fully agree with flotus1. Actually when you would use "number of cores used/total number of cores available" on the horizontal axis, our results would probably look very similar. Curve is almost flat for using more than ~50% of the cores.


Not sure why the results for single core is so different, but I might not have used adequate settings. At least I am sure I did not use core binding (which might be a good idea to bind cores to one cpu for around 2-10? cores).
ym92,


You should be getting the same single core performance as the single 7742 run. Your all core runs should tend to 16 seconds. This under performance might indicate a hardware problem.
wkernkamp is offline   Reply With Quote

Old   January 12, 2023, 03:10
Default
  #622
New Member
 
Eduardo
Join Date: Feb 2019
Posts: 9
Rep Power: 7
ERodriguez is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Part of the reason you see worse scaling with your system is the faster single-core time you got. For a more intuitive comparison, I would recommend scaling both results by the same single-core value.
There are reasons for this large difference in single-core performance, but we don't need to get into that. Your result is good, and indicates decent FP optimizations at work. Which don't apply at high tread count, where the workload becomes bound by memory bandwidth.

Speaking of memory bandwidth: that's what ultimately limits scaling on your single 64-core CPU. You are comparing against two CPUs, which have twice the amount of shared CPU resources. Memory bandwidth and last level cache being two of them.
Your peak performance of 33s doesn't seem too far off.

For best performance, these are the settings I would recommend:
SMT off
NPS=4
cleared caches before each run using "echo 3 > /proc/sys/vm/drop_caches" as root
and then run the simulation with
mpirun -np 64 --bind-to core --rank-by core --map-by numa

It won't change results drastically though. It's still one CPU against two.
Thanks for your answers

I have tried the settings you suggest and the 64-core time decreases from 33.22s to 32.23, so, as you say, it is nothing drastic. In fact, the fastest run I have obtained was with SMT off, NPS=4 but without setting any "--bind-to" option in mpirun (I actually do not know what is the default). In that situation, I got 31.65s. Also negligible impact.

I also attach the scaling curve compared with Yannick's results but normalizing all the curves by the same single-core result (936s reported by ym92). As you can see, the absolute results are better for fewer cores but, beyond say ~25 cores, the memory bandwith problem starts hitting and the performance is worse.

At the end of the day, the fastest simulation Yannick got with two identical processors was about 15s whilst mine is about double of that (32s). Makes kind of sense provided we have half the CPUs and thus half the memory bandwith.

So I am starting to convince myself we are at the peak performance of the hardware we have. I don'n know if you may find if useful, but if you want to include these results in the global compilation of benchmarks, feel free to do so.

Best regards and thanks for the help

Eduardo
Attached Images
File Type: png OpenFOAM_scalabilit_Epyc7742_rescaled.png (15.1 KB, 43 views)
ERodriguez is offline   Reply With Quote

Old   January 12, 2023, 03:46
Default
  #623
New Member
 
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8
ym92 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
ym92,


You should be getting the same single core performance as the single 7742 run. Your all core runs should tend to 16 seconds. This under performance might indicate a hardware problem.

Thanks for indicating this. Yes, my all core run was 15.46s, which seems ok. But you are right. I think I run these benchmarks before we replaced the SSD, which was acting weirdly (sometimes fast, sometimes very slow). I don't know, if some read/write processes are included in the benchmark, but if yes, that could explain it. Anyway, I will probably rerun the benchmark in the next days just to confirm.


EDIT: checked again on another server with same specs (only less RAM). Results are just around 10% better for single core. Maybe it is an issue with my docker or the old version OF I am using. Anyway, as all-core performance is ok (and I am usually not working with OF), I will probably not look further into it.

Last edited by ym92; January 12, 2023 at 09:47.
ym92 is offline   Reply With Quote

Old   January 12, 2023, 22:10
Default
  #624
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
System:
Gigabyte MD80-F34, 2x E5-2683 (16 core, 45 MB Cache), 16x *GB DDR4-2400 2R two RDIMMs per channel, Debian Linux 5.15


Software:
OpenFOAM v2212 from openfoam.com


Code:
Meshing Times:
1 1426.06
2 931.58
4 524.9
8 326.17
12 247.9
16 221.57
20 187.4
24 171.64
28 176.2
32 159.53
Flow Calculation:
1 1041.68
2 523.79
4 238.93
8 126.03
12 91.68
16 76.93
20 67.85
24 64.21
28 61.2
32 60.19
Attached Files
File Type: zip benchOpenFOAMv2212.zip (21.2 KB, 23 views)
Crowdion likes this.
wkernkamp is offline   Reply With Quote

Old   January 13, 2023, 17:30
Default
  #625
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
There is currently a super cheap QUANTA server on ebay that could achieve the above performance as my MD90-F34 system if configured right. https://www.ebay.com/itm/13404888714...oAAOSwmPhjaqph I have no relationship with the seller.


These 1U systems get noisy when fans spin up. With the E5-2683 v4 chips, the temperature when running openfoam on all cores gets to about 62C and the fans barely spin up. Power consumption is about 360W during the calc, 95W on idle.


A student could put together a system with 64 GB of RAM for under $500. This system would have twice the performance of a DDR5 Desktop system. Graphics you can do on your laptop.
wkernkamp is offline   Reply With Quote

Old   January 24, 2023, 03:33
Default
  #626
Member
 
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12
Jeggi is on a distinguished road
Hardware:
2xEPYC 7302, 16x16 3200 Mhz DDR4 Dual Rank ECC

Software:
OpenFOAM v2212 from openfoam.com

Code:
# cores   Wall time (s):
------------------------
Meshing Times:
1 1195.69
2 777.66
4 436.66
8 247.98
12 182.25
16 160.27
20 136.17
24 125.75
28 126.47
30 134.66
32 121.23
Flow Calculation:
1 822.59
2 415.07
4 178.69
8 85.59
12 57.74
16 43.96
20 39.97
24 36.1
28 31.14
30 30.12
32 31.38
techtuner, wkernkamp and Crowdion like this.
Jeggi is offline   Reply With Quote

Old   January 24, 2023, 15:02
Default
  #627
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
There are others that have achieved somewhat better results. Both single core and 32 core are better. This should not be as you have what appears to be the best possible hardware..

Found this one:
Quote:
Originally Posted by linnemann View Post
Tested on 2xEPYC Rome 7302, 256Gb ram 32x8Gb@2933
No core binding or trickery.

With AOCC 2.1/GCC 9.2.0 and Openfoam 19.12

Code:
# cores    Wall time (s)        
    AOCC 2.1.0        GCC 9.2.0 
        -march=znver2     -march=znver2    Diff %
1    693.3            692.5                0%
2    470.3            470.88                0%
4    167.2            164.52                -2%
8    78.5             77.16                -2%
12    59.5             60.26                1%
16    42.3             41.79                -1%
20    41.2             41.07                0%
24    33.3             33.59                1%
28    34.0             32.36                -5%
32    28.2             27.95                -1%
So no real benefit going for AOCC/Clang

and this one:

Quote:
Originally Posted by Tobermory View Post
Dear speedsters,

here are my results from the motorbike testcase.....
Jeggi likes this.
wkernkamp is offline   Reply With Quote

Old   January 24, 2023, 15:13
Default
  #628
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
There is a spreadsheet started by blackcatxiii that I have updated somewhat for myself (attached). It had the following result:



# 2x EPYC 7302
1 723.64
2 328.11
4 164.21
8 81.4
12 55.2
16 41.1
20 37.53
24 34.27
28 29.99
32 26.89

This is the history of the Benchmarking spreadsheet:
Quote:
Originally Posted by Malinator View Post
Hello, fellow CFD users)

Benchmarks data compiled previously by topicstarter here https://openfoamwiki.net/index.php/Benchmarks was updated in late 2018, so currently it is missing all the modern configurations. It is quite a lot of work to read through all the posts, so short summary can save a lot of time for occasional visitor searching for guidance on optimal config.


So, I used the concept from blackcatxiii and just trimmed his list to commercially available as brand-new single- and dual-CPU systems in workstation and server range. If you're interested in broader range of systems, you can still check them there .

I also added diagram that compares performance on max number of available cores relative to (at the moment) champion in bang-for-$ 2*EPYC 7542 system.

Please keep in mind that performance may be influenced by a lot of things - major factors are at least memory speed, OS, NUMA configuration, OF version, compiler used. Obviously, they differ between different machines that were used by forumers to run benchmark. You can check the actual settings using the link to respective original messages in the first column of table.
Attached Files
File Type: xlsx BenchmarkResults.xlsx (10.5 KB, 29 views)
wkernkamp is offline   Reply With Quote

Old   January 25, 2023, 04:28
Default Very curious of the performance of Apple M1 Ultra...
  #629
New Member
 
Guangyu Zhu
Join Date: May 2013
Posts: 12
Rep Power: 13
bravebear is on a distinguished road
As the M1 Ultra provides super high memory bandwith (up to 800GB/s), would it beat the lateset gen of EPYC or XEON with the similar cores (20cores) in OpenFOAM test? The price of Mac Stuidio equipped with a 20 cores M1 Ultra and 128GB ram is USD 4799.
linuxguy123 likes this.
bravebear is offline   Reply With Quote

Old   January 25, 2023, 05:53
Default
  #630
Member
 
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12
Jeggi is on a distinguished road
It is strange that a system with slower RAM gets 26.89s@32cores.
Do you have any suggestions on where I should look for optimizations?
I think the first step for me is to double-check the RAM modules. I bought the system second-hand, so perhaps the RAM modules are different than I thought.
Jeggi is offline   Reply With Quote

Old   January 25, 2023, 18:47
Default
  #631
Member
 
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7
linuxguy123 is on a distinguished road
One can buy a Mac M2 mini (8 cores) for $600. Supposedly they have a lot of memory bandwidth. Would a cluster of M2 minis be cost competitive with EPYC machines ? I believe one can get a 10 Gbe port for an extra $100.


https://www.reddit.com/r/homelab/com..._m2m2_pro_mac/


Has anyone run the OpenFOAM benchmark on the M2 processors ? M2, M2 Pro, M2 Max, M2 Ultra ?
linuxguy123 is offline   Reply With Quote

Old   January 25, 2023, 23:54
Default
  #632
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
Ookami HPC (48 core, A64FX, 32GB HBM RAM, 1TB/s memory bandwidth) same architecture as Fugaku.
I only tested the performance with 1 node. I have to say the performance is terrible. Apple gave us the illusion that ARM64 is very powerful but the fact is ARM64 HPC is still far away from x86.
# cores Wall time (s):
------------------------
1 2416.19
2 1185.66
4 582.35
8 299.75
12 207.08
24 108.55
36 78.28
48 65.31
xuegy is offline   Reply With Quote

Old   January 26, 2023, 03:30
Default
  #633
Member
 
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12
Jeggi is on a distinguished road
I reran the tests. The previous test was started through a NoMachine, while this test was started through SSH. I suspect that not showing the desktop through NoMachine can have a minor impact on the results.

Hardware:
2xEPYC 7302, 16x16 3200 Mhz DDR4 Dual Rank ECC

Software:
OpenFOAM v2212 from openfoam.com
Code:
# cores   Wall time (s):
------------------------
1 2 4 8 12 16 20 24 28 30 32
Meshing Times:
1 1205.84
2 779.58
4 439.87
8 248.06
12 181.88
16 160.38
20 134.94
24 125.01
28 126.6
30 135.61
32 116.24
Flow Calculation:
1 794.95
2 414.41
4 177.73
8 86.68
12 58.03
16 43.69
20 41.85
24 36.36
28 31.35
30 29.62
32 29.44
This shows a speedup of about 2% compared to the previous test.
wkernkamp and Crowdion like this.
Jeggi is offline   Reply With Quote

Old   January 28, 2023, 04:16
Default
  #634
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Jeggi View Post
It is strange that a system with slower RAM gets 26.89s@32cores.
Do you have any suggestions on where I should look for optimizations?
I think the first step for me is to double-check the RAM modules. I bought the system second-hand, so perhaps the RAM modules are different than I thought.
You should check the ram, even though your latest result is pretty close. An easy way on linux without opening up the computer:
sudo dmidecode -t 17

sudo is an utility to temporarily become super user. If you are already super user you can just do:
dmidecode -t 17



This command gives you information on all your memory slots. Result for each DIMM slot looks like this:

Handle 0x005F, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0049
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMH2
Bank Locator: P1_Node1_Channel3_Dimm1
Type: DDR3
Type Detail: Registered (Buffered)
Speed: 1866 MT/s
Manufacturer: Kingston
Serial Number: 3630B135
Asset Tag: DimmH2_AssetTag
Part Number: KP9RN2-HYC
Rank: 2
Configured Memory Speed: 1866 MT/s
wkernkamp is offline   Reply With Quote

Old   January 29, 2023, 17:12
Default
  #635
Member
 
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12
Jeggi is on a distinguished road
Hi Wkernkamp,

Thanks for taking the time to look into this.

This is what I get when running dmidecode -t 17:
Code:
Handle 0x005D, DMI type 17, 84 bytes
Memory Device
        Array Handle: 0x0024
        Error Information Handle: 0x005C
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16 GB
        Form Factor: DIMM
        Set: None
        Locator: P2-DIMMH1
        Bank Locator: P1_Node0_Channel7_Dimm0
        Type: DDR4
        Type Detail: Synchronous Registered (Buffered)
        Speed: 3200 MT/s
        Manufacturer: Samsung
        Serial Number: 03B9A5D1
        Asset Tag: P2-DIMMH1_AssetTag (date:51/00)
        Part Number: M393A2K43DB3-CWE
        Rank: 2
        Configured Memory Speed: 3200 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V
        Memory Technology: DRAM
        Memory Operating Mode Capability: Volatile memory
        Firmware Version: M393A2K43DB3-CWE
        Module Manufacturer ID: Bank 1, Hex 0xCE
        Module Product ID: Unknown
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Non-Volatile Size: None
        Volatile Size: 16 kB
        Cache Size: None
        Logical Size: None
Jeggi is offline   Reply With Quote

Old   January 29, 2023, 19:01
Default
  #636
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Jeggi View Post
Hi Wkernkamp,

Thanks for taking the time to look into this.

This is what I get when running dmidecode -t 17:

This RDIMM is functioning properly as you can see:
Configured Memory Speed: 3200 MT/s


Your RDIMMs should all have the same rank, this one's rank is 2:
Rank: 2



If there is one not right, you can find it's slot using this record:
Locator: P2-DIMMH1

If all your RDIMM's show this, your memory is functioning OK. No need to open box!
wkernkamp is offline   Reply With Quote

Old   January 30, 2023, 19:57
Default
  #637
Member
 
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12
Jeggi is on a distinguished road
Hi Wkernkamp,

I ran sudo dmidecode -t 17 | grep "Memory Speed" and sudo dmidecode -t 17 | grep "Rank". Both commands gave 16 identical lines of output, so the memory should be working correctly.
wkernkamp likes this.
Jeggi is offline   Reply With Quote

Old   February 28, 2023, 13:58
Default
  #638
Senior Member
 
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 7
Tibo99 is on a distinguished road
Here is a results I got (see picture in attachment) from another type of case. OF 2212v
Attached Images
File Type: png ThinkStationP710_PA.png (29.1 KB, 32 views)
Crowdion likes this.

Last edited by Tibo99; February 28, 2023 at 15:20.
Tibo99 is offline   Reply With Quote

Old   February 28, 2023, 14:19
Default
  #639
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Tibo99 View Post
Here is the results I got (see picture in attachment). OF 2212v

Your table shows a residual for all runs of 1.0e-3. Does this means that you changed the termination criteria for the benchmark? The benchmark normally is made to run for just 100 iterations. The times you are getting are very much longer than what they should be on your machine. I would expect about 65 seconds on all cores and around 1200 seconds on one.


The shape of the curve #cores versus runtime looks normal.

Your processor, the E5-2699C v4, has a low frequency of 2.2 MHz that boosts only to 2.4 MHz regardless of the number of running cores. For the high core count runs, this should not make much of a difference as the memory bandwidth is the limiting factor. However, for your general use of the workstation, the higher clock processors, up to 3.6 MHz two cores active might be attractive. These processors are now very cheap if you look at core counts around 16. As you discovered yourself, at some point, the extra cores do no longer speed up the run.

I have a very similar machine, a Gigabyte MD90-F34 with dual E5-2683 v4. I got that one to complete the benchmark in 60 seconds. Yours has more cache so might be slightly faster if you configure your memory perfectly. That is really the purpose of this benchmark: tune your machine optimally for CFD. You can see that my memory configuration allows a small performance gain all the way to 32 cores, whereas your curve flattens at 24 cores.

Quote:
Originally Posted by wkernkamp View Post
System:
Gigabyte MD80-F34, 2x E5-2683 (16 core, 45 MB Cache), 16x *GB DDR4-2400 2R two RDIMMs per channel, Debian Linux 5.15


Software:
OpenFOAM v2212 from openfoam.com


Code:
Meshing Times:
1 1426.06
2 931.58
4 524.9
8 326.17
12 247.9
16 221.57
20 187.4
24 171.64
28 176.2
32 159.53
Flow Calculation:
1 1041.68
2 523.79
4 238.93
8 126.03
12 91.68
16 76.93
20 67.85
24 64.21
28 61.2
32 60.19
Tibo99 likes this.
wkernkamp is offline   Reply With Quote

Old   February 28, 2023, 14:46
Default
  #640
Senior Member
 
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 7
Tibo99 is on a distinguished road
Thank very much for your quick reply!

Unfortunatly, I saw this post after I perform this analysis, so that why I didnt use your benchmak.

I just chose to share the results even though.

I'll for sure download the benchmark and post the results.

Regarding the results I got, I in fact noticed what you just say about the curve flatten around 24 cores.

From this results, I was wondering if there is a number of cores where the performance getting worse instead of slowly getting better (asymptote), but you answered well this question.

Its really nice to have feedback from somone that has a similar workstation. I would certainly like to optimise this machine. Since you have a similar machine, do you know where I could find good documentation online in order to configuring it?

Thank you again!

Last edited by Tibo99; February 28, 2023 at 15:59.
Tibo99 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 00:22.