CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree518Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 19, 2023, 13:21
Default
  #701
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 365
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Tibo99 View Post
Ok, thank you for the clarification!

So, changing this setting is probably the last thing I can do to push the enveloppe without affecting too much the hardware right?

Regards,
Yes, I think so.
Tibo99 likes this.
wkernkamp is offline   Reply With Quote

Old   April 28, 2023, 13:49
Default
  #702
New Member
 
Joost
Join Date: Mar 2023
Posts: 3
Rep Power: 3
Lavos is on a distinguished road
Little update on the hobo-cluster (8 x dual socket e5-2670v1 + 1333ddr3) now with infiniband (40GBit QDR) as it was meant to be. I'm basically seeing linear scaling with the GAMG solver and super linear scaling with PCG solver as advised by the AWS team. Super-linear scaling is due to small size of the benchmark relative to the cpu cache according to the theory. Biggest learning has been that the 1us mpi latency offered by infiniband RMDA is really a must for scaling OpenFOAM in a multi-node setup. I first had the mellanox cards running in 10Gbit ethernet mode using the classic tcp stack and scaling was just awful.

GAMG solver:
16 130,1
32 65,2
48 43,1
64 31,5
80 26,5
96 21,5
112 18,3
128 16,2


PGJ solver:
Flow Calculation:
16 130,0
32 65,0
48 41,9
64 28,9
80 22,4
96 17,8
112 15,1
128 13,4

I'm pretty happy with the results as-is. I might be able to get another 10-25% increase by overclocking the memory to 1666 and some bios/infiniband tuning. Not sure if it's worth the stability trade-off though, rather hook-up an extra 4 nodes if the need arises.
wkernkamp and L C like this.

Last edited by Lavos; April 28, 2023 at 16:40.
Lavos is offline   Reply With Quote

Old   April 28, 2023, 21:17
Default
  #703
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 365
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Lavos View Post
Little update on the hobo-cluster (8 x dual socket e5-2670v1 + 1333ddr3) now with infiniband (40GBit QDR) as it was meant to be.......

I'm pretty happy with the results as-is. I might be able to get another 10-25% increase by overclocking the memory to 1666 and some bios/infiniband tuning. Not sure if it's worth the stability trade-off though, rather hook-up an extra 4 nodes if the need arises.
You could also look into upgrading to Ivy Bridge (Xeon E5 v2) instead of getting extra nodes. The prices for the processors are low: E5-2697 v2 domestic 4 day $37.50 and $32 from China. They have the potential to go to DDR3-1866. I have in the past been able to push 1333 to 1866, but no guarantees! CPU frequencies can be higher on more cores and power consumption will be lower.
Lavos likes this.
wkernkamp is offline   Reply With Quote

Old   May 2, 2023, 06:15
Default
  #704
New Member
 
Joost
Join Date: Mar 2023
Posts: 3
Rep Power: 3
Lavos is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
You could also look into upgrading to Ivy Bridge (Xeon E5 v2) instead of getting extra nodes. The prices for the processors are low: E5-2697 v2 domestic 4 day $37.50 and $32 from China. They have the potential to go to DDR3-1866. I have in the past been able to push 1333 to 1866, but no guarantees! CPU frequencies can be higher on more cores and power consumption will be lower.
I will experiment on a single node to see if the memory takes. The $18 E5-2696v2 is likely already sufficient in since OF scaling is so memory bandwidth bound. Just learned that Ivy Bridge introduced QPI home snoop which should also provide significant higher interanode bandwidth vs older generation. Definitely worth a try! Would be mad if we could push the benchmark sub .1 it/sec with less than 1k of janky old hardware.
wkernkamp likes this.

Last edited by Lavos; May 2, 2023 at 07:48.
Lavos is offline   Reply With Quote

Old   May 2, 2023, 11:52
Default
  #705
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 365
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Lavos View Post
I will experiment on a single node to see if the memory takes. The $18 E5-2696v2 is likely already sufficient in since OF scaling is so memory bandwidth bound. Just learned that Ivy Bridge introduced QPI home snoop which should also provide significant higher interanode bandwidth vs older generation. Definitely worth a try! Would be mad if we could push the benchmark sub .1 it/sec with less than 1k of janky old hardware.

Go for it!
wkernkamp is offline   Reply With Quote

Old   May 24, 2023, 14:42
Post Ryzen 7700X
  #706
New Member
 
Andrew
Join Date: Apr 2012
Posts: 15
Rep Power: 14
Malinator is on a distinguished road
Bench results for modern workstation/desktop when on a budget


HW: AMD Ryzen 7700X (8-core Zen4), MSI MAG B650, 2*16Gb DDR5 (XMP 6200MHz C40, Hynix M-die based)
HW tuning: SMT off, PBO on, Custom optimizer to reduce core voltage by 30 mW, timings, subtimings of memory carefully optimized to 6200Mhz 30-37...etc, FCLK 2133MHz

SW:
Win: Win 10 Pro 22H2, WSL2, OF10 on Ubuntu 22.04.2

Lin: Kernel 6.2.15, Fedora 38, OF 10 compiled with additional -march=zenv4 flag
Results (average on 3 runs for benchv02 from thread head):

Win10 + WSL2

Cores | Wall (flow calculation) time, s -- Meshing time, s

1 | 312.3 -- 636.5

2 | 189.2 -- 430.6

4 | 130.2 -- 243.3

6 | 112.5 -- 202.3

8 | 109.9 -- 184.2

Linux native

Cores | Wall (flow calculation) time, s -- Meshing time, s

1 | 331.5 -- 567.0

2 | 192.9 -- 399.4

4 | 126.2 -- 241.0

6 | 110.3 -- 209.4

8 | 105.9 -- 162.9

Conclusions: decent machine for pre- post-processing (see respectable meshing times of 160ish seconds); even capable of light calculations, but ..
IMHO this particular model (and may be the latest Ryzen consumer line as a whole except for *X3D models) does not use full capability of fast DDR5 modules. 6200-6400Mhz is typically the highest sustainable frequency, memory bandwidth is still gimmicky with Infinity Fabric. This particular CPU has 1 CCX module, which (reportedly and consistent with my observations) likely makes FCLK another bottleneck in memory read tasks. All-in-all, performance-wise in CFD workloads it is more like 5800X3D, and considerably lags behind rivals from latest 13*00k Intel line that are capable of achieving higher memory bandwidth.
Still, a decent upgrade for 2-3 year consumer-class hardware as a relatively quiet desktop workstation.
wkernkamp and ErikAdr like this.
Malinator is offline   Reply With Quote

Old   May 31, 2023, 01:54
Default
  #707
Senior Member
 
Dongyue Li
Join Date: Jun 2012
Location: Beijing, China
Posts: 844
Rep Power: 18
sharonyue is on a distinguished road
Quote:
Originally Posted by ym92 View Post
Unfortunately, the only version I have installed is v2206. Is there a good test case which is available in OF10 and v2206? Or maybe I find a way to download that tutorial somewhere..
Please see the attachment (20m cells). It should work with 2206.

Alright, attachment does not work, please download this one: https://www.cfd-china.com/assets/upl...9-2000w.tar.xz
wkernkamp likes this.
__________________
My OpenFOAM algorithm website: http://dyfluid.com
By far the largest Chinese CFD-based forum: http://www.cfd-china.com/category/6/openfoam
We provide lots of clusters to Chinese customers, and we are considering to do business overseas: http://dyfluid.com/DMCmodel.html

Last edited by sharonyue; May 31, 2023 at 07:54.
sharonyue is offline   Reply With Quote

Old   July 11, 2023, 23:19
Default
  #708
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 42
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Just wondering if ddr5 6000MHz would be faster than this.

Quote:
Originally Posted by Malinator View Post
Bench results for modern workstation/desktop when on a budget


HW: AMD Ryzen 7700X (8-core Zen4), MSI MAG B650, 2*16Gb DDR5 (XMP 6200MHz C40, Hynix M-die based)


Win10 + WSL2

Cores | Wall (flow calculation) time, s -- Meshing time, s

1 | 312.3 -- 636.5

2 | 189.2 -- 430.6

4 | 130.2 -- 243.3

6 | 112.5 -- 202.3

8 | 109.9 -- 184.2

Linux native

Cores | Wall (flow calculation) time, s -- Meshing time, s

1 | 331.5 -- 567.0

2 | 192.9 -- 399.4

4 | 126.2 -- 241.0

6 | 110.3 -- 209.4

8 | 105.9 -- 162.9

Conclusions: decent machine for pre- post-processing (see respectable meshing times of 160ish seconds); even capable of light calculations, but ..
IMHO this particular model (and may be the latest Ryzen consumer line as a whole except for *X3D models) does not use full capability of fast DDR5 modules. 6200-6400Mhz is typically the highest sustainable frequency, memory bandwidth is still gimmicky with Infinity Fabric. This particular CPU has 1 CCX module, which (reportedly and consistent with my observations) likely makes FCLK another bottleneck in memory read tasks. All-in-all, performance-wise in CFD workloads it is more like 5800X3D, and considerably lags behind rivals from latest 13*00k Intel line that are capable of achieving higher memory bandwidth.
Still, a decent upgrade for 2-3 year consumer-class hardware as a relatively quiet desktop workstation.
aparangement is offline   Reply With Quote

Old   July 11, 2023, 23:41
Default
  #709
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 42
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Those numbers are just too great!

However would you mind checking if the cases with 48+ threads actually finished normally?

I am curious just because the improvement is huge, compared with normal L3 7003.

Quote:
Originally Posted by oswald View Post
Hardware: 2x EPYC 7573X, 16x 32GB DDR4
Software: Ubuntu 20.04.3, OF7

Code:
cores   Wall time (s)
1    492.5
4    113.53
8    57.91
12    39.68
16    31.88
20    28.08
24    25.14
28    24.14
32    22.34
40    21.49
48    17.17
56    12.53
64    11.55
I did not use core bindings, which might explain the bad scaling behaviour when using 20 to 40 cores. Compared to my 2xEPYC7543 workstation, this machine is ~33% faster on 64 cores.
aparangement is offline   Reply With Quote

Old   July 24, 2023, 03:42
Default
  #710
Member
 
Join Date: Sep 2010
Location: Leipzig, Germany
Posts: 95
Rep Power: 16
oswald is on a distinguished road
Hi Yan,


I checked it and all runs finished as intended.
oswald is offline   Reply With Quote

Old   July 24, 2023, 05:56
Default
  #711
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,426
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The results look pretty tame compared to "normal" Epyc Milan without 3D-Vcache.
Comparing to my results with two 7543: OpenFOAM benchmarks on various hardware

Code:
#threads | 7543   | 7573X
=========|========|=======
01       | 471.92 | 492.5
02       | 227.14 | ---
04       | 108.51 | 113.53
08       |  52.11 | 57.91
16       |  28.81 | 31.88
32       |  18.11 | 22.34
48       |  15.46 | 17.17
64       |  13.81 | 11.55
I went through some effort to get the intermediate thread count results as fast as possible. So the only reasonable comparison to draw here is at 64 threads. And that difference is well within expectations.
flotus1 is offline   Reply With Quote

Old   July 25, 2023, 11:06
Default Genoa X OpenFOAM performance information released
  #712
Member
 
dab bence
Join Date: Mar 2013
Posts: 47
Rep Power: 13
danbence is on a distinguished road
https://www.amd.com/system/files/doc...b-openfoam.pdf
danbence is offline   Reply With Quote

Old   July 25, 2023, 12:34
Default
  #713
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 365
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by danbence View Post

That is based on a 100x40x40 grid, which is really a small problem. The benefit of the L3 cache reduces the larger the problem.
wkernkamp is offline   Reply With Quote

Old   July 26, 2023, 14:45
Default
  #714
L C
New Member
 
Join Date: Aug 2022
Posts: 8
Rep Power: 4
L C is on a distinguished road
Just take note that the link shows results for the new Genoa-X which has a revised microarchitecture and increased L1 and L2 cache per core, so it more likely scales differently than Milan-X.
L C is offline   Reply With Quote

Old   July 26, 2023, 21:21
Default
  #715
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 365
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by L C View Post
Just take note that the link shows results for the new Genoa-X which has a revised microarchitecture and increased L1 and L2 cache per core, so it more likely scales differently than Milan-X.
OpenFOAM solutions at higher core count are determined by memory bandwidth. The bandwidth to the various caches is much higher than the bandwidth to memory. In fact the memory will not even come into play on these systems when the problem is small. So AMD, having the larger caches, is giving itself a maximum advantage.

I ran a dual xeon v2 system on the phoronix 30M and 60M OpenFOAM test. I compared 2xE5-4627 v2 (16 cores) to 2xE5-2697 v2 (24 cores). (The additional cores beyond 16 don't add much). The difference was quite large on the 30M problem in favor of the E5-2697v2, However, the difference decreased for the 60M problem. I attributed the difference to the 50% larger cache of this processor. The larger the problem gets, the more the equal bandwidth to memory equalizes the run time. On the 2M OpenFOAM Benchmark, the 2xE5-4627v2 completes in 100 seconds and the 2xE5-2697v2 in 86 seconds. I tried to look up the openbenchmark.org results, but gave up. That website is badly in need of a usable interface.
wkernkamp is offline   Reply With Quote

Old   July 27, 2023, 05:22
Default
  #716
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 42
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Your 7543 is fast, for sure.

But I think and fair comparison would be runing the case without mpi tuning. (or with the same level of tuning, but this is sometimes difficult..)

Quote:
Originally Posted by flotus1 View Post
The results look pretty tame compared to "normal" Epyc Milan without 3D-Vcache.
Comparing to my results with two 7543: OpenFOAM benchmarks on various hardware

Code:
#threads | 7543   | 7573X
=========|========|=======
01       | 471.92 | 492.5
02       | 227.14 | ---
04       | 108.51 | 113.53
08       |  52.11 | 57.91
16       |  28.81 | 31.88
32       |  18.11 | 22.34
48       |  15.46 | 17.17
64       |  13.81 | 11.55
I went through some effort to get the intermediate thread count results as fast as possible. So the only reasonable comparison to draw here is at 64 threads. And that difference is well within expectations.
aparangement is offline   Reply With Quote

Old   July 27, 2023, 14:00
Default
  #717
L C
New Member
 
Join Date: Aug 2022
Posts: 8
Rep Power: 4
L C is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
OpenFOAM solutions at higher core count are determined by memory bandwidth. The bandwidth to the various caches is much higher than the bandwidth to memory. In fact the memory will not even come into play on these systems when the problem is small. So AMD, having the larger caches, is giving itself a maximum advantage.
That's what I was trying to convey, just from the other side. When the problem is not constrained by the main memory (i.e. fits within the L3 cache), I'd expect that Genoa should scale better than Rome because the cache hierarchy works more efficiently.
wkernkamp likes this.
L C is offline   Reply With Quote

Old   July 28, 2023, 14:26
Default Slow 5800x3d
  #718
New Member
 
Johannes
Join Date: Sep 2022
Posts: 2
Rep Power: 0
mrlau is on a distinguished road
Hi everyone,


I'm having problems replicating the good results seen from the 5800x3d.


I'm using 2 sticks of I think single rank 3600MHz memory. DOCP is on in the bios, and I have populated the memory slots in accordance with the manual.


I have tried both benchmarks linked in this thread, and openfoam2306 and openfoam11 both packaged and compiled myself.


I have also tried turning SMT off in bios, but that does not seem to make a major difference.


My OS is Ubuntu 22.04.2 LTS


The best results I have gotten are:
# cores Wall time (s):
------------------------
8 171.62
6 167.26
4 174.13
2 235.1
1 409.13


Quite far from what I have seen others report.

Under windows the system performs fine in Cinebench so I don't think it a temperature problem. The PC uses custom loop water cooling.


Any help getting the performance up would be greatly appreciated.
mrlau is offline   Reply With Quote

Old   July 28, 2023, 15:15
Default
  #719
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 365
Rep Power: 14
wkernkamp is on a distinguished road
I have not run this cpu myself. However, I seem to remember that the RAM was run at 4800 MT/s. That would be 33% faster. If you can run your memory at that frequency you might reduce your run time by 20-25%.
wkernkamp is offline   Reply With Quote

Old   July 28, 2023, 15:57
Default
  #720
New Member
 
Johannes
Join Date: Sep 2022
Posts: 2
Rep Power: 0
mrlau is on a distinguished road
Quote:
Originally Posted by Simbelmynė View Post
5800X3D, 2 x 8 GB DDR4 Rank1 @ 3200 MT/s (14-14-14-14-28,1T)
OFv9, OpenSUSE Tumbleweed, GCC 11.2, kernel 5.17.4

The 1 core result is amazing and the 6 core result is pretty decent as well. I assume this is the fastest dual channel CPU for CFD right now. Well at least until someone with a large wallet can post some results for Alder Lake with DDR5 @ 6400+ MT/s EDIT: (missed the post a couple pages back, the i5-12600 with DDR5 @ 6000 MT/s is indeed faster, and not terribly expensive with a B660 motherboard, so definitely a better value if buying an entire new computer)
The single-core result is 33% faster than the 5900X (from this thread). The 5900X has a single core boost up to 4.8 GHz while the 5800X3D only boosts to 4.5 GHz. Apparently the extra V-Cache is more important than the extra single-core speed.



Code:
 cores       Simulation     Meshing
#                (s)      (min.sec)
1             314.21        12m23s
2             201.98        8m21s
4             149.98        5m05s
6             138.55        4m02s
Will update if I manage to push the memory and IF to 1800 MHz.



EDIT:
2 x 8 GB DDR4 Rank1 @3800 MT/s (16-16-16-16-32, 1T)

Code:
cores    Simulation         Meshing
#           (s)             (min.sec)
1            304              12m14
2            188              8m12
4            135              4m58
6            124              3m55
8            122              3m28
I have some results where the IF manages 2000 MHz, which admits 4000 MT/s in 1:1. Not fully stable though so i need a few more days to learn this particular CPU. The interesting part is that higher IF speeds means that the L3 cache latency decreases, so it not only admits higher bandwidths.
These result seems to be obtained with memory at 3200MHz and 3800MHz, even if the timing are a little better on the 3200 kit, my results should still be in the same ballpark I would think
mrlau is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 04:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 06:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 09:54


All times are GMT -4. The time now is 15:33.