CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree547Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 29, 2021, 10:03
Default
  #401
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I read that in some places in the US, electricity has become pretty expensive. But I was not aware that they already surpassed Germany at around 32ct/kWh.
flotus1 is offline   Reply With Quote

Old   May 29, 2021, 10:19
Default
  #402
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I read that in some places in the US, electricity has become pretty expensive. But I was not aware that they already surpassed Germany at around 32ct/kWh.
Sorry I calculated a wrong decimal point when using my phone calculator. Will edit that post. Since the electricity bill is paid by the university anyway, I'm not very sensitive to that..
xuegy is offline   Reply With Quote

Old   June 5, 2021, 23:54
Default Ryzen 5000 - may be wrong advice
  #403
New Member
 
Alexander Kazantcev
Join Date: Sep 2019
Posts: 24
Rep Power: 7
AlexKaz is on a distinguished road
Hi all!
Thanks for the topic and public results of benchmark.

I have to say some words about Ryzen 5600x. This CPU has only 16 bytes/tact write transfer speed to RAM. But Ryzen 1800x, 2700x, 3900x, 3950x, 5900x, 5950x have 32 bytes per tact write speed. Then, it is realy bad choise to buy for CFD (and LS-Dyna, Code_Aster and same FEM-software with MPI) 3000-5000 Ryzen CPU with 5-6 cores.
But the speedup of 5000 Ryzen lies in good IPC. Because 2x IPC speed 3900 and 5900 have the same results in benchs as Intel 7980xe and 7960xe. You can find the results at https://openbenchmarking.org/test/pts/openfoam , see 30M model, this model does fit into cache.
Second, cheap Ryzen systems are sensetive for number of DIMM-plates in system. 4 DIMM may be better then 2 DIMM strips, I don't know exactly.
Also there is another results for 3900x in OpenFOAM https://firepear.net/grid/ryzen3900/ , where 3900x two times faster then 2700x, but the ftequencies and threads are same.


All measured results from AIDA with RAM-speed of most Ryzen's you can see there

https://www.hardwareluxx.ru/index.ph...n-9-3900x.html
https://www.hardwareluxx.ru/index.ph...3.html?start=3
https://www.hardwareluxx.ru/index.ph...3.html?start=6
https://www.hardwareluxx.ru/index.ph...n-9-3900x.html
https://3dnews.ru/1024662/obzor-prot...itekture-zen-3

Last edited by AlexKaz; June 7, 2021 at 10:45.
AlexKaz is offline   Reply With Quote

Old   June 9, 2021, 05:58
Default
  #404
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That much is true: Since Zen2, the memory write bandwidth per chiplet is only half of the read bandwidth. More precisely; the bandwidth of the link between the compute die and the I/O die, which then handles memory access. AMD did that for two reasons:
1) it saves power
2) it doesn't affect the vast majority of workloads. And that includes CFD in general

You could theoretically write a CFD code that has similar memory read and write bandwidth requirements. And there might be some carefully optimised research codes that work like this. But for most codes out in the wild, reads are much more important than writes.
Thus the amount of compute dies per CPU should not affect which CPU you buy. Ryzen CPUs with only one compute die are fine. So are Epyc CPUs with only 4 compute dies.
You get more L3 cache with more compute dies, but that's about it.

Quote:
Second, cheap Ryzen systems are sensetive for number of DIMM-plates in system. 4 DIMM may be better then 2 DIMM strips
It's not the amount of DIMMs, but the amount of ranks per memory channel. You want two ranks per channel. That can either be achieved with one dual-rank DIMM, or with two single-rank DIMMs.
flotus1 is offline   Reply With Quote

Old   June 13, 2021, 18:00
Default Some more legacy harware...
  #405
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Hi all, HP DL560 Gen8 with 2x E5-2690v2, 16x16Gb 1866 DDR3. Ubuntu 20.04, OF 7 from .org Ubuntu repository.

Code:
threads	mesh	speedup	sim	speedup	it/s
1.00	1708.00	1.00	934.50	1.00	0.11
2.00	1141.00	1.50	496.60	1.88	0.20
4.00	658.00	2.60	225.90	4.14	0.44
6.00	460.00	3.71	160.40	5.83	0.62
8.00	383.00	4.46	131.80	7.09	0.76
12.00	301.00	5.67	105.10	8.89	0.95
16.00	261.00	6.54	94.20	9.92	1.06
20.00	235.00	7.27	90.20	10.36	1.11
Great value for money - cost me < EUR700 complete. Can't wait to test with 4x E5-4627v2.
wkernkamp likes this.
Kailee71 is offline   Reply With Quote

Old   June 22, 2021, 09:02
Default HP Dl560 Gen 8, 2x 2690 V2, 256Gb 1866 - VM results
  #406
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Hi all,

in addition to the above results on bare metal, here some results from running in VMs of different flavour on TrueNAS Scale 21.04; KVM and Docker are self-explanatory, and as TrueNAS Scale is Debian based the openfoam.org OpenFOAM installs straightforward natively on that also.

Code:
Env	Mesh	Sim	It/s
KVM	362,00	222,15	0,45
Docker	246,00	94,48	1,06
Native	230,10	94,75	1,06
From previous experience with Proxmox on a DL380 G8 I'm a little surprised that on this platform the KVM performance is so much worse than native; on the 380 there was a 5-10% penalty when using a KVM VM, when on the 560 it seems to halve performance. To rule out any losses due to Truenas I tried the same with Proxmox on the DL560 and got comparable results (within a couple of % to TrueNAS Scale); KVM was abysmal, LCX (container) was almost identical to bare metal.

Does anybody have an idea why the 560 might suffer so much from the KVM/Qemu virtualization? Does it have to do with having 4 sockets (currently I only have 2 populated)?

Cheers,

Kai.
Kailee71 is offline   Reply With Quote

Old   July 8, 2021, 03:25
Default
  #407
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
10920x, ddr4 3733 (16-18-18-38) 16g*4, ofv8-commit-9603c.

Code:
# cores   Wall time (s):
------------------------
1 760.04
2 401.79
3 255.95
4 202.53
6 142.74
8 122.8
12 105.17
16 116.53
Single thread wall time varies from 750 to 760, but 12 threads were quite stable.

surfaceFeatureExtractDict has to be changed into surfacesFeatureDict.

streamline and meshDict were modified according to

Quote:
Originally Posted by eric View Post
Code:
#include streamlines
#include wallBoundedStreamlines
You should also delete all the run_* folders before rerunning the run.sh script.
Quote:
Originally Posted by EMurphy View Post

Line 21 in meshQualityDict must be changed from

#includeEtc "caseDicts/meshQualityDict"

to

#includeEtc "caseDicts/mesh/generation/meshQualityDict"
Crowdion likes this.
aparangement is offline   Reply With Quote

Old   July 10, 2021, 09:23
Default
  #408
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Hi all,

tried to run this with openfoam.org OF8 also, however running into issues;

Quote:
surfaceFeatureExtractDict has to be changed into surfacesFeatureDict.
Should this not be "surfaceFeaturesDict"?

Even when doing that, when I run surfaceFeatures it barfs;
Code:
Reading surfaceFeaturesDict



--> FOAM FATAL IO ERROR:
keyword surfaces is undefined in dictionary "/truenas/data/preserve/OF_bench/OF8/run_20/system/surfaceFeaturesDict/motorBike.obj"

file: /truenas/data/preserve/OF_bench/OF8/run_20/system/surfaceFeaturesDict/motorBike.obj from line 20 to line 44.

    From function const Foam::entry& Foam::dictionary::lookupEntry(const Foam::word&, bool, bool) const
    in file db/dictionary/dictionary.C at line 797.

FOAM exiting
What am I missing?

TIA

Kai.
Kailee71 is offline   Reply With Quote

Old   July 10, 2021, 09:39
Default
  #409
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
It seems that openfoam-8 changes the routine of sureface*dict.

I modified the original surfaceFeatureExtractDict according to the example in $FOAM_ETC. I didn't check the extraction results between versions but I think they should be equivalent.


Quote:
Originally Posted by Kailee71 View Post
Hi all,

tried to run this with openfoam.org OF8 also, however running into issues;



Should this not be "surfaceFeaturesDict"?

Even when doing that, when I run surfaceFeatures it barfs;
Code:
Reading surfaceFeaturesDict



--> FOAM FATAL IO ERROR:
keyword surfaces is undefined in dictionary "/truenas/data/preserve/OF_bench/OF8/run_20/system/surfaceFeaturesDict/motorBike.obj"

file: /truenas/data/preserve/OF_bench/OF8/run_20/system/surfaceFeaturesDict/motorBike.obj from line 20 to line 44.

    From function const Foam::entry& Foam::dictionary::lookupEntry(const Foam::word&, bool, bool) const
    in file db/dictionary/dictionary.C at line 797.

FOAM exiting
What am I missing?

TIA

Kai.
Attached Files
File Type: zip surfaceFeaturesDict-modified4ofv8_2.zip (3.7 KB, 13 views)
Kailee71 likes this.
aparangement is offline   Reply With Quote

Old   July 12, 2021, 04:28
Default
  #410
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
7532*2, numa NPS4, ddr4 3200 16g*16 2R, ubuntu18.04, ofv8 commit-30b264 (copiled normally without any special gcc tuning.):

Code:
# cores   Wall time (s):
------------------------
1 730.97
2 342.93
4 171.78
8 81.72
16 41.99
24 29.84
32 23
48 20.04
64 18.4
Quite out of my expectation, even the ST time is less than Intel 10920.
However both ST and MT can't perform as well as the other two 7532 systems, e.g. Novel's. I hope it is due to lack of gcc tuning, and room temperature (around 28 degree C during my testing)



Quote:
Originally Posted by Novel View Post
cores time (s) speedup
1 677,34 1,00
2 363,04 1,87
4 161,42 4,20
6 101,82 6,65
8 77,16 8,78
12 52,28 12,96
16 39,4 17,19
20 32,01 21,16
24 27,31 24,80
28 24,15 28,05
32 21,53 31,46
36 21,32 31,77
40 20,46 33,11
44 18,99 35,67
48 18,12 37,38
52 17,45 38,82
56 17,06 39,70
60 16,5 41,05
[/SIZE] 64 15,91 42,57
aparangement is offline   Reply With Quote

Old   August 12, 2021, 06:41
Default Intel 6338*2
  #411
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Intel 6338*2, numa SNC2 (very similar results by SNC1), ddr4 3200 16g*16 2R, ubuntu18.04, ofv8 commit-30b264 (compiled normally without any special gcc tuning.):

Code:
# cores   Wall time (s):
------------------------
1 732.86
2 358.24
4 171.29
8 91.2
16 53.6
24 42.98
32 36.11
48 30.44
64 27.37
A little bit disappointed as this is much worse than epyc 7002, although according to anandtech, ice lake sp has huge advantage on memory bandwidth.

Clock speed of 6338 is lower than 7532, in full load it's only ~2.6G (VS 3.2G of 7532), however for parallel CFD I guess this should not be the bottleneck, especially when using all 64 cores.

Another reason could be that ubuntu18.04 with kernel5.4 is too old for ice lake sp, but I faced some difficulties installing paraview and Aspeed graphic drivers on 20.04 so I stop the effort and go back to 18.04.

Quote:
Originally Posted by aparangement View Post
7532*2

Code:
# cores   Wall time (s):
------------------------
1 730.97
2 342.93
4 171.78
8 81.72
16 41.99
24 29.84
32 23
48 20.04
64 18.4
HTML Code:
https://images.anandtech.com/doci/16594/STREAM-8380.png
flotus1 and Crowdion like this.
aparangement is offline   Reply With Quote

Old   August 14, 2021, 14:30
Default
  #412
New Member
 
Alexander Kazantcev
Join Date: Sep 2019
Posts: 24
Rep Power: 7
AlexKaz is on a distinguished road
Xeon Silver 4314 Ice Lake-SP Scalable 3rd gen 10nm 2900MHz all cores, RAM 2666 8 dimms 8 channels, NUMA on, HT on (with off will be the same), bios power profile "Power (save)"

OpenFOAM v1806, openmpi 2.1.3, Puppy Linux Fossa mitigations = off (with on result ~ the same)

flow | mesh
1 649.63 16m40.79s
2 369.84 12m3.332s
4 204 7m5.208s
6 148.87 5m17.678s
8 123.86 4m23.601s
12 99.57 3m41.038s
16 85.1 3m18.140s
20 94.18 4m20.113s
24 89.79 3m37.573
28 86.99 3m49.439s
32 84.81 3m50.380s
flotus1 likes this.
AlexKaz is offline   Reply With Quote

Old   August 16, 2021, 23:57
Default Dual AMD EPYC 7313 in 4 channel mode
  #413
New Member
 
Dmitry
Join Date: Feb 2013
Posts: 29
Rep Power: 13
techtuner is on a distinguished road
Dual AMD EPYC 7317 (2x16 cores), 8 x 32 GB DDR4 ECC 3200 MHz 32 GB Micron MTA18ASF4G72PDZ-3G2E1 (in 4 channel mode, NUMA NPS4). Gigabyte MZ72-HB0 (rev. 3.0) motherboard, dual Noctua NH-U14S TR3-SP3 coolers. openfoam 2106, OpenSUSE Leap 15.3.


# cores - Solver Wall time (s):
------------------------------------------
1 - 731.2
2 - 444.28
4 - 199.06
6 - 126.95
8 - 96.3
12 - 69.26
16 - 56.43
20 - 49.85
24 - 47.59
32 - 51.07


On the same system, openfoam 2106, mingw version, Windows Server 2019.

# cores - Solver Wall time (s):
------------------------------------------
16 - 127.2
32 - 66.8
techtuner is offline   Reply With Quote

Old   October 7, 2021, 03:03
Default
  #414
New Member
 
Alexander Kazantcev
Join Date: Sep 2019
Posts: 24
Rep Power: 7
AlexKaz is on a distinguished road
Quote:
Originally Posted by AlexKaz View Post
Hi all!
Thanks for the topic and public results of benchmark.

it is realy bad choise to buy for CFD (and LS-Dyna, Code_Aster and same FEM-software with MPI) 3000-5000 Ryzen CPU with 5-6 cores.
But the speedup of 5000 Ryzen lies in good IPC.
All ok with dyna. I bought 5600x for ls-dyna last months for replacing my Ryzen 1800x. In mpi realization this cpu is double faster in 5 threads then 1800x with 8 threads and about 30% faster than 3800x. Also there are no speed differents in ls-dyna between 2 channel 5600x and new 8 channel 1 x Xeon 4314 with problem size about 0.1M dofs. After some tests checking, there is no very little advantages for DYNA between one 2 rank dimm or two 2 ranks dimms in dual channel or 4 dimms in all slots or two 1 rank dimms.
But in OpenFOAM 3800x faster than 5600x.
I think 5900x will be faster up to 2 times because of full speed of cpu-memory data bus. But now there are no reasons to buy 5900x. 5600x is cold after overclocking up to 4850 MHz and that's why a workstation has low noise. Terrific!

Last edited by AlexKaz; October 7, 2021 at 08:31.
AlexKaz is offline   Reply With Quote

Old   October 7, 2021, 03:35
Default
  #415
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I doubt the "2x faster" hypothesis for a Ryzen 5900X.
What you get compared to a 5600x is twice the amount of L3 cache (which is good) and twice the WRITE memory bandwidth - which is not as awesome as it sounds. Reads are what matters. and they are basically the same with one or two CCDs on Zen3 Ryzen.
flotus1 is offline   Reply With Quote

Old   October 11, 2021, 22:02
Default
  #416
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Ok - I finally got my 4x 4627v2's... Box is the same as in June with the 2690v2's (DL560G8, 16x16Gb DDR1866, data remote via NFS over 10GbE).
Code:
Threads	Tmesh	Tsim	It/s	Wmesh	Wsim	kWh
1	1705	1038.75	0.10	204	212	0.061
2	1164	539.05	0.19	211	221	0.033
4	639	215.99	0.46	314	339	0.020
8	359	114.26	0.88	341	388	0.012
16	239	67.06	1.49	421	507	0.009
24	208	54.18	1.85	494	607	0.009
32	219	49.56	2.02	570	710	0.010
Just for comparison, here were results with the 2690's:
Code:
Threads	Tmesh	Tsim	It/s	Wmesh	Wsim	kWh
20	231	89.36	1.12	326	399	0.010
All for just under EUR900; not bad bang/buck.

[EDIT] Added it/s and kWh for efficiency comparision[/EDIT]
wkernkamp likes this.

Last edited by Kailee71; October 12, 2021 at 07:24.
Kailee71 is offline   Reply With Quote

Old   October 28, 2021, 07:37
Unhappy
  #417
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
That's solid performance already, considering the bandwidth limits.

Would you mind providing more details on the OF installation? By native ARM64 you mean directly install source code on MacOS, or on an arm64 linux virtual machine?

I am very curious since the new M1 max has a much higher memory bandwidth, and the new macbook, althogh quite expensive, is still much cheaper than a 2-way workstaiton. But I have no idear how much work involved to port the codes..


Quote:
Originally Posted by xuegy View Post
Apple M1 Mac mini 16GB 4 big cores @ 3.2GHz
OF-v2012 compiled in native ARM64(still buggy, but I managed to run this benchmark). No SIMD optimization yet. No GPU acceleration yet.
# cores Wall time (s):
------------------------
1 469.16
2 291.02
3 228.07
4 190.39*
Crashed at t=100s for some reason, so I used 99s time = 188.49*100/99

So seems like M1 single-core outperformed all x86 PCs. But the MPI scaling is really bad. Not sure if it's M1 issue or openmpi issue.
aparangement is offline   Reply With Quote

Old   October 28, 2021, 15:47
Default
  #418
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
Quote:
Originally Posted by aparangement View Post
That's solid performance already, considering the bandwidth limits.

Would you mind providing more details on the OF installation? By native ARM64 you mean directly install source code on MacOS, or on an arm64 linux virtual machine?

I am very curious since the new M1 max has a much higher memory bandwidth, and the new macbook, althogh quite expensive, is still much cheaper than a 2-way workstaiton. But I have no idear how much work involved to port the codes..
Here's my patch to compile OF-v2106 on M1:
https://github.com/BrushXue/OpenFOAM-AppleM1
xuegy is offline   Reply With Quote

Old   October 28, 2021, 23:49
Default
  #419
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Thanks a lot xuegy.
Although I have literally zero experience on macOS..

If I managed to get hands on M1 max & find a solution, I will post the benckmark results, which I myself is also looking for.

Quote:
Originally Posted by xuegy View Post
Here's my patch to compile OF-v2106 on M1:
https://github.com/BrushXue/OpenFOAM-AppleM1
aparangement is offline   Reply With Quote

Old   October 28, 2021, 23:59
Default
  #420
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
I can’t wait to see the rumored dual M1 Max Mac Pro next year. Given the memory bandwidth per USD, Apple is not expensive at all.
xuegy is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 20:00.