|
[Sponsors] |
February 5, 2018, 07:10 |
OpenFOAM benchmarks on various hardware
|
#1 |
Member
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 22 |
** Update 2: I have created a page on the OpenFOAM wiki: https://openfoamwiki.net/index.php/Benchmarks . The updated plot will now be found there as I will eventually not be able to edit this post. But please continue to contribute further benchmarks in this thread! **
** Update: I have now added a plot with minimum time to solution for all hardware posted in this thread! I will try to keep this updated as more results are posted. Thank you for all the contributions! ** Hi, I promised in another thread here to run some OpenFOAM benchmarks on different Intel hardware that I have available, so here they are. These are based on the motorBike benchmark, but I modified it to have more grid cells, run fewer iterations and to use scotch decomposition. You can find the full setup in the attached tar.gz-file. If you want to test on your hardware, you only need to run the run.sh script (you only need to change the number of cores in the three for loops if you want to run on a different number of cores). It would be interesting if more people could contribute to generate a modest database of benchmarks here. The below table shows runtime in seconds. There is also a graph which shows the speedup. Some observations, most relatively obvious :
Code:
# Gold 6148 8x E7-8870 2x E5-2695 v2 2x E5-2643 v3 2x E5-2695 v4 1 874 2132 1451 883 1084 2 435 1124 597 468 578 4 225 476 281 215 273 6 164 297 205 153 189 8 136 203 178 126 146 12 111 148 150 101 104 16 101 104 140 85 20 98 92 137 76 24 77 137 71 36 64 65 ---MODERATOR NOTE--- The original bench template requires some tweaks in order to work with more recent versions of OpenFOAM. Try using bench_template_v02 for the openfoam.org versions instead, courtesy of Simbelmynė For the openfoam.com versions (e.g. v2112) this script should work out of the box: OpenFOAM benchmarks on various hardware bench_template_v02.zip Newer performance charts provided by naffrancois with much more entries Maximum performance: https://ibb.co/MsQh94V Single-core performance: https://ibb.co/GVnbYP5 MS Excel file with the numbers: bdd_cpu_cfdonline.xlsx Last edited by flotus1; November 13, 2022 at 16:52. Reason: bench template outdated |
|
February 7, 2018, 03:29 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Edit: now with modified controldict to get proper results.
mpirun thoroughly disliked my attempts to pin it to certain cores, resulting in abysmal performance for most cases. So these results are just with plain mpirun -np N 2x AMD Epyc 7301, 16x16GB 2Rx4 DDR4-2133 reg ECC, of_v1712, Opensuse Tumbleweed, kernel 4.14.14-1 Code:
# cores Wall time (s) speedup: ------------------------------------------------ 01 1016.6 1.0 02 480.5 2.1 04 231.9 4.4 08 125.4 8.1 12 79.9 12.7 16 66.4 15.3 20 60.5 16.8 24 52.0 19.6 28 49.1 20.7 32 42.6 23.9 Last edited by flotus1; February 8, 2018 at 05:52. |
|
February 7, 2018, 06:25 |
|
#3 |
Member
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 22 |
Thanks for running this, flotus. The error happens at the very end of the simulation so it shouldn't affect the timings by much. If you still want to fix it, see below. Impressive performance, it scales a lot better than the Intel machines. Would be nice to also have a dual socket Gold machine to compare against.
The error is happening when trying to calculate streamlines at the end of the simulation. Looks like this is due to version differences, I see you are using the v1712 version while I use the 5.x version. The easiest way to fix this is to disable the streamline calculation. Just open the file basecase/system/controlDict and remove the lines Code:
#include streamlines #include wallBoundedStreamlines |
|
February 7, 2018, 06:34 |
|
#4 | |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 539
Rep Power: 20 |
Quote:
Code:
1 1041.62 2 595 4 257 8 130 12 85 16 62 24 55 36 44 Only the single core performance is a bit low :-( |
||
February 7, 2018, 06:47 |
|
#5 |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 539
Rep Power: 20 |
Code:
# i7-2600 i7-3960X E5 1650 V3 1 1085 794 824 2 727 433 440 4 253 258 6 212 214 It is very interesting to see that the 3960X is the fastest processor for 1 or 2 core calculations. |
|
February 7, 2018, 12:20 |
|
#6 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Thanks for your input, I will run the case again tonight and edit the results, maybe throw in some better core-binding options. Mpirun or Linux are not fully aware which cores form a NUMA-node.
Until then, results from the machine I used to test your suggestion: single Xeon W3670 (6 cores) with triple-channel DDR3-1333, of_v1712, Opensuse Leap 42.2, kernel 4.4.104-39: Code:
# cores Wall time (s): ------------------------ 1 1262.5 2 849.8 4 649.6 6 622.7 The single-core result for Epyc was to be expected, it only uses a single-core turbo of 2.7GHz. I already stated this in my initial review, AMD missed the spot for medium core count CPUs with higher clock speeds. A 16-core variant with >=3.5GHz or at least higher single-core turbo would have been no problem from a TDP perspective. Forcing you to buy the most expensive SKU with lots of useless cores to get at least 3.2GHz single core is what Intel would do Edit: AMD Epyc results now edited in the second post. Since there were no results for Xeon E5 "v1" yet: Dual Xeon E5-2687W, 16x8GB DDR3-1600, of_v1712, Opensuse leap 42.3, kernel 4.4.103-36 Code:
# cores Wall time (s): ------------------------ 01 898.8 02 502.1 04 235.1 06 169.7 08 141.6 10 128.4 12 119.3 14 116.3 16 112.6 Last edited by flotus1; February 8, 2018 at 05:22. |
|
February 9, 2018, 17:04 |
|
#7 |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
@eric
While the speedup of added cores is interesting, I also think that speedup vs other hardware is of great interest. Since this is present in this thread, perhaps you could also compile and maintain a plot in the first post (if the thread continues to grow that is)? I guess the metric is the lowest possible solution time on a given hardware. Possibly normalized against some system of choice. I'll join in with 1950X, 7940X and 8700k soon, so you get some comparison for lower budget systems |
|
February 9, 2018, 19:16 |
|
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
The problem with that is that you can no longer edit older posts after a few weeks. Maintaining a thread like this becomes impossible. This restriction kept me from starting one or two related threads in the past.
|
|
February 10, 2018, 03:22 |
|
#9 |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
That's strange. A thread like this has definitely the possibility to be "sticky".
Oh, well, browsing to the last post only requires one extra mouse click 7940X, 32 (4x8) GB 3200 MHz RAM, CentOS 7.x, kernel 3.10.0 Code:
# cores Wall time (s): ------------------------ 1 764.36 2 419.98 4 233.26 6 188.29 8 169 12 160.28 14 168.73 Code:
# cores Wall time (s): ------------------------ 1 827.21 2 465.01 4 235.17 6 198.81 8 170.73 12 154.26 16 154.9 Code:
# cores Wall time (s): ------------------------ 1 531.44 2 312.15 4 249.55 6 247.83 For the 8700K system we have: Code:
# cores real time: ------------------------ 1 16m35s 2 10m56s 4 07m01s 6 05m30s Code:
# cores real time: ------------------------ 1 23m32s 2 16m01s 4 08m44s 6 06m50s 8 05m48s 12 04m38s 16 04m12s Last edited by Simbelmynė; February 10, 2018 at 06:26. |
|
February 10, 2018, 06:22 |
|
#10 |
New Member
Join Date: Jan 2018
Posts: 7
Rep Power: 8 |
7820X@4,6Ghz, 4x8 GB 3400MHz RAM, Ubuntu 17.10, kernel 4.13.0-32
Code:
# cores Wall time (s): Speedup: ---------------------------------- 1 756.42 1 2 376.09 2,0 4 205.46 3,7 6 168.24 4,5 8 160.05 4,7 Code:
#Cores Mesh time ----------------- 1 19m37s 1177s 2 13m3s 783s 4 7m23s 443s 6 5m30s 330s 8 5m8s 308s Last edited by The_Sle; February 10, 2018 at 09:18. Reason: Added meshing data |
|
February 10, 2018, 06:26 |
|
#11 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
Quote:
Quote:
Let me know if you want this thread stickied and/or want me to start a wiki page for this! Best regards, Bruno Last edited by wyldckat; February 13, 2018 at 17:53. Reason: see "edit:" |
|||
February 10, 2018, 06:28 |
|
#12 | |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
Quote:
That is really interesting. It seems that the 7940X is a terrible price/performance option compared to the 7820X (this was perhaps known, but not that the 7820X is actually as fast as the 7940X regardless of the number of cores being used). Your system is overclocked on all cores? Perhaps you have some other processes running that interfere with the simulation so some extent? Finally I do not understand why your system is so slow on 1 core, compared to the 8700K, which runs @4.7 GHz on one core (and slower memory). They should be quite similar. |
||
February 10, 2018, 09:20 |
|
#13 | |
New Member
Join Date: Jan 2018
Posts: 7
Rep Power: 8 |
Quote:
Yes it's running 4,6 GHz on all cores. I checked it with turbostat during runs, thermals are OK as well. The newer gen 8700K just is that much faster in single thread workloads I suppose. That 8700K is really impressive actually, and difference between X299 and TR is surprisingly small! |
||
February 13, 2018, 14:10 |
|
#14 |
Member
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 22 |
Thank you for all the contributions! I have made a new plot summarizing all the results, and asked Bruno to sticky the post so that I can keep updating it.
Interesting to see the performance of the "enthusiast" i7 and Threadripper processors, looks like good choices for workstations for testing/developing and pre/post-processing. |
|
February 13, 2018, 14:28 |
|
#15 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Now I feel kind of sorry for adding Xeon W3670 and messing up the scaling in the diagram
But seriously, I think the inverse (iterations per second) would be a better metric to compare in a diagram. Otherwise the huge differences in performance at the top end become indistinguishable. On a side note: It would be helpful if new contributions gave more information about the actual setup. Software versions, memory configuration...but more importantly: clock speeds for over-clockable processors. |
|
February 13, 2018, 17:36 |
|
#16 | |
Member
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 22 |
Quote:
|
||
February 14, 2018, 03:34 |
|
#17 |
New Member
Håvard B. Refvik
Join Date: Jun 2015
Location: Norway
Posts: 17
Rep Power: 11 |
Thank you for starting this thread. I got my hands on a couple of Epyc 7601 processors this week, so figured I'd do the same tests on it for comparison. Will post results with a dual Epyc 7351 when our server arrives in a couple of weeks and a 2 x dual Epyc 7351 when I've had the time to set them up with infiniband.
2x Epyc 7601, 16x 8GB DDR4 2666MHz, 1TB SSD, running OpenFOAM 5.0 on Ubuntu 16.04. Code:
# Cores Wall time [s] Speedup ------------------------------------------------------------ 1 971.64 1 2 577.18 1.7 4 234.01 4.2 6 169.8 5.7 8 132.41 7.3 12 81.52 11.9 16 59.65 16.3 20 62.56 15.5 24 54.39 17.9 28 45.92 21.2 32 43.42 22.4 36 42.83 22.7 48 40.5 24.0 64 35 27.8 Last edited by havref; February 14, 2018 at 11:26. Reason: Added speeedup |
|
February 14, 2018, 15:21 |
|
#18 |
Member
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 22 |
Nice, havref. Looking forward to seeing the 7351 results as well.
It's worth noting that at 64 cores there is only ~30 000 cells per core so communication may start to become a bottleneck. |
|
February 16, 2018, 16:26 |
|
#19 |
New Member
Chad
Join Date: Jan 2017
Posts: 8
Rep Power: 9 |
2x Intel Gold 5118, 12x 8GB DDR4 2400 MHz, M2 SSD, OpenFOAM 4.1, Ubuntu 17.10 Kernel 4.13.0-32
# cores Wall time (s): ------------------------ 1: 1083.38 2: 558.414: 254.74 8: 131.22 16: 80.48 20: 73.1 24: 79.35 While still a novice when it comes to CFD, these results did surprise me as a bit slow. If anyone thinks I may have missed something, let me know and I'll gladly re-run these. |
|
February 18, 2018, 05:26 |
|
#20 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
You could try to run a newer version of OpenFOAM. And since it is mostly the parallel performance >16cores which seems a bit low you could check if RAM came configured properly. Some of the Skylake-SP dual-socket motherboards have more than 12 DIMM-slots, populating memory correctly is crucial here.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |