|
[Sponsors] |
March 22, 2021, 13:44 |
|
#381 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
I think you went on an unnecessary tangent here. The motherboard in your system should only have 16 DIMM slots. You stated earlier that you have 16x8GB of RAM installed. There is just no way to get an unbalanced memory population this way. Just open the side panel and check if all slots are populated. Then check with the operating system that 128GB of RAM are present. CPU-Z won't help you with dual-socket systems.
I am not sure what kind of virtualization you are running here. But if I had to guess, that's probably what is causing the performance hit. |
|
March 22, 2021, 15:12 |
|
#382 | |
New Member
Roland Siemons
Join Date: Mar 2021
Posts: 13
Rep Power: 5 |
Quote:
Yes, Will, the runs complete normally. I ran CPU-Z, but see no abnormalities. I attach the report file (zipped *.txt). It is big, but it can be searched for the term "memory". IF (only then) you would have time for it, you might skim it. Thanks for your suggestions! Roland |
||
March 22, 2021, 16:19 |
|
#383 | |
New Member
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6 |
Quote:
I went through your hardware log. Everything seems fine. I would agree with flotus, a reasonable explanation is the windows subsystem that you use. For optimal results you should make a linux partition and work from there. It takes some time to set up, but not a lot. I would also like to stress your attention in the results you provide. You are oversubscribing, that means that you assign in a single core more than one processes. That is very suboptimal. You should avoid it, it will not provide any benefit, quite the opposite as you can already see from your results. Keep in mind the difference between a physical core and a thread. Last edited by gpouliasis; March 23, 2021 at 05:54. |
||
March 23, 2021, 05:45 |
|
#384 | |
New Member
Roland Siemons
Join Date: Mar 2021
Posts: 13
Rep Power: 5 |
Quote:
Hi Flotus, All physical stuff is there. Not only from visual observation, also from system performance information (all graphs are there, and operational). The mere question is to find the best configuration. The OpenFOAM-v2006 windows version that I used was cross-compiled in OpenSUSE environment using mingw cross-compiler. (as prepared for the FreeCAD windows software). Well I found some interesting settings. See next message. Best regards, Roland Last edited by RolandS; March 23, 2021 at 05:58. Reason: improving message |
||
March 23, 2021, 05:57 |
|
#385 |
New Member
Roland Siemons
Join Date: Mar 2021
Posts: 13
Rep Power: 5 |
Dear Will, Flotus, George,
Thanks for all your effort and advices. I turned my computer into a dual boot machine: Win10 + LinuxMint. The Linux results are drastically improved. As I said, under Win10 I operate OpenFOAM-v2006 under the mingw cross-compiler (as prepared for the FreeCAD windows software). The Linux results are: # cores Wall time (s): ------------------------ 6 - 163.36 10 - 109.74 14 - 89.72 18 - 81.18 22 - 77.7 24 - 76.53 In the attached graph you see the incredible speed-up of operating openFOAM directly under Linux. Unless you see other issues to address, I am happy with this result. Best regards, Roland |
|
March 28, 2021, 20:50 |
Congratulations!
|
#386 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
That looks as expected. Congratulations on your fast and cheap machine. Will |
||
March 29, 2021, 23:45 |
Dual e5-2630 v3
|
#387 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Meshing Times:
1 1651.15 2 1099.88 4 612.52 8 404.99 12 326.62 16 329.21 Flow Calculation: 1 1113.39 2 589.07 4 281.81 8 163.56 12 132 16 115.62 I am going to try turboboost unlock next |
|
April 19, 2021, 21:29 |
Dual E5-4627v3
|
#388 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Gigabyte R180-F34 Dual E5-4627 v3 with 8x8Gb R2x8 2400T (running at 2133 due to v3 processor)
Flow Calculation: 2 578.12 4 277 8 147.97 12 111 16 92.67 20 85.04 |
|
April 30, 2021, 02:32 |
|
#389 |
New Member
Harris Snyder
Join Date: Aug 2018
Posts: 24
Rep Power: 8 |
Full benchmark results to follow, but as a heads up to anyone running Epyc Rome... Try changing the numa nodes per socket in the BIOS. I was able to get the 32 core benchmark time down from around 26.5s to around 24.0s by changing from NPS1 to NPS4. This is on a dual-7302 system.
|
|
May 10, 2021, 03:40 |
Ryzen 3800x
|
#390 |
New Member
Florian
Join Date: May 2021
Posts: 8
Rep Power: 5 |
I benchmarked my Ryzen 3800x with 2x16GB 3000MHz CL16-18-18-38 for 2133MHz and 3000MHz on Ubuntu 20.04.2, OpenFoam v8.
For 2133MHz Flow Mesh 1 713s 1096s 2 457s 756s 4 330s 461s 6 313s 375s 8 315s 341s For 3000MHz Flow Mesh 1 658s 1030s 2 379s 702s 4 261s 419s 6 244s 335s 8 245s 304s It's comparable to the 5600x already posted, but slower than the 3700x. But both of these CPUs run on 3600MHz RAM and Ryzen can use 3200MHz afaik. Do you think it would be worth to upgrade my RAM? |
|
May 10, 2021, 04:31 |
|
#391 | |
Senior Member
Join Date: May 2012
Posts: 551
Rep Power: 16 |
Quote:
Is 3000 MHz the ceiling of your RAM? Have you tried the Ryzen DRAM Calculator? You may be able to tighten those timings substantially if you are lucky. Is it worth upgrading your RAM? This should be easy enough to answer for yourself by looking at the results posted so far. You know the price and you know what results you may accomplish. Is this computer used to run CFD 24/7? Then I would say an upgrade may be worth it. On the other hand, you should probably look at other setups as well in this case imho. My RAM I used for the 3700X results (around 170s for the benchmark) are single rank Samsung b-die, binned at 4133 MHz @ CL18. They easily do 3600 MHz @ CL 15 (I can even push them to stable 3600 MHz @ CL 14, with higher voltage). If you go for 3600 MHz memory then I would suggest CL16 memory, two sticks with two ranks, or four sticks of single rank memory. If you can get that to work I think it would be the sweet spot for your CPU. Your motherboard vendor will likely have a list of qualified memory. That can give you an indication of the capabilities of your motherboard and how good the PCB layout is. The rest is up to the silicon lottery of your CPU. |
||
May 10, 2021, 05:16 |
|
#392 | |
New Member
Florian
Join Date: May 2021
Posts: 8
Rep Power: 5 |
Quote:
Thanks for your answer I use this RAM https://geizhals.de/g-skill-aegis-di...-a1798024.html. So I guess 3000MHz is the ceiling of my RAM, but I am new to this kind of hardware stuff. I did not use the DRAM calculator, but will try and see if I can improve performance. If I could gain an substantial performance increase like 15% with better RAM compared to this setup, it probably would be worth it, but this machine is only for smaller simulations. |
||
May 10, 2021, 06:46 |
|
#393 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Nothing prevents you from overclocking the memory you currently have. That's an easy way to find out if faster memory is worth it to you. And you will probably get within 10% of what could be achieved with higher binned memory modules. Ryzen DRAM calculator is a handy tool for this, especially if you are overwhelmed by the plethora of timing settings.
You already saw the performance improvements going from 2133MT/s to 3000MT/s. If you only control for memory frequency, extrapolating this trend linearly is a good enough estimate for the performance at even higher transfer rates. These are your options: 1) leave everything as-is 2) Manually tune your current memory for higher transfer rates and optimised timings. 3) Buy expensive memory like 3600 CL16, apply XMP without any manual tuning. About the same performance as option 2 4) Same as 3, but further optimise manually. The best option for you depends on how much time you want to spend manually tuning memory frequency, latency and related voltages. The sweet-spot for Zen2 Ryzen CPUs in terms of transfer rates is at DDR4-3600 with a 1:1 ratio of DRAM and infinity fabric. Most of these CPUs don't achieve higher IF clock speeds without much hassle, and switching to a 2:1 ratio for higher memory frequency is not worth it. Last edited by flotus1; May 10, 2021 at 09:25. |
|
May 28, 2021, 12:13 |
AMD Epyc 7532
|
#394 |
New Member
Josh Dyson
Join Date: Mar 2011
Posts: 21
Rep Power: 15 |
Been wanting to add to this benchmark for some time and finally been able to do so.
OpenFOAM-v2012 running on CentOS 7.9. 2x AMD Epyc 7532 with 1TB of 3200mhz RAM. AMD equivalent to hyper threading switched off. # cores Wall time (s): ------------------------ 1 643.75 4 158.48 8 77.35 16 43.92 32 23.68 48 19.69 64 15.97 128 8.94 The 128 core result comes from a 100Gb InfiniBand connection to an identical node. Super linear speed up to 8 cores and 6.25 iterations/s on 64 cores showing 2nd gen Zen is a big step from 1st. |
|
May 29, 2021, 01:06 |
Apple M1
|
#395 |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
Apple M1 Mac mini 16GB 4 big cores @ 3.2GHz
OF-v2012 compiled in native ARM64(still buggy, but I managed to run this benchmark). No SIMD optimization yet. No GPU acceleration yet. # cores Wall time (s): ------------------------ 1 469.16 2 291.02 3 228.07 4 190.39* Crashed at t=100s for some reason, so I used 99s time = 188.49*100/99 So seems like M1 single-core outperformed all x86 PCs. But the MPI scaling is really bad. Not sure if it's M1 issue or openmpi issue. |
|
May 29, 2021, 02:55 |
Probably Memory Bandwidth
|
#396 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
OpenFoam always chokes on the memory channels not being sufficient to make all the cores productive, while the (openmpi) parallel interface is not a big problem. So, you might look at your installed memory (type, frequency, slots filled.)
|
|
May 29, 2021, 03:16 |
|
#397 |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
M1 has 68GB/s bandwidth from LPDDR4X-4266 so it's already workstation level. However it soldered directly on CPU so I don't have a choice. I would say it's either Apple's design problem, or openmpi is not optimized for M1. (because M1 is very different from regular ARM64)
|
|
May 29, 2021, 06:08 |
|
#398 |
Senior Member
Join Date: Apr 2020
Location: UK
Posts: 736
Rep Power: 14 |
Impressive single-core clock time! 50% faster than my 3GHz Epyc 7302. I wonder how it is managing this? Clearly is getting more done each clock cycle ...
|
|
May 29, 2021, 08:39 |
|
#399 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
Edit: dual-channel DDR4-3600 also yields memory bandwidth north of 50GB/s. And is the limiting factor for scaling on the 6-core Ryzen CPU. |
||
May 29, 2021, 09:42 |
|
#400 | |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
Quote:
In the near future we will definitely see more ARM64 servers running not only CFD but also other scientific computing tasks. Last edited by xuegy; May 29, 2021 at 10:20. Reason: wrong decimal point |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |