|
[Sponsors] |
Best PC recommendation for special CFD simulation with a short time step |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 29, 2019, 02:04 |
Best PC recommendation for special CFD simulation with a short time step
|
#1 |
Member
Join Date: Oct 2019
Posts: 65
Rep Power: 7 |
Hi guys, I use the Flow 3D software for special CFD simulation including both heat and mass transfer. With respect to the nature of the problem, I have to use a very short time step in the term of convergence. The total cell count is about 100k. Currently, I employ the AMD 2970wx configured by 4*8GB 3200MHz ram and CentOS v7.7. This PC shows very high performance in the multi-cores computation of default examples. Although in the cases with the higher time step, the 20 cores performance is double time higher than 8 cores condition, the optimized condition of a customized model achieved just at 8 cores. I checked many various methods, for example, disabling SMT, memory interleaving as the channel, configuring Numa cores, and many other efforts without special benefits. I already checked the other PC (Intel 6580K) and found the same multi-processing problem.
I will be grateful if you suggest some hardware replacement for this special case (Up to 2000$) or some method to improve the performance of my current PC. The following link mentioned about floating-point performance as the most important parameter in CFD. https://www.flow3d.com/hardware-sele...w-3d-products/ https://en.wikichip.org/wiki/flops Based on the attached link, the 2970WX uses the AVX2 & FMA (128-bit) while the intel products utilize the AVX2 & FMA (256-bit) or AVX-512 & FMA (512-bit) with higher FLOPs per cycle. I have tried to find the benefits of AVX-512, although there are some claims on reducing efficiency!! https://software.intel.com/en-us/for...g/topic/815069 https://lemire.me/blog/2018/04/19/by...st-experiment/ As I understand, the poor efficiency of multi-cores means that the problem is not correctly balanced between all cores or that the time needed for communications is important compared to computing time. I believe my model is somehow limited by the intercommunications time. BTW, I am not really sure about the new architecture of the intel product. My license covers up to 32 cores and the time step must be limited to 5e-06s with respect to 0.25mm of the cubic mesh size. |
|
November 29, 2019, 05:50 |
|
#2 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
There is A LOT to unravel here. I might get back to it over the course of the weekend.
For now, I can not help but notice that the first link you quoted is riddled with questionable claims about CFD performance, and it contradicts itself. To quote from that source: Quote:
Weirdly enough, they follow up by contradicting themselves: Quote:
So in short: you are not looking for the highest theoretical floating point performance in a CPU, but a balance between FP performance and memory subsystem performance. |
|||
December 1, 2019, 05:59 |
|
#3 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
All right here is my take on this issue.
First things first: with a low cell count of 200k, your case just might not scale very well on many cores, no matter what you do. Time step size should have very little impact on the whole situation. But here is what you can try with your 2970WX
As for buying a different PC just for this application: While the TR 2970WX might not be ideal for this task, it will be very difficult to get a significantly better configuration in the 2000$ price range. At least once all of the issues above have been addressed. |
|
December 1, 2019, 08:20 |
|
#4 | |
Member
Join Date: Oct 2019
Posts: 65
Rep Power: 7 |
Quote:
At first, thank you for your helpful reply. I need to mention that all my experience on CFD just limited to Flow 3D software and threadripper based system so maybe some of my claims are not true for the other cases. I agree with your comment on multi-core performance using 200k cell but the effect of the time step is completely obvious. Bellow the 1e-6s step the solving time for 4 cores is similar to 20 cores. I am a little confused about the result and very interesting to find the bottleneck. I already checked these customization. For example in my case, disabling the SMT did not show significant improvement (less than 5 percent). Overclocking the frequency of RAM from 2166 (M.B. default) to 3200MHz (RAM default with the best timing) showed about 10 percent improvement. All four slots populated correctly in quad mode. In addition, I use a high-speed M.2 memory (970 Evo). Setting the memory interleaving in Die mode revealed more benefits than Channel mode (5 percent totally). It seems the auto-configuration in ASRock x399 taichi works very well. Maybe, it is better to focus on the codes in simulation and looking for a way to increase the time step or better defining the problem. Thanks again. |
||
December 1, 2019, 09:48 |
|
#5 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
If time step size really has such a high impact with your test case, it will be necessary to find out what is going on here. This is not normal.
Apart from that, you still seem to be missing the most important optimization I mentioned. Run the code only on cores that have direct memory access. Without this potentially huge bottleneck out of the way, judging the impact of other factors is rather pointless. |
|
December 1, 2019, 11:01 |
|
#6 |
Member
Join Date: Oct 2019
Posts: 65
Rep Power: 7 |
Oh, I missed to explain the set up of the cores with direct access to ram. The Linux command provides some option for disabling cores directly so I checked different sets, e.g. disabling die 1 and 3 with and without SMT. As I mentioned, due to plenty of threads in 2970wx, the effect somehow was vanished. I heard that this bottleneck has a significant effect on some benchmark like 7-zip compression using higher than 8 cores, but in Flow 3D the condition looks different.
Thank you. |
|
Tags |
amd, time step reduced |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Time Step Continuity Errors simpleFoam | Dorian1504 | OpenFOAM Running, Solving & CFD | 1 | October 9, 2022 10:23 |
[Other] Contribution a new utility: refine wall layer mesh based on yPlus field | lakeat | OpenFOAM Community Contributions | 58 | December 23, 2021 03:36 |
AMI speed performance | danny123 | OpenFOAM | 21 | October 24, 2020 05:13 |
How to write k and epsilon before the abnormal end | xiuying | OpenFOAM Running, Solving & CFD | 8 | August 27, 2013 16:33 |
IcoFoam parallel woes | msrinath80 | OpenFOAM Running, Solving & CFD | 9 | July 22, 2007 03:58 |