|
[Sponsors] |
March 22, 2023, 11:33 |
No OpenMP Performance Gain
|
#1 |
New Member
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8 |
Hello all,
I have been playing around a bit with different compilation and execution of SU2 in an attempt to wring out as much performance as possible on my workstation, and noticed something funny: I am not seeing any speedup when using OpenMP. I configured with meson using: Code:
> CXXFLAGS="-O3 -march=znver3" ./meson.py build --prefix=<installdir> -Dwith-mpi=enabled -Dwith-omp=true Code:
./ninja -C build install I then run the case (turbulent channel flow with ~650,000 points) using: Code:
export OMP_NUM_THREADS=<number_of_threads> mpirun -np 1 SU2_CFD -t <number_of_threads> channel.cfg Any thoughts? What am I missing? Thanks for any help! -Paul PS. I should add that I have run this on a Ryzen 9 5950x using Manjaro Linux and a Ryzen 9 7950x using both Arch Linux under WSL and Manjaro Linux (dual boot). Last edited by GomerOfDoom; March 22, 2023 at 11:35. Reason: Added machine info |
|
March 22, 2023, 11:47 |
Adding MPI results
|
#2 |
New Member
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8 |
Here is a chart showing the same OpenMP scaling as well as the MPI scaling added for comparison.
|
|
March 22, 2023, 17:38 |
|
#3 |
Senior Member
Pedro Gomes
Join Date: Dec 2017
Posts: 466
Rep Power: 14 |
What does htop look like when su2 is running? You may need to look into thread pinning settings (mpirun --bind-to none, as a first try).
But at those counts per thread on a single machine plain mpi is likely to be faster, unless you are using multi grid for which fewer partitions tend to improve convergence. -Denable-mixedprec should give you a reasonable boost if you are running implicit. |
|
March 23, 2023, 11:04 |
HTOP Shows No Additional Threads
|
#4 |
New Member
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8 |
Pedro,
Thanks for responding! As it turns out, htop shows only one or two cores working, regardless of whether I specify 1, 2, 4, 8, or 16 threads. Interesting! So, I re-ran using: Code:
> mpirun -np 1 --bind-to none SU2_CFD -t <num_threads> channel.cfg Thanks for all of your help! -Paul Last edited by GomerOfDoom; March 23, 2023 at 11:06. Reason: grammatical |
|
March 24, 2023, 16:13 |
|
#5 |
Senior Member
Pedro Gomes
Join Date: Dec 2017
Posts: 466
Rep Power: 14 |
I don't know what sets the defaults TBH. --bind-to-numa is preferred and using our OpenMP strategy across multiple numa nodes usually results in reduced performance.
Can you post the scaling now compared to MPI for my curiosity |
|
March 27, 2023, 10:53 |
|
#6 |
New Member
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8 |
Hi Pedro,
Thanks again for your help. Here is the plot of the MPI vs OpenMP scaling with "--bind-to none" (Note that the colors are switched from my previous plot). The OpenMP behaves as expected. I'm sure there are some other knobs I could turn (for example, SMT is enabled on this machine, I used the gnu compilers, etc.)... so I might do a little more fiddling. I'll re-update if I find any other significant improvements. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
General recommendations for CFD hardware [WIP] | flotus1 | Hardware | 19 | June 23, 2024 19:02 |
Workstation Suggestions For A Newbie | mrtcnsmgr | Hardware | 1 | February 22, 2023 02:13 |
CPU for Flow3d | mik_urb | Hardware | 4 | December 4, 2022 23:06 |
If memory bound : how to improve performance? | aerosayan | Main CFD Forum | 13 | July 7, 2021 06:44 |
Abysmal performance of 64 cores opteron based workstation for CFD | Fauster | Hardware | 8 | June 4, 2018 11:51 |