CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > SU2

No OpenMP Performance Gain

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By GomerOfDoom

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 22, 2023, 11:33
Default No OpenMP Performance Gain
  #1
New Member
 
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8
GomerOfDoom is on a distinguished road
Hello all,

I have been playing around a bit with different compilation and execution of SU2 in an attempt to wring out as much performance as possible on my workstation, and noticed something funny: I am not seeing any speedup when using OpenMP.

I configured with meson using:

Code:
> CXXFLAGS="-O3 -march=znver3" ./meson.py build --prefix=<installdir> -Dwith-mpi=enabled -Dwith-omp=true
And then used
Code:
./ninja -C build install
The compilation/installation went fine.

I then run the case (turbulent channel flow with ~650,000 points) using:

Code:
export OMP_NUM_THREADS=<number_of_threads>
mpirun -np 1 SU2_CFD -t <number_of_threads> channel.cfg
However, I see basically no improvement in seconds per iteration from 1 thread to 2, 4, 8, and 16 threads (see attached plot).

Any thoughts? What am I missing? Thanks for any help!

-Paul

PS. I should add that I have run this on a Ryzen 9 5950x using Manjaro Linux and a Ryzen 9 7950x using both Arch Linux under WSL and Manjaro Linux (dual boot).
Attached Images
File Type: png openmp_scaling.png (19.0 KB, 6 views)

Last edited by GomerOfDoom; March 22, 2023 at 11:35. Reason: Added machine info
GomerOfDoom is offline   Reply With Quote

Old   March 22, 2023, 11:47
Default Adding MPI results
  #2
New Member
 
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8
GomerOfDoom is on a distinguished road
Here is a chart showing the same OpenMP scaling as well as the MPI scaling added for comparison.
Attached Images
File Type: png openmp_scaling.png (18.5 KB, 16 views)
GomerOfDoom is offline   Reply With Quote

Old   March 22, 2023, 17:38
Default
  #3
pcg
Senior Member
 
Pedro Gomes
Join Date: Dec 2017
Posts: 466
Rep Power: 14
pcg is on a distinguished road
What does htop look like when su2 is running? You may need to look into thread pinning settings (mpirun --bind-to none, as a first try).
But at those counts per thread on a single machine plain mpi is likely to be faster, unless you are using multi grid for which fewer partitions tend to improve convergence.
-Denable-mixedprec should give you a reasonable boost if you are running implicit.
pcg is offline   Reply With Quote

Old   March 23, 2023, 11:04
Default HTOP Shows No Additional Threads
  #4
New Member
 
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8
GomerOfDoom is on a distinguished road
Pedro,

Thanks for responding! As it turns out, htop shows only one or two cores working, regardless of whether I specify 1, 2, 4, 8, or 16 threads. Interesting!

So, I re-ran using:

Code:
> mpirun -np 1 --bind-to none SU2_CFD -t <num_threads> channel.cfg
And now I get performance changes with different thread numbers! So, what does this mean? Is there a default binding behavior that is set on my machine somewhere? Should I be using "--bind-to none" all the time? Would "--bind-to numa" or another binding option be better?

Thanks for all of your help!

-Paul

Last edited by GomerOfDoom; March 23, 2023 at 11:06. Reason: grammatical
GomerOfDoom is offline   Reply With Quote

Old   March 24, 2023, 16:13
Default
  #5
pcg
Senior Member
 
Pedro Gomes
Join Date: Dec 2017
Posts: 466
Rep Power: 14
pcg is on a distinguished road
I don't know what sets the defaults TBH. --bind-to-numa is preferred and using our OpenMP strategy across multiple numa nodes usually results in reduced performance.
Can you post the scaling now compared to MPI for my curiosity
pcg is offline   Reply With Quote

Old   March 27, 2023, 10:53
Default
  #6
New Member
 
Paul
Join Date: Jul 2018
Posts: 5
Rep Power: 8
GomerOfDoom is on a distinguished road
Hi Pedro,

Thanks again for your help.

Here is the plot of the MPI vs OpenMP scaling with "--bind-to none"

(Note that the colors are switched from my previous plot).

The OpenMP behaves as expected.

I'm sure there are some other knobs I could turn (for example, SMT is enabled on this machine, I used the gnu compilers, etc.)... so I might do a little more fiddling. I'll re-update if I find any other significant improvements.
Attached Images
File Type: png openmp_scaling.png (22.4 KB, 16 views)
pcg likes this.
GomerOfDoom is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
General recommendations for CFD hardware [WIP] flotus1 Hardware 19 June 23, 2024 19:02
Workstation Suggestions For A Newbie mrtcnsmgr Hardware 1 February 22, 2023 02:13
CPU for Flow3d mik_urb Hardware 4 December 4, 2022 23:06
If memory bound : how to improve performance? aerosayan Main CFD Forum 13 July 7, 2021 06:44
Abysmal performance of 64 cores opteron based workstation for CFD Fauster Hardware 8 June 4, 2018 11:51


All times are GMT -4. The time now is 13:46.