CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

MPI & shared memory

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By usv001

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   December 15, 2018, 03:57
Default MPI & shared memory
  #1
Senior Member
 
Join Date: Sep 2015
Location: Singapore
Posts: 102
Rep Power: 11
usv001 is on a distinguished road
Dear Foamers,

I would like to know if the following is possible:

Say that I am running a case in parallel. Assuming that all the cores are within the same node, is it possible to declare a shared memory in the heap that is visible to all the cores. Specifically, if each processor creates a field as shown below,

Code:
scalarField* fieldPtr(new scalarField(n));
Can one core access the field created by another core using the pointer address?

Has anyone implemented something like this before? If so, how to go about doing it?

USV
raumpolizei likes this.
usv001 is offline   Reply With Quote

Old   December 30, 2018, 06:30
Default
  #2
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Currently no DMA or RDMA wrapping in OpenFOAM. You will have create your own MPI communicators, access windows etc.
olesen is offline   Reply With Quote

Old   December 31, 2018, 00:47
Default MPI/OpenMP Hybrid Programming in OpenFOAM
  #3
Senior Member
 
Join Date: Sep 2015
Location: Singapore
Posts: 102
Rep Power: 11
usv001 is on a distinguished road
Thank you Mark.

After a little scouring of the Internet, I came to the same conclusion. However, there is a simple but limited solution which is to use OpenMP. Since I created my own schemes and solver, I was able to incorporate quite a bit of OpenMP parallelism into the code. For those trying to use existing solvers/schemes, unfortunately, this won't help you too much unless you re-write the schemes with OpenMP pragmas.

To compile with OpenMP, include the 'fopenmp' flag in the file '$WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt'. So, it should read like this:
Code:
$:cat $WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt
c++DBUG     = 
c++OPT      = -O2 -fopenmp
Remember to source the etc/bashrc file before compiling.

In your solver/scheme, you may need to include "omp.h" for the pragmas to work. After this, you're pretty much set. You can parallelize loops as follows:

Code:
#pragma omp parallel for
forAll(neighbour, celli)
{
    ...
}
When running in MPI/OpenMP mode, I decomposed the domain into the number of NUMA nodes that I am using rather than the total number of cores. After that, I set the number of OMP threads to the number of cores present in each NUMA node using the environment variable 'OMP_NUM_THREADS'. For instance, let's say that there are 4 NUMA nodes in each socket and each NUMA node consists of 6 cores (i.e. 24 cores per socket). If I wish to use 2 sockets in total, I can decompose the domain into 8 sub-domains and run the solver as follows:

Code:
export OMP_NUM_THREADS=6
mpirun -np 8 --map-by ppr:1:numa:pe=6 solver -parallel
This tells that I would like to start 8 MPI processes (same as the number of sub-domains) and each process should be given to 1 NUMA node with 6 cores/threads allocated to each process. Inside each NUMA node, OpenMP can then parallelize over the 6 cores/threads available.

A word of caution though. This may not run any faster (it fact, it ran much slower in many cases) unless a significant portion of the code (i.e. the heavy duty loops) is parallelized and the OpenMP overhead is kept small. Usually, the benefits start showing at higher core counts when MPI traffic starts to dominate. In other cases, I think the bulit-in MPI alone is more efficient.

Lastly, I am no expert in these areas. Just an amateur. So, there could be other things that I am missing and better ways to go about doing it. So, feel free to correct my mistakes and suggest better ways to go about it...

Cheers,
USV
usv001 is offline   Reply With Quote

Old   December 31, 2018, 05:06
Default
  #4
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Quote:
Originally Posted by usv001 View Post
Thank you Mark.

After a little scouring of the Internet, I came to the same conclusion. However, there is a simple but limited solution which is to use OpenMP.
...
To compile with OpenMP, include the 'fopenmp' flag in the file '$WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt'. So, it should read like this:
Code:
$:cat $WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt
c++DBUG     = 
c++OPT      = -O2 -fopenmp
The preferred method is to use the COMP_OPENMP and LINK_OPENMP definitions instead (in your Make/options file) and do NOT touch the wmake rules. Apart from less editing, easier upgrading etc, these are also defined for clang and Intel as well as gcc.
Take a look at the cfmesh integration for examples of using these defines, as well as various openmp directives.

Note that it is also good practice (I think) to guard your openmp pragmas with ifdef/endif so that you can rapidly enable/disable these. Sometimes debugging mpi + openmp can be rather "challenging".
olesen is offline   Reply With Quote

Old   December 31, 2018, 05:15
Default
  #5
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Quote:
Originally Posted by usv001 View Post
A word of caution though. This may not run any faster (it fact, it ran much slower in many cases) unless a significant portion of the code (i.e. the heavy duty loops) is parallelized and the OpenMP overhead is kept small.
Memory bandwidth affects many codes (not just OpenFOAM). You should give this a read :
https://www.ixpug.org/images/docs/IX...g-OpenFOAM.pdf
olesen is offline   Reply With Quote

Old   December 31, 2018, 20:02
Default
  #6
Senior Member
 
zhangyan's Avatar
 
Yan Zhang
Join Date: May 2014
Posts: 120
Rep Power: 12
zhangyan is on a distinguished road
Hi
I'm also interested in this issue.
I want to ask is there any possible to create a shared class whose member variables cost much memory.

PS: For openMP in OpenFOAM, I've found a github repository.
__________________
https://openfoam.top
zhangyan is offline   Reply With Quote

Old   January 2, 2019, 09:00
Default
  #7
Senior Member
 
Join Date: Sep 2015
Location: Singapore
Posts: 102
Rep Power: 11
usv001 is on a distinguished road
Hello Mark,

Quote:
Originally Posted by olesen View Post
The preferred method is to use the COMP_OPENMP and LINK_OPENMP definitions instead (in your Make/options file) and do NOT touch the wmake rules. Apart from less editing, easier upgrading etc, these are also defined for clang and Intel as well as gcc.
Take a look at the cfmesh integration for examples of using these defines, as well as various openmp directives.
That looks interesting. I tried look to for them but couldn't find anything relevant. Could you please post an example of how the Make/options file should look like?

By the way, when OpenMP is not linked, the relevant pragmas are ignored by the compiler. This happens in both GCC and ICC. I don't use Clang though. So, I guess there is no need for guards.

Quote:
Originally Posted by olesen View Post
Memory bandwidth affects many codes (not just OpenFOAM). You should give this a read :
https://www.ixpug.org/images/docs/IX...g-OpenFOAM.pdf
I agree with you completely. I have been doing some preliminary profiling of my code and memory accesses are taking up nearly 80% of the computation time! Clearly, OpenFOAM would do better with more vectorization.

USV
usv001 is offline   Reply With Quote

Old   January 2, 2019, 10:05
Default
  #8
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Quote:
Originally Posted by usv001 View Post
Hello Mark,

That looks interesting. I tried look to for them but couldn't find anything relevant. Could you please post an example of how the Make/options file should look like?

By the way, when OpenMP is not linked, the relevant pragmas are ignored by the compiler. This happens in both GCC and ICC. I don't use Clang though. So, I guess there is no need for guards.

The simplest example is applications/test/openmp/Make/options (in 1712 and later).


If you check the corresponding source file (Test-openmp.C) you'll perhaps see what I mean about the guards. As a minimum, you need a guard around the include <omp.h> statement.
After that you can decide to use any of the following approaches:
  1. Just use the pragmas and let the compiler decide to use/ignore.
  2. Guard with the standard #ifdef _OPENMP
  3. Guard with the cfmesh/OpenFOAM #ifdef USE_OMP

The only reason I suggest the USE_OMP guard is to let you explicitly disable openmp for benchmarking and debugging as required by changing the Make/options entry. If you don't need this for benchmarking, debugging etc, no worries.



Quote:

I agree with you completely. I have been doing some preliminary profiling of my code and memory accesses are taking up nearly 80% of the computation time! Clearly, OpenFOAM would do better with more vectorization.
I wouldn't draw the same conclusion at all, but state that vectorization makes the most sense when the arithmetic intensity is much higher (see the roofline model in the CINECA presentation).
olesen is offline   Reply With Quote

Reply

Tags
mpi, shared memory


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Foam::error::PrintStack almir OpenFOAM Running, Solving & CFD 92 May 21, 2024 08:56
Decomposing meshes Tobi OpenFOAM Pre-Processing 22 February 24, 2023 10:23
decomposePar problem: Cell 0contains face labels out of range vaina74 OpenFOAM Pre-Processing 37 July 20, 2020 06:38
[mesh manipulation] Importing Multiple Meshes thomasnwalshiii OpenFOAM Meshing & Mesh Conversion 18 December 19, 2015 19:57
SigFpe when running ANY application in parallel Pj. OpenFOAM Running, Solving & CFD 3 April 23, 2015 15:53


All times are GMT -4. The time now is 13:17.