How to do communication across processors

saloo · July 19, 2014, 10:10

Hello everybody!:)
I hope you all are doing well with OpenFOAM.

I am working on parallelizing a code in OpenFOAM. This code does Large Eddy Simulation (LES) in OpenFOAM and uses Linear Eddy Model (LEM) as a sub-grid combustion model. In simple words Linear eddy model does direct numerical simulation (DNS) on a 1-D line. If you are more interested, you can read details about it in this PhD thesis: https://www.dropbox.com/s/ftdfmo00ob...200312_phd.pdf

In the code, for every LES cell in OpenFOAM, a new LEM line is initialized using ‘boost::ptr_vector’. See the figure below just to get an idea. In this figure we have 16 LES cells (2-D case having one cell in Z-direction) and in each of these LES cell, a new LEM line is initialized.
https://www.dropbox.com/s/go5zl3l4febcv04/LES-LEM.png

If these LEM lines do not communicate then parallelizing this code in OpenFOAM using domain decomposition is really easy because each LEM line belong to one LES cell only. See the following figure just to get an idea of domain decomposition in this case.
https://www.dropbox.com/s/vm96l3w5kh...omposition.png

The real problem arises when these LEM lines communicate with each other. The communication between LEM lines is required for large scale advection process. In this process large scale (resolved) flow moves across LES cells. In the code this process is modelled by exchanging parts (LEM cells) of a LEM line to the neighboring LEM lines (to ensure conservation of mass). For details about large scale advection (also known as splicing) interested readers can read details in the above mentioned PhD thesis. The figure below is taken from the PhD thesis in order to get a general idea of splicing.
https://www.dropbox.com/s/ijgke1a6wy8ea78/splicing.png

In the code we just loop over all internal (LES) faces of the domain (using mesh::owner()) and do the splicing (exchanging of LEM cells). In serial processing mode it is not a problem because splicing is done in whole domain (on all internal faces). When domain decomposition is done using ‘decomposePar’ and same code is run in parallel then the faces between processor boundaries are left out (splicing between internal faces of a single processor is still happening but not across processor boundaries) and hence no splicing is done across processor boundaries.
One idea on which I am working on now is (at each time step) to make local copies of neighboring LEM lines across each processor boundary to do splicing. To understand the idea, see the figure below.
https://www.dropbox.com/s/fw8ihdkr2w...Processor1.png

In above figure, only processor 1 is considered (the concept is same for all processors). So in order to do splicing on faces between processor boundaries for processor 1, LEM lines of (LES) cells 5, 7, 9 and 10 will be copied and splicing (for faces between processor boundaries) will be done using these local copies.

This is just one idea and for this idea I am having difficulties in finding out how to find all the LES cells neighboring my current processor and more importantly how to copy LEM lines across processor boundaries? How to use Pstream to handle this copying operation because may be Pstream can handle certain types of data and the data that I need to copy is may be not supported by Pstream. Also the LEM lines are on ‘boost::ptr_vector’ and I do not want to copy the pointer to the LEM lines but instead the whole LEM line (the class to which ‘boost::ptr_vector’ is pointing to).

The above idea is just an idea and because I am new to parallelization in OpenFOAM so I have less knowledge of implementation details in OpenFOAM. Any other better idea of implementing parallel ‘splicing’ in OpenFOAM and also if possible any help related to implementation of my current idea in OpenFOAM will be highly appreciated.:)

Many thanks in advance :)

niklas · August 11, 2014, 02:46

Maybe this piece of code will help you understand.
It just sends a random valued vector from each processor to the main processor.

Code:

    Random rnd(0);

    // get the number of processors
    label n=Pstream::nProcs();

    // generate a random vector, since the seed is the same for all procs they will be the same
    // if we only do it once
    vector localV = rnd.vector01();
    for(label i=0; i<Pstream::myProcNo(); i++)
    {
        localV = rnd.vector01();
    }

    // print the vector on the processor (so we can compare it later when we print it on the main proc)
    Sout << "[" << Pstream::myProcNo() << "] V = " << localV << endl;

    if (Pstream::myProcNo() == 0)
    {
        List<vector> allV(n, vector::zero);
        allV[0] = localV;
        for(label i=1; i<n; i++)
	{
            // create the input stream from processor i
 	    IPstream vStream(Pstream::blocking, i);
            vStream >> allV[i];
	}
        // print the list of all vectors on the main proc
        Info << allV << endl;
    }
    else
    {
        // create the stream to send to the main proc
        OPstream vectorStream
	(
	    Pstream::blocking, 0
        );
        vectorStream << localV;
    }

marupio · August 11, 2014, 13:49

This looks like a good candidate for Pstream::exchangeList.

Put the data you need to send to other processors in a list with size Pstream::nProcs(), initialize another list with the same datatype as the one you are sending (also give it a size nProcs) then call Pstream::exchangeList, and the data intended for each processor will be distributed efficiently.

Like this, (pseudo code):

Code:

    scalarListList sendData(Pstream::nProcs());
    scalarListList recvData(Pstream::nProcs());
    // fill in sendData with what you want to send...

    Pstream::exchangeList(sendData, recvData);
    // recvData now has all the data you need, including sendData
    // intended for itself

I'm probably missing something... but this should give you an idea. You can look through the code for other instances where exchangeList is used.

Andrea_85 · December 1, 2015, 11:01

Hello Niklas,

i have just one question regarding your example. The list you "reconstructed" on the main processor is automatically available on all the other proccessors (allV in your example)? If not, how can i send the global list to the processors?

I am trying to do a similar thing with my code (modified version of interFoam). I have a list of vectors for each processors. What i want do is to send each list of vectors to the main processor, reconstruct the "global" list and then make it available for all the processor. The vector list contains points on a surface and i want to calculate the minimum distance between each point of the mesh and the surface (that is why i need the global list for each processor).

Best,

andrea

Andrea_85 · December 4, 2015, 10:47

After a quick search on the forum it seems Pstream::gatherList and ListListOps::combine might do what i need. I try to explain better my problem. I have a pointField named "ppSurf" which contains points on a surface. For serial calculation ppSurf contains all the points of the surface while in case of parallel calculation ppSurf contains the points of the surface which lie on processor i.
This is what i did to gather the points on the master

Code:

 //List with size equal to number of processors
 List< pointField > gatheredData(Pstream::nProcs());

 //  Populate and gather the list onto the master processor.
 gatheredData[Pstream::myProcNo()] = ppSurf;
 Pstream::gatherList(gatheredData);

Now "gatheredData" contains n sublists with all the points, where n is the number of processors. Fine here.

Then i tried to use ListListOps::combine (src/OpenFOAM/containers/Lists/ListListOps/ListListOps.H) to combine the elements of the n sublists into one big list.

Code:

   pointField ppSurfGlobal
    (
         ListListOps::combine<pointField>
         (
               gatheredData,
               ppSurf()      //<--- not sure what i have to put here
           )
   );

However this does not compile. The error is at line "ppSurf()" and OF complains about the fact that a pointField does not have the operator "operator:

)". In the example provided in the doxigen is written that there it should be the access operator to access the individual element of the sublist but i am not sure what does it mean in my case.

Best,
Andrea

hua1015 · July 11, 2017, 08:33

Quote:

Originally Posted by Andrea_85

After a quick search on the forum it seems Pstream::gatherList and ListListOps::combine might do what i need. I try to explain better my problem. I have a pointField named "ppSurf" which contains points on a surface. For serial calculation ppSurf contains all the points of the surface while in case of parallel calculation ppSurf contains the points of the surface which lie on processor i.
This is what i did to gather the points on the master

Code:

 //List with size equal to number of processors
 List< pointField > gatheredData(Pstream::nProcs());

 //  Populate and gather the list onto the master processor.
 gatheredData[Pstream::myProcNo()] = ppSurf;
 Pstream::gatherList(gatheredData);

Now "gatheredData" contains n sublists with all the points, where n is the number of processors. Fine here.

Then i tried to use ListListOps::combine (src/OpenFOAM/containers/Lists/ListListOps/ListListOps.H) to combine the elements of the n sublists into one big list.

Code:

   pointField ppSurfGlobal
    (
         ListListOps::combine<pointField>
         (
               gatheredData,
               ppSurf()      //<--- not sure what i have to put here
           )
   );

However this does not compile. The error is at line "ppSurf()" and OF complains about the fact that a pointField does not have the operator "operator:

)". In the example provided in the doxigen is written that there it should be the access operator to access the individual element of the sublist but i am not sure what does it mean in my case.

Best,
Andrea

use:
ListListOps::combine<Field<Type> >
(
allValues,
accessOp<Field<Type> >()
)

bjnieuwboer · November 6, 2017, 08:16

Thank you all for describing this! It has helped me a lot. For me, this nearly worked. However, I had to include "Pstream::scatterList(gatheredData);" for distributing the data to all processors again, because I needed the information on all processors. If you want to use the field on all processors this is the code you need to use:

Code:

//List with size equal to number of processors
List< pointField > gatheredData(Pstream::nProcs());

//  Populate and gather the list onto the master processor.
gatheredData[Pstream::myProcNo()] = ppSurf;
Pstream::gatherList(gatheredData);
//  Distibulte the data accross the different processors
Pstream::scatterList(gatheredData);

//combining the list with pointFields to a single pointField
pointField ppSurfGlobal
(
   ListListOps::combine<Field<point> >
   (
      gatheredData,
      accessOp<Field<point> >()
   )
);

On the other hand, when you only need the gathered data to work on 1 processor you can/should specify this as well by including an if statement:

Code:

//List with size equal to number of processors
List< pointField > gatheredData(Pstream::nProcs());

//  Populate and gather the list onto the master processor.
gatheredData[Pstream::myProcNo()] = ppSurf;
Pstream::gatherList(gatheredData);

//combining the list with pointFields to a single pointField on the master processor
if (Pstream::master())                           
{
   pointField ppSurfGlobal
   (
      ListListOps::combine<Field<point> >
      (
         gatheredData,
         accessOp<Field<point> >()
      )
    );
}

t.oliveira · September 4, 2018, 09:26

Thanks, David Gaden, for your answer. It is the only one I found that suggests Pstream::exchangeList, which I considered the easiest option to use for my needs of point-to-point communication.

In OpenFOAM v5, it is called Pstream::exchange. I sketch below on way to use it, in case it can be useful to someone. Pstream::exchange is not limited to List<scalar>; you can use other Container and datatype

Code:

scalarListList sendData(Pstream::nProcs());
// sendData[procI] is the scalarList this processor wants to send to processor procI.
// Populate it as you need. Leave it empty if no communication is required with processor procI.
// Note that sendData[Pstream::myProcNo()] can be present.

scalarListList receivedData;
Pstream::exchange<scalarList, scalar>(sendData, receivedData);
// receivedData[procI] is the scalarList received from processor procI.
// receivedData[Pstream::myProcNo()] can be present. It is equal to sendData[Pstream::myProcNo()] 

forAll(receivedData, procI)
{
    if (receivedData[procI].size() > 0)
    {
        // Use receivedData[procI] as you need.            
    }
}

Kind regards,
Thomas

t.oliveira · September 4, 2018, 16:13

Quote:

Originally Posted by marupio

This looks like a good candidate for Pstream::exchangeList.

David Gaden, thank you for pointing out Pstream::exchangeList. It is the most convenient way to perform the point-to-point communication I needed.

In OpenFOAM v5, it is called Pstream::exchange. This is how I am using it. I hope it is correct and can be useful to other readers.

Code:

// The scalarList listsToBeSent[procI] will be sent from this processor to processor procI.
// Populate it listsToBeSent as needed. If not communication is required between this processor and processor procI, leave listsToBeSent[procI] empty.
// Note that Pstream::exchange is not limited to scalarList: it accepts other Containers and datatypes.
scalarListList listsToBeSent(Pstream::nProcs());

// receivedLists[procI] is the scalarList that processor procI sent to this processor
scalarListList receivedLists;
Pstream::exchange<scalarList, scalar>(listsToBeSent, receivedLists);

forAll(receivedLists, procI)
{
    if (receivedLists[procI].size() > 0)
    {
        // Use receivedLists[procI] as needed.
        // Note that receivedLists[Pstream::myProcNo()] == listsToBeSent[Pstream::myProcNo()]
    }
}

Kind regards,
Thomas

klausb · October 4, 2018, 20:30

I want to combine the distributed parts of matrix().lduAddr().lowerAddr() on the master processor for some global matrix structure analysis.

Code:

if (Pstream::master())
{
std::vector<int> global_lowerAddr;
// gather and combine local lowerAddr() into global_lowerAddr 
}

I managed to create a vector containing the local parts as separate blocks of elements but what I need is one vector/container, containing all (global) matrix indices.

How can I combine matrix().lduAddr().lowerAddr() on the master process?

kkamau · March 30, 2019, 16:17

Quote:

Originally Posted by klausb

I want to combine the distributed parts of matrix().lduAddr().lowerAddr() on the master processor for some global matrix structure analysis.

Code:

if (Pstream::master())
{
std::vector<int> global_lowerAddr;
// gather and combine local lowerAddr() into global_lowerAddr 
}

I managed to create a vector containing the local parts as separate blocks of elements but what I need is one vector/container, containing all (global) matrix indices.

How can I combine matrix().lduAddr().lowerAddr() on the master process?

You can't. Once you run decomposePar, new subdomains are created to replace the global ones. Each subdomain is viewed as a totally different independent domain with its own numbering system and boundary condition. The only way one subdomain is affected by the neighbouring subdomain is through the processor boundary conditions which is carried out explicitly after every timestep. However processor boundary patch is shared between the two processors and the face centers are similar in the two subdomains sharing a face. The face ID is different as well.

openfoam_aero · March 22, 2023, 17:23

You are a lifesaver. Thanks a ton for this.

July 19, 2014, 10:10	How to do communication across processors	#1
saloo New Member Salman Arshad Join Date: May 2014 Posts: 4 Rep Power: 12	Hello everybody!:) I hope you all are doing well with OpenFOAM. I am working on parallelizing a code in OpenFOAM. This code does Large Eddy Simulation (LES) in OpenFOAM and uses Linear Eddy Model (LEM) as a sub-grid combustion model. In simple words Linear eddy model does direct numerical simulation (DNS) on a 1-D line. If you are more interested, you can read details about it in this PhD thesis: https://www.dropbox.com/s/ftdfmo00ob...200312_phd.pdf In the code, for every LES cell in OpenFOAM, a new LEM line is initialized using ‘boost::ptr_vector’. See the figure below just to get an idea. In this figure we have 16 LES cells (2-D case having one cell in Z-direction) and in each of these LES cell, a new LEM line is initialized. https://www.dropbox.com/s/go5zl3l4febcv04/LES-LEM.png If these LEM lines do not communicate then parallelizing this code in OpenFOAM using domain decomposition is really easy because each LEM line belong to one LES cell only. See the following figure just to get an idea of domain decomposition in this case. https://www.dropbox.com/s/vm96l3w5kh...omposition.png The real problem arises when these LEM lines communicate with each other. The communication between LEM lines is required for large scale advection process. In this process large scale (resolved) flow moves across LES cells. In the code this process is modelled by exchanging parts (LEM cells) of a LEM line to the neighboring LEM lines (to ensure conservation of mass). For details about large scale advection (also known as splicing) interested readers can read details in the above mentioned PhD thesis. The figure below is taken from the PhD thesis in order to get a general idea of splicing. https://www.dropbox.com/s/ijgke1a6wy8ea78/splicing.png In the code we just loop over all internal (LES) faces of the domain (using mesh::owner()) and do the splicing (exchanging of LEM cells). In serial processing mode it is not a problem because splicing is done in whole domain (on all internal faces). When domain decomposition is done using ‘decomposePar’ and same code is run in parallel then the faces between processor boundaries are left out (splicing between internal faces of a single processor is still happening but not across processor boundaries) and hence no splicing is done across processor boundaries. One idea on which I am working on now is (at each time step) to make local copies of neighboring LEM lines across each processor boundary to do splicing. To understand the idea, see the figure below. https://www.dropbox.com/s/fw8ihdkr2w...Processor1.png In above figure, only processor 1 is considered (the concept is same for all processors). So in order to do splicing on faces between processor boundaries for processor 1, LEM lines of (LES) cells 5, 7, 9 and 10 will be copied and splicing (for faces between processor boundaries) will be done using these local copies. This is just one idea and for this idea I am having difficulties in finding out how to find all the LES cells neighboring my current processor and more importantly how to copy LEM lines across processor boundaries? How to use Pstream to handle this copying operation because may be Pstream can handle certain types of data and the data that I need to copy is may be not supported by Pstream. Also the LEM lines are on ‘boost::ptr_vector’ and I do not want to copy the pointer to the LEM lines but instead the whole LEM line (the class to which ‘boost::ptr_vector’ is pointing to). The above idea is just an idea and because I am new to parallelization in OpenFOAM so I have less knowledge of implementation details in OpenFOAM. Any other better idea of implementing parallel ‘splicing’ in OpenFOAM and also if possible any help related to implementation of my current idea in OpenFOAM will be highly appreciated.:) Many thanks in advance :) peyman.havaej and the_ichthyologist like this.

August 11, 2014, 13:49		#3
marupio Senior Member David Gaden Join Date: Apr 2009 Location: Winnipeg, Canada Posts: 437 Rep Power: 22	This looks like a good candidate for Pstream::exchangeList. Put the data you need to send to other processors in a list with size Pstream::nProcs(), initialize another list with the same datatype as the one you are sending (also give it a size nProcs) then call Pstream::exchangeList, and the data intended for each processor will be distributed efficiently. Like this, (pseudo code): Code: scalarListList sendData(Pstream::nProcs()); scalarListList recvData(Pstream::nProcs()); // fill in sendData with what you want to send... Pstream::exchangeList(sendData, recvData); // recvData now has all the data you need, including sendData // intended for itself I'm probably missing something... but this should give you an idea. You can look through the code for other instances where exchangeList is used. blue8803, saloo, lzhou and 1 others like this. __________________ ~~~ Follow me on twitter @DavidGaden

December 4, 2015, 10:47		#5
Andrea_85 Senior Member Andrea Ferrari Join Date: Dec 2010 Posts: 319 Rep Power: 17	After a quick search on the forum it seems Pstream::gatherList and ListListOps::combine might do what i need. I try to explain better my problem. I have a pointField named "ppSurf" which contains points on a surface. For serial calculation ppSurf contains all the points of the surface while in case of parallel calculation ppSurf contains the points of the surface which lie on processor i. This is what i did to gather the points on the master Code: //List with size equal to number of processors List< pointField > gatheredData(Pstream::nProcs()); // Populate and gather the list onto the master processor. gatheredData[Pstream::myProcNo()] = ppSurf; Pstream::gatherList(gatheredData); Now "gatheredData" contains n sublists with all the points, where n is the number of processors. Fine here. Then i tried to use ListListOps::combine (src/OpenFOAM/containers/Lists/ListListOps/ListListOps.H) to combine the elements of the n sublists into one big list. Code: pointField ppSurfGlobal ( ListListOps::combine<pointField> ( gatheredData, ppSurf() //<--- not sure what i have to put here ) ); However this does not compile. The error is at line "ppSurf()" and OF complains about the fact that a pointField does not have the operator "operator:)". In the example provided in the doxigen is written that there it should be the access operator to access the individual element of the sublist but i am not sure what does it mean in my case. Best, Andrea peyman.havaej likes this.

October 4, 2018, 20:30		#10
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 281 Rep Power: 22	I want to combine the distributed parts of matrix().lduAddr().lowerAddr() on the master processor for some global matrix structure analysis. Code: if (Pstream::master()) { std::vector<int> global_lowerAddr; // gather and combine local lowerAddr() into global_lowerAddr } I managed to create a vector containing the local parts as separate blocks of elements but what I need is one vector/container, containing all (global) matrix indices. How can I combine matrix().lduAddr().lowerAddr() on the master process?

March 22, 2023, 17:23		#12
openfoam_aero Member Uttam Join Date: May 2020 Location: Southampton, United Kingdom Posts: 35 Rep Power: 6	You are a lifesaver. Thanks a ton for this. __________________ Best Regards Uttam ----------------------------------------------------------------- “When everything seem to be going against you, remember that the airplane takes off against the wind, not with it.” – Henry Ford.

December 1, 2015, 11:01		#4
Andrea_85 Senior Member Andrea Ferrari Join Date: Dec 2010 Posts: 319 Rep Power: 17	Hello Niklas, i have just one question regarding your example. The list you "reconstructed" on the main processor is automatically available on all the other proccessors (allV in your example)? If not, how can i send the global list to the processors? I am trying to do a similar thing with my code (modified version of interFoam). I have a list of vectors for each processors. What i want do is to send each list of vectors to the main processor, reconstruct the "global" list and then make it available for all the processor. The vector list contains points on a surface and i want to calculate the minimum distance between each point of the mesh and the surface (that is why i need the global list for each processor). Best, andrea

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Other] How to create uneven load for the processors using decomposePar utility.	shinde.gopal	OpenFOAM Meshing & Mesh Conversion	1	May 24, 2014 09:49
How do I choose de number of processors with paraFoam?	CSN	OpenFOAM Post-Processing	0	April 17, 2012 05:44
polyhedral mesh with multiple processors	user1	Siemens	7	August 22, 2008 11:59
Parallel Computing on Multi-Core Processors	Upgrading Hardware	CFX	6	June 7, 2007 16:54
64-bit processors for home computing	Ananda Himansu	Main CFD Forum	2	March 16, 2004 13:48