How to compute the integral in the parallel case

SaddamH · December 18, 2020, 15:25

Hi everybody, I have a question regarding the computation of integrals using fvc::domainIntegrate.

How can I compute the value of integrals over the different domains defined by the parallel decomposition ?

For example, I run a case in parallel using 6 processors and I want to compute the following quantity, where Ufield is volVectorField for velocity.

double A =fvc::domainIntegrate(Ufield & fvc::div(linearInterpolate(Ufield) & Ufield.mesh().Sf(), Ufield)).value();

when I then print the value of A using the following command

std::cout << "A is " << A << " from " << "processor" + name(Pstream::myProcNo()) << endl;

I get always - 0.156 from processor 0 - printed six times. This means that the integral is computed only over the first decomposed part of the domain. Ideally, I should be able to sum the values of A given by the computation of the integral across the six subdomains but I am not able to figure out how to do this.

Could you help me in solving this issue ?

Thank you very much,
Saddam

olesen · December 18, 2020, 16:22

Quote:

Originally Posted by SaddamH

Hi everybody, I have a question regarding the computation of integrals using fvc::domainIntegrate.

How can I compute the value of integrals over the different domains defined by the parallel decomposition ?

For example, I run a case in parallel using 6 processors and I want to compute the following quantity, where Ufield is volVectorField for velocity.

double A =fvc::domainIntegrate(Ufield & fvc::div(linearInterpolate(Ufield) & Ufield.mesh().Sf(), Ufield)).value();

when I then print the value of A using the following command

std::cout << "A is " << A << " from " << "processor" + name(Pstream::myProcNo()) << endl;

I get always - 0.156 from processor 0 - printed six times. This means that the integral is computed only over the first decomposed part of the domain. Ideally, I should be able to sum the values of A given by the computation of the integral across the six subdomains but I am not able to figure out how to do this.

Could you help me in solving this issue ?

Thank you very much,
Saddam

Use Pout to see information on a per-process basis. Eg,

Code:

Pout<< "Hello from " << Pstream::myProcNo() << nl;

There are different ways to "combine" data from parallel processes. For your particular case, you are looking for a reduction. Eg,

Code:

scalar value = ...;
reduce(value, sumOp<scalar>());

Take a look through the code and you will find plenty of examples.

As a gentle hint, if you are working with OpenFOAM you will generally not use std:: cout for IO, but use OpenFOAM-specific ones. For stdout, it is "Info" which is more or less cout on the master process and /dev/null on all of the others. As soon as you start trying to print out a List or Field of values, you will notice that there are no operators for std ostream, but there are for the OpenFOAM Ostream.
Also do not use "double" and "int", in OpenFOAM they are called "scalar" and "label". It may not seem important at the moment, but if you want code that will also compile with single or mixed-precision and/or 64-bit labelling (for large meshes) it is better to use these OpenFOAM typedefs.

SaddamH · December 18, 2020, 18:37

Quote:

Originally Posted by olesen

Use Pout to see information on a per-process basis. Eg,

Code:

Pout<< "Hello from " << Pstream::myProcNo() << nl;

There are different ways to "combine" data from parallel processes. For your particular case, you are looking for a reduction. Eg,

Code:

scalar value = ...;
reduce(value, sumOp<scalar>());

Take a look through the code and you will find plenty of examples.

As a gentle hint, if you are working with OpenFOAM you will generally not use std:: cout for IO, but use OpenFOAM-specific ones. For stdout, it is "Info" which is more or less cout on the master process and /dev/null on all of the others. As soon as you start trying to print out a List or Field of values, you will notice that there are no operators for std ostream, but there are for the OpenFOAM Ostream.
Also do not use "double" and "int", in OpenFOAM they are called "scalar" and "label". It may not seem important at the moment, but if you want code that will also compile with single or mixed-precision and/or 64-bit labelling (for large meshes) it is better to use these OpenFOAM typedefs.

Thank you very much Mark for your answer and helpful advice. I would like to write the line of codes that I modified according to what you have recommended:

scalar value = fvc::domainIntegrate(L_U_SUPmodes[i] & fvc::div(
linearInterpolate(L_U_SUPmodes[j]) & L_U_SUPmodes[j].mesh().Sf(),
L_U_SUPmodes[k])).value();

reduce(value, sumOp<scalar>());

the last two lines are inside three nested loops which loop over i, j and k, and L_U_SUPmodes is a list of velocity vector fields.

I printed the value of the scalar "value" by doing this:

Pout<< "value from " << Pstream::myProcNo() << " is " << value << nl;

I found it strange that the six values printed (I used six processors for the parallel run) are all the same and equal to 0.157144 so basically the reduce command just gives that value multiplied by the number of processors.

It is worth mention that the correct value of the integral when the code is run in serial is totally different.

Could you help me in identifying the source of this discrepancy.

Thank you very much for your help,
Saddam

olesen · December 19, 2020, 11:35

Quote:

Originally Posted by SaddamH

Pout<< "value from " << Pstream::myProcNo() << " is " << value << nl;

I found it strange that the six values printed (I used six processors for the parallel run) are all the same and equal to 0.157144 so basically the reduce command just gives that value multiplied by the number of processors.

You have reached the wrong conclusion. The reduce method uses the operation that you have specified. In this case a sumOp. Thus you have summed the values from all processors, which is what you originally asked.
After the global reduction, the value MUST be the same on all processors, anything else would be a bug.

SaddamH · December 19, 2020, 12:20

Quote:

Originally Posted by olesen

You have reached the wrong conclusion. The reduce method uses the operation that you have specified. In this case a sumOp. Thus you have summed the values from all processors, which is what you originally asked.
After the global reduction, the value MUST be the same on all processors, anything else would be a bug.

Ok I think I misunderstood the purpose of the reduce method. I thought that the integration will be carried out by each processor separately, which would have required a summation step of the values of the integrals. This is not the case according to what I understand from you. However, I still don't understand why the value of the integral computed in serial (which is the right one) differs from the one computed in parallel.

Thank you very much for your help.

olesen · December 19, 2020, 13:10

I think that at this stage you need to write out what exactly you are doing. That is to say, if you are doing the reduce inside or outside of the loop etc. This might reveal some mismatch of concepts, and explain why you have a different value from the serial case. You say that the value is different than the serial case, but are not specific. Are they different by 1-2 %, different by 10 times, different by a factor equal to the number of processors? This all makes it incredibly difficult to provide anything remotely helpful.

SaddamH · December 19, 2020, 13:33

Quote:

Originally Posted by olesen

I think that at this stage you need to write out what exactly you are doing. That is to say, if you are doing the reduce inside or outside of the loop etc. This might reveal some mismatch of concepts, and explain why you have a different value from the serial case. You say that the value is different than the serial case, but are not specific. Are they different by 1-2 %, different by 10 times, different by a factor equal to the number of processors? This all makes it incredibly difficult to provide anything remotely helpful.

I am sorry for not being clear. I will write the full method which basically computes a third order tensor (with the usage of c++ the library eigen) by looping over a list of volVectorFields (velocity fields) which is defined and read in the main of my application.

Eigen::Tensor<double, 3> convectiveTensorParallel(label NUmodes,
label NPmodes,
label NSUPmodes)
{
label Csize = NUmodes + NSUPmodes + liftfield.size();
Eigen::Tensor<double, 3> C_tensor;
C_tensor.resize(Csize, Csize, Csize);

for (label i = 0; i < Csize; i++)
{
for (label j = 0; j < Csize; j++)
{
for (label k = 0; k < Csize; k++)
{
scalar value = fvc::domainIntegrate(L_U_SUPmodes[i] & fvc::div(
linearInterpolate(L_U_SUPmodes[j]) & L_U_SUPmodes[j].mesh().Sf(),
L_U_SUPmodes[k])).value();

reduce(value, sumOp<scalar>());
C_tensor(i,j,k) = value;

}
}
}

//if (Pstream:

arRun())
//{
// reduce(C_tensor, sumOp<Eigen::Tensor<double, 3>>());
//}

return C_tensor;
}

The important part is the one inside the nested loop, where I compute the value of the integral. The value of the integral for the first loop is 3.548312443e-05 if computed in serial.

In parallel I get the value 0.157144 for the integral and after the reduce command it becomes 0.9428640461 (I use six processors, this latter value is six times the earlier one)

Note that Csize is equal also to the size of the list L_U_SUPmodes.

Thank you very much.

guin · December 20, 2020, 18:06

What about splitting your code into simpler blocks that shall make your life easier when having to debug them? For instance, following lines should just work in a less cryptic manner:

Code:

// ... (after whatever update done for Ufield)

//NOTE: useful when planning to interpolate it later...
Ufield.correctBoundaryConditions(); 

//NOTE: nested just to ensure we don't mess up with your other variable names
{
    const surfaceVectorField Ufield_f = linearInterpolate(Ufield);

    const surfaceScalarField phifield = Ufield_f & Ufield.mesh().Sf();

    const volScalarField UdivU
    (
        Ufield 
      & fvc::div(phifield,"div(phifield)")
    );

    //NOTE: "fvc::domainIntegrate()" is already parallel aware
    const dimensionedScalar A = fvc::domainIntegrate(UdivU); 

    Info<<"your integrated value = "<<A.value()<< endl;
}

December 18, 2020, 15:25	How to compute the integral in the parallel case	#1
SaddamH New Member Saddam Hijazi Join Date: May 2016 Posts: 17 Rep Power: 10	Hi everybody, I have a question regarding the computation of integrals using fvc::domainIntegrate. How can I compute the value of integrals over the different domains defined by the parallel decomposition ? For example, I run a case in parallel using 6 processors and I want to compute the following quantity, where Ufield is volVectorField for velocity. double A =fvc::domainIntegrate(Ufield & fvc::div(linearInterpolate(Ufield) & Ufield.mesh().Sf(), Ufield)).value(); when I then print the value of A using the following command std::cout << "A is " << A << " from " << "processor" + name(Pstream::myProcNo()) << endl; I get always - 0.156 from processor 0 - printed six times. This means that the integral is computed only over the first decomposed part of the domain. Ideally, I should be able to sum the values of A given by the computation of the integral across the six subdomains but I am not able to figure out how to do this. Could you help me in solving this issue ? Thank you very much, Saddam

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Sampling for graphs in parallel for AMI case	C-L	OpenFOAM Bugs	5	January 15, 2018 12:55
implementation of mapFields into parallel transient case	simpomann	OpenFOAM Pre-Processing	4	August 2, 2016 05:41
Superlinear speedup in OpenFOAM 13	msrinath80	OpenFOAM Running, Solving & CFD	18	March 3, 2015 06:36
[OpenFOAM] parallel case with paraview	Ingenieur	ParaView	2	August 16, 2013 09:24
Run in parallel a 2mesh case	cosimobianchini	OpenFOAM Running, Solving & CFD	2	January 11, 2007 07:33

December 19, 2020, 13:10		#6
olesen Senior Member Mark Olesen Join Date: Mar 2009 Location: https://olesenm.github.io/ Posts: 1,715 Rep Power: 40	I think that at this stage you need to write out what exactly you are doing. That is to say, if you are doing the reduce inside or outside of the loop etc. This might reveal some mismatch of concepts, and explain why you have a different value from the serial case. You say that the value is different than the serial case, but are not specific. Are they different by 1-2 %, different by 10 times, different by a factor equal to the number of processors? This all makes it incredibly difficult to provide anything remotely helpful.