decomposed case to 2-cores (Not working)

pkr · January 28, 2011, 13:18

Hi Bruno,

Thanks for all your help.
I figured out that the problem is because of the way OpneMPI behaves:
"Unless otherwise specified, Open MPI will greedily use all TCP networks that it can find and try to connect to all peers upon demand (i.e., Open MPI does not open sockets to all of its MPI peers during MPI_INIT -- see this FAQ entry for more details). Hence, if you want MPI jobs to not use specific TCP networks -- or not use any TCP networks at all -- then you need to tell Open MPI."

When using MPI_reduce, the OpenMPI was trying to establish TCP through a different interface. The problem is solved if the following command is used:
mpirun --mca btl_tcp_if_exclude lo,virbr0 -hostfile machines -np 2 /home/rphull/OpenFOAM/OpenFOAM-1.6/bin/foamExec interFoam -parallel

The above command will restrict MPI to use certain networks (lo, vibro in this case).

wyldckat · January 28, 2011, 15:23

Hi pkr,

Congratulations! It would have taken me a while longer suspect that this would be the case

I still only had the notion that something was fishy about the network or how things were connecting to each other.
And thank you for sharing the solution!

I guess this is one of those details that either comes from experience or from a dedicated MPI course!

Best regards,
Bruno

pkr · February 4, 2011, 13:40

Hi Bruno,

I am would like to use scotch partitioning instead of metis. Please suggest a way to enable scotch partitioning in OpenFoam-1.6.

Thanks.

wyldckat · February 4, 2011, 18:42

Hi Pkr,

I just tested to confirm it... it's as simple as editing the file "system/decomposeParDict" and setting the method like this:

Code:

method          scotch;

Best regards,
Bruno

pkr · February 23, 2011, 23:54

Bruno,

I have couple on questions on decompositions. On running decomposePar utiluty, the mesh gets decomposed into processor* directories. Each processor directory contains points, cells and neighbors on which the process works upon.

As with the OpenFoam solver, we are trying to solve the equation Ax=b.So i have folowing queries:

1. Where in the code the processor* directories get converted to matrice A, vector x and b?
2. Now each processor will compute matrix-vector product Ax on the set of data it works upon. After each multiplication the processor exchanges boundary elements with the neighbors and somehow updates the matrix-vector product to rectify its own result? How does this update works?
3. How does working on small matrices by each processor equivalent to work on the complete matrix? I mean none of the processor gets the complete Matrix vector product at any time.

Let me know if you have same idea on the same.

Thanks

wyldckat · February 24, 2011, 18:59

Hi Pkr,

Sadly I don't know how OpenFOAM does the Ax=b in parallel.
But if I needed to find out, I would start looking for which solvers/functions/methods call upon the methods made available in "libPstream.so"! The source code for Pstream is in the folder "$WM_PROJECT_DIR/src/Pstream/mpi". It has the methods used for communicating between processors. If you trace back who calls these methods, you should be able to figure out how OpenFOAM does the matrix operations!

The other keywords to keep an eye out for are the classes/methods related to the pre-conditioners and respective matrix solvers that we usually define in fvSolution.

Good luck!
Bruno

pkr · February 24, 2011, 22:24

Bruno, Thanks for your response. Another query:

If the mesh is already decomposed among 4 processors. After decomposition, Is there a way where one of the processor transfer some of it's cells/points/faces to another processor at runtime. I am looking for some kind of join and split operation.

Thanks

deepsterblue · February 24, 2011, 22:52

Quote:

Originally Posted by pkr

1. Where in the code the processor* directories get converted to matrice A, vector x and b?
2. Now each processor will compute matrix-vector product Ax on the set of data it works upon. After each multiplication the processor exchanges boundary elements with the neighbors and somehow updates the matrix-vector product to rectify its own result? How does this update works?
3. How does working on small matrices by each processor equivalent to work on the complete matrix? I mean none of the processor gets the complete Matrix vector product at any time.

I guess I could provide a few clues:

1. The processor directories only contain sub-domains of the mesh after partitioning. Parallelisation is actually done at the lduMatrix level.

2. If you really want to see the exchange process in action, take a look at the lduMatrix class:

($FOAM_SRC)/OpenFOAM/matrices/lduMatrix/lduMatrix/lduMatrixUpdateMatrixInterfaces.C

There's are (init/update)MatrixInterfaces member functions that do what you're looking for.

3. You're right - each processor does only its bit of the matrix multiply. The rest is handled through information exchange across processor boundaries. Most Krylov solvers require some form of global dot-product reduction, but besides that, this is essentially the meat-and-potatoes of it.

Hope this helps.

pkr · February 24, 2011, 23:15

Thanks for your response Sandeep.

It seems that each process solves it's own part of Ax=b. Whre A, x and b are formulated by a sub-domains of mesh in Processor* directories. Can you explain further what do you mean by parallelization at lduMatrix level?

Talking about exchange of messages across the interface, I don't understand why is the following operation performed:

void processorFvPatchField<scalar>::updateInterfaceMatr ix(...) const
{
scalarField pnf
(
procPatch_.compressedReceive<scalar>(commsType, this->size())()
);

const unallocLabelList& faceCells = patch().faceCells();

forAll(faceCells, facei)
{
result[faceCells[facei]] -= coeffs[facei]*pnf[facei];
}
}

What's the need of subtracting coeff[]*pnf[] from matrix-vector product result[]. How does it fix the result after being partitioned?

Again, How does splitting of a big mesh into smaller meshes where each smaller mesh is solved for Ax=b gives the global solution for the complete large mesh.

Thanks.

wyldckat · February 25, 2011, 05:19

Hi pkr,

Quote:

Originally Posted by pkr

If the mesh is already decomposed among 4 processors. After decomposition, Is there a way where one of the processor transfer some of it's cells/points/faces to another processor at runtime. I am looking for some kind of join and split operation.

The application redistributeMeshPar allows you to make some changes to the mesh while offline. I think that you can only transfer cells between processors if you follow the same methodology used in OpenFOAM for dynamic meshes in parallel execution, since the processor interfaces will also change.

Best regards,
Bruno

arjun · February 25, 2011, 07:58

Quote:

Originally Posted by pkr

Thanks for your response Sandeep.

It seems that each process solves it's own part of Ax=b. Whre A, x and b are formulated by a sub-domains of mesh in Processor* directories. Can you explain further what do you mean by parallelization at lduMatrix level?

Talking about exchange of messages across the interface, I don't understand why is the following operation performed:

void processorFvPatchField<scalar>::updateInterfaceMatr ix(...) const
{
scalarField pnf
(
procPatch_.compressedReceive<scalar>(commsType, this->size())()
);

const unallocLabelList& faceCells = patch().faceCells();

forAll(faceCells, facei)
{
result[faceCells[facei]] -= coeffs[facei]*pnf[facei];
}
}

What's the need of subtracting coeff[]*pnf[] from matrix-vector product result[]. How does it fix the result after being partitioned?

Again, How does splitting of a big mesh into smaller meshes where each smaller mesh is solved for Ax=b gives the global solution for the complete large mesh.

Thanks.

This is my guess, lets say you spit the main matrix into two parts
A = A_local + A_parallel.

Now A_Parallel has the matrix coefficients for neighbours on other processors.

To solve Ax = b, you would have to write it like this:

A_local x = b - A_parallel . x_old

This why that vector is subtracting the product of coeff with boundary values.

This is just a guess though.

---------

Another guess is that

Matrix is stored as

Ap phi_p = Sum( Al phi_l ) + b

In this case for vector matrix product you would have to subtract rather than add.

pkr · February 27, 2011, 18:39

Dear Bruno,

I have some basic doubts in the working of interFoam/dambreak case:

1. What difference does it make case if the following changes are made in controlDict:
Start time: 0
end time : 1
Changed to
start time: 0
end time: 0.25

I understand that it reduces the simulation time, but what happens to the quality/correctness of the results?

2.What equation is interFoam/dambreak actually solving? From the code in pEqn.H:

Code:

{
    volScalarField rUA = 1.0/UEqn.A();
    surfaceScalarField rUAf = fvc::interpolate(rUA);

    U = rUA*UEqn.H();

    surfaceScalarField phiU
    (
        "phiU",
        (fvc::interpolate(U) & mesh.Sf())
      + fvc::ddtPhiCorr(rUA, rho, U, phi)
    );

    adjustPhi(phiU, U, p);

    phi = phiU +
        (
            fvc::interpolate(interface.sigmaK())*fvc::snGrad(alpha1)*mesh.magSf()
          + fvc::interpolate(rho)*(g & mesh.Sf())
        )*rUAf;

    for(int nonOrth=0; nonOrth<=nNonOrthCorr; nonOrth++)
    {
        fvScalarMatrix pEqn
        (
            fvm::laplacian(rUAf, p) == fvc::div(phi)
        );

        pEqn.setReference(pRefCell, pRefValue);

        if (corr == nCorr-1 && nonOrth == nNonOrthCorr)
        {
            pEqn.solve(mesh.solver(p.name() + "Final"));
        }
        else
        {
            pEqn.solve(mesh.solver(p.name()));
        }

        if (nonOrth == nNonOrthCorr)
        {
            phi -= pEqn.flux();
        }
    }

    U += rUA*fvc::reconstruct((phi - phiU)/rUAf);
    U.correctBoundaryConditions();
}

How do you interpret the mathematical version of equation being solved?

3. How does solving of some equation narrows down to PCG solver which solves the equation of the form Ax=b?

Looking forward to hear from you.

Thanks.

wyldckat · February 27, 2011, 19:04

Hi Pkr,

My experience in CFD is really limited, so I ask of arjun and deepsterblue or any other experienced OpenFOAM user to fill in the gaps of what I can't answer

Quote:

Originally Posted by pkr

I have some basic doubts in the working of interFoam/dambreak case:

1. What difference does it make case if the following changes are made in controlDict:
Start time: 0
end time : 1
Changed to
start time: 0
end time: 0.25

I understand that it reduces the simulation time, but what happens to the quality/correctness of the results?

If it's only the end time you are changing, quality isn't affected. The solver interFoam - if I'm not mistaken - is a dual-phase transient solver, which means that the solution you get in the end is the real time simulation of one fluid flowing through another... well, snapshots of the real time simulation. So, by reducing from 1 to 0.25, you only get one quarter of the real original simulation.
This is different from, for example, icoFoam and simpleFoam, which are stationary solvers (again, not sure of the terminology); in these the time steps are related to the number of iterations made, with no temporal relevance.

In interFoam, what affects the quality are other parameters, such as: maxCo, maxAlphaCo, maxDeltaT, "writeControl adjustableRunTime", deltaT.
For more information about these, and the physics/equations behind this, read the OpenFOAM's User Guide in the section about the damBreak tutorial: http://www.openfoam.com/docs/user/damBreak.php or more specifically Time step control.

As for the other questions, I don't know enough. Again I suggest you read section in the user guide for the damBreak; also check the fvSchemes file for the list of equations possibly used and the methods used to solve them.
Also, check the rest of the code that interFoam uses, because it might still be using pieces of code from other header files that you didn't look into

Best regards,
Bruno

January 28, 2011, 15:23		#22
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Hi pkr, Congratulations! It would have taken me a while longer suspect that this would be the case I still only had the notion that something was fishy about the network or how things were connecting to each other. And thank you for sharing the solution! I guess this is one of those details that either comes from experience or from a dedicated MPI course! Best regards, Bruno __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

February 4, 2011, 18:42		#24
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Hi Pkr, I just tested to confirm it... it's as simple as editing the file "system/decomposeParDict" and setting the method like this: Code: method scotch; Best regards, Bruno __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

February 24, 2011, 18:59		#26
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Hi Pkr, Sadly I don't know how OpenFOAM does the Ax=b in parallel. But if I needed to find out, I would start looking for which solvers/functions/methods call upon the methods made available in "libPstream.so"! The source code for Pstream is in the folder "$WM_PROJECT_DIR/src/Pstream/mpi". It has the methods used for communicating between processors. If you trace back who calls these methods, you should be able to figure out how OpenFOAM does the matrix operations! The other keywords to keep an eye out for are the classes/methods related to the pre-conditioners and respective matrix solvers that we usually define in fvSolution. Good luck! Bruno __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
bc's of a komegaSST case	Zymon	OpenFOAM	11	July 25, 2010 10:36
Paraview decomposed case without reconstructing?	HelloWorld	OpenFOAM	3	May 8, 2010 10:47
Interfoam Droplet under shear test case	adona058	OpenFOAM Running, Solving & CFD	3	May 3, 2010 19:46
Scale-Up Study in Parallel Processing with OpenFoam	sahm	OpenFOAM	10	April 26, 2010 18:37
Turbulent Flat Plate Validation Case	Jonas Larsson	Main CFD Forum	0	April 2, 2004 11:25

January 28, 2011, 13:18		#21
pkr Member Join Date: Nov 2010 Posts: 33 Rep Power: 15	Hi Bruno, Thanks for all your help. I figured out that the problem is because of the way OpneMPI behaves: "Unless otherwise specified, Open MPI will greedily use all TCP networks that it can find and try to connect to all peers upon demand (i.e., Open MPI does not open sockets to all of its MPI peers during MPI_INIT -- see this FAQ entry for more details). Hence, if you want MPI jobs to not use specific TCP networks -- or not use any TCP networks at all -- then you need to tell Open MPI." When using MPI_reduce, the OpenMPI was trying to establish TCP through a different interface. The problem is solved if the following command is used: mpirun --mca btl_tcp_if_exclude lo,virbr0 -hostfile machines -np 2 /home/rphull/OpenFOAM/OpenFOAM-1.6/bin/foamExec interFoam -parallel The above command will restrict MPI to use certain networks (lo, vibro in this case).

February 4, 2011, 13:40		#23
pkr Member Join Date: Nov 2010 Posts: 33 Rep Power: 15	Hi Bruno, I am would like to use scotch partitioning instead of metis. Please suggest a way to enable scotch partitioning in OpenFoam-1.6. Thanks.

February 23, 2011, 23:54		#25
pkr Member Join Date: Nov 2010 Posts: 33 Rep Power: 15	Bruno, I have couple on questions on decompositions. On running decomposePar utiluty, the mesh gets decomposed into processor* directories. Each processor directory contains points, cells and neighbors on which the process works upon. As with the OpenFoam solver, we are trying to solve the equation Ax=b.So i have folowing queries: 1. Where in the code the processor* directories get converted to matrice A, vector x and b? 2. Now each processor will compute matrix-vector product Ax on the set of data it works upon. After each multiplication the processor exchanges boundary elements with the neighbors and somehow updates the matrix-vector product to rectify its own result? How does this update works? 3. How does working on small matrices by each processor equivalent to work on the complete matrix? I mean none of the processor gets the complete Matrix vector product at any time. Let me know if you have same idea on the same. Thanks

February 24, 2011, 22:24		#27
pkr Member Join Date: Nov 2010 Posts: 33 Rep Power: 15	Bruno, Thanks for your response. Another query: If the mesh is already decomposed among 4 processors. After decomposition, Is there a way where one of the processor transfer some of it's cells/points/faces to another processor at runtime. I am looking for some kind of join and split operation. Thanks

February 24, 2011, 23:15		#29
pkr Member Join Date: Nov 2010 Posts: 33 Rep Power: 15	Thanks for your response Sandeep. It seems that each process solves it's own part of Ax=b. Whre A, x and b are formulated by a sub-domains of mesh in Processor* directories. Can you explain further what do you mean by parallelization at lduMatrix level? Talking about exchange of messages across the interface, I don't understand why is the following operation performed: void processorFvPatchField<scalar>::updateInterfaceMatr ix(...) const { scalarField pnf ( procPatch_.compressedReceive<scalar>(commsType, this->size())() ); const unallocLabelList& faceCells = patch().faceCells(); forAll(faceCells, facei) { result[faceCells[facei]] -= coeffs[facei]pnf[facei]; } } What's the need of subtracting coeff[]pnf[] from matrix-vector product result[]. How does it fix the result after being partitioned?** Again, How does splitting of a big mesh into smaller meshes where each smaller mesh is solved for Ax=b gives the global solution for the complete large mesh. Thanks.