[cfMesh] cfmesh in parallel (MPI)

Q.E.D. · May 27, 2016, 15:43

Hello everyone!

I'm currently trying to run cfmesh (v1.1.1, cartesianMesh) in parallel. (MPI) But there occurs an error which I didn't manage to resolve so far. Maybe someone can help me?

The error I get is the following:

[node033:18375] *** An error occurred in MPI_Bsend
[node033:18375] *** reported by process [46912131891201,11]
[node033:18375] *** on communicator MPI_COMM_WORLD
[node033:18375] *** MPI_ERR_BUFFER: invalid buffer pointer
[node033:18375] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node033:18375] *** and potentially your MPI job)

A similiar error has already been reported. (see: https://sourceforge.net/p/cfmesh/tickets/2/) But I cannot find the solution.

Thank you in advance for your support.
Arthur

franjo_j · June 26, 2016, 12:39

Quote:

Originally Posted by Q.E.D.

Hello everyone!

I'm currently trying to run cfmesh (v1.1.1, cartesianMesh) in parallel. (MPI) But there occurs an error which I didn't manage to resolve so far. Maybe someone can help me?

The error I get is the following:

[node033:18375] *** An error occurred in MPI_Bsend
[node033:18375] *** reported by process [46912131891201,11]
[node033:18375] *** on communicator MPI_COMM_WORLD
[node033:18375] *** MPI_ERR_BUFFER: invalid buffer pointer
[node033:18375] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node033:18375] *** and potentially your MPI job)

A similiar error has already been reported. (see: https://sourceforge.net/p/cfmesh/tickets/2/) But I cannot find the solution.

Thank you in advance for your support.
Arthur

Hi,

cfMesh run in parallel all the time. I guess that you are referring to MPI parallelisation.

It is not possible to understand much from what you posted here. Please provide a log file and an example that reproduces the problem (http://sscce.org/).

Regards,

Franjo

tom.opt · January 27, 2020, 07:26

Hi,
I'm having a similar issue.
I am using OpenFOAM v1912 and trying to generate an aircraft mesh.
I'm working on a cluster so i need to run it in parallel using mpi.
When I add a small number of refinement levels it works, however when I increase the refinement levels mpi crashes..

Here is a working meshDict

surfaceFile "ac.stl";

maxCellSize 0.2;

objectRefinements
{

ac3
{
type box;
cellSize 0.1;
centre (3.93106 0.998578 -0.613427);
lengthX 14;
lengthY 6;
lengthZ 6;
}

}
surfaceMeshRefinement
{
TE
{
additionalRefinementLevels 4;
surfaceFile "TE.stl";

}
nose
{
additionalRefinementLevels 3;
surfaceFile "nose.stl";

}
tails
{
additionalRefinementLevels 3;
surfaceFile "tails.stl";

}
wing
{
additionalRefinementLevels 2;
surfaceFile "wing.stl";

}
}

Here is a crashing meshDict:

surfaceFile "ac.stl";

maxCellSize 1;

objectRefinements
{
ac1
{
type box;
cellSize 0.5;
centre (20 0.998578 -0.613427);
lengthX 80;
lengthY 30;
lengthZ 30;
}
ac2
{
type box;
cellSize 0.2;
centre (10 0.998578 -0.613427);
lengthX 50;
lengthY 15;
lengthZ 15;
}
ac3
{
type box;
cellSize 0.1;
centre (3.93106 0.998578 -0.613427);
lengthX 14;
lengthY 6;
lengthZ 6;
}
ac4
{
type box;
cellSize 0.05;
centre (4.2 0.998578 -0.613427);
lengthX 8.9;
lengthY 2.5;
lengthZ 3;
}

}

Has there been a fix?

franjo_j · January 27, 2020, 08:33

Quote:

Originally Posted by tom.opt

Hi,
I'm having a similar issue.
I am using OpenFOAM v1912 and trying to generate an aircraft mesh.
I'm working on a cluster so i need to run it in parallel using mpi.
When I add a small number of refinement levels it works, however when I increase the refinement levels mpi crashes..

Here is a working meshDict

surfaceFile "ac.stl";

maxCellSize 0.2;

objectRefinements
{

ac3
{
type box;
cellSize 0.1;
centre (3.93106 0.998578 -0.613427);
lengthX 14;
lengthY 6;
lengthZ 6;
}

}
surfaceMeshRefinement
{
TE
{
additionalRefinementLevels 4;
surfaceFile "TE.stl";

}
nose
{
additionalRefinementLevels 3;
surfaceFile "nose.stl";

}
tails
{
additionalRefinementLevels 3;
surfaceFile "tails.stl";

}
wing
{
additionalRefinementLevels 2;
surfaceFile "wing.stl";

}
}

Here is a crashing meshDict:

surfaceFile "ac.stl";

maxCellSize 1;

objectRefinements
{
ac1
{
type box;
cellSize 0.5;
centre (20 0.998578 -0.613427);
lengthX 80;
lengthY 30;
lengthZ 30;
}
ac2
{
type box;
cellSize 0.2;
centre (10 0.998578 -0.613427);
lengthX 50;
lengthY 15;
lengthZ 15;
}
ac3
{
type box;
cellSize 0.1;
centre (3.93106 0.998578 -0.613427);
lengthX 14;
lengthY 6;
lengthZ 6;
}
ac4
{
type box;
cellSize 0.05;
centre (4.2 0.998578 -0.613427);
lengthX 8.9;
lengthY 2.5;
lengthZ 3;
}

}

Has there been a fix?

If the problem was due to your meshDict, then the mesher would not work even without MPI.

I assume the problem comes from a limited MPI buffer size that is not large enough to handle all messages. You can increase the buffer size by setting an environment variable MPI_BUFFER_SIZE and keep on increasing the buffer size until it starts working.

Alternatively, you may adjust the buffer size by setting a variable in you $WM_PROJECT_DIR/etc/controlDict. Have a look here: https://www.openfoam.com/releases/op.../usability.php

Franjo

tom.opt · January 27, 2020, 10:26

Quote:

Originally Posted by franjo_j

If the problem was due to your meshDict, then the mesher would not work even without MPI.

I assume the problem comes from a limited MPI buffer size that is not large enough to handle all messages. You can increase the buffer size by setting an environment variable MPI_BUFFER_SIZE and keep on increasing the buffer size until it starts working.

Alternatively, you may adjust the buffer size by setting a variable in you $WM_PROJECT_DIR/etc/controlDict. Have a look here: https://www.openfoam.com/releases/op.../usability.php

Franjo

Thanks.
I updated the value of the buffer in/OpenFOAM/OpenFOAM-v1912/etc/controlDict

I went up to 900 000 000

I rerun the program. But it still seems to crash?

Is there a recommended number of cores that i should use per million cells of mesh size?
i'm planning to generate something of the order 100mil and i'm using 64 cores

franjo_j · January 27, 2020, 10:41

Quote:

Originally Posted by tom.opt

Thanks.
I updated the value of the buffer in/OpenFOAM/OpenFOAM-v1912/etc/controlDict

I rerun the program. But it still seems to crash?

Is there a recommended number of cores that i should use per million cells of mesh size?
i'm planning to generate something of the order 100mil and i'm using 64 cores

cfMesh uses shared-memory parallelization (SMP) by default and MPI is used optionally. MPI is available for cartesianMesh, only.

For example, if there are 64 cores available on a single node, there is no need to use MPI. The code will not run any faster because it uses all cores by default.

Using MPI makes sense in two cases:
1. The desired number of cores is distributed over a given number of nodes.
2. There is not enough memory on a single node.

When using MPI, the number of MPI processes shall be equal to the number of nodes, not the number of cores. Cores are used by default on each node.

tom.opt · January 28, 2020, 05:21

Quote:

Originally Posted by franjo_j

cfMesh uses shared-memory parallelization (SMP) by default and MPI is used optionally. MPI is available for cartesianMesh, only.

For example, if there are 64 cores available on a single node, there is no need to use MPI. The code will not run any faster because it uses all cores by default.

Using MPI makes sense in two cases:
1. The desired number of cores is distributed over a given number of nodes.
2. There is not enough memory on a single node.

When using MPI, the number of MPI processes shall be equal to the number of nodes, not the number of cores. Cores are used by default on each node.

Thank you very much

My hpc architecture was 16 cores per node so once i adjusted that(ie set the number of domains to 4 in decomposeParDict), and also increased the buffer size as previously advised, I managed to get it to run smoothly

May 27, 2016, 15:43	cfmesh in parallel (MPI)	#1
Q.E.D. New Member Join Date: Dec 2015 Posts: 16 Rep Power: 11	Hello everyone! I'm currently trying to run cfmesh (v1.1.1, cartesianMesh) in parallel. (MPI) But there occurs an error which I didn't manage to resolve so far. Maybe someone can help me? The error I get is the following: [node033:18375] * An error occurred in MPI_Bsend [node033:18375] * reported by process [46912131891201,11] [node033:18375] * on communicator MPI_COMM_WORLD [node033:18375] * MPI_ERR_BUFFER: invalid buffer pointer [node033:18375] * MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [node033:18375] * and potentially your MPI job) A similiar error has already been reported. (see: https://sourceforge.net/p/cfmesh/tickets/2/) But I cannot find the solution. Thank you in advance for your support. Arthur al.csc likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
problem during mpi in server: expected Scalar, found on line 0 the word 'nan'	muth	OpenFOAM Running, Solving & CFD	3	August 27, 2018 05:18
Run Mode:Platform MPI Local Parallel core problem	mztcu	CFX	0	October 13, 2016 04:14
Explicitly filtered LES	saeedi	Main CFD Forum	16	October 14, 2015 12:58
simpleFoam parallel	AndrewMortimer	OpenFOAM Running, Solving & CFD	12	August 7, 2015 19:45
Sgimpi	pere	OpenFOAM	27	September 24, 2011 08:57

January 27, 2020, 07:26		#3
tom.opt Member Tom Join Date: Apr 2017 Posts: 50 Rep Power: 9	Hi, I'm having a similar issue. I am using OpenFOAM v1912 and trying to generate an aircraft mesh. I'm working on a cluster so i need to run it in parallel using mpi. When I add a small number of refinement levels it works, however when I increase the refinement levels mpi crashes.. Here is a working meshDict surfaceFile "ac.stl"; maxCellSize 0.2; objectRefinements { ac3 { type box; cellSize 0.1; centre (3.93106 0.998578 -0.613427); lengthX 14; lengthY 6; lengthZ 6; } } surfaceMeshRefinement { TE { additionalRefinementLevels 4; surfaceFile "TE.stl"; } nose { additionalRefinementLevels 3; surfaceFile "nose.stl"; } tails { additionalRefinementLevels 3; surfaceFile "tails.stl"; } wing { additionalRefinementLevels 2; surfaceFile "wing.stl"; } } Here is a crashing meshDict: surfaceFile "ac.stl"; maxCellSize 1; objectRefinements { ac1 { type box; cellSize 0.5; centre (20 0.998578 -0.613427); lengthX 80; lengthY 30; lengthZ 30; } ac2 { type box; cellSize 0.2; centre (10 0.998578 -0.613427); lengthX 50; lengthY 15; lengthZ 15; } ac3 { type box; cellSize 0.1; centre (3.93106 0.998578 -0.613427); lengthX 14; lengthY 6; lengthZ 6; } ac4 { type box; cellSize 0.05; centre (4.2 0.998578 -0.613427); lengthX 8.9; lengthY 2.5; lengthZ 3; } } Has there been a fix?