CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM

Open MPI-fork() error

Register Blogs Community New Posts Updated Threads Search

Like Tree5Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 29, 2012, 07:38
Default Open MPI-fork() error
  #1
Senior Member
 
Jian Zhong
Join Date: Feb 2012
Location: Birmingham
Posts: 109
Rep Power: 14
zxj160 is on a distinguished road
Hi,

I run a parallel case from 0s to 20.8s. But there is an error in 20.8s as follow. Could anyone know how to solve it?

Time = 20.8
Courant Number mean: 0.208474 max: 0.316474
DILUPBiCG: Solving for Ux, Initial residual = 0.000518377, Final residual = 1.4007e-06, No Iterations 1
DILUPBiCG: Solving for Uy, Initial residual = 0.0138311, Final residual = 4.40116e-06, No Iterations 2
DILUPBiCG: Solving for Uz, Initial residual = 0.0111841, Final residual = 3.21597e-06, No Iterations 2
DILUPBiCG: Solving for C, Initial residual = 0.000130826, Final residual = 5.54248e-08, No Iterations 1
[1] #0 Foam::error:rintStack(Foam::Ostream&)[5] #0 Foam::error:rintStack(Foam::Ostream&)[3] #0 Foam::error:rintStack(Foam::Ostream&)[7] #0 Foam::error:rintStack(Foam::Ostream&)--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: u2n126 (PID 19527)
MPI_COMM_WORLD rank: 1
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
zxj160 is offline   Reply With Quote

Old   June 30, 2012, 08:37
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings zxj160,

Not much information to go on with to diagnose the issue.

According to the error message, it looks like you're trying to launch another application from within the solver during parallel execution.

I've searched online for the last message line and picked up on this:
Quote:
Originally Posted by http://webstokes.ist.ucf.edu/forum/viewtopic.php?f=10&t=101#p247

Change the mpirun line from:
Code:
mpirun -machinefile $PBS_NODEFILE -np $NP $EXECUTABLE
to:
Code:
mpirun --mca mpi_warn_on_fork 0 -machinefile $PBS_NODEFILE -np $NP $EXECUTABLE
Please let me know if this works for you.
Best regards,
Bruno
EleCr and LiedtkeJ like this.
__________________
wyldckat is offline   Reply With Quote

Old   June 30, 2012, 13:42
Default
  #3
Senior Member
 
Jian Zhong
Join Date: Feb 2012
Location: Birmingham
Posts: 109
Rep Power: 14
zxj160 is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Greetings zxj160,

Not much information to go on with to diagnose the issue.

According to the error message, it looks like you're trying to launch another application from within the solver during parallel execution.

I've searched online for the last message line and picked up on this:


Best regards,
Bruno
Hi, many thanks for your reply. It may depend on the values I set to some constant scalar. I change the value now it can run longer time.

By the way, I find that the cyclic inlet and outlet patch can not accept any other type of BC (eg. zeroGradient), only cyclic for all the variables. My velocity in the inlet and outlet is cyclic. But I want to set the passive scalar fixvalue (0) for the inlet and zeroGradient for the outlet. Do you have any idea of this problem? Many thanks.
zxj160 is offline   Reply With Quote

Old   June 30, 2012, 18:49
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Mmm... I vaguely remember that OpenFOAM calls an external application when it wants to do a printStack (i.e., when it tries to do a controlled crash and show how it got where it is when it crashes)... so that's why it wants to fork()...

Have you tried one of the "directMapped*" BCs instead of "cyclic"?
sushant likes this.
__________________
wyldckat is offline   Reply With Quote

Old   July 1, 2012, 07:13
Default
  #5
Senior Member
 
Jian Zhong
Join Date: Feb 2012
Location: Birmingham
Posts: 109
Rep Power: 14
zxj160 is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Mmm... I vaguely remember that OpenFOAM calls an external application when it wants to do a printStack (i.e., when it tries to do a controlled crash and show how it got where it is when it crashes)... so that's why it wants to fork()...

Have you tried one of the "directMapped*" BCs instead of "cyclic"?
I have not tried 'directMapped' BCs but I heard about it. But I define cyclic BC for inlet and oulet in the blockMeshDict. I do not know whether the cyclic BCs accept 'directMapped' BCs. If I used 'directMapped' BCs, the inlet and outlet will still be cyclic? or those I mapped?

I know a sub-derived cyclic BC, fan ,is used like:
ad
{
type fan;
patchType cyclic;
f List<scalar> 2(10.0 -1.0);
value uniform 0;
}

I want to set
inlet
{
type fixedValue;
patchType cyclic;
value uniform 0;
}
But I remember someone said that cyclic BC can only accept cyclic and its sub-derived BC (e.g. fan BC). I do not know whether my inlet idea is correct or not.
zxj160 is offline   Reply With Quote

Old   July 1, 2012, 07:14
Default
  #6
Senior Member
 
Jian Zhong
Join Date: Feb 2012
Location: Birmingham
Posts: 109
Rep Power: 14
zxj160 is on a distinguished road
Quote:
Originally Posted by zxj160 View Post
I have not tried 'directMapped' BCs but I heard about it. But I define cyclic BC for inlet and oulet in the blockMeshDict. I do not know whether the cyclic BCs accept 'directMapped' BCs. If I used 'directMapped' BCs, the inlet and outlet will still be cyclic? or those I mapped?

I know a sub-derived cyclic BC, fan ,is used like:
ad
{
type fan;
patchType cyclic;
f List<scalar> 2(10.0 -1.0);
value uniform 0;
}

I want to set
inlet
{
type fixedValue;
patchType cyclic;
value uniform 0;
}
But I remember someone said that cyclic BC can only accept cyclic and its sub-derived BC (e.g. fan BC). I do not know whether my inlet idea is correct or not.
I also want to set
outlet
{
type zeroGradient;
patchType cyclic;
}
zxj160 is offline   Reply With Quote

Old   July 1, 2012, 16:27
Default
  #7
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi zxj160,

Unfortunately I don't know.
All I know is that the cyclic boundary condition is conceptually similar to the symmetry boundary condition, in the sense that it does everything on its own. directMapped samples the result from one end and places it in the other, having to be defined in the "polyMesh/boundary" file the particular sampling location.

And by the example you gave of the cyclic fan, it looks like you'll have to code your own BC derived from the cyclic one. This is in case the directMapped one or derived ones from then don't do what you want!

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   July 4, 2012, 11:39
Default
  #8
Senior Member
 
Jian Zhong
Join Date: Feb 2012
Location: Birmingham
Posts: 109
Rep Power: 14
zxj160 is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Hi zxj160,

Unfortunately I don't know.
All I know is that the cyclic boundary condition is conceptually similar to the symmetry boundary condition, in the sense that it does everything on its own. directMapped samples the result from one end and places it in the other, having to be defined in the "polyMesh/boundary" file the particular sampling location.

And by the example you gave of the cyclic fan, it looks like you'll have to code your own BC derived from the cyclic one. This is in case the directMapped one or derived ones from then don't do what you want!

Best regards,
Bruno
Dear Bruno,

I am trying to use directMapped. But I am new to it. I do not know how to use it. Could you explain the meaning of each keywords. The following comes from pisoFoam/pitzDailyDirectMapped in the tutorials.

blockMeshDict
inlet
{
type directMappedPatch;
offset ( 0.0495 0 0 );
sampleRegion region0;
sampleMode nearestCell;
samplePatch none;
}

0/U
boundaryField
{
inlet
{
type directMapped;
value uniform (10 0 0);
interpolationScheme cell;
setAverage true;
average (10 0 0);
}


The distance between my inlet and outlet is 30m.

Many thanks,
Jian
zxj160 is offline   Reply With Quote

Old   July 4, 2012, 19:23
Default
  #9
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Jian,

Most of the parameters here are self-explanatory. When in doubt about other options for most of those parameters: Use bananas

As for "offset", it's sort-of simple: it indicates the relative position of the other patch to look at for cell data.
  1. Imagine the layer of cells next to the outlet patch.
  2. Visualize the location of the plane that intersects the center of the cells in that layer, or at least is a plane goes through all of the relevant cells...
  3. The offset will be the relative location from the inlet patch (the origin of this referential) and that plane near the outlet.
Conceptually, it would make a lot more sense to simply state the name of the other patch and get information directly from it. Unfortunately, AFAIK OpenFOAM's infrastructure doesn't give that much freedom, so it requires this trick of referencing the cells near the other patch.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 17, 2013, 03:47
Default
  #10
Senior Member
 
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 22
Bernhard is on a distinguished road
I am experiencing the same kind of error with OpenFOAM 2.1.1 while using the mapped boundary condition. Does any of you know if there are updates on this MPI-fork() issue?

Is it safe to run with mpi_warn_on_fork switched of?
Bernhard is offline   Reply With Quote

Old   January 17, 2013, 10:08
Default
  #11
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Bernhard,

I suppose it's somewhat safe to turn off that warning, even if it's just for one test run.

But the problem might be larger, since if it's triggering that warning, that's very likely because your case is crashing. As I wrote above, the printStack method launches an application that helps diagnose where the crash occurred.

As for the direct mapped problem, it would be good to know if you're able to reproduce the same error with a simple case or a modified tutorial case!

By the way, knowing which MPI version you're using could also help! And if you're using GridEngine, do not use Open-MPI 1.5.3 that comes with OpenFOAM. Either downgrade to Open-MPI 1.4.x or upgrade to 1.6.x.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 17, 2013, 17:27
Default
  #12
Senior Member
 
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 22
Bernhard is on a distinguished road
I am quite convinced that it is not my case that is causing the error, and I will try to construct a minimal reconstructive case. If I replace mapped by fixedValue there are no issues by the way.

I am using PBS and OpenMPI 1.4.4. I don't know if there are known issues with this set-up?
Bernhard is offline   Reply With Quote

Old   January 18, 2013, 18:11
Default
  #13
Senior Member
 
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 22
Bernhard is on a distinguished road
Ok, so I re-setup my case. I did not do anything special or different from the earlier situation, but for some reason I do not encounter this issue anymore. Wyldcat, I think I have to agree with you on the printStack, but I am still a bit puzzled...
Bernhard is offline   Reply With Quote

Old   January 18, 2013, 18:50
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Bernhard,

I can't remember about any problems with PBS and Open-MPI 1.4.4, so my guess is that the problem is related to a crash.

As for things now working as intended, there are few possible scenarios:
  1. The previous "constant/polyMesh/boundary" file might have been somewhat damaged. Re-doing the case set-up might have cleared up things.
  2. Minor mesh changes or domain decomposition may affect how things are working. In particular, you might be triggering a bug related to "directMapped" that only occurs when decompositions occur in a certain way.
  3. Cluster occupancy or hardware stability might affect how OpenFOAM is operating.
Either way, without a test case, it's very hard to diagnose the problem

Best regards,
Bruno
Luttappy likes this.
__________________
wyldckat is offline   Reply With Quote

Old   January 19, 2013, 15:39
Default
  #15
Senior Member
 
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 22
Bernhard is on a distinguished road
Hi Bruno.

1. I did not change the boundary file for the new setup.
2. I tried quite a few decompositions, so I can rule that out.
3. On the cluster I've used here, I have the nodes available for myself and the system is maintained by a bunch of professionals, and I did not have any hardware stability issues ever.

Now I am rechecking things, I see that in the failed case, for some files I have both a .gz and an uncompressed file. I don't know which have been read, but probably not the intended one for some variables. Have been looking for the error in the wrong spot, as this have to be it. Although OpenFOAM should read the files according to the settings in the controlDict, but I think it does not do so.
Bernhard is offline   Reply With Quote

Old   January 19, 2013, 15:50
Default
  #16
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Bernhard,

Quote:
Originally Posted by Bernhard View Post
Although OpenFOAM should read the files according to the settings in the controlDict, but I think it does not do so.
Honestly I don't know what is OpenFOAM's reading priority, but I do know that it has to ignore "controlDict" when reading (compressed vs uncompressed only), because you might want to switch between compressed and uncompressed between time steps, so the previous state still has to be read!

But that's one interesting detail that I'll try to keep in mind: always check if there are duplicate files inside the "constant" and time folders!!

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   December 18, 2014, 16:58
Default
  #17
New Member
 
Join Date: Dec 2012
Posts: 19
Rep Power: 13
smraniaki is on a distinguished road
I've had the same problem. This is one of those problem drive me crazy! I could unreasonably solve the problem by modifying fvSolution dictionary (changing a smoother). The other time I could get rid of it by changing my decomposition scheme. I still don't know how and why it happens but apparently it comes from ghost cells that are not identifiable by MPI.

Goodluck
Smran
smraniaki is offline   Reply With Quote

Old   February 16, 2016, 09:13
Default
  #18
Senior Member
 
Illya Shevchuk
Join Date: Aug 2009
Location: Darmstadt, Germany
Posts: 176
Rep Power: 17
linch is on a distinguished road
Is there any update to this issue?
linch is offline   Reply With Quote

Old   February 21, 2016, 14:03
Default
  #19
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by linch View Post
Is there any update to this issue?
Quick answer: No one ever managed to give me more details on how to reproduce this error, therefore I didn't manage to find a solution for this .

If you can provide me with more details, then it'll be easier to diagnose and solve the problem.
If not, then try the information provided here: Notes about running OpenFOAM in parallel - specially this information:
Quote:
Is the output from mpirun (Open-MPI) only coming out at the end of the run? Check this post: mpirun openfoam output is buffered, only output at the end post #9
__________________
wyldckat is offline   Reply With Quote

Old   February 22, 2016, 05:04
Default
  #20
Senior Member
 
Illya Shevchuk
Join Date: Aug 2009
Location: Darmstadt, Germany
Posts: 176
Rep Power: 17
linch is on a distinguished road
Thanks a lot, Bruno.

I regularly received the error using OF2.1.x on our local computational cluster. But since it has moved to a new OS this weekend, I'll first have to test if the problem still persists.
linch is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
polynomial thermophysical properties II sebastian OpenFOAM Running, Solving & CFD 54 November 21, 2019 08:12
[OpenFOAM] Native ParaView Reader Bugs tj22 ParaView 270 January 4, 2016 12:39
[OpenFOAM] Saving ParaFoam views and case sail ParaView 9 November 25, 2011 16:46
CGNS lib and Fortran compiler manaliac Main CFD Forum 2 November 29, 2010 07:25
Version 15 on Mac OS X gschaider OpenFOAM Installation 113 December 2, 2009 11:23


All times are GMT -4. The time now is 12:16.