|
[Sponsors] |
October 17, 2008, 05:44 |
Description:
Lately I have se
|
#1 |
Member
Niklas Wikstrom
Join Date: Mar 2009
Posts: 86
Rep Power: 17 |
Description:
Lately I have several times been running into the followin problem. It is repeatable with the same case on two different hardware archs and with icc and gcc compilers: During Shell refinement iteration (>1) an MPI-error occur: [dagobah:01576] *** An error occurred in MPI_Bsend [dagobah:01576] *** on communicator MPI_COMM_WORLD [dagobah:01576] *** MPI_ERR_BUFFER: invalid buffer pointer [dagobah:01576] *** MPI_ERRORS_ARE_FATAL (goodbye) Problem does not occur using e.g. 2 processes. Solver/Application: (name of application. If one of your own track down where the problem is inside OpenFOAM) Testcase: snappyHexMesh-coarse.tgz Platform: linux i686 and X86-64, The latter tested with gcc and icc Version: 1.5.x (2008-10-10) Notes: To run: blockMesh decomposePar foamJob -p -s snappyHexMesh Cheers Niklas |
|
October 17, 2008, 12:40 |
Have you checked for 'if (user
|
#2 |
Senior Member
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,419
Rep Power: 26 |
Have you checked for 'if (user() == "Niklas")' ;-)
Your problem should work with 17-10 onwards version. Fixed some unsynchronised communications. |
|
October 19, 2008, 15:51 |
;-) you're great Mattijs. Than
|
#3 |
Member
Niklas Wikstrom
Join Date: Mar 2009
Posts: 86
Rep Power: 17 |
;-) you're great Mattijs. Thanks, and I'm pulling allready. Testing tomorrow morning.
Really promising tool by the way, snappy! Awed. And it got me into learning Blender more, as well, which is fun. /Niklas |
|
October 20, 2008, 08:35 |
Sorry Mattijs,
did not do i
|
#4 |
Member
Niklas Wikstrom
Join Date: Mar 2009
Posts: 86
Rep Power: 17 |
Sorry Mattijs,
did not do it. Same situation as before, it seems. Does it work with user()== "Mattijs" ? ;-) |
|
October 20, 2008, 15:20 |
It worked for me on 17/10 when
|
#5 |
Senior Member
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,419
Rep Power: 26 |
It worked for me on 17/10 when I pushed those changes in. Didn't use foamJob but that is about the only difference.
|
|
October 20, 2008, 17:46 |
didnt work for me either.
I
|
#6 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
didnt work for me either.
I see there's a new version of openmpi out 1.2.8 Im using 1.2.6 still. Think that could be it? Will try 1.2.8 tomorrow and see if it matters. N |
|
October 21, 2008, 04:46 |
nope, same problem...
|
#7 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
nope, same problem...
|
|
October 21, 2008, 05:12 |
Hadn't expected mpi version to
|
#8 |
Senior Member
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,419
Rep Power: 26 |
Hadn't expected mpi version to matter. Where does it go wrong? Can you post log or run it in separate windows (e.g. using mpirunDebug) and get a traceback? What are your Pstream settings (non-blocking?)
|
|
October 21, 2008, 07:17 |
Dont know what my Pstream sett
|
#9 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
Dont know what my Pstream settings are.
Where do I check/change that? N |
|
October 21, 2008, 07:23 |
etc/controldict
Mine is non
|
#10 |
Member
Niklas Wikstrom
Join Date: Mar 2009
Posts: 86
Rep Power: 17 |
etc/controldict
Mine is nonBlocking, but I earlier tried with blocking. No difference. |
|
October 21, 2008, 07:36 |
nvm,
its nonBlocking.
Here's
|
#11 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
nvm,
its nonBlocking. Here's a log from mpirunDebug Im trying different setting now. Shell refinement iteration 2 ---------------------------- Marked for refinement due to refinement shells : 338416 cells. Determined cells to refine in = 0.74 s Selected for internal refinement : 339745 cells (out of 736740) Edge intersection testing: Number of edges : 9527741 Number of edges to retest : 8366794 Number of intersected edges : 63397 Refined mesh in = 20.68 s After refinement shell refinement iteration 2 : cells:3114955 faces:9495395 points:3267682 Cells per refinement level: 0 19318 1 8320 2 19344 3 2848317 4 155933 5 63723 Program received signal SIGHUP, Hangup. [Switching to Thread 182941472640 (LWP 15999)] 0x0000002a98140c92 in opal_progress () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0 #0 0x0000002a98140c92 in opal_progress () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0 #1 0x0000002a9a57f0f5 in mca_pml_ob1_probe () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/openmpi/mca_pml_ob1.so #2 0x0000002a97e9cd86 in MPI_Probe () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libmpi.so.0 #3 0x0000002a979babd1 in Foam::IPstream::IPstream () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/openmpi -1.2.8/libPstream.so #4 0x0000002a963a512f in Foam::fvMeshDistribute::receiveMesh () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so #5 0x0000002a963a7a9b in Foam::fvMeshDistribute::distribute () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so #6 0x0000002a95762e65 in Foam::meshRefinement::refineAndBalance () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so #7 0x0000002a95702e7b in Foam::autoRefineDriver::shellRefine () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so #8 0x0000002a9570394f in Foam::autoRefineDriver::doRefine () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so #9 0x0000000000406357 in main () (gdb) Hangup detected on fd 0 Error detected on fd 0 |
|
October 21, 2008, 07:41 |
scheduled, blocking or nonBloc
|
#12 |
Member
Niklas Wikstrom
Join Date: Mar 2009
Posts: 86
Rep Power: 17 |
scheduled, blocking or nonBlocking gives exactly the same result.
/N FYI: I have problems running mpirunDebug. Same issues I've had earlier about OF shell scripts running /bin/sh: In ubuntu /bin/sh -> /bin/dash. (If I change the shebang to /bin/bash it usually works, though.) Does Suse link /bin/sh to bash? I can fix this locally. Just wanted you to know. /Niklas |
|
October 21, 2008, 08:43 |
Hi Niklas,
yes on SuSE /bin
|
#13 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,714
Rep Power: 40 |
Hi Niklas,
yes on SuSE /bin/sh -> bash, but I thought most of the OpenFOAM scripts were POSIX-compliant (except for wmakeScheduler, which uses bash). Could you be hitting this? https://bugs.launchpad.net/debian/+s...89/+viewstatus |
|
October 21, 2008, 11:06 |
No, that one seem to be solved
|
#14 |
Member
Niklas Wikstrom
Join Date: Mar 2009
Posts: 86
Rep Power: 17 |
No, that one seem to be solved. One example problem is in mpirunDebug, line 130. Specifically this does not work
#!/bin/dash nProcs=4 for ((proc=0; proc<$nProcs; proc++)) do echo $proc done But result in Syntax error: Bad for loop variable. With /bin/bash it works, though. |
|
October 21, 2008, 13:39 |
What is MPI_BUFFER_SIZE set to
|
#15 |
Senior Member
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,419
Rep Power: 26 |
What is MPI_BUFFER_SIZE set to? I am running with 200000000 or even bigger. It transfers whole sections of the mesh across so might run into problems with small buffer. Had hoped mpi would give nice error message though :-(
You cannot run with fulldebug on it - ordering problem in constructing the patches with the new mesh. |
|
October 22, 2008, 04:17 |
Let me buy you a beer http://w
|
#16 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
Let me buy you a beer
setting MPI_BUFFER_SIZE to 2000000000 (thats 9 zeros) solved it for me (plus changing my username to Bob the superior builder ;)) for those who need to look like I did its in etc/settings.(c)sh |
|
November 26, 2008, 05:06 |
Hi there,
I have been tryi
|
#17 |
Member
Leonardo Honfi Camilo
Join Date: Mar 2009
Location: Delft, Zuid Holland, The Netherlands
Posts: 60
Rep Power: 17 |
Hi there,
I have been trying to run a snappyHexMesh in parallel with no success at all. I am trying to run a case called "simplecase" using the following command from one directory above the case directory(I am using a quad core 32 bit machine): "mpirun -np 4 snappyHexMesh -case simplecase -parallel " then I get: Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:stream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so" #3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" #8 __libc_start_main in "/lib/libc.so.6" #9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" [3] [3] [3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&) [3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349. [3] FOAM parallel run aborting [3] [openfoam01:05499] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 mpirun noticed that job rank 0 with PID 5496 on node openfoam01 exited on signal 15 (Terminated). 2 additional processes aborted (not shown) alternatively if If I cd to the case directory I and use the command foamJob -p -s snappyHexMesh I get this slightly bigger error message: #1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so" #3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" #8 __libc_start_main in "/lib/libc.so.6" #9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" [1] [1] [1] From function hexRef8::setRefinement(const labelList&, polyTopoChange&) [1] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349. [1] FOAM parallel run aborting [1] [openfoam01:04820] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1 [3] [3] [3] cell 217431 of level 0 uses more than 8 points of equal or lower level Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:stream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so" #3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" #8 __libc_start_main in "/lib/libc.so.6" #9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" [3] [3] [3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&) [3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349. [3] FOAM parallel run aborting [3] [openfoam01:04825] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 I have tried following the advice above and increasing the MPI_BUFFER_SIZE, but that did not help. please help thanks in advance leo |
|
November 26, 2008, 05:52 |
Did you try running the 1.5.x
|
#18 |
Senior Member
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,419
Rep Power: 26 |
Did you try running the 1.5.x git repository? There are various fixes in there relating to snappyHexMesh.
|
|
November 26, 2008, 06:55 |
I did not, I will try that rig
|
#19 |
Member
Leonardo Honfi Camilo
Join Date: Mar 2009
Location: Delft, Zuid Holland, The Netherlands
Posts: 60
Rep Power: 17 |
I did not, I will try that right away, although I have dowloaded OF-1.5 3 weeks ago into this machine.
In any case I tried running the case with 2 processor and surprisingly enough it went a lot further without breaking until I got this message: Setting up information for layer truncation ... mpirun noticed that job rank 1 with PID 7153 on node openfoam01 exited on signal 15 (Terminated). 1 process killed (possibly by Open MPI) [leo@openfoam01 simplecase]$ exited on signal 15 (Terminated) I am still a little bit in the dark about why this keeps breaking, but I will try the git repos anyway. thanks for the suggestion though regards leo |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[snappyHexMesh] SnappyHexMesh in Parallel | bastil | OpenFOAM Meshing & Mesh Conversion | 22 | April 7, 2010 12:48 |
Parallel case setup boundry conditions snappyhexmesh | oskar | OpenFOAM Pre-Processing | 5 | September 11, 2009 02:12 |
[snappyHexMesh] SnappyHexMesh in parallel openmpi | wikstrom | OpenFOAM Meshing & Mesh Conversion | 7 | November 24, 2008 10:52 |
OpenMPI performance | vega | OpenFOAM Running, Solving & CFD | 13 | November 27, 2007 02:28 |
Cant run in parallel on two nodes using OpenMPI | CHristofer | Main CFD Forum | 0 | October 26, 2007 10:54 |