|
[Sponsors] |
December 4, 2018, 10:23 |
Segmentation fault with mpi
|
#1 |
New Member
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 8 |
Hi there,
I got an annoying problem with mpi. Everytime I run OpenFOAM parallel with mpi I get a Segmentation fault after a randome time. This can be 7 minutes, 1 hour or even 1 day. If I run the same case seriell everything is fine. I'm using an own version of "sonicLiquidFoam" to be able to use waveTransmissive as a BC. But the same error occurred with other solvers like rhoPimpleFoam. My geometry is a long rectangle with a sphere in the middle. The error is: Code:
[19] #0 Foam::error::printStack(Foam::Ostream&) at ??:? [19] #1 Foam::sigSegv::sigHandler(int) at ??:? [19] #2 ? in "/lib64/libc.so.6" [19] #3 ? at btl_vader_component.c:? [19] #4 opal_progress in "/home/j/j_voss/anaconda3/envs/foam/lib/./libopen-pal.so.40" [19] #5 ompi_request_default_wait_all in "/home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40" [19] #6 PMPI_Waitall in "/home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40" [19] #7 Foam::UPstream::waitRequests(int) at ??:? [19] #8 Foam::GeometricField<double, Foam::fvPatchField, Foam::volMesh>::Boundary::evaluate() at ??:? [19] #9 Foam::tmp<Foam::GeometricField<double, Foam::fvPatchField, Foam::volMesh> > Foam::fvc::surfaceSum<double>(Foam::GeometricField<double, Foam::fvsPatchField, Foam::surfaceMesh> const&) at ??:? [19] #10 ? at ??:? [19] #11 ? at ??:? [19] #12 __libc_start_main in "/lib64/libc.so.6" [19] #13 ? at ??:? [r08n11:323145] *** Process received signal *** [r08n11:323145] Signal: Segmentation fault (11) [r08n11:323145] Signal code: (-6) [r08n11:323145] Failing at address: 0x9979800004ee49 [r08n11:323145] [ 0] /lib64/libc.so.6(+0x362f0)[0x2b4c959052f0] [r08n11:323145] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b4c95905277] [r08n11:323145] [ 2] /lib64/libc.so.6(+0x362f0)[0x2b4c959052f0] [r08n11:323145] [ 3] /home/j/j_voss/anaconda3/envs/foam/lib/openmpi/mca_btl_vader.so(+0x41ec)[0x2b4ca902f1ec] [r08n11:323145] [ 4] /home/j/j_voss/anaconda3/envs/foam/lib/./libopen-pal.so.40(opal_progress+0x2c)[0x2b4c9b0af32c] [r08n11:323145] [ 5] /home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40(ompi_request_default_wait_all+0xed)[0x2b4c98cf8e5d] [r08n11:323145] [ 6] /home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40(PMPI_Waitall+0x9f)[0x2b4c98d3497f] [r08n11:323145] [ 7] /home/j/j_voss/foam/OpenFOAM-v1712/platforms/linux64GccDPInt32Opt/lib/openmpi-system/libPstream.so(_ZN4Foam8UPstream12waitRequestsEi+0x85)[0x2b4c95ca2a75] [r08n11:323145] [ 8] TestSonicLiquidFoam(_ZN4Foam14GeometricFieldIdNS_12fvPatchFieldENS_7volMeshEE8Boundary8evaluateEv+0x1ba)[0x439a5a] [r08n11:323145] [ 9] TestSonicLiquidFoam(_ZN4Foam3fvc10surfaceSumIdEENS_3tmpINS_14GeometricFieldIT_NS_12fvPatchFieldENS_7volMeshEEEEERKNS3_IS4_NS_13fvsPatchFieldENS_11surfaceMeshEEE+0x2cf)[0x475eaf] [r08n11:323145] [10] TestSonicLiquidFoam[0x475fb7] [r08n11:323145] [11] TestSonicLiquidFoam[0x427a7d] [r08n11:323145] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b4c958f1445] [r08n11:323145] [13] TestSonicLiquidFoam[0x42a83a] [r08n11:323145] *** End of error message *** ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 19 with PID 0 on node r08n11 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- segmentation fault--parrallel problem? but was not able to find a solution. I tried different decompostion methods (like scotch or hierarchical) but both ends in this kind of error. Also the place where the error occures is sometimes different. For example the same error just with Code:
[132] #9 Foam::fv::gaussGrad<double>::gradf(Foam::GeometricField<double, Foam::fvsPatchField, Foam::surfaceMesh> const&, Foam::word const&) at ??:? I also checked the new version 1806 from OpenFOAM but the error stays the same. I installed my version on a cluster via anaconda and from source like in https://www.openfoam.com/download/install-source.php . I'm thankful for every kind of help. Best regards Johannes |
|
December 12, 2018, 07:47 |
|
#2 |
New Member
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 8 |
It seems to be that something went wrong with the installation.
The choosen MPI version was SYSTEMOPENMPI in "WM_MPLIB=SYSTEMOPENMPI" in the etc/bashrc file which is using the system installed openmpi. When I change this to "WM_MPLIB=OPENMPI" and in /etc/config.sh/mpi the line under OPENMPI to "export FOAM_MPI=openmpi-1.10.4" and install this OpenMpi version as ThirdParty everything works fine. |
|
December 14, 2018, 16:47 |
|
#3 |
Member
Cyrille Bonamy
Join Date: Mar 2015
Location: Grenoble, France
Posts: 86
Rep Power: 11 |
You may have a conflict with your condo installation ...
In the error log, there is a link to your conda installation. Systemopenmpi works well for me. But I do not combine conda and openfoam (/home/j/j_voss/anaconda3/.....libmpi.so.40) ;-) |
|
December 17, 2018, 13:09 |
|
#4 |
New Member
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 8 |
You are right, that seems to be the reason. I think Systemopenmpi choose a not right working mpi version (or at least not for OpenFOAM) installed with conda.
|
|
February 21, 2019, 04:31 |
|
#5 |
Senior Member
Lukas Fischer
Join Date: May 2018
Location: Germany, Munich
Posts: 117
Rep Power: 8 |
Hi Johannis,
I am having a similar issue but I am not using a condo installation. Which version of openMPI defined in SYSTEMOPENMPI caused your problem? Last edited by lukasf; February 21, 2019 at 04:32. Reason: wrong thread |
|
February 21, 2019, 06:07 |
|
#6 |
New Member
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 8 |
Hi Lukas,
since in the error message the openmpi version of conda is called I think it should be Open MPI: 3.1.0. But even if you don't use conda to install OpenFOAM you should be able to choose the Open MPI version in the installation. So you could choose perhaps also openmpi-1.10.4. |
|
February 21, 2019, 07:07 |
|
#7 |
Senior Member
Lukas Fischer
Join Date: May 2018
Location: Germany, Munich
Posts: 117
Rep Power: 8 |
I am not sure if it is ok to use OpenFoam4.1 (OF4.1) with a different openMPI version other than which has been used to compile it.
This is my issue (similiar to yours): I am using OF4.1 on a cluster. I wanted to compile it with the newest openmpi version 4.0.0. This is not possible ("ptscotch.H" cannot be found when I try to compile the thirdparty folder). I switched to version 3.1.3. which allows me to compile OF4.1. Now I have an OpenFoam Version which is compiled with 3.1.3. When I run simulations in parallel they will crash (floating point exception). Those simulations have run in the past on a different cluster with openMPI 1.6.5. Unfortunalety, it is not possible to use this version on the new cluster. What I did to bypass this problem was to source the openmpi version 4.0.0. Now I am using OF4.1 (compiled with the openpi version 3.1.3) but with a sourced openmpi version 4.0.0. The old simulations work in parallel now without a problem. I tried to run a different case now and the segmentation fault arises. I improved the mesh to reduce the possibility that this is reason for the issue. CheckMesh does not fail (I know that the mesh can still be an issue though). I used scotch to decompose my case. I will try now to use axial decomposition and see if it crashes at the same time. |
|
February 26, 2019, 09:54 |
|
#8 |
Senior Member
Lukas Fischer
Join Date: May 2018
Location: Germany, Munich
Posts: 117
Rep Power: 8 |
I was able to compile OF4.1 with Open-MPI 1.10.7 and I can also run it with it.
The segmentation fault still arises. I tried the same case in parallel with a different number of processors. The case crashes with a higher number of processors (e.g. 240 or 280) after some time (the time differs). The case runs with a lower number of processors (e.g. 168) without crashing and reaches a higher endtime. This is too slow for me though. Right now I think, that it has nothing to do with the MPI Versions with which I compiled and sourced OpenFOAM. I would still be interested in any opinion. Last edited by lukasf; March 1, 2019 at 05:02. Reason: updated my post |
|
October 26, 2019, 07:01 |
|
#9 | |
Member
Anonymous
Join Date: Aug 2016
Posts: 75
Rep Power: 10 |
Quote:
Did you find the solution to this issue? I am facing a similar problem in my modified pimpleFoam for scalar transport. My simulations also crash abruptly and then when I restart them, they run finely again until next abrupt mpi exit. |
||
October 27, 2019, 12:50 |
|
#10 |
Senior Member
Lukas Fischer
Join Date: May 2018
Location: Germany, Munich
Posts: 117
Rep Power: 8 |
Hi,
I am able to run pimpleFoam without problems with openMPI1.6.5. I have compiled openfoam 4.1 while I had sourced openMPI1.6.5. While running OF 4.1 make sure to use openMPI1.6.5 as well. Content of my sourced bashrc: export PATH="/opt/openmpi/1.6.5/gcc/bin/:${PATH}" #source openfoam source /home/lukas/openFoam41_openMPI1.6.5/OpenFOAM-4.1/etc/bashrc Content of my cluster runscript: # load MPI export PATH="/opt/openmpi/1.6.5/gcc/bin/:${PATH}" export LD_LIBRARY_PATH="/opt/openmpi/1.6.5/gcc/lib/:${LD_LIBRARY_PATH}" # source openfoam 4.1 for centos7 . /home/lukas/openFoam41_openMPI1.6.5 |
|
Tags |
mpi error, parallel error, segmentaion fault |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
mpi run problem signal 11 (segmentation fault) FGM simulation | Fedindras | CONVERGE | 1 | October 26, 2017 17:05 |
Segmentation fault in SU2 V5.0 | ygd | SU2 | 2 | March 1, 2017 05:38 |
Segmentation fault when running in parallel | Pj. | OpenFOAM Running, Solving & CFD | 3 | April 8, 2015 09:12 |
segmentation fault when installing OF-2.1.1 on a cluster | Rebecca513 | OpenFOAM Installation | 9 | July 31, 2012 16:06 |
Error using LaunderGibsonRSTM on SGI ALTIX 4700 | jaswi | OpenFOAM | 2 | April 29, 2008 11:54 |