|
[Sponsors] |
OpenMPI error at the beginnin of parallel OpenFOAM Simulation |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
June 16, 2019, 08:36 |
OpenMPI error at the beginnin of parallel OpenFOAM Simulation
|
#1 |
New Member
Elias Trautner
Join Date: Jun 2019
Posts: 4
Rep Power: 7 |
Hello everyone,
I am currently using a Deep Learning Tool (Tensorflow) to access an artificial neural network during my OpenFOAM simulation. To do so, I used the C API of Tensorflow and wrote my own code. I had to include some headers and link to some shared libraries, but everything went ok, also using parallel runs with OpenMPI. However now I wanted to increase the speed of the Tensorflow usage so I compiled it from source and activated AVX support (which is allowed on my CPU). Doing so I created new headers and .so-files. However, now the following situation occured: - Before the upgrade to AVX: Both single core runs as well as parallel simulation using mpirun worked without problems - After the upgrade to AVX: Single core runs perfect and 60 % faster during the ANN usage, however if I want to use mpirun on several cores I get this error (it repeates as often as the number of cores I want to use in parallel): Code:
[node134:10568] *** Process received signal *** [node134:10568] Signal: Segmentation fault (11) [node134:10568] Signal code: Address not mapped (1) [node134:10568] Failing at address: (nil) [node134:10568] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fac03c53f20] [node134:10568] [ 1] /home/elias/OpenFOAM/elias-4.1/platforms/linux64GccDPInt32Opt/lib/libtensorflow_framework.so.1(hwloc_bitmap_and+0x14)[0x7fabe8f05534] [node134:10568] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_hwloc_base_filter_cpus+0x380)[0x7fabcccbab80] [node134:10568] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_ess_pmi.so(+0x2b4e)[0x7fabcbbe6b4e] [node134:10568] [ 4] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(orte_init+0x22e)[0x7fabccf0e1de] [node134:10568] [ 5] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_mpi_init+0x30e)[0x7fabe70a027e] [node134:10568] [ 6] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Init+0x6b)[0x7fabe70c12ab] [node134:10568] [ 7] /opt/OpenFOAM/OpenFOAM-4.1/platforms/linux64GccDPInt32Opt/lib/openmpi-system/libPstream.so(_ZN4Foam8UPstream4initERiRPPc+0x1f)[0x7fac03a0c43f] [node134:10568] [ 8] /opt/OpenFOAM/OpenFOAM-4.1/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam7argListC1ERiRPPcbbb+0x719)[0x7fac04e1aed9] [node134:10568] [ 9] tabulatedCombustionFoam(+0x279b8)[0x559e1bd079b8] [node134:10568] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fac03c36b97] [node134:10568] [11] tabulatedCombustionFoam(+0x30a0a)[0x559e1bd10a0a] [node134:10568] *** End of error message *** - Strangely if I decompose my domain to 1 subdomain and do mpirun without the -parallel tag it works again Obviously this is an issue concerning mpirun. During the compilation of Tensorflow with AVX from source (using Google's bazel tool) I had to chose whether I want MPI support. Of course I said yes, and I entered the MPI Toolkit folder just as default: /usr Now I read in this post (Problems running OpenFOAM 2.3 in parallel) that there might be a conflict between OpenFOAM and Tensorflow trying to use different OpenMPI versions. Can you help me to fix it? I have to ask here because obviously people not using OpenFOAM seem to be unable to help me with this issue. Edit: I just recognized that if I want to do ./Allwmake in opt/OpenFOAM/ThirdParty, I get: Build MPI libraries if required + cd openmpi ./Allwmake: 78: cd: can't cd to openmpi + exit 1 Last edited by tre95; June 16, 2019 at 12:40. |
|
June 16, 2019, 15:27 |
|
#2 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quick answer: In principle, you're using the same Open-MPI in the system... I'm assuming that MPICH2 is not installed at "/usr", given that mpirun gives you Open-MPI by default.
Building with another/custom Open-MPI version will not solve the issue. Since you are using OpenFOAM 4.1, it looks like you tripped over this bug: https://bugs.openfoam.org/view.php?id=2815 This bug fix was available in OpenFOAM 5, but not in 4.x. You have two choices:
__________________
|
|
June 17, 2019, 07:36 |
|
#3 |
New Member
Elias Trautner
Join Date: Jun 2019
Posts: 4
Rep Power: 7 |
Hello, thank you very much for your support!
Fortunately I did not have to make any changes (upgrade would have not been possible as 4.1 is the version used at the Institute I work at), as the error was in Tensorflow. The issue is solved here: https://github.com/tensorflow/tensorflow/issues/29838 Normally the issue should not occur any more as the Tensorflow issue was already solved and the changes were merged to Tensorflow's master. |
|
Tags |
mpi error |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
OpenFOAM 4.0 Released | CFDFoundation | OpenFOAM Announcements from OpenFOAM Foundation | 2 | October 6, 2017 06:40 |
[OpenFOAM.org] OpenFOAM 2.4.0 OpenMPI Epoll warning on parallel job | Talder | OpenFOAM Installation | 3 | November 15, 2015 13:24 |
Explicitly filtered LES | saeedi | Main CFD Forum | 16 | October 14, 2015 12:58 |
Can not run OpenFOAM in parallel in clusters, help! | ripperjack | OpenFOAM Running, Solving & CFD | 5 | May 6, 2014 16:25 |