|
[Sponsors] |
unable to run in parallel with OpenFOAM 2.2 on CentOS |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
June 9, 2014, 23:42 |
unable to run in parallel with OpenFOAM 2.2 on CentOS
|
#1 |
Member
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14 |
Hello!
I have been running interFoam in serial for a little while. Now that the models got bigger, I want to run them in parallel. Machine is a 16-core CentOS. OpenFoam version 2.2. Using ThirdParty mpi. I created a decomposition following instructions on open foam website. When trying to run mpirun I get: Code:
lava:damBreakNoObstacle>>mpirun --hostfile machines -np 8 interFoam -parallel -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_init failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- Any ideas what I'm missing? file "machines" holds just one line, saying: lava cpu=8 Thanks!! |
|
June 10, 2014, 15:02 |
|
#2 | |
Member
Jace
Join Date: Oct 2012
Posts: 77
Rep Power: 16 |
Quote:
mpirun -n 8 interFoam -parallel I think the hostfile tag is only needed when you have the processors distributed over multiple machines. |
||
June 15, 2014, 00:18 |
Thank you, but still problems...
|
#3 |
Member
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14 |
So strange --
Here is what I get: Code:
lava:damBreakNoObstacle>>mpirun -np 8 interFoam -parallel[lava:11845] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 121 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [lava:11845] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file orterun.c at line 616 Any advice?? |
|
June 15, 2014, 06:16 |
|
#4 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
@einatlev: The following thread also addresses this issue: http://www.cfd-online.com/Forums/ope...entos-5-a.html - but conclusion on that thread was that the problem was related to the Open-MPI version being used. But since you've tried with other user accounts and worked fine on them, here are a few questions:
Bruno
__________________
|
|
June 17, 2014, 22:11 |
Maybe problem due to groups?
|
#5 |
Member
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14 |
Thank you Bruno.
The users all belong to one group, their own (mane of group - name of user). The ThirdParty mpi files have permissions of rwxrwxrwx, and belong to user and group 503, which I suppose is some kind of default? OpenFOAM installation is located under /usr/local and everyone use the same installation (only one on the server). MPI modules have been loaded properly. Is the above information helpful? Thanks! Einat |
|
June 18, 2014, 04:17 |
|
#6 |
Senior Member
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 22 |
Can you show the output of
Code:
which interFoam which mpirun |
|
June 19, 2014, 13:50 |
paths for both users
|
#7 |
Member
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14 |
For myself:
Code:
lava:~>>which mpirun /usr/lib64/openmpi/bin/mpirun lava:~>>which interFoam /usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/bin/interFoam lava:~>> Code:
[taylorredmond@lava ~]$ which mpirun /usr/lib64/openmpi/bin/mpirun [taylorredmond@lava ~]$ which interFoam /usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64Gcc45DPOpt/bin/interFoam thanks!!! |
|
June 20, 2014, 07:29 |
|
#8 |
Member
Franco Marra
Join Date: Mar 2009
Location: Napoli - Italy
Posts: 70
Rep Power: 17 |
Dear Einat,
I experienced the frustration of getting everithing working OK in a parallel environment, and I know it is not easy. I am not a ICT specialist, but looking for errors for several long times, at least I am now experienced to find differences: the two outputs are not equal ! Looks better (cut&paste from your mail): /usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/bin/interFoam /usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64Gcc45DPOpt/bin/interFoam So you got two characters, 45, more in the taylorredmond version. I can suspect the two binary have been built under a different environment setup, but I am not sure. Surely Bruno is much more skilled than me on this. I would suggest to compare your OpenFOAM baschrc with that of your collegue, as well as the Open-MPI version and gcc compiler version loaded by the shells of the two users, as Bruno already suggested. Hoping it helps. Regards, Franco |
|
June 20, 2014, 08:33 |
|
#9 | |
Senior Member
|
Hi,
Quote:
What is a contents of machines file? Do you run solver on a single node or on multiple nodes? |
||
June 26, 2014, 01:24 |
You found the problem!
|
#10 |
Member
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14 |
Thanks for pointing out the "45" characters. Tis really lead me to finding the problem. Turns out I needed to define the following:
Code:
module load openmpi-x86_64 || export PATH=$PATH:/usr/lib64/openmpi/bin source /usr/local/OpenFOAM/OpenFOAM-2.2.x/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=system WM_COMPILER=Gcc45 WM_MPLIB=SYSTEMOPENMPI Last edited by wyldckat; June 28, 2014 at 14:45. Reason: fixed broken code marker |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[snappyHexMesh] OpenFOAM parallel run for Channel Flow | Dmoore | OpenFOAM Meshing & Mesh Conversion | 0 | June 10, 2013 16:08 |
Script to Run Parallel Jobs in Rocks Cluster | asaha | OpenFOAM Running, Solving & CFD | 12 | July 4, 2012 23:51 |
SnappyHexMesh OF-1.6-ext crashes on a parallel run | norman1981 | OpenFOAM Bugs | 5 | December 7, 2011 13:48 |
Cross-compiling OpenFOAM 1.7.0 on Linux for Windows 32 and 64bits with Mingw-w64 | wyldckat | OpenFOAM Announcements from Other Sources | 3 | September 8, 2010 07:25 |
[Commercial meshers] ST_Malloc: out of memory.malloc_storage: unable to malloc Velocity SA, | cfdproject | OpenFOAM Meshing & Mesh Conversion | 0 | April 14, 2009 16:45 |