CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

unable to run in parallel with OpenFOAM 2.2 on CentOS

Register Blogs Community New Posts Updated Threads Search

Like Tree3Likes
  • 1 Post By francescomarra
  • 1 Post By alexeym
  • 1 Post By einatlev

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 9, 2014, 23:42
Default unable to run in parallel with OpenFOAM 2.2 on CentOS
  #1
Member
 
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14
einatlev is on a distinguished road
Hello!
I have been running interFoam in serial for a little while. Now that the models got bigger, I want to run them in parallel. Machine is a 16-core CentOS. OpenFoam version 2.2. Using ThirdParty mpi.
I created a decomposition following instructions on open foam website. When trying to run mpirun I get:

Code:
lava:damBreakNoObstacle>>mpirun --hostfile machines -np 8 interFoam -parallel
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

Any ideas what I'm missing?
file "machines" holds just one line, saying:
lava cpu=8

Thanks!!
einatlev is offline   Reply With Quote

Old   June 10, 2014, 15:02
Default
  #2
Member
 
Jace
Join Date: Oct 2012
Posts: 77
Rep Power: 16
zhengzh5 is on a distinguished road
Quote:
Originally Posted by einatlev View Post
Hello!
I have been running interFoam in serial for a little while. Now that the models got bigger, I want to run them in parallel. Machine is a 16-core CentOS. OpenFoam version 2.2. Using ThirdParty mpi.
I created a decomposition following instructions on open foam website. When trying to run mpirun I get:

Code:
lava:damBreakNoObstacle>>mpirun --hostfile machines -np 8 interFoam -parallel
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

Any ideas what I'm missing?
file "machines" holds just one line, saying:
lava cpu=8

Thanks!!
hey, try:

mpirun -n 8 interFoam -parallel

I think the hostfile tag is only needed when you have the processors distributed over multiple machines.
zhengzh5 is offline   Reply With Quote

Old   June 15, 2014, 00:18
Default Thank you, but still problems...
  #3
Member
 
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14
einatlev is on a distinguished road
So strange --
Here is what I get:
Code:
lava:damBreakNoObstacle>>mpirun -np 8 interFoam -parallel[lava:11845] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 121
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[lava:11845] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file orterun.c at line 616
BUT -- when I login as another user (not SU, just another user not me), I run in parallel no problem. Strangest thing is, our environment variables related to MPI appear to be the same (I copied the other user's environment vars into mine and tried to run).

Any advice??
einatlev is offline   Reply With Quote

Old   June 15, 2014, 06:16
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

@einatlev: The following thread also addresses this issue: http://www.cfd-online.com/Forums/ope...entos-5-a.html - but conclusion on that thread was that the problem was related to the Open-MPI version being used.

But since you've tried with other user accounts and worked fine on them, here are a few questions:
  1. Is the installation common to all users? For example, is OpenFOAM installed at "/opt" or similar?
  2. In any of the accounts, is this command used, when starting up the shell environment?
    Code:
    module load openmpi-x86_64
    You can check if it's already loaded, by running:
    Code:
    module list
  3. The other possibility are the groups to which each user belongs to. You can use:
    Code:
    groups
    in your own account, or as root:
    Code:
    groups the_user_name
    to see to which groups each user belongs to.
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   June 17, 2014, 22:11
Default Maybe problem due to groups?
  #5
Member
 
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14
einatlev is on a distinguished road
Thank you Bruno.
The users all belong to one group, their own (mane of group - name of user). The ThirdParty mpi files have permissions of rwxrwxrwx, and belong to user and group 503, which I suppose is some kind of default?

OpenFOAM installation is located under /usr/local and everyone use the same installation (only one on the server).

MPI modules have been loaded properly.

Is the above information helpful?

Thanks!
Einat
einatlev is offline   Reply With Quote

Old   June 18, 2014, 04:17
Default
  #6
Senior Member
 
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 22
Bernhard is on a distinguished road
Can you show the output of
Code:
which interFoam
which mpirun
for both the users? There might be some unintended difference there?
Bernhard is offline   Reply With Quote

Old   June 19, 2014, 13:50
Default paths for both users
  #7
Member
 
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14
einatlev is on a distinguished road
For myself:
Code:
lava:~>>which mpirun
/usr/lib64/openmpi/bin/mpirun
lava:~>>which interFoam
/usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/bin/interFoam
lava:~>>
For second user:
Code:
[taylorredmond@lava ~]$ which mpirun
/usr/lib64/openmpi/bin/mpirun
[taylorredmond@lava ~]$ which interFoam
/usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64Gcc45DPOpt/bin/interFoam
So it's the same as far as I can tell... Other ideas?
thanks!!!
einatlev is offline   Reply With Quote

Old   June 20, 2014, 07:29
Default
  #8
Member
 
Franco Marra
Join Date: Mar 2009
Location: Napoli - Italy
Posts: 69
Rep Power: 17
francescomarra is on a distinguished road
Dear Einat,

I experienced the frustration of getting everithing working OK in a parallel environment, and I know it is not easy. I am not a ICT specialist, but looking for errors for several long times, at least I am now experienced to find differences: the two outputs are not equal !

Looks better (cut&paste from your mail):
/usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/bin/interFoam
/usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64Gcc45DPOpt/bin/interFoam
So you got two characters, 45, more in the taylorredmond version.

I can suspect the two binary have been built under a different environment setup, but I am not sure. Surely Bruno is much more skilled than me on this.

I would suggest to compare your OpenFOAM baschrc with that of your collegue, as well as the Open-MPI version and gcc compiler version loaded by the shells of the two users, as Bruno already suggested.

Hoping it helps.

Regards,
Franco
wyldckat likes this.
francescomarra is offline   Reply With Quote

Old   June 20, 2014, 08:33
Default
  #9
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,938
Rep Power: 39
alexeym has a spectacular aura aboutalexeym has a spectacular aura about
Send a message via Skype™ to alexeym
Hi,

Quote:
Originally Posted by einatlev View Post
Code:
lava:~>>which mpirun
/usr/lib64/openmpi/bin/mpirun
lava:~>>which interFoam
/usr/local/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/bin/interFoam
lava:~>>
In the first post you've said OpenFOAM use ThirdParty OpenMPI while location of mpirun suggests you're using system-wide installed OpenMPI. Which one is actually used to build OpenFOAM?

What is a contents of machines file? Do you run solver on a single node or on multiple nodes?
jmartrubio likes this.
alexeym is offline   Reply With Quote

Old   June 26, 2014, 01:24
Default You found the problem!
  #10
Member
 
einat
Join Date: Jul 2012
Posts: 31
Rep Power: 14
einatlev is on a distinguished road
Thanks for pointing out the "45" characters. Tis really lead me to finding the problem. Turns out I needed to define the following:

Code:
module load openmpi-x86_64 || export PATH=$PATH:/usr/lib64/openmpi/bin

source /usr/local/OpenFOAM/OpenFOAM-2.2.x/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=system WM_COMPILER=Gcc45 WM_MPLIB=SYSTEMOPENMPI
and now it works for all users, including myself.
wyldckat likes this.

Last edited by wyldckat; June 28, 2014 at 14:45. Reason: fixed broken code marker
einatlev is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[snappyHexMesh] OpenFOAM parallel run for Channel Flow Dmoore OpenFOAM Meshing & Mesh Conversion 0 June 10, 2013 16:08
Script to Run Parallel Jobs in Rocks Cluster asaha OpenFOAM Running, Solving & CFD 12 July 4, 2012 23:51
SnappyHexMesh OF-1.6-ext crashes on a parallel run norman1981 OpenFOAM Bugs 5 December 7, 2011 13:48
Cross-compiling OpenFOAM 1.7.0 on Linux for Windows 32 and 64bits with Mingw-w64 wyldckat OpenFOAM Announcements from Other Sources 3 September 8, 2010 07:25
[Commercial meshers] ST_Malloc: out of memory.malloc_storage: unable to malloc Velocity SA, cfdproject OpenFOAM Meshing & Mesh Conversion 0 April 14, 2009 16:45


All times are GMT -4. The time now is 16:42.