CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Installation

Unable to run OpenFOAM 1.6-ext in parallel with more than one machine

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 1 Post By wyldckat
  • 1 Post By wyldckat

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 24, 2014, 18:35
Default Unable to run OpenFOAM 1.6-ext in parallel with more than one machine
  #1
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Dear all

I'm facing a problem running in parallel in multi node. I can run in one node without any problem.

I am receiving this error all the time:

Quote:
error: executing task of job 1964 failed: execution daemon on host "compute-0-9.local" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 21646) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
I have searched but its not solved yet.
I should say that I didn't receive no error installing openfoam1.6 ext.
http://sourceforge.net/p/openfoam-ex...ci/1.6.1/tree/

other users are using openfoam 2.x and they don't have my problem.

will be too much thankful to have your helps


mehdi

Last edited by mm.abdollahzadeh; January 24, 2014 at 19:44.
mm.abdollahzadeh is offline   Reply With Quote

Old   January 24, 2014, 20:05
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings mm.abdollahzadeh,

Not much information to work with... therefore I'll have to ask a few questions:
  1. If you are running in more than one machine, did you install 1.6-ext on all of the machines you want to run?
    1. And did you install on the same exact path on all of them?
  2. What is the exact and complete command line you are using for running in parallel?
  3. Which Linux Distribution are you using?
  4. What installation instructions have you followed?
  5. How are you keeping OpenFOAM versions apart from each other?
Best regards,
Bruno
mm.abdollahzadeh likes this.
__________________
wyldckat is offline   Reply With Quote

Old   January 24, 2014, 20:25
Default
  #3
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Many thanks Burono

I have installed the openfoam in my folder, which is sharded on all nodes.
we are using ROCKs.
and below is the commands that I use

Quote:

#!/bin/bash
#$ -S /bin/bash
#
#
# Set the Parallel Environment and number of procs.
#$ -pe mpi1 2
#
# The job will run in the actual directory
#$ -cwd
#
# Define the name of the job (name that will be displayed)
#$ -N putWhatEverYouWant
#
# Set your job output file
#$ -o log
#
# Set your job error file
#$ -e error.err
#
# Set the priority, default value is 0 (no priority)
#$ -p 0
#
# Set the email to receive news from the job
#$ -M myemail@...
#$ -m bea
#
#
# Put your Job commands here.
#------------------------------------------------

. $HOME/OpenFOAM/OpenFOAM-1.6-ext/etc/bashrc

# Defining openmpi parameters (dont change, this should be ok)
ARGS="--mca btl ^openib --mca btl_tcp_if_include eth0"


# Solver (the solver that you will run)
SOLVER=mysolver

mpirun -np $NSLOTS $ARGS $SOLVER -parallel

I have installed the openfoam 1.6ext from http://sourceforge.net/p/openfoam-ex...ci/1.6.1/tree/

and the instruction from
http://sourceforge.net/p/openfoam-ex..._5.5_64bit.txt


best
Mehdi
mm.abdollahzadeh is offline   Reply With Quote

Old   January 24, 2014, 20:42
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Mehdi,

If my diagnosis is correct, the problem is that the cluster does not accept using the customized Open-MPI installation that was done by default by the 1.6-ext installation scripts/instructions. You must use the MPI that the cluster already has got.

Go into the where 1.6-ext is installed and edit the file "etc/prefs.sh", search for these lines:
Code:
#export WM_MPLIB=SYSTEMOPENMPI
#export OPENMPI_DIR=path_to_system_installed_openmpi
#export OPENMPI_BIN_DIR=$OPENMPI_DIR/bin
And remove the # from the 3 lines.
But now there is a difficult problem on this case... figuring out what is the folder for the Open-MPI installation the cluster is using, so that you can define it in the variable "OPENMPI_DIR", namely in the second line.

I'm too tired right now to estimate what path it might be in, so I suggest that you ask someone you know that works with that cluster.

Once you know the path to the MPI toolbox, replace "path_to_system_installed_openmpi" for the path, e.g. "/opt/openmpi-1.6.2", save and close the file. Then start a new terminal and run Allwmake again in the main 1.6-ext folder.
Once it's finished, and hopefully it does it with success, try running in parallel once again.

Best regards,
Bruno
mm.abdollahzadeh likes this.
__________________
wyldckat is offline   Reply With Quote

Old   January 25, 2014, 20:07
Default
  #5
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Dear Burno

Its some how solved but not completely.
Now my cases are start to run. however there are still problems.
even if for example I choose 30 processors ( no matter if work with orte or mpi) it starts running the case in one machine ( which just has 12 processor) !!

I mention that other users are still running with out problem.





best
mahdi
mm.abdollahzadeh is offline   Reply With Quote

Old   January 26, 2014, 07:09
Default
  #6
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Mahdi,

I can't see what you're seeing, unless you share it somehow (text or pictures).

And are you certain that everything is properly compiled? What do these commands give you:
Code:
which mpirun
which mpicc
echo $FOAM_MPI_LIBBIN
ls -l $FOAM_MPI_LIBBIN
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 26, 2014, 09:10
Default
  #7
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Dear Bruno

May thanks for your replay.

here is the out put of the commands:

which mpirun

Quote:
/opt/openmpi/bin/mpirun
which mpicc

Quote:
/opt/openmpi/bin/mpicc
echo $FOAM_MPI_LIBBIN

Quote:
/home/mahdi/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64Gcc44DPOpt/openmpi-system
ls -l $FOAM_MPI_LIBBIN
Quote:
total 144
-rwxrwxr-x 1 mahdi mahdi 93590 Jan 25 23:14 libparMetisDecomp.so
-rwxrwxr-x 1 mahdi mahdi 46052 Jan 25 23:14 libPstream.so
Here is the monitor of the cluster, there was three nodes free so I tested to run a task with 30 processors ( each node 10 processors )

Untitled.jpg

it starts to run. I can see in master that

Quote:
user=mahdi, P=30, state=Running, started=1390741536.0, name=putWhatEverYouWant, nodes=10*compute-0-%d:3-3,7-7,9-9
however it is just running in node 9

Untitledw.jpg

best Regards
mahdi
mm.abdollahzadeh is offline   Reply With Quote

Old   January 26, 2014, 09:25
Default
  #8
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Mahdi,

Do you have another version of OpenFOAM, such as 2.2.2 or any other? And are you able to use it yourself on more than one machine?

In addition, are you certain you are using the job script properly? Some job scripts need to specify the machines where to run.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 26, 2014, 09:35
Default
  #9
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Dear Bruno

unfortunately, I don't have. but other users have openfoam 2.1.0 without problem.
I think so that is correct. I have doubt that maybe the ARGS in my script are not consistent with openfoam extended?!

best
mahdi
mm.abdollahzadeh is offline   Reply With Quote

Old   January 26, 2014, 10:18
Default
  #10
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Mahdi,

You need to ask the other fellow users, since I'm not familiar with the job scheduler you are using.
The ARGS entry currently is only taking care of disabling (or enabling?) one connection port and enabling another. It's not taking care of the machines to be used.

Try adding the following to the ARGS variable:
Code:
-host compute-0-3-3,compute-0-7-7,compute-0-9-9
So it should look like this:
Code:
ARGS="--mca btl ^openib --mca btl_tcp_if_include eth0 -host compute-0-3-3,compute-0-7-7,compute-0-9-9"
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 26, 2014, 10:51
Default
  #11
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Dear Bruno

unfortunately it didnt work yet give me :

Quote:
error: executing task of job 2016 failed: execution daemon on host "compute-0-7" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 19670) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
error: executing task of job 2016 failed: execution daemon on host "compute-0-3" didn't accept task
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
by the way I did the commands echo $FOAM_MPI_LIBBIN that you told before in with another user, it doesn't appear nth?!

Best Regards
Mahdi
mm.abdollahzadeh is offline   Reply With Quote

Old   January 26, 2014, 12:07
Default
  #12
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Mahdi,

Quote:
Originally Posted by mm.abdollahzadeh View Post
unfortunately it didnt work yet give me :
Did you also try the one I had written?
Code:
-host compute-0-3-3,compute-0-7-7,compute-0-9-9
Quote:
Originally Posted by mm.abdollahzadeh View Post
by the way I did the commands echo $FOAM_MPI_LIBBIN that you told before in with another user, it doesn't appear nth?!
Depends on the version/variant of OpenFOAM being used, because on 2.1.0 it's the following variable combination:
Code:
echo $FOAM_LIBBIN/$FOAM_MPI
And did on the other user work without indicating the hosts?

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 26, 2014, 12:43
Default
  #13
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Dear Bruno

certainly I have tested the command that you suggest. but it didnt work.
and I got:

Quote:

error: executing task of job 2051 failed: failed sending task to execd@compute-0-7-7: can't resolve host name
error: executing task of job 2051 failed: failed sending task to execd@compute-0-3-3: can't resolve host name
--------------------------------------------------------------------------
A daemon (pid 21182) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
I guess that I should use : -host compute-0-3,compute-0-7

Quote:
error: executing task of job 2052 failed: execution daemon on host "compute-0-7" didn't accept task
error: executing task of job 2052 failed: execution daemon on host "compute-0-3" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 4170) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
compute-0-3 - daemon did not report back when launched
compute-0-7 - daemon did not report back when launched
Best Regards
Mahdi
mm.abdollahzadeh is offline   Reply With Quote

Old   January 26, 2014, 14:44
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Mahdi,

Sorry, then I have absolutely no idea. You need to talk to the system's administrator to assess what the problem is.
In theory you should now be using the correct MPI toolbox, therefore the problem should be somewhere in the job script.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   January 27, 2014, 10:40
Default
  #15
Senior Member
 
mahdi abdollahzadeh
Join Date: Mar 2011
Location: Covilha,Portugal
Posts: 153
Rep Power: 15
mm.abdollahzadeh is on a distinguished road
Thank you Bruno

I add this lines to my /etc/bashrc

Quote:
GMP_VERSION=gmp-5.0.1
MPFR_VERSION=mpfr-3.0.1
MPC_VERSION=mpc-0.8.2

LD_LIBRARY_PATH=$WM_THIRD_PARTY_DIR/packages/$MPC_VERSION/platforms/$WM_OPTIONS/lib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=$WM_THIRD_PARTY_DIR/packages/$MPFR_VERSION/platforms/$WM_OPTIONS/lib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=$WM_THIRD_PARTY_DIR/packages/$GMP_VERSION/platforms/$WM_OPTIONS/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
with the default mpi in thridparty

many thanks Bruno


best regards
mahdi
mm.abdollahzadeh is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[OpenFOAM] Questions about Paraview to show Parallel run of OpenFOAM padian ParaView 20 September 24, 2018 13:52
First Parallel Run - need some help Gian Maria OpenFOAM 3 June 17, 2011 13:08
Cross-compiling OpenFOAM 1.7.0 on Linux for Windows 32 and 64bits with Mingw-w64 wyldckat OpenFOAM Announcements from Other Sources 3 September 8, 2010 07:25
Unable to run OF in parallel on a multiple-node cluster quartzian OpenFOAM 3 November 24, 2009 14:37
OpenFOAM 1.6 virtual machine mahaputra OpenFOAM Installation 4 October 30, 2009 04:01


All times are GMT -4. The time now is 15:16.