|
[Sponsors] |
July 23, 2014, 02:35 |
problem about running parallel on cluster
|
#1 |
New Member
Zhiwei Zheng
Join Date: May 2014
Posts: 23
Rep Power: 12 |
Hi all,
Recently, I want to run case on cluster, and meet some problem. Note: just run case on the master note, and other notes are slave. When I implement "mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log" [zhengzw@manager pitzDailyMapped]$ mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log Warning: Permanently added 'n01,172.16.1.1' (RSA) to the list of known hosts. bash: orted: command not found -------------------------------------------------------------------------- A daemon (pid 16793) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- When I implement "foamJob -s -p pisoFoam" [zhengzw@manager pitzDailyMapped]$ foamJob -s -p pisoFoam Parallel processing using SYSTEMOPENMPI with 8 processors Executing: /usr/lib64/openmpi/bin/mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log Warning: Permanently added 'n01,172.16.1.1' (RSA) to the list of known hosts. bash: /usr/lib64/openmpi/bin/orted: No such file or directory -------------------------------------------------------------------------- A daemon (pid 15934) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- When I implement "/usr/lib64/openmpi/bin/mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log" [zhengzw@manager pitzDailyMapped]$ /usr/lib64/openmpi/bin/mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log Warning: Permanently added 'n01,172.16.1.1' (RSA) to the list of known hosts. bash: /usr/lib64/openmpi/bin/orted: No such file or directory -------------------------------------------------------------------------- A daemon (pid 16822) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- Howerver the orted exists in the directory "/usr/lib64/openmpi/bin/" because when I implement "ls /usr/lib64/openmpi/bin/" [zhengzw@manager cylinder2D1]$ ls /usr/lib64/openmpi/bin/ mpic++ mpitests-IMB-MPI1 ompi-profiler otfconfig mpicc mpitests-osu_acc_latency ompi-ps otfdecompress mpiCC mpitests-osu_alltoall ompi-server otfdump mpicc-vt mpitests-osu_bcast ompi-top otfinfo mpiCC-vt mpitests-osu_bibw opal_wrapper otfmerge mpic++-vt mpitests-osu_bw opari otfprofile mpicxx mpitests-osu_get_bw orte-bootproxy.sh otfshrink mpicxx-vt mpitests-osu_get_latency ortec++ vtc++ mpiexec mpitests-osu_latency ortecc vtcc mpif77 mpitests-osu_latency_mt orteCC vtCC mpif77-vt mpitests-osu_mbw_mr orte-clean vtcxx mpif90 mpitests-osu_multi_lat orted vtf77 mpif90-vt mpitests-osu_put_bibw orte-iof vtf90 mpirun mpitests-osu_put_bw orte-ps vtfilter mpitests-com mpitests-osu_put_latency orterun vtunify mpitests-glob ompi-clean orte-top vtunify-mpi mpitests-globalop ompi_info orte_wrapper_script vtwrapper mpitests-IMB-EXT ompi-iof otfaux mpitests-IMB-IO ompi-probe otfcompress I have no ideal, help me! Any help will be appreciated! |
|
July 23, 2014, 15:50 |
|
#2 |
Member
Join Date: May 2013
Location: Canada
Posts: 32
Rep Power: 13 |
It's difficult to help with parallel case troubleshooting because there are many potential sources of error, but I can try to help point you toward things to investigate.
Make sure that OpenFOAM is installed in the exact same location on all computers. It looks like you have it installed in /home/zhengzw/OpenFOAM/ on your workstation. foamExec passes the environment variables to the slave machines from your workstation, so when the slaves attempt to launch OpenFOAM, they are looking in their own filesystem under /home/zhengzw/OpenFOAM/ to find the OpenFOAM libraries and executables. This is why it's preferable to install OpenFOAM in a standard location (like /opt/ or similar) as opposed to the local user directory when you are running across multiple machines. Also, if you haven't already, make sure you have password-less login enabled by copying your SSH keys appropriately across all your machines. Otherwise, OpenFOAM won't be able to read/write across nodes. Finally, if you're trying to use a custom library, you'll have to make it accessible across all nodes as well. |
|
July 23, 2014, 21:36 |
|
#3 | |
New Member
Zhiwei Zheng
Join Date: May 2014
Posts: 23
Rep Power: 12 |
Hi cdm,
thanks for your reply! Quote:
|
||
July 23, 2014, 22:13 |
|
#4 |
Member
Join Date: May 2013
Location: Canada
Posts: 32
Rep Power: 13 |
It really depends on your cluster setup. I'd discuss with the admin about how the cluster is distributed. OpenFOAM has to be installed on each computer that you expect to use for running in parallel. The admin should be able to get you set up properly, as it is likely not be something you can do yourself as a local user.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem running with 3 CPU's in parallel | saurabh3737 | OpenFOAM Running, Solving & CFD | 5 | August 16, 2012 18:05 |
Script to Run Parallel Jobs in Rocks Cluster | asaha | OpenFOAM Running, Solving & CFD | 12 | July 4, 2012 23:51 |
RSH problem for parallel running in CFX | Nicola | CFX | 5 | June 18, 2012 19:31 |
Problem editing files when running in parallel | Ladnam | OpenFOAM | 2 | September 19, 2011 04:35 |
Problem in running Parallel | mamaly60 | OpenFOAM Running, Solving & CFD | 1 | April 19, 2010 12:11 |