|
[Sponsors] |
July 2, 2010, 08:05 |
|
#41 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Stephane,
OK, a few more possibilities:
Bruno
__________________
|
|
July 2, 2010, 10:09 |
|
#42 |
Senior Member
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18 |
Bruno,
with the below command nothing appens. No file is created. mpirun -n 2 `which foamExec` ./test.sh But with the below command 2 files (6630.log and 6631.log) are created. mpirun -n 2 ./test.sh I have done another application (hello test)to test mpirun. Maybe you know it. With the below command I obtain an error message mpirun --hostfile myhostfile hello error message: orted: Command not found. -------------------------------------------------------------------------- A daemon (pid 6648) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -------------------------------------------------------------------------- cfs6 - daemon did not report back when launched cfs7 - daemon did not report back when launched cfs8 - daemon did not report back when launched cfs9 - daemon did not report back when launched cfs11 - daemon did not report back when launched [117]cfs10-sanchi /home/sanchi/test_openmpi % orted: Command not found. orted: Command not found. orted: Command not found. orted: Command not found. With the below command I obtain an error message /shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun --hostfile myhostfile hello Hello World! from process 10 out of 12 on cfs11 Hello World! from process 11 out of 12 on cfs11 Hello World! from process 9 out of 12 on cfs10 Hello World! from process 8 out of 12 on cfs10 Hello World! from process 0 out of 12 on cfs6 Hello World! from process 2 out of 12 on cfs7 Hello World! from process 1 out of 12 on cfs6 Hello World! from process 3 out of 12 on cfs7 Hello World! from process 6 out of 12 on cfs9 Hello World! from process 7 out of 12 on cfs9 Hello World! from process 4 out of 12 on cfs8 Hello World! from process 5 out of 12 on cfs8 Something is going wrong because: [106]cfs10-sanchi /home/sanchi/test_openmpi % which mpirun /shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun Regards, Stephane. |
|
July 2, 2010, 10:54 |
|
#43 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Stephane,
"Hello World!" is a great testing application Quote:
Quote:
Also, try using the -x option for launching mpirun. For example: Code:
mpirun -n 2 -x PATH -x LD_LIBRARY_PATH ./test.sh By the way, what method are you using for sharing the folder /shared between machines? NFS, sshfs, samba or something else? My guess is that for some reason, the way that the folder is mounted is only activated on demand. For example:
Ah, there is also another possibility: does the folder /shared exist before mounting or is it only created when it's mounted? I've had this particular problem with MSys and Cygwin, but never with Linux... but it's a possibility! Best regards, Bruno
__________________
|
|||
July 5, 2010, 05:47 |
|
#44 |
Senior Member
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18 |
Bruno,
the folder /shared exist before mounting. I can't understand because OF-1.6 and 1.6.x (in the past) was running fine in parallel. Regards, Stephane. |
|
July 5, 2010, 08:23 |
|
#45 |
Senior Member
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18 |
Bruno,
I have notice that mpirun of version 1.7.0 has no link ! [102]cfs10-sanchi /home/sanchi % ls -l `which mpirun` -rwx------ 1 sanchi cfs 106795 2010-07-01 14:47 /shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun For the previous version 1.6.x the link was: sanchi@cfs10:~> ls -l `which mpirun` lrwxrwxrwx 1 sanchi cfs 7 2010-05-31 12:15 /shared/OpenFOAM/ThirdParty-1.6.x/openmpi-1.3.3/platforms/linux64GccDPOpt/bin/mpirun -> orterun Your comments about that ? Regards, Stephane. |
|
July 5, 2010, 19:13 |
|
#46 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Stephane,
Sorry for the late reply, but I couldn't answer earlier. OK, as for the link: I believe you no longer have the link because you copied the orterun file to mpirun, which was one of my instructions to try to isolate/fix the issue. As for the /shared folder: you didn't say what method do you use for mounting. As for OpenFOAM 1.6.x was working in parallel before: As far as I can tell, you are still getting nearly the same problem you were getting, but this time it's even worse! I remember you posted some time ago that you weren't able to use mpirun with success, and that with foamJob it did work, albeit rather slow. This time, even foamJob doesn't work. The only common working point with both OpenFOAM versions is if you state the full path to mpirun when launching the parallel run. And that's why I suspect the mounting mechanism is to blame! Otherwise, there is a bug in OpenMPI... which got worse from OpenMPI 1.3.3 to 1.4.1!! So, three possibilities remain:
Best regards, Bruno
__________________
|
|
July 6, 2010, 04:32 |
|
#47 |
Senior Member
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18 |
Bruno,
the /shared folder is mounted using NFS. The story is a bit curious. - At the beginning OF-1.6 and OF-1.6.x were running fine in parallel. - Then OF-1.6.x was no more running in parallel, but foamJob was running. - Then foamJob was no more running in parallel. - Now I have installed OF-1.7.x. It is impossible to launch a case in parallel, even if I use the full path to mpirun. But, our own flow solver NSMB runs in parallel using /opt/mpich/bin/mpirun. Stephane. |
|
July 6, 2010, 05:27 |
|
#48 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Stephane,
Have you tried using full paths for mpirun and foamExec? Something like this: Code:
/shared/OpenFOAM/ThirdParty-1.7.x/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun -np 4 -hostfile machines /shared/OpenFOAM/OpenFOAM-1.7.x/bin/foamExec interDyMFoam -parallel | tee log Code:
sync,dirsync,atime,exec,rw The idea is to force the NFS system to refresh more actively, because the default options are usually meant for a small access footprint, while these options (the bold ones) should enforce a more strict policy, and if my theory is correct, it will hopefully fix the issue you are having. As for "one day was working, the next it wasn't", it seems that the master node may have been updated/upgraded while the other nodes didn't... or maybe all did get updated, which could have tampered with your previous settings... Good luck! Bruno
__________________
|
|
July 6, 2010, 05:51 |
|
#49 |
Senior Member
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18 |
Hi Bruno,
Now it works again. I don't know why, but it works again with the 2 following commands: 1. /shared/OpenFOAM/ThirdParty-1.7.x/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun -np 8 -hostfile machines /shared/OpenFOAM/OpenFOAM-1.7.x/bin/foamExec simpleFoam -parallel | tee log 2. foamJob -s -p simpleFoam Yesterday I have installed OF-1.7.x and this morning I have done git pull and ./Allwmake. This is the only change between yesterday and today. Thanks again for all your messages !!! Best regards, Stephane. |
|
January 27, 2015, 09:54 |
|
#50 |
Member
CFDUser
Join Date: Mar 2014
Posts: 59
Rep Power: 13 |
||
March 6, 2020, 10:47 |
|
#51 |
Senior Member
chandra shekhar pant
Join Date: Oct 2010
Posts: 220
Rep Power: 17 |
Hello All,
I am also facing the same issue, which says:FIPS integrity verification test failed. orted: Command not found. -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- when running on a cluster of 2 nodes using Code:
mpirun/orterun --host n217:16,n219:16 -np 32 --use-hwthread-cpus snappyHexMesh -parallel -overwrite > log.snappyHexMesh |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[OpenFOAM] Native ParaView Reader Bugs | tj22 | ParaView | 270 | January 4, 2016 12:39 |
[swak4Foam] GroovyBC the dynamic cousin of funkySetFields that lives on the suburb of the mesh | gschaider | OpenFOAM Community Contributions | 300 | October 29, 2014 19:00 |
Version 15 on Mac OS X | gschaider | OpenFOAM Installation | 113 | December 2, 2009 11:23 |
user defined function | cfduser | CFX | 0 | April 29, 2006 11:58 |
error while compiling the USER Sub routine | CFD user | CFX | 3 | November 25, 2002 16:16 |