CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

OpenMPI bash: orted: comand not found error

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 2, 2010, 08:05
Default
  #41
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Stephane,

OK, a few more possibilities:
  • Do you have permissions on the machine to install OpenMPI into the system? Thus is would make it part of it and reduce the chances of it not being detected! Preferably installing OpenMPI using the system's software/package management... and don't forget to install the -dev part of the OpenMPI package too!
    Then go to OpenFOAM's bashrc file and change from OpenMPI to SYSTEMOPENMPI, if I'm not mistaken... and you might need to rebuild libPstream.so.
  • Try not to use the machines file for defining what machines to use. Let's try running locally only for now!
  • Let's try debugging the environment accessible by the mpirun:
    1. create a file with this in it, e.g. test.sh:
      Code:
      #!/bin/bash
      var=$$
      env > $var.log
    2. save file and run:
      Code:
      chmod +x test.sh
    3. try launching mpirun with our new file, but using foamExec for launching it:
      Code:
      mpirun -n 2 `which foamExec` ./test.sh
      Or 3 or 4 processes, it's up to you, since this is only a test!
    4. this should have created files named pid_number.log, one for each successfully launched test.sh.
    5. now, even if it's only one that has been launched successfully, does its content have references to OpenFOAM's environment? In other words, does it look like your local OpenFOAM environment?
  • Another possibility is to try to use mpiexec or orterun instead of mpirun... although I doubt it will do any difference.
  • Is the folder /shared a folder on a physical mount, is it a user mount or a system wide mount? I say this, because if the folder is a user mount, and due to some strange reason, it might not be visible to the remote mpirun executable.
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   July 2, 2010, 10:09
Default
  #42
Senior Member
 
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18
openfoam_user is on a distinguished road
Bruno,

with the below command nothing appens. No file is created.

mpirun -n 2 `which foamExec` ./test.sh

But with the below command 2 files (6630.log and 6631.log) are created.
mpirun -n 2 ./test.sh

I have done another application (hello test)to test mpirun. Maybe you know it.

With the below command I obtain an error message
mpirun --hostfile myhostfile hello

error message:
orted: Command not found.
--------------------------------------------------------------------------
A daemon (pid 6648) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
cfs6 - daemon did not report back when launched
cfs7 - daemon did not report back when launched
cfs8 - daemon did not report back when launched
cfs9 - daemon did not report back when launched
cfs11 - daemon did not report back when launched
[117]cfs10-sanchi /home/sanchi/test_openmpi % orted: Command not found.
orted: Command not found.
orted: Command not found.
orted: Command not found.

With the below command I obtain an error message
/shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun --hostfile myhostfile hello

Hello World! from process 10 out of 12 on cfs11
Hello World! from process 11 out of 12 on cfs11
Hello World! from process 9 out of 12 on cfs10
Hello World! from process 8 out of 12 on cfs10
Hello World! from process 0 out of 12 on cfs6
Hello World! from process 2 out of 12 on cfs7
Hello World! from process 1 out of 12 on cfs6
Hello World! from process 3 out of 12 on cfs7
Hello World! from process 6 out of 12 on cfs9
Hello World! from process 7 out of 12 on cfs9
Hello World! from process 4 out of 12 on cfs8
Hello World! from process 5 out of 12 on cfs8

Something is going wrong because:

[106]cfs10-sanchi /home/sanchi/test_openmpi % which mpirun
/shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun

Regards,

Stephane.
openfoam_user is offline   Reply With Quote

Old   July 2, 2010, 10:54
Default
  #43
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Stephane,

"Hello World!" is a great testing application
Quote:
Originally Posted by openfoam_user View Post
Something is going wrong because:

[106]cfs10-sanchi /home/sanchi/test_openmpi % which mpirun
/shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun
Uhm... remember the post I made a while back:
Quote:
Originally Posted by wyldckat View Post
You better edit the foamJob script:
Code:
which foamJob
and go to the last lines and where is says:
Code:
echo "Executing: mpirun
change to
Code:
echo  "Executing: $mpirun
Run foamJob like you have done before and now you will know who is foamJob really using!
Which mpirun is OpenFOAM's 1.7 foamJob trying to use?



Also, try using the -x option for launching mpirun. For example:
Code:
mpirun -n 2 -x PATH -x LD_LIBRARY_PATH ./test.sh
If this doesn't work, then the only possible solution should be the next possibility!



By the way, what method are you using for sharing the folder /shared between machines? NFS, sshfs, samba or something else?
My guess is that for some reason, the way that the folder is mounted is only activated on demand. For example:
  • if we just try to simply launch mpirun remotely, even if the path to it is in PATH, the mounting system assumes that the mpirun file should be already visible;
  • but if we say that mpirun is located at /shared/right_here/mpirun, the mounting mechanism responsible for the folder /shared wakes up and really checks if the file really exists!
This is the only valid explanation that I can theorize based on the available clues! That's why I'm asking how are you mounting the folder /shared!

Ah, there is also another possibility: does the folder /shared exist before mounting or is it only created when it's mounted? I've had this particular problem with MSys and Cygwin, but never with Linux... but it's a possibility!

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   July 5, 2010, 05:47
Default
  #44
Senior Member
 
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18
openfoam_user is on a distinguished road
Bruno,

the folder /shared exist before mounting.

I can't understand because OF-1.6 and 1.6.x (in the past) was running fine in parallel.

Regards,

Stephane.
openfoam_user is offline   Reply With Quote

Old   July 5, 2010, 08:23
Default
  #45
Senior Member
 
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18
openfoam_user is on a distinguished road
Bruno,

I have notice that mpirun of version 1.7.0 has no link !

[102]cfs10-sanchi /home/sanchi % ls -l `which mpirun`
-rwx------ 1 sanchi cfs 106795 2010-07-01 14:47 /shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun

For the previous version 1.6.x the link was:
sanchi@cfs10:~> ls -l `which mpirun`
lrwxrwxrwx 1 sanchi cfs 7 2010-05-31 12:15 /shared/OpenFOAM/ThirdParty-1.6.x/openmpi-1.3.3/platforms/linux64GccDPOpt/bin/mpirun -> orterun

Your comments about that ?

Regards,

Stephane.
openfoam_user is offline   Reply With Quote

Old   July 5, 2010, 19:13
Default
  #46
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Stephane,

Sorry for the late reply, but I couldn't answer earlier.

OK, as for the link: I believe you no longer have the link because you copied the orterun file to mpirun, which was one of my instructions to try to isolate/fix the issue.

As for the /shared folder: you didn't say what method do you use for mounting.

As for OpenFOAM 1.6.x was working in parallel before: As far as I can tell, you are still getting nearly the same problem you were getting, but this time it's even worse! I remember you posted some time ago that you weren't able to use mpirun with success, and that with foamJob it did work, albeit rather slow. This time, even foamJob doesn't work.

The only common working point with both OpenFOAM versions is if you state the full path to mpirun when launching the parallel run. And that's why I suspect the mounting mechanism is to blame! Otherwise, there is a bug in OpenMPI... which got worse from OpenMPI 1.3.3 to 1.4.1!!

So, three possibilities remain:
  1. the mounting system used (NFS, sshfs, samba, etc...) is to blame or isn't properly configured;
  2. or you can try building OpenFOAM 1.7.0 or 1.7.x with OpenMPI 1.3.3 from the 1.6.x version.
  3. read and follow the instructions on OpenMPI's FAQ: Where should I install Open MPI? - the more efficient option would be to install OpenMPI locally on each machine, but I suppose it's not possible in the one you use

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   July 6, 2010, 04:32
Default
  #47
Senior Member
 
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18
openfoam_user is on a distinguished road
Bruno,

the /shared folder is mounted using NFS.

The story is a bit curious.
- At the beginning OF-1.6 and OF-1.6.x were running fine in parallel.
- Then OF-1.6.x was no more running in parallel, but foamJob was running.
- Then foamJob was no more running in parallel.
- Now I have installed OF-1.7.x. It is impossible to launch a case in parallel, even if I use the full path to mpirun.

But, our own flow solver NSMB runs in parallel using /opt/mpich/bin/mpirun.

Stephane.
openfoam_user is offline   Reply With Quote

Old   July 6, 2010, 05:27
Default
  #48
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Stephane,

Have you tried using full paths for mpirun and foamExec? Something like this:
Code:
/shared/OpenFOAM/ThirdParty-1.7.x/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun -np 4 -hostfile machines /shared/OpenFOAM/OpenFOAM-1.7.x/bin/foamExec  interDyMFoam -parallel | tee log
OK, as for NFS - if you can, try to mount with these options:
Code:
sync,dirsync,atime,exec,rw
Source: http://www.toucheatout.net/informati...tuning-options
The idea is to force the NFS system to refresh more actively, because the default options are usually meant for a small access footprint, while these options (the bold ones) should enforce a more strict policy, and if my theory is correct, it will hopefully fix the issue you are having.


As for "one day was working, the next it wasn't", it seems that the master node may have been updated/upgraded while the other nodes didn't... or maybe all did get updated, which could have tampered with your previous settings...

Good luck!
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   July 6, 2010, 05:51
Default
  #49
Senior Member
 
stephane sanchi
Join Date: Mar 2009
Posts: 314
Rep Power: 18
openfoam_user is on a distinguished road
Hi Bruno,

Now it works again. I don't know why, but it works again with the 2 following commands:

1.
/shared/OpenFOAM/ThirdParty-1.7.x/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun -np 8 -hostfile machines /shared/OpenFOAM/OpenFOAM-1.7.x/bin/foamExec simpleFoam -parallel | tee log

2.
foamJob -s -p simpleFoam

Yesterday I have installed OF-1.7.x and this morning I have done git pull and ./Allwmake.

This is the only change between yesterday and today.

Thanks again for all your messages !!!

Best regards,

Stephane.
openfoam_user is offline   Reply With Quote

Old   January 27, 2015, 09:54
Default
  #50
Member
 
CFDUser
Join Date: Mar 2014
Posts: 59
Rep Power: 13
CFDUser_ is on a distinguished road
Quote:
Originally Posted by fijinx View Post
Ok I definately got it now! I just added the ..../etc/bashrc as the FIRST line in the .bashrc file (before it calls the if non-interactive do nothing) and it works!
Dear James Baker,

Thankyou. Everything working fine .

Thanks & Regards,
CFDUser_
CFDUser_ is offline   Reply With Quote

Old   March 6, 2020, 10:47
Default
  #51
Senior Member
 
chandra shekhar pant
Join Date: Oct 2010
Posts: 220
Rep Power: 17
chandra shekhar pant is on a distinguished road
Hello All,


I am also facing the same issue, which says:FIPS integrity verification test failed.
orted: Command not found.
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
when running on a cluster of 2 nodes using

Code:
mpirun/orterun --host n217:16,n219:16 -np 32 --use-hwthread-cpus snappyHexMesh -parallel -overwrite > log.snappyHexMesh
Could any one suggest any thing in this regard, it will be a great help. Thanks a lot!
chandra shekhar pant is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[OpenFOAM] Native ParaView Reader Bugs tj22 ParaView 270 January 4, 2016 12:39
[swak4Foam] GroovyBC the dynamic cousin of funkySetFields that lives on the suburb of the mesh gschaider OpenFOAM Community Contributions 300 October 29, 2014 19:00
Version 15 on Mac OS X gschaider OpenFOAM Installation 113 December 2, 2009 11:23
user defined function cfduser CFX 0 April 29, 2006 11:58
error while compiling the USER Sub routine CFD user CFX 3 November 25, 2002 16:16


All times are GMT -4. The time now is 03:47.