|
[Sponsors] |
March 15, 2005, 23:47 |
Hi,
I am trying to get para
|
#1 |
Senior Member
Pei-Ying Hsieh
Join Date: Mar 2009
Posts: 317
Rep Power: 18 |
Hi,
I am trying to get parallel computing going and are running into problem. It will be appreciate if anyone here can help me. 1. I got nfs running. Process1 mounted to process0. 2. I got passwordless ssh working. I can type: ssh -v phsieh@192.168.254.43 and log in to the remote computer without entering a password. But, I cannot get lamboot -v ... to start (in the file machines contains 2 nodes). Here is the error message: ------------------------ [phsieh@brian3 interFoam]$ lamboot -v /home/phsieh/OpenFOAM/phsieh-1.1/run/tutorials/interFoam/damBreakFine/system/mac hines LAM 7.1.1 - Indiana University n-1<4730> ssi:boot:base:linear: booting n0 (brian3.hsieh.com) n-1<4730> ssi:boot:base:linear: booting n1 (kevin3.hsieh.com) ERROR: LAM/MPI unexpectedly received the following on stderr: connect to address 192.168.254.32: Connection refused connect to address 192.168.254.32: Connection refused trying normal rsh (/usr/bin/rsh) kevin3.hsieh.com: Connection refused ----------------------------------------------------------------------------- LAM failed to execute a process on the remote node "kevin3.hsieh.com". LAM was not trying to invoke any LAM-specific commands yet -- we were simply trying to determine what shell was being used on the remote host. LAM tried to use the remote agent command "rsh" to invoke "echo $SHELL" on the remote node. *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S *** MAILING LIST. This usually indicates an authentication problem with the remote agent, some other configuration type of error in your .cshrc or .profile file, or you were unable to executable a command on the remote node for some other reason. The following is a list of items that you should check on the remote node: - You have an account and can login to the remote machine - Incorrect permissions on your home directory (should probably be 0755) - Incorrect permissions on your $HOME/.rhosts file (if you are using rsh -- they should probably be 0644) - You have an entry in the remote $HOME/.rhosts file (if you are using rsh) for the machine and username that you are running from - Your .cshrc/.profile must not print anything out to the standard error - Your .cshrc/.profile should set a correct TERM type - Your .cshrc/.profile should set the SHELL environment variable to your default shell Try invoking the following command at the unix command line: rsh kevin3.hsieh.com -n 'echo $SHELL' You will need to configure your local setup such that you will *not* be prompted for a password to invoke this command on the remote node. No output should be printed from the remote node before the output of the command is displayed. When you can get this command to execute successfully by hand, LAM will probably be able to function properly. ----------------------------------------------------------------------------- n-1<4730> ssi:boot:base:linear: Failed to boot n1 (kevin3.hsieh.com) n-1<4730> ssi:boot:base:linear: aborted! n-1<4735> ssi:boot:base:linear: booting n0 (brian3.hsieh.com) n-1<4735> ssi:boot:base:linear: booting n1 (kevin3.hsieh.com) ERROR: LAM/MPI unexpectedly received the following on stderr: connect to address 192.168.254.32: Connection refused connect to address 192.168.254.32: Connection refused trying normal rsh (/usr/bin/rsh) kevin3.hsieh.com: Connection refused ----------------------------------------------------------------------------- LAM failed to execute a process on the remote node "kevin3.hsieh.com". LAM was not trying to invoke any LAM-specific commands yet -- we were simply trying to determine what shell was being used on the remote host. LAM tried to use the remote agent command "rsh" to invoke "echo $SHELL" on the remote node. *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S *** MAILING LIST. This usually indicates an authentication problem with the remote agent, some other configuration type of error in your .cshrc or .profile file, or you were unable to executable a command on the remote node for some other reason. The following is a list of items that you should check on the remote node: - You have an account and can login to the remote machine - Incorrect permissions on your home directory (should probably be 0755) - Incorrect permissions on your $HOME/.rhosts file (if you are using rsh -- they should probably be 0644) - You have an entry in the remote $HOME/.rhosts file (if you are using rsh) for the machine and username that you are running from - Your .cshrc/.profile must not print anything out to the standard error - Your .cshrc/.profile should set a correct TERM type - Your .cshrc/.profile should set the SHELL environment variable to your default shell Try invoking the following command at the unix command line: rsh kevin3.hsieh.com -n 'echo $SHELL' You will need to configure your local setup such that you will *not* be prompted for a password to invoke this command on the remote node. No output should be printed from the remote node before the output of the command is displayed. When you can get this command to execute successfully by hand, LAM will probably be able to function properly. ----------------------------------------------------------------------------- n-1<4735> ssi:boot:base:linear: Failed to boot n1 (kevin3.hsieh.com) n-1<4735> ssi:boot:base:linear: aborted! lamboot did NOT complete successfully [phsieh@brian3 interFoam]$ pei |
|
March 16, 2005, 04:14 |
Hi,
Have you logged in to k
|
#2 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
Hi,
Have you logged in to kevin3 with just ssh prior to starting lamboot? The first time you log on a machine with ssh you have to answer a yes/no question and lamboot can no handle this. You must therefore, by hand, log on to the machines you want to use for the parallell run before starting. N |
|
March 16, 2005, 04:22 |
another alternative is to copy
|
#3 |
Member
Kuan Tek Seang
Join Date: Mar 2009
Posts: 31
Rep Power: 17 |
another alternative is to copy all the relevant host information in the .ssh directory
|
|
March 16, 2005, 04:24 |
ah ... sorry, a rather incompl
|
#4 |
Member
Kuan Tek Seang
Join Date: Mar 2009
Posts: 31
Rep Power: 17 |
ah ... sorry, a rather incomplete sentence there, i mean, make sure all the different host id are present in the relevant file in your .ssh directory.
|
|
March 16, 2005, 04:24 |
Hi,
LAM is as default using
|
#5 |
New Member
Rasmus Gjesing
Join Date: Mar 2009
Posts: 7
Rep Power: 17 |
Hi,
LAM is as default using rsh, but that is properly not installed or running on your system. Instead, change to ssh (which is also more secure), by adding and setting the enviroment variable LAMRSH to "ssh -x". As I read your post, you already has ssh working, so you just need to tell lam to use it. /Rasmus |
|
May 22, 2007, 22:41 |
Sorry, but my remote computer
|
#6 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Sorry, but my remote computer (30 cpus) doesn't support ssh, and what could i do? I have to rsh it (let me call it B) from my computer (I call it A).
I have tried passwordless rsh for a whole day, such as set .rhosts file, or /etc/hosts.****, but failed. my question is, do i need to install openFoam on the remote computer? Thanks in advance. Daniel
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
May 23, 2007, 15:58 |
Yep. For rsh the binaries (in
|
#7 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Yep. For rsh the binaries (in your case OpenFOAM) have to be available on B (rsh just starts a shell on B and tries to execute your command there; no data(programs) get sent during the process). If your B uses the same /home as your A then this should be relatively easy (provided they are both of the same architecture) because you already have a /home/daniel/OpenFOAM where these programs reside. The only problem is that maybe the OpenFOAM/OpenFOAM-1.4/.OpenFOAM-1.4/bashrc doesn't get sourced on B and therefor the OF-applications (and the .so-s) will not be found (but it's a long time that I worked with rsh, so I'm only 93% sure).
Just try rsh B interFoam and report what error message you get.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request |
|
May 23, 2007, 23:41 |
Thank you very much, Bernhardh
|
#8 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Thank you very much, Bernhard
now, when i try rsh B interFoam, i get connect to address 202.***.***.*** port ***: Connection refused Trying krb4 rsh... connect to address 202.***.***.*** port ***: Connection refused Trying normal rsh (/usr/bin/rsh) Login incorrect. I'm a newbie to linux world, and i guess that I probably have made a mistake on trying rsh passwordless, I should have modified hosts, hosts.allow, hosts.deny, hosts.equiv, and .rhosts in computer B, not in A. am i right? I use fedora c6, and my remote computer is IRIX which InstallationTest told me OpenFOAM can only be installed on linux, therefore that's why i ask do I have to install OpenFOAM on both computers? and it's not quite easy for me to get the root right of B. Thanks again. Daniel
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
May 24, 2007, 15:44 |
As above: if the machines shar
|
#9 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
As above: if the machines share a /home then setting ~/.rhost should be sufficient.
One guess is this: Some of the passwordless logins ONLY work if the .bashrc (or equivalen whatever your choice is) don't print any output to stdout. If your B does such a thing you'll have to remove these commands or distinguish between interactive (can have output) and non-interactive (==rsh) logins. But that is only a guess.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Lamboot and ssh | dmoroian | OpenFOAM Running, Solving & CFD | 1 | November 1, 2006 06:53 |
Lamboot and mpirun | r2d2 | OpenFOAM Running, Solving & CFD | 2 | January 10, 2006 12:31 |
Lamboot trouble | r2d2 | OpenFOAM Installation | 4 | October 17, 2005 05:27 |
how to start cfd | abul basat | FLUENT | 2 | August 1, 2005 04:51 |
Cannot start using CFX-5.7.1 | Atit | CFX | 8 | February 16, 2005 07:40 |