CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Parallel Computing on more than one machine

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   December 8, 2008, 19:31
Default Hi, I want to use multiple
  #1
Member
 
Daniel Harlacher
Join Date: Mar 2009
Location: Davis, CA, United States
Posts: 60
Rep Power: 17
harly is on a distinguished road
Hi,

I want to use multiple computers to speed up my calculations. but I ran in some trouble, here is my story:

I have the following setup:

1 Server with the home directories

n clients with 2 Cores each

so the problem is that when I log on to Client No. 1 and want to start a calculation with:

mpirun --hostfile machines -np nx2 turbFoam -parallel >log

I get the following error:

bash: orted: command not found

I figured it could be a problem, that the library for orted(mpi) lies in the ThirdParty folder and therefore would be accessed by both computers at the same time on the server.

To solve that problem I created a new account and copied the OpenFoam Installation.

Now what I wanted to do was:

- being logged on to client No.1
- start calculation on client 1 with current user and on client 2 as the new user

but I could not find a way to tell mpirun to log on the client 2 with a seperate username.

Is multiple computing with the setup I have possible at all? I just have full write access to my homefolder and the home directories are mounted as nfs.

Maybe some of you tried that already.

Thank you
-harly
harly is offline   Reply With Quote

Old   December 9, 2008, 09:46
Default Hi Harly! I think that the
  #2
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi Harly!

I think that the problem (in your original setup, I don't understand the stuff with the two users) might have been that the other machine did not know where to find orted. If both of your machines are identically set up a generous application of the -x option of mpirun might help. The options that I for instance use are (it is possible that not all of them are necessary):

-x PATH -x LD_LIBRARY_PATH -x WM_PROJECT_DIR -x FOAM_MPI_LIBBIN

Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   December 9, 2008, 12:45
Default Hi, In my company the home
  #3
Member
 
florian
Join Date: Mar 2009
Location: Mannheim - Vincennes - Valenciennes, Deutchland - France
Posts: 34
Rep Power: 17
floooo is on a distinguished road
Hi,

In my company the home folder is shared on all computers.
And MPICH methode works.
I tried a work on 3x8 cores. And it works.

But I don't know exactly how the home folder is mounted on the machines.

Florian
floooo is offline   Reply With Quote

Old   December 9, 2008, 16:17
Default Hi, the trick with -x didn'
  #4
Member
 
Daniel Harlacher
Join Date: Mar 2009
Location: Davis, CA, United States
Posts: 60
Rep Power: 17
harly is on a distinguished road
Hi,

the trick with -x didn't help here is the complete error - maybe you can find something in there:

bash: orted: command not found
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[lab15:18747] ERROR: A daemon on node lab13 failed to start as expected.
[lab15:18747] ERROR: There may be more information available from
[lab15:18747] ERROR: the remote shell (see above).
[lab15:18747] ERROR: The daemon exited unexpectedly with status 127.
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------

thanks a lot

btw: lab15 is my client 1(I am logged on to and start the calculation) and lab13 is client 2
harly is offline   Reply With Quote

Old   December 9, 2008, 16:41
Default Hi! Never had that problem,
  #5
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi!

Never had that problem, so I'm not sure: is it possible that mpirun can't start the remote processes (either via rsh or ssh). Try from lab15

rsh lab13 which orted

or

ssh lab13 which orted

One of them should find the correct file. Otherwise you'd have to configure rsh or ssh (better) to allow remote execution.

Although the "bash: orted: command not found" hints at "I can connect but on the remote machine I can't find orted". Make sure it is in the same place on both machines (symbolic links are your friends)

Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
parallel computing Daniel FLUENT 1 November 21, 2007 17:09
single and parallel machine with different results zenith FLUENT 2 May 10, 2007 01:27
Parallel Computing Himanshu Almadi FLUENT 0 April 12, 2006 13:43
Reg Parallel Computing Kalyan CFX 3 August 5, 2005 10:00
CFX-5.7.1(Linux) Parallel - 4 CPU Machine James Date CFX 6 June 14, 2005 19:03


All times are GMT -4. The time now is 01:44.