CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Pre-Processing

Parallel run with distributed data: How?

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 10, 2012, 11:22
Default Parallel run with distributed data: How?
  #1
New Member
 
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14
eskila is on a distinguished road
I have two multi-core computers available on the network, and want to parallelize a case using cores on both machines.
I tried it with "distributed no" in decomposeParDict, and that worked fine.
Now i want to try the "distributed yes" feature, but find the documentation a bit lacking.

I am starting with something simple: Using one core on each machine.
Let's call the machines:
#1: The one i'm launching the job from.
#2: The other one.

I have a file "machines" with the following content (ip adresses hidden by x's):
Code:
xxx.xxx.xxx.xx1 cpu=1
xxx.xxx.xxx.xx2 cpu=1
I tried a decomposeParDict like this, though i'm not sure if this is the way it's supposed to be done:

Code:
numberOfSubdomains 2;

method          scotch;

distributed     yes;

roots
1
( 
"<absolute path to empty directory on machine 2 which my user owns>"
);
I then do (on machine 1, in the case directory):
Code:
blockMesh
decomposePar
mpirun --hostfile machines -np 2 buoyantBoussinesqPimpleFoam -parallel
It asks for my password on machine 2. After that i get the error:
Code:
[1] --> FOAM FATAL ERROR: 
[1] Cannot find file "points" in directory "polyMesh" in times 0 down to constant
[1] 
[1]     From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&)
[1]     in file db/Time/findInstance.C at line 188.
[1] 
FOAM parallel run exiting
What am i doing wrong?
I'm guessing it has something to do with the "roots" entry in decomposeParDict. Should it be absolute or relative path? If the latter, relative to what? Should it be empty directories? What should it be if i want to e.g. run on 10 cores on each machine?

Any help will be greatly appreciated.
eskila is offline   Reply With Quote

Old   October 10, 2012, 15:11
Default
  #2
Senior Member
 
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17
kev4573 is on a distinguished road
Do you need to specify both roots when using the distributed option, even when one of the nodes is the host?

My guess is that they are absolute paths to the case directory for each of the nodes.
kev4573 is offline   Reply With Quote

Old   October 10, 2012, 15:26
Default
  #3
New Member
 
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14
eskila is on a distinguished road
Quote:
Originally Posted by kev4573 View Post
Do you need to specify both roots when using the distributed option, even when one of the nodes is the host?
No, because if i try to give two roots with this setup, i get the message that the number of roots MUST be one less than the number of processes.

But if i as an example want to use 10 cores on each machine, do i really have to list 19 roots where 9 and 10 of them are identical? And how is a root in decomposeParDict coupled with a machine listed in the "machines" file? A path can be valid on both machines..
eskila is offline   Reply With Quote

Old   October 10, 2012, 16:13
Default
  #4
Senior Member
 
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17
kev4573 is on a distinguished road
These instructions worked for me - http://www.cfd-online.com/Forums/blo...h-process.html .

In any case I'd agree the official documentation could be clearer..

Last edited by kev4573; October 10, 2012 at 16:14. Reason: typo
kev4573 is offline   Reply With Quote

Old   October 10, 2012, 17:02
Default
  #5
New Member
 
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14
eskila is on a distinguished road
Quote:
Originally Posted by kev4573 View Post
These instructions worked for me - http://www.cfd-online.com/Forums/blo...h-process.html .

In any case I'd agree the official documentation could be clearer..
Thanks for the tip.
I am not able to test it for myself right now, but i think i see one problem: In that case all the processes seem to be run on the same machine, only in different directories. Where does the information about the IP-adresses and the distribution of processes between machines go?
eskila is offline   Reply With Quote

Old   October 10, 2012, 17:58
Default
  #6
Senior Member
 
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17
kev4573 is on a distinguished road
I'm not sure there is an explicit map of processors to node/cpu locations, it may just be implied by the ordering of the nodes in your hosts file; first X processors are computed on first node, next X processors computed on the next node, and so on. I think the key is to just copy the entire decomposed case to each node, and point the root directories in decomposeParDict to the parent of the case directory.
kev4573 is offline   Reply With Quote

Old   October 11, 2012, 04:44
Default
  #7
New Member
 
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14
eskila is on a distinguished road
It worked, I think. I don't know how i can be sure that both computers used only their local disc, instead of "crossing over", but at least it ran without error.

The "machines" file contained:
Code:
XXX.XXX.XXX.XX1 cpu=10
XXX.XXX.XXX.XX2 cpu=10
While decomposeParDict contained a listing of 19 roots. 9 of them (nodeX, X=1->9) were paths on machine #1, and 10 of them (nodeX, X=10->19) were paths on machine #2.
Each nodeX directory got a complete copy of the case as a subdirectory, after running decomposePar on the master node.
Then
Code:
mpirun --hostfile machines -np 20 buoyantBoussinesqPimpleFoam -parallel
was run on the master node, and 10 processes popped up on "top" on each machine.
eskila is offline   Reply With Quote

Old   October 11, 2012, 10:22
Default
  #8
Senior Member
 
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17
kev4573 is on a distinguished road
Great, I was wondering myself about whether you would need to define roots for each core vs each machine. I agree it seems to redundant to do the former, but could be beneficial if someone wanted to distribute the processors in different places within a single machine.
kev4573 is offline   Reply With Quote

Old   September 27, 2021, 07:44
Default
  #9
Senior Member
 
qutadah
Join Date: Jun 2021
Location: USA
Posts: 101
Rep Power: 5
qutadah.r is on a distinguished road
hey guys, could you please explain what the distributed keyword means in the decomposePar dictionary?



Thanks!
qutadah.r is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
First Parallel Run - need some help Gian Maria OpenFOAM 3 June 17, 2011 13:08
Working directory via command line Luiz CFX 4 March 6, 2011 21:02
Problem with cyclic bc when run on parallel Tek OpenFOAM 2 January 20, 2011 15:40
Parallel Run on dynamically mounted partition braennstroem OpenFOAM Running, Solving & CFD 14 October 5, 2010 15:43
recovering monitor data from a crashed run prabhu CFX 1 February 21, 2008 19:08


All times are GMT -4. The time now is 16:12.