|
[Sponsors] |
October 10, 2012, 11:22 |
Parallel run with distributed data: How?
|
#1 |
New Member
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14 |
I have two multi-core computers available on the network, and want to parallelize a case using cores on both machines.
I tried it with "distributed no" in decomposeParDict, and that worked fine. Now i want to try the "distributed yes" feature, but find the documentation a bit lacking. I am starting with something simple: Using one core on each machine. Let's call the machines: #1: The one i'm launching the job from. #2: The other one. I have a file "machines" with the following content (ip adresses hidden by x's): Code:
xxx.xxx.xxx.xx1 cpu=1 xxx.xxx.xxx.xx2 cpu=1 Code:
numberOfSubdomains 2; method scotch; distributed yes; roots 1 ( "<absolute path to empty directory on machine 2 which my user owns>" ); Code:
blockMesh decomposePar mpirun --hostfile machines -np 2 buoyantBoussinesqPimpleFoam -parallel Code:
[1] --> FOAM FATAL ERROR: [1] Cannot find file "points" in directory "polyMesh" in times 0 down to constant [1] [1] From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&) [1] in file db/Time/findInstance.C at line 188. [1] FOAM parallel run exiting I'm guessing it has something to do with the "roots" entry in decomposeParDict. Should it be absolute or relative path? If the latter, relative to what? Should it be empty directories? What should it be if i want to e.g. run on 10 cores on each machine? Any help will be greatly appreciated. |
|
October 10, 2012, 15:11 |
|
#2 |
Senior Member
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17 |
Do you need to specify both roots when using the distributed option, even when one of the nodes is the host?
My guess is that they are absolute paths to the case directory for each of the nodes. |
|
October 10, 2012, 15:26 |
|
#3 | |
New Member
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14 |
Quote:
But if i as an example want to use 10 cores on each machine, do i really have to list 19 roots where 9 and 10 of them are identical? And how is a root in decomposeParDict coupled with a machine listed in the "machines" file? A path can be valid on both machines.. |
||
October 10, 2012, 16:13 |
|
#4 |
Senior Member
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17 |
These instructions worked for me - http://www.cfd-online.com/Forums/blo...h-process.html .
In any case I'd agree the official documentation could be clearer.. Last edited by kev4573; October 10, 2012 at 16:14. Reason: typo |
|
October 10, 2012, 17:02 |
|
#5 | |
New Member
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14 |
Quote:
I am not able to test it for myself right now, but i think i see one problem: In that case all the processes seem to be run on the same machine, only in different directories. Where does the information about the IP-adresses and the distribution of processes between machines go? |
||
October 10, 2012, 17:58 |
|
#6 |
Senior Member
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17 |
I'm not sure there is an explicit map of processors to node/cpu locations, it may just be implied by the ordering of the nodes in your hosts file; first X processors are computed on first node, next X processors computed on the next node, and so on. I think the key is to just copy the entire decomposed case to each node, and point the root directories in decomposeParDict to the parent of the case directory.
|
|
October 11, 2012, 04:44 |
|
#7 |
New Member
Eskil Aursand
Join Date: Sep 2012
Posts: 7
Rep Power: 14 |
It worked, I think. I don't know how i can be sure that both computers used only their local disc, instead of "crossing over", but at least it ran without error.
The "machines" file contained: Code:
XXX.XXX.XXX.XX1 cpu=10 XXX.XXX.XXX.XX2 cpu=10 Each nodeX directory got a complete copy of the case as a subdirectory, after running decomposePar on the master node. Then Code:
mpirun --hostfile machines -np 20 buoyantBoussinesqPimpleFoam -parallel |
|
October 11, 2012, 10:22 |
|
#8 |
Senior Member
Kevin Smith
Join Date: Mar 2009
Posts: 104
Rep Power: 17 |
Great, I was wondering myself about whether you would need to define roots for each core vs each machine. I agree it seems to redundant to do the former, but could be beneficial if someone wanted to distribute the processors in different places within a single machine.
|
|
September 27, 2021, 07:44 |
|
#9 |
Senior Member
qutadah
Join Date: Jun 2021
Location: USA
Posts: 101
Rep Power: 5 |
hey guys, could you please explain what the distributed keyword means in the decomposePar dictionary?
Thanks! |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
First Parallel Run - need some help | Gian Maria | OpenFOAM | 3 | June 17, 2011 13:08 |
Working directory via command line | Luiz | CFX | 4 | March 6, 2011 21:02 |
Problem with cyclic bc when run on parallel | Tek | OpenFOAM | 2 | January 20, 2011 15:40 |
Parallel Run on dynamically mounted partition | braennstroem | OpenFOAM Running, Solving & CFD | 14 | October 5, 2010 15:43 |
recovering monitor data from a crashed run | prabhu | CFX | 1 | February 21, 2008 19:08 |