|
[Sponsors] |
December 12, 2013, 15:02 |
Running MPI on 2 laptops
|
#1 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
I am trying to run airfoil 2D case on two laptops . I made my own blockMesh having 1.1 million cells. I have 2 laptops both having 4 processors. Same openfoam version 2.2.1 and ubuntu 12.04LTS version. I gave decomposePar in both laptops for the same case airfoil2D
The problem is when i give command mpirun --hostfile hosts -np 8 simpleFoam -parallel in terminal, where hosts file contain names and number of processors of slave.Its located inside airfoil2D folder. Host file: ahsan@ahsan-Inspiron-5521 Dell-4050 cpu=4 I get this error: ahsan@ahsan-Inspiron-5521:~/airFoil2D$ mpirun --hostfile hosts -np 8 simpleFoam -parallel ssh: Could not resolve hostname Dell-4050: Name or service not known -------------------------------------------------------------------------- A daemon (pid 12100) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- mpirun: clean termination accomplished Dell 4050 is my slave. I read posts about running MPI on clusters but couldn’t understand how to give library path as i'm not much of a programmer and new to ubuntu. This is my final year project if any1 can help me out ill be more than thankful. |
|
December 15, 2013, 15:18 |
|
#2 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings Ahsan,
Two reference posts that can come in handy:
To solve the problem, you must first find the IP address for each laptop. You can use this command: Code:
ifconfig Make a note of those IP addresses and try pinging the other machine, using it's IP address, to check if it can find it. For example:
Now, once the IPs are properly determined and test, edit the file "/etc/hosts" as super-user in both machines and add the two lines at the end of the file: Code:
11.12.13.14 machine1 11.12.13.132 machine2 Code:
machine1 slots=4 max-slots=4 machine2 slots=4 max-slots=4 Bruno
__________________
|
|
December 16, 2013, 09:05 |
|
#3 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
Thank you cat, i have successfully pinged both laptops, and did all the changes you mentioned.
Code:
192.168.1.2 ahsan@ahsan-Inspiron-5521 192.168.1.1 umer-HP-ProBook-4540s Code:
ahsan@ahsan-Inspiron-5521:~/airFoil2D$ ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_req=41 ttl=64 time=0.520 ms 64 bytes from 192.168.1.1: icmp_req=42 ttl=64 time=0.362 ms Code:
ahsan@ahsan-Inspiron-5521 slots=4 max-slots=4 umer-HP-ProBook-4540s slots=4 max-slots=4 Code:
ahsan@ahsan-Inspiron-5521:~/airFoil2D$ mpirun --hostfile hosts -np 8 simpleFoam -parallel ssh: connect to host umer-HP-ProBook-4540s port 22: Connection refused -------------------------------------------------------------------------- A daemon (pid 6863) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- mpirun: clean termination accomplished Connection refused. Also when i ssh from ahsan: connection timed out While when i ssh from umer : Permission denied Last edited by wyldckat; December 30, 2013 at 09:50. Reason: Added [CODE][/CODE] |
|
December 30, 2013, 09:49 |
|
#4 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ahsan,
Sorry, I was not able to answer any sooner. In summary, the problems in question are probably:
Bruno
__________________
|
|
January 1, 2014, 02:50 |
Help needed!
|
#5 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
I have successfully SSH both laptops. But it gives error :
Cannot find executable s Is it necessary for 2 laptops to have same version of ubuntu, Same openFOAM version, Same architecture (32 or 64 bit) to run a case across a network? In our case one laptop is 32 and other 64 bit also 1 laptop has openfoam version 2.2.1 and other has 2.1.1. I think the problem lies in architecture and ubuntu version. Plz help me out on this. |
|
January 1, 2014, 08:20 |
|
#6 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ahsan,
Don't expect OpenFOAM to do universal parallel processing It doesn't matter what Ubuntu versions you have on each machine. But it does matter that you use the same exact architecture and version of OpenFOAM, as well as installed in the same path. If one laptop is using a 32bit version of Ubuntu, then you have to install the same version of OpenFOAM and the same 32bit build, on both machines! This is because it's the common denominator between both machines, because 64bit machines can also use 32bit applications. First choose which version of OpenFOAM you want to use and let me know, so that I can give you directions on what to do. Best regards, Bruno
__________________
|
|
February 25, 2014, 04:20 |
cluster problem
|
#7 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
Hey wyldcat i really appreciate your help. I have successfully run an airfoil3d case on two core i 3 machines. now i have moved to 3 machines. Actually what I did is that I installed same ubuntu(12.04 LTS) and openfoam (221) versions on all machines. Made same username and password account on all machines. All machines have same architecture. My host file looks like this:
ahsan cpu=4 alvi cpu=4 aftab cpu=4 In /etc/hosts I added these lines as you instructed: 192.168.1.2 ahsan 192.168.1.3 alvi 192.168.1.10 aftab All machines ping and ssh successfully. I start mpirun by typing: mpirun -np 12 --hostfile ./hosts /opt/openfoam221/bin/foamExec simpleFoam -parallel Now the problem is when i run my case on a single machine using its 4 cores it runs faster and reaches 200 time-steps in 5 mins 44 sec, while when I run it along 3 machines using 12 cores, it takes 10 min 33 sec. I used a hub to connect all pcs together. Decomposition method is "scotch". I decomposed my domain into 12 sub domains by typing "decomposePar" in all 3 machines. I dont know where the problem lies . Have i given enough info? plz tell me ill provide. Last edited by ahsan_smme.nust@yahoo.com; February 25, 2014 at 17:03. Reason: Problem definition updated |
|
March 2, 2014, 11:12 |
|
#8 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ahsan,
Well, throwing more CPUs at a problem doesn't mean that it will solve the problem faster. You need to take into account several details:
If you can describe:
Best regards, Bruno
__________________
|
|
March 4, 2014, 04:09 |
Specs
|
#9 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
intel (R) Core (TM) i3-2120 CPU @ 3.30GHz 3.30GHz
Installed Memory 8 Gb 64 bit operating system,x64 based processor ethernet connection 100Mbps RAM DDR3 1333MHz No. of cells in mesh 1.1 million mesh density (150 150 10) check the attachment for ping times. |
|
March 4, 2014, 06:05 |
|
#10 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ahsan,
Quote:
OK, no problem here. Good. Not good. Really not good Any chance you can change to a Gigabit network, namely 1000Mbps = 1Gbps? Because this is the main reason why you're getting such bad timing results For a sense of perspective: the CPUs would have to be running at 400 to 800 MHz per core, for you to notice improvements in using multiple machines. Which at this point, doesn't make much sense, because each CPU runs at 3300MHz. Good enough! Quote:
this only equates to 225 thousand cells. 0.700 ms is an indicator of pretty bad latency issues You need at least 0.300 to 0.100 to achieve relatively good simulation time. If you cannot change the Ethernet connection to Gigabit, then you will have to resort to another kind of parallel performance: you can run 3 independent simulations at the same time, one of each machine, to test different settings. If you can change the Ethernet connection to Gigabit and use 2 cores per machine, then I expect that you'll see something like this:
Best regards, Bruno
__________________
|
|||
April 2, 2014, 23:23 |
|
#11 |
New Member
liyu
Join Date: Jan 2014
Location: Beijing
Posts: 10
Rep Power: 12 |
hi,ahsan~
should the case folder on different laptops be located in the same path ? and where will the results be stored, the host or the slave? thx~ |
|
April 3, 2014, 07:30 |
Path
|
#12 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
Case folder should have same path in all Pcs. If Pc1 is host then from processor 0-4 results will be stored in Pc1, 5-8 results will be stored in Pc2 and so on.
|
|
April 3, 2014, 07:46 |
|
#13 |
New Member
liyu
Join Date: Jan 2014
Location: Beijing
Posts: 10
Rep Power: 12 |
thank you for your vary fast reply!
another question~ is the password-less ssh necessary? i have not understood the detailed procedure even after reading this instruction on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html. i am a freshman to openfoam as well as linux ~~ Besides, could you show me the " roots" part in decomposeParDict file? i am not sure what the " node" means, one node means one PC ? Last edited by ahliyu; April 3, 2014 at 10:42. |
|
April 3, 2014, 13:43 |
Ssh
|
#14 |
New Member
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 13 |
I tried passwordless ssh but didnt work fr me. As far as my knowledge is concerned openfoam may not initiate MPI without a passwrd ssh. Il give u a key shortcut to remove all the fuss in making a cluster. Make same username account and password in all Pcs. Give different IPs to each Pc and run SSH. Pfff ! cluster is ready. Abt decomposePar ill ans that in a while.
Last edited by ahsan_smme.nust@yahoo.com; April 3, 2014 at 13:52. Reason: To Elaborate |
|
April 4, 2014, 00:27 |
help~
|
#15 | |
New Member
liyu
Join Date: Jan 2014
Location: Beijing
Posts: 10
Rep Power: 12 |
Quote:
i have learnt a lot from your post, so really thank you for your work! i have several questions about parallel running on two PCs(actually, two workstations). 1. you mentioned the username. So if two machines have the same username, is it still necessary for me to edit the "roots" in decomposeParDict ? besides, as far as i am concerned, one node means one PC ? i have 2 workstations and each of them has 8 cores. so the " n" in " roots n" should be 1 or 15 ? 2. i still fail to understand how to get a password-less SSH even after reading the instructions on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html. i am a freshman to linux~ is it possible to avoid the password-less SSH setting? thank you again~ |
||
April 5, 2014, 21:26 |
|
#16 | |||||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
@ahliyu: Quote:
Quote:
Yes. Usually the nodes are referred to:
edit: I moved the post above from the other quoted thread, to keep the discussion in the same thread here. Quote:
Quote:
Quote:
Best regards, Bruno
__________________
Last edited by wyldckat; April 5, 2014 at 21:32. Reason: see "edit:" |
||||||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
mpirun, best parameters | pablodecastillo | Hardware | 18 | November 10, 2016 13:36 |
Statically Compiling OpenFOAM Issues | herzfeldd | OpenFOAM Installation | 21 | January 6, 2009 10:38 |
Error using LaunderGibsonRSTM on SGI ALTIX 4700 | jaswi | OpenFOAM | 2 | April 29, 2008 11:54 |
Is Testsuite on the way or not | lakeat | OpenFOAM Installation | 6 | April 28, 2008 12:12 |
Kubuntu uses dash breaks All scripts in tutorials | platopus | OpenFOAM Bugs | 8 | April 15, 2008 08:52 |