|
[Sponsors] |
Something weird encountered when running OpenFOAM in parallel on multiple nodes |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
May 2, 2013, 05:03 |
Something weird encountered when running OpenFOAM in parallel on multiple nodes
|
#1 |
New Member
Qiu Xiaoping
Join Date: Apr 2013
Location: IPE CAS China
Posts: 14
Rep Power: 14 |
Hello everyone,
Recently I have encountered something weird when I tried to run OpenFOAM in parallel on multiple nodes. I got two nodes (A123 and A122),each with 8 cores. After a lot of trial and error, my case ran successfully in parallel with 16 processes. However, when I tried to listen network traffic between A123 and A122 during the running, something I think weird happened. Now let me discribe my situation in detail, and hope you to give me some advices. The following steps depict how I implemented parallel running on multiple nodes: 1 Both A123 and A122 have installed 64 bit CentOS 5.4 ,with OpenFOAM 2.1.1 installed(the openfoam package was download from centFOAM :http://sourceforge.net/projects/centfoam/files/5.x/, I just decompressed the tar ball and source the bashrc in ../etc/ . The thirdParty package was used, with openMPI version of 1.5.3). After those settings, OpenFOAM works fine, both serial case and parallel case on single node ran successfully. 2 Then I changed the ssh settings so as to let A123 and A122 can visit each other without typing password.(i.e. on A123(or A122), I can log in A122(or A123) by typing "ssh A122(or A123)" ,password inputting is not needed. ) 3 I made a directory named "shares" on both A123 and A122 (mkdir -p ~/shares), and on A122 , I utilized "mount" to mount the "shares directory on A123" to "~/shares"( mount -t nfs A123:$HOME/shares ~/shares (as a root)). So the dicrectory "~/shares" on A123 are shared by A123 and A122. 4 I made a hostfile (named "hosts_2-8"), filled with the following words: A123 cpu=8 A122 cpu=8 5 After that , I copied my case to "~/shares" on A123, ran blockMesh , setted up decomposeParDict ,ran decomposePar ,and finally ran "mpirun --hostfile hosts_2-8 -np 16 pisoFoam -parallel". So far everything looked fine. From the log ,I can see that the CFD domain was split into 16 parcels, and the case was computed with 16 processes(8 on A123 and 8 on A122, one of the processes on A123 was the master ,the reset were slaves). Then I tried to listen network traffic between A123 and A122 . I installed and ran "iftop"(http://www.ex-parrot.com/~pdw/iftop/) on A122, and it turned out that data packages exchange between A123 and A122 only occured at the moment when the running begun and ended , and between that ,not even a byte of data exchange could be listenned ! So far as I known, when OpenFOAM ran in parallel, there should be data exchange between processes, and then there should be network traffic between A123 and A122 all the time. Did I make any mistake, either on the implement of paralle running on multiple nodes or on my understanding of MPI ? Or I should use another software to listen the network traffic? By the way, I also tried wireshark, and the result are the same. I even tried pullling out the Ethernet cable during the running, and the running aborted ,as expected. I hope you to give me some advices,thank you ! xpqiu |
|
May 2, 2013, 05:30 |
|
#2 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
I guess one possible explanation is that OpenMPI might not use the TCP/IP protocol (actually SSH over TCP/IP) for anything else than establishing and closing network links on a lower layer in the OSI model. By establishing low-level data links it is possible to gain much better performance than using some IP-based protocols. Such communications might not be "spotted" by the iftop or similar software, which probably stack on top of the IP protocol.
Does the lights on your network interfaces (in the back of the machine) blink? If so, I guess everything is working. If not, you should check that the computations are running on both machines, and not just one. Try to do a ps -A | grep Foam on both machines to see all OpenFOAM applications running. Both machines should display the same number of applications (8). |
|
May 2, 2013, 05:59 |
|
#3 |
New Member
Qiu Xiaoping
Join Date: Apr 2013
Location: IPE CAS China
Posts: 14
Rep Power: 14 |
Thanks haakon,
The network works fine,and after typing "ps -A |grep Foam ", I can see 8 processes named "pisoFoam" on both A123 and A122,so I think the running is just normal. As regards protocol,I am not familiar with that, I will search on the net.Thank you for your information. xpqiu |
|
Tags |
multiple nodes, network traffic, parallel computation |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can not run OpenFOAM in parallel in clusters, help! | ripperjack | OpenFOAM Running, Solving & CFD | 5 | May 6, 2014 16:25 |
parallel error with cyclic BCs for pimpleDyMFoam and trouble in resuming running | sunliming | OpenFOAM Bugs | 21 | November 22, 2013 04:38 |
Running in parallel | Djub | OpenFOAM Running, Solving & CFD | 3 | January 24, 2013 17:01 |
Parallel run of OpenFOAM in linux and windows side by side | m2montazari | OpenFOAM Running, Solving & CFD | 5 | June 24, 2011 04:26 |
running without rsh between nodes | hattonps | OpenFOAM | 10 | March 22, 2010 16:02 |