MPI issue on multiple nodes

sail · August 23, 2012, 09:42

Dear All,

I'm setting up a mini cluster (4 nodes) to decrease the solution time with Openfoam.

the OS is ubuntu 12.4LTS

there is no scheduler (i know that this is not the best course, but...)

i've NFS exported my home on the slaves and have passwordless ssh access. the same version of OF, paraview and thirdparties is installed on every machine via script.

i've created the machinefile using the ip of the slaves:

Code:

192.168.0.17
192.168.0.19
192.168.0.21
192.168.0.23

the first is the master and the rest are the slaves. althoug every machine have 4 cores, i've set it like this to keep things simple.

to test the config i meshed, and decomposed the motorbike tutorial, but when i lauch the command

Code:

mpirun -np 4 --hostfile [my machinefile] simpleFoam -parallel > log

i got the following error:

Code:

--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not find an executable:

Executable: simpleFoam
Node: 192.168.0.19

while attempting to start process rank 1.
--------------------------------------------------------------------------

i've searched the forum and google for help, especially i've seen those two threads: http://www.cfd-online.com/Forums/ope...am-solved.html and http://www.cfd-online.com/Forums/ope...tml#post297100 and the relevatn links, but to no avail.

to my uninformed judgement it looks that it cannot find the simplefoam file, but, exporting the home to all the nodes, i shuld already have all the necessary informations in the .bashrc file. and i do: if i ssh to a node and check it, the last line shows

Code:

source /opt/openfoam211/etc/bashrc

any idea about what i'm missing?

thanks in advance

sail · August 23, 2012, 13:34

quick update:

moving the "source /opt/openfoam211/etc/bashrc" as a first line solved the issue.

now lauching the case works-kind of.

tailing the log shuows tat it goes no furhter than the "create time" step, but top on the master node and the slaves shows the processes alive and running @ 100%. ideas?

wyldckat · August 23, 2012, 15:49

Greetings Vieri,

Unfortunately I don't much time to help diagnose the issue, so I'll have to refer you to this quote from my signature link:

Quote:

On how to test if MPI is working: post #4 of "openfoam 1.6 on debian etch", and/or post #19 of "OpenFOAM updates" - Note: As of OpenFOAM 2.0.0, the application "parallelTest" is now called "Test-parallel".
Notes about running OpenFOAM in parallel

Best regards,
Bruno

sail · August 24, 2012, 22:41

still no success.

any hello world or other mpi program does work, so it is not a mpi or network issue.

but the foam job get does not goes past the create time:

Code:

foamJob -p -s simpleFoam
Parallel processing using SYSTEMOPENMPI with 2 processors
Executing: /usr/bin/mpirun -np 2 -hostfile machines /opt/openfoam211/bin/foamExec -prefix /opt simpleFoam -parallel | tee log
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.1.1                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 2.1.1-221db2718bbb
Exec   : simpleFoam -parallel
Date   : Aug 25 2012
Time   : 03:32:58
Host   : "Milano1"
PID    : 25714
Case   : /home/cfduser/OpenFOAM/cfduser-2.1.1/run/vieri/tutorials/tutorials/incompressible/simpleFoam/motorBike
nProcs : 2
Slaves : 
1
(
"Milano4.21614"
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

the processes are spawn on the master and slaves node at 100% cpu.

the case works well both in single and on multicore on a single machine, on both master and any of the slaves.

any ideas?

sail · August 24, 2012, 22:52

same thing using the application Test-parallel gives the same result.

wyldckat · August 25, 2012, 05:31

Hi Vieri,

Weekend is here, so it'll be easier to start helping you.

OK, so you've got 4 machines that are unable to communicate with each other on your cluster. All of them have OpenFOAM installed in the same folder, or at least the folder is shared where the installation is.

One of the reasons for this lock-in to occur is if there is more than one way for each machine to access any network. My guess is that the master node has at least 2 ethernet cards, one for outside world and another for cluster network.
Therefore, to try and check if this is indeed the problem, you can log in into one of the cluster nodes and launch the Test-parallel application using only two of those nodes.

If that works, try using 3 nodes, without the master. If that works, then the problem is indeed because the master has two cards and Open-MPI gets lost in the master while trying to search for other ways to access the nodes.

To override this behaviour, check what ethernet interfaces you've got on the master node:

Code:

ifconfig -a

Names to look for are: eth0, eth1, wlan0 and so on...

You should also confirm the interface names used in the slave nodes.

Now, edit the foamJob script - or create your own copy of it - find this block of code and add the line in bold:

Code:

#
# locate mpirun
#
mpirun=`findExec mpirun` || usage "'mpirun' not found"
mpiopts="-np $NPROCS"

mpiopts="$mpiopts --mca btl_tcp_if_exclude lo,eth1"

The lo is the loopback connection, where the machine can contact itself. eth1 is assuming this is the interface for external contact with the world.

Hopefully this will do the trick.

Best regards,
Bruno

sail · August 27, 2012, 20:33

Quote:

Originally Posted by wyldckat

Hi Vieri,

Weekend is here, so it'll be easier to start helping you.

Hi Bruno,

First of all, thanks for taking your well deserved free-time to help me.

Quote:

Originally Posted by wyldckat

OK, so you've got 4 machines that are unable to communicate with each other on your cluster. All of them have OpenFOAM installed in the same folder, or at least the folder is shared where the installation is.

correct, same installation dir.

Quote:

Originally Posted by wyldckat

One of the reasons for this lock-in to occur is if there is more than one way for each machine to access any network. My guess is that the master node has at least 2 ethernet cards, one for outside world and another for cluster network.
Therefore, to try and check if this is indeed the problem, you can log in into one of the cluster nodes and launch the Test-parallel application using only two of those nodes.

I tried to do so, but the result is the same as before: it hangs at the "create time" step. I must notice that running that from a slave to another it ask for the password, but once i've typed it, the same behaviour appears.

Quote:

Originally Posted by wyldckat

If that works, try using 3 nodes, without the master. If that works, then the problem is indeed because the master has two cards and Open-MPI gets lost in the master while trying to search for other ways to access the nodes.

To override this behaviour, check what ethernet interfaces you've got on the master node:

Code:

ifconfig -a

Names to look for are: eth0, eth1, wlan0 and so on...

You should also confirm the interface names used in the slave nodes.

each of my machines, slave and master altogherther, have two NICs: an eth0 which is actually connected to the switch and have the ip assigned, plus an eth1 wich is recognized by the system but not plugged and used.
I've positively checked that all the used network are called eth0.

Quote:

Originally Posted by wyldckat

Now, edit the foamJob script - or create your own copy of it - find this block of code and add the line in bold:

Code:

#
# locate mpirun
#
mpirun=`findExec mpirun` || usage "'mpirun' not found"
mpiopts="-np $NPROCS"

mpiopts="$mpiopts --mca btl_tcp_if_exclude lo,eth1"

The lo is the loopback connection, where the machine can contact itself. eth1 is assuming this is the interface for external contact with the world.

I've tried doing as you suggested, plus various combinations to cover every possibility:

mpiopts="$mpiopts --mca btl_tcp_if_exclude lo,eth1" and
mpiopts="$mpiopts --mca btl_tcp_if_exclude eth1"

give the same behaviour: stuck at "create Time" phase.

mpiopts="$mpiopts --mca btl_tcp_if_exclude lo,eth0" and
mpiopts="$mpiopts --mca btl_tcp_if_exclude eth0"

don't even get till there: as it is expected, we excluded the connection, so it doesn't even start; this is the output to where it is stuck:

Code:

./foam2Job -p -s Test-parallel
Parallel processing using SYSTEMOPENMPI with 2 processors
Executing: /usr/bin/mpirun -np 2 --mca btl_tcp_if_exclude lo,eth0 -hostfile machines /opt/openfoam211/bin/foamExec -prefix /opt Test-parallel -parallel | tee log
cfduser@192.168.0.23's password: 
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.1.1                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 2.1.1-221db2718bbb
Exec   : Test-parallel -parallel
Date   : Aug 28 2012
Time   : 01:11:00
Host   : "Milano2"
PID    : 22821

Quote:

Originally Posted by wyldckat

Hopefully this will do the trick.

Best regards,
Bruno

YES!!! YOU NAILED IT!

i spent some hours to do some testing, and now, just while i was repling to you, i found it!

ifconfig higlighted the presence of another network infetface called usb0 (i have not idea what it is). adding it to the parameters to be exclueded in the new foamJob script made everything work. I guess openMpi is really greedy for tcp connections! i guess it wasn't advancing because it was waiting for an answer on this misterious usb0 device.

thanks again, you are relly one of the columns of this forum and the time you spend helping users and (wannabe) sysadmin is really appreciated!

words fail me to express my gratitude.

best regards.

wyldckat · August 28, 2012, 10:50

Hi Vieri,

Quote:

Originally Posted by sail

ifconfig higlighted the presence of another network infetface called usb0 (i have not idea what it is). adding it to the parameters to be exclueded in the new foamJob script made everything work. I guess openMpi is really greedy for tcp connections! i guess it wasn't advancing because it was waiting for an answer on this misterious usb0 device.

Open-MPI does have a greedy algorithm, for using the maximum amount of available resources available for the optimum connections. I've read before that Open-MPI 1.2.8 and an old Debian installation, it was able to use 2 ethernet cards per machine, where each card had an IP in the same subnet. But nowadays it requires some additional configuration that I'm not aware of.
Basically, if you have 2 cards, both turned on and in the same subnet (same machine could have 10.0.0.1 and 10.0.0.2, one for each card); but for example, when you want to ping another machine, you need to specify an interface:

Code:

ping -I eth1 10.0.0.3

Otherwise, it uses by default the first card that is listed. Open-MPI was meant to take advantage of this dual network interface per machine, but like I said, I don't know how it works nowadays.

Quote:

Originally Posted by sail

thanks again, you are relly one of the columns of this forum and the time you spend helping users and (wannabe) sysadmin is really appreciated!

words fail me to express my gratitude.

best regards.

You're welcome

This way I think I now have on record at least 3 cases where this solution works

Best regards,
Bruno

sail · September 7, 2012, 06:20

Up!

Unfortunately another issue has arisen.

if i launch a job using all the 16 cores of my 4- machines cluster it gets stuck.

i see on the master node a cpu utilization of 25% in userspace and 75% in systemspace, while the slaves are 50% waiting for IO.

this behaviour happens even if the master node has just one processor occupied by the job.

it looks to me, but i might be wrong, that the master have issues serving the nfs shared directory to all the nodes because if i first copy the case data locally on the slaves, using the informations provided here: http://www.cfd-online.com/Forums/blo...h-process.html it works flawlessi and is really fast.

i then tried to stresstest the nfs executing the following script on a slave but it works flawlessy.

Code:

time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile1 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile2 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile3 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile4 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile5 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile6 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile7 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile8 bs=16k count=10000 &
 time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile9 bs=16k count=10000

even increasing the number of nfs daemons on the server to an higer value (32) produced no effect. i see lots of nfsd processes, using high values of cpu utilization, and the procedure goes well.

on the other hand when i lauch the job i see just 2 nfsd at 1% cpu on the master.

running a 4-cores job using one core each machine works well,

running a 4-cores job using the master, a slave with 2 cores and a slave with 1 core is slower.

running a 5-cores job with a master, and 2 2-cores slaves is even slower/gets stuck.:

master - top:

Code:

Tasks: 213 total,   2 running, 206 sleeping,   0 stopped,   5 zombie
Cpu(s):  5.0%us, 20.8%sy,  0.0%ni, 73.9%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  16423888k total, 15999036k used,   424852k free,       52k buffers
Swap: 15624996k total,  1469876k used, 14155120k free,   213596k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
10123 cfduser   20   0  322m 156m  15m R  100  1.0   4:32.87 pimpleDyMFoam      
 2200 root      20   0     0    0    0 D    1  0.0   0:10.33 nfsd               
 2202 root      20   0     0    0    0 D    1  0.0   0:26.47 nfsd               
 9515 root      20   0     0    0    0 S    1  0.0   0:01.24 kworker/0:2        
10302 root      20   0     0    0    0 S    1  0.0   0:00.63 kworker/0:3        
10301 root      20   0     0    0    0 S    0  0.0   0:00.34 kworker/u:1

slave - top:

Code:

Tasks: 145 total,   1 running, 141 sleeping,   0 stopped,   3 zombie
Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 49.7%id, 50.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16423888k total, 12310280k used,  4113608k free,      580k buffers
Swap: 15624996k total,     3100k used, 15621896k free, 11622444k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 1695 root      20   0     0    0    0 S    1  0.0   0:01.28 kworker/1:0        
 4124 root      20   0     0    0    0 D    1  0.0   0:00.56 192.168.0.17-ma    
  911 cfduser   20   0 17340 1332  968 R    0  0.0   0:00.51 top                
 4658 lightdm   20   0  744m  18m  12m S    0  0.1  86:52.63 unity-greeter      
    1 root      20   0 24592 2364 1300 S    0  0.0   0:01.56 init               
    2 root      20   0     0    0    0 S    0  0.0   0:00.56 kthreadd

the case is big enough to let the communication part to be neglegible in comparison to the computing part (5 M cells)

any ideas on how to fix it?

up to now all the comm on the network happens on gb ethernet. tests shows that the trasfer rate is reasonable (60MB/s). i have another ethernet card in every machine, up to now unused. shuld i activate it, assign a different set of IPs and use one for nfs traffic and another for MPI comm? do you belive this might be the problem?

any other options or flags I'm missing?

thanks in advance

best regards

wyldckat · September 8, 2012, 16:50

Hi Vieri,

NFS locking up... sounds familiar, but I can't remember if I ever found out the fix for it...

But before I forget, here's a link with some hints on improving NFS performance: http://scotgrid.blogspot.pt/2009/11/...guide-for.html

Now, let's try diagnosing this another way (hint: jump to #7, then start from the top if it didn't work

):

Check with another solver. Dynamic mesh solvers tend to be more unstable and you might be triggering a bug.
I suggest two tutorials:
- "multiphase/interFoam/laminar/damBreakFine/" - this one only appears if you run Allrun on the parent folder.
  This case is simple enough for a quick test, which doesn't weight much on the NFS system.
  You can gradually increase the mesh resolution, in an attempt to isolate the load level at which it breaks.
- "incompressible/pisoFoam/les/motorBike/motorBike/" - this one is good and heavy and is good for testing parallel meshing as well
In the previous two cases, try setting a larger "writeInterval" in "controlDict", for determining if it locks up when it tries to write down the files.
If it locks up at the reading stage, try this: log into each node and copy the folder onto the local storage, doesn't matter where to. Now, don't close the connection and try running the case without any changes.
This way the files are pre-loaded into memory cache and don't need to access the NFS as much.
Which MPI are you using? Is it Open-MPI 1.5.3? It's the default that comes with OpenFOAM 2.0 and 2.1? Because I remember there was a problem with that version... here we go: http://www.cfd-online.com/Forums/ope...s-openmpi.html
The problem shouldn't be a bottleneck related issue. You can try running the case using the separate paths method that worked before and then during that, copy files to each machine at the same time, using one terminal connection per transfer.

In OpenFOAM's main "etc" folder, you'll find a "controlDict" file. Edit it and check if you are using these default settings:

Code:

OptimisationSwitches
{
    fileModificationSkew 10;

    //- Modification checking:
    //  - timeStamp         : use modification time on file
    //  - inotify           : use inotify framework
    //  - timeStampMaster   : do time stamp (and file reading) only on master.
    //  - inotifyMaster     : do inotify (and file reading) only on master.
    fileModificationChecking timeStampMaster;//inotify;timeStamp;inotifyMaster;

    commsType       nonBlocking; //scheduled; //blocking;
    floatTransfer   0;
    nProcsSimpleSum 0;

    // Force dumping (at next timestep) upon signal (-1 to disable)
    writeNowSignal              -1; //10;
    // Force dumping (at next timestep) upon signal (-1 to disable) and exit
    stopAtWriteNowSignal        -1;
}

As you can see, "fileModificationChecking" is set to allow only the master to check for file changes, which provides for a very good way to keep NFS accesses at a low level.
The "commsType" gives control on how parallel processes wait for each other... which should allow them to not block.

It's possible that the MPI is messing around the same ports as the NFS. Related to this, I've stumbled on this post: http://samritmaity.wordpress.com/201...abit-ethernet/ - quoting the solution:

Quote:

After a lot of trial-and-error, I zeroed in at conclusion that as each of the compute node of my cluster is having multiple Ethernet connectivity port [ eth0,eth1,eth2 ] , the OpenMPI is confused at the time of HPL-MPI communication[ send/receive ] about – which port to use for packet transfer. This understanding let me to Mailing List Archives which talks about MCA flag “btl_tcp_if_include eth0”. So I decided to give it a try – and surprisngly this solved my problem. The final command is as follows

Code:

mpiexec  -mca btl_tcp_if_include eth0   -np 16  -hostfile hostfile.txt   ./xhpl

As for assigning traffic via different cards: AFAIK, it all depends on the kernel (configurations) you're using. From my experience with openSUSE and it's default Linux kernel, I'm never able to properly use two networks for connecting a group of machines, because the network package manager always sends packages through the shortest path, namely use the first "eth*" it can use for sending packages between two machines, even if it means automagically pairing IP addresses on the same NIC.

Nonetheless, you might be luckier than I've been so far, if you follow these instructions: http://www.open-mpi.org/community/li...9/12/11544.php

Best regards,
Bruno

Giuliano69 · September 13, 2012, 03:58

Hi Wildkat !
I'm the sysamdin of the cluster where the program is running.
First of all thanks a lot for your huge help.

The cluster is formed by 4 ibm systemx xeon ws. The os is Ubuntu 12.04 64 bit.With nfs3+Openfoam2.1.1+OpenMPI1.4.3
Each ws has 2 eth cards, but card eth1 is disconnected, and the MPI foam2Job disable the eth1 and usb0 port

I've had to downgrade the nfs4 to nfs3, because with nfs4 the motorbike testcase locked.

----------------------------
nfs3 machine configuration:
SERVER - nfs
/etc/exports :
192.168.0.19(rw,sync,root_squash,no_subtree_check) 192.168.0.21(rw,sync,root_squash,no_subtree_check) 192.168.0.23(rw,sync,root_squash,no_subtree_check)

CLIENT - nfs 3 version
/etc/fstab
192.168.0.17:/home/cfduser/OpenFOAM /home/cfduser/OpenFOAM/ nfs _netdev,nfsvers=3,proto=tcp,noac,auto 0 0
-----------------------------------------

Unfortunately, _our_ test case is still not running (the tutorials on MPI do run); I give a reply to the topic you proposed, to see what we could try.

1) incompressible/pisoFoam/les/motorBike/motorBike/ works like a charme. I attach a screen shot of the nmon of the four ws running.
After success in motorbike mpi run, we thought that the nfs problem was over.... nope :-(
http://ubuntuone.com/6VdREYUP3lU48ii1P6ndvO
2) didn't change writeInterval in controlDict (of the motorbike tutorial), because it worked out of the box
4) Open mpi come in version 1.4.3 together with OpenFoam with the SGI package. In Ubuntu 12.04 MPI ver 1.5 is available as separate package, but cause disinstallation of OpenFoam :-)). Anyway from what I read in your link, ver 1.4.3 should be safer
5) our testcase is some GB in size, so I fear it should overlap the cache
6) /opt/openfoam211/etc/controlDict is the same you posted
7) as motorbike is working flawlessly, I fear that is not (mainly) an nfs configuration problem.
Anyway it happens that, if we run our testcase letting it fail, the master keeps two nfsd process alive (with some nwtk activities going on... why?? ) and running then motorbike, lead to failure (motorbike stuked as our case !) I attach two screenshots more (our case failing/locked, and the 2 nfsd zombie after killing the foam2Job with ctrl-c)
http://ubuntuone.com/11t2ZT9VIENlM1etMHbpU1
http://ubuntuone.com/6Y1Kt32czcNMYtRSrKveEW

As can be seen by the screen shot our test case keeps node 3 and 4 in waiting. Note 2 is working (?)
Maybe related in some way to the code, i.e in the order from which the write and read funtion are called, causing a deadlock ?
Maybe the node2 keeping a file still open, causing nodes 3 and 4 locked out ? The problem arise from the fist seconds....

Seeing the screenshots, which other test you would consider ?
Which debug code we could insert in the code to narrow the point where the problem is generated ?

Thanks again for your huge help.

Giuliano69 · September 14, 2012, 11:18

Update:

Result n° 1
at a fist debug, the problem seems to arise form the AMI mesh that has been used.
The code get stucked at #include "createDynamicFvMesh.H"

that is:

Code:

debugPimpleDyMFoam.C

int main(int argc, char *argv[])
{
Info << "-- STARTING ALL"<< endl;
    #include "setRootCase.H"
Info << "-- setRoot done, before createTime"<< endl;
    #include "createTime.H"
Info << "-- createTime done, before createDynmicMesh" << endl;
    #include "createDynamicFvMesh.H"
Info << "-- creteDynMesh done, before initCont" << endl;
    #include "initContinuityErrs.H"
Info << "-- Init cont done before createFields" << endl;
    #include "createFields.H"
Info << "-- createFields done, before readTimeControls" << endl;
    #include "readTimeControls.H"
Info << "-- redTime done before pimpleControl" << endl;
    pimpleControl pimple(mesh);

    // *

Digging into the "createDynamicFvMesh.H", would require to modify the OF library, but seems related to line 04

Code:

createDynamicFvMesh.H
00001     Info<< "Create mesh for time = "
00002         << runTime.timeName() << nl << endl;
00003 
00004     autoPtr<dynamicFvMesh> meshPtr
00005     (
00006         dynamicFvMesh::New
00007         (
00008             IOobject
00009             (
00010                 dynamicFvMesh::defaultRegion,
00011                 runTime.timeName(),
00012                 runTime,
00013                 IOobject::MUST_READ
00014             )
00015         )
00016     );
00017 
00018     dynamicFvMesh& mesh = meshPtr()

Result n° 2
After this debug, the ideas comes to try a vanilla example with AMI mesh like "mixerVesselAMI2D", on a parallel cluster.
motorbike works on parallel, but mixerVesselAMI2D -that use AMI mesh- DOESN'T, and gets stacked in the same way of our case.

Could it be a problem of the AMI mesh when in a parallel cluster ?

wyldckat · September 15, 2012, 09:11

Greetings Giuliano,

OK, I've read through your posts here and will follow up at your other thread: http://www.cfd-online.com/Forums/ope...ed-anyone.html

Best regards,
Bruno

Giuliano69 · October 12, 2012, 12:24

Thanks wildkat for your kind help.

After some test&debug, it was clear that ONLY when node n°4 was working in MPI, we had problems with the MPI lock.

Although the installation was done from with a bash script, the only solution was to try a complete format&reinstall of node n° 4.

after that, everything worked.
Strange but...

Giuliano69 · October 12, 2012, 12:29

I would like also to share the following experience:

under ubuntu 12.04 64bit, BOTH the following nfs configuration (CLIENT side) was found to be fully working with MPI (/etc/fstab configuration) (I mean that it woks EITHER as nfs 3 OR as nfs 4)

192.168.0.17:/home/cfduser/OpenFOAM /home/cfduser/OpenFOAM/ nfs4 _netdev,auto 0 0
#192.168.0.17:/home/cfduser/OpenFOAM /home/cfduser/OpenFOAM/ nfs _netdev,nfsvers=3,proto=tcp,noac,auto 0 0

on the SERVER side, this is the etc/exports configuration
/home/cfduser/OpenFOAM 192.168.0.19(rw,sync,root_squash,no_subtree_check) 192.168.0.21(rw,sync,root_squash,no_subtree_check) 192.168.0.23(rw,sync,root_squash,no_subtree_check)

At the PRESENT TIME, the used configuration is nfs ver 4

In case could help someone....

wyldckat · October 12, 2012, 17:32

Hi Giuliano,

Quote:

Originally Posted by Giuliano69

Although the installation was done from with a bash script, the only solution was to try a complete format&reinstall of node n° 4.

The installers are usually never full proof. There is always some weird thing that happens that the installer isn't expecting and then it breaks something... it could be a byte misplaced (RAM or hard-drive minor flaw), a race condition (something happened before it should have) or some other weird thing, like a mild power failure that doesn't lead to a power down... the damage occurs, but goes on as if nothing happened.

And many thanks for sharing the info on how to configure NFS on Ubuntu. Personally I rarely am able to configure Ubuntu to use NFS... but thank goodness that openSUSE exists

otherwise I would be missing more hair on my head...

Best regards,
Bruno

ehsan · August 27, 2013, 13:15

Hello

We are running interPhaseChangeFoam in parallel using 24 nodes. The run starts fine and go for some times but afterwards, one of our systems does not contribute in the communications process and the run encounters the deadlock. Could you please help us in this regards?

Sincerely,
Ehsan

wyldckat · August 27, 2013, 18:00

Greetings Ehsan,

Did you try following the instructions present on this thread?

Best regards,
Bruno

ehsan · August 27, 2013, 22:59

Hello Burno

the problem is that whether the stop of the run is related to the system set-ups that discussed in this page or it is something with the interPhaseChangeFoam solver being run in parallel?

In fact, one of our system stop contributing after some runs, say after 1000 or 2000 time steps, if we stop the run and continue again, the run goes ahead until again encountering the same problem. However, we used this system for parallel run of other solvers without problem and without this stops.

So, is it possible that problem arise from interPhaseChangeFoam setups in fvSolution or elsewhere?

Regards

ehsan · August 28, 2013, 07:24

Hello

We detected that the problem is that one system goes out of connection from the network, i.e., once we ping it, it won't reply. It is odd that at the start, it goes fine but after some iterations it stop working in the network.
Would you please help me in this regards?

Thanks

August 23, 2012, 09:42	MPI issue on multiple nodes	#1
sail Senior Member Vieri Abolaffio Join Date: Jul 2010 Location: Always on the move. Posts: 308 Rep Power: 17	Dear All, I'm setting up a mini cluster (4 nodes) to decrease the solution time with Openfoam. the OS is ubuntu 12.4LTS there is no scheduler (i know that this is not the best course, but...) i've NFS exported my home on the slaves and have passwordless ssh access. the same version of OF, paraview and thirdparties is installed on every machine via script. i've created the machinefile using the ip of the slaves: Code: 192.168.0.17 192.168.0.19 192.168.0.21 192.168.0.23 the first is the master and the rest are the slaves. althoug every machine have 4 cores, i've set it like this to keep things simple. to test the config i meshed, and decomposed the motorbike tutorial, but when i lauch the command Code: mpirun -np 4 --hostfile [my machinefile] simpleFoam -parallel > log i got the following error: Code: -------------------------------------------------------------------------- mpirun was unable to launch the specified application as it could not find an executable: Executable: simpleFoam Node: 192.168.0.19 while attempting to start process rank 1. -------------------------------------------------------------------------- i've searched the forum and google for help, especially i've seen those two threads: http://www.cfd-online.com/Forums/ope...am-solved.html and http://www.cfd-online.com/Forums/ope...tml#post297100 and the relevatn links, but to no avail. to my uninformed judgement it looks that it cannot find the simplefoam file, but, exporting the home to all the nodes, i shuld already have all the necessary informations in the .bashrc file. and i do: if i ssh to a node and check it, the last line shows Code: source /opt/openfoam211/etc/bashrc any idea about what i'm missing? thanks in advance __________________ http://www.leadingedge.it/ Naval architecture and CFD consultancy

August 23, 2012, 13:34		#2
sail Senior Member Vieri Abolaffio Join Date: Jul 2010 Location: Always on the move. Posts: 308 Rep Power: 17	quick update: moving the "source /opt/openfoam211/etc/bashrc" as a first line solved the issue. now lauching the case works-kind of. tailing the log shuows tat it goes no furhter than the "create time" step, but top on the master node and the slaves shows the processes alive and running @ 100%. ideas? krikre likes this. __________________ http://www.leadingedge.it/ Naval architecture and CFD consultancy Last edited by sail; August 23, 2012 at 14:07.

August 24, 2012, 22:52		#5
sail Senior Member Vieri Abolaffio Join Date: Jul 2010 Location: Always on the move. Posts: 308 Rep Power: 17	same thing using the application Test-parallel gives the same result. __________________ http://www.leadingedge.it/ Naval architecture and CFD consultancy

August 25, 2012, 05:31		#6
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Hi Vieri, Weekend is here, so it'll be easier to start helping you. OK, so you've got 4 machines that are unable to communicate with each other on your cluster. All of them have OpenFOAM installed in the same folder, or at least the folder is shared where the installation is. One of the reasons for this lock-in to occur is if there is more than one way for each machine to access any network. My guess is that the master node has at least 2 ethernet cards, one for outside world and another for cluster network. Therefore, to try and check if this is indeed the problem, you can log in into one of the cluster nodes and launch the Test-parallel application using only two of those nodes. If that works, try using 3 nodes, without the master. If that works, then the problem is indeed because the master has two cards and Open-MPI gets lost in the master while trying to search for other ways to access the nodes. To override this behaviour, check what ethernet interfaces you've got on the master node: Code: ifconfig -a Names to look for are: eth0, eth1, wlan0 and so on... You should also confirm the interface names used in the slave nodes. Now, edit the foamJob script - or create your own copy of it - find this block of code and add the line in bold: Code: # # locate mpirun # mpirun=`findExec mpirun` \|\| usage "'mpirun' not found" mpiopts="-np $NPROCS" mpiopts="$mpiopts --mca btl_tcp_if_exclude lo,eth1" The lo is the loopback connection, where the machine can contact itself. eth1 is assuming this is the interface for external contact with the world. Hopefully this will do the trick. Best regards, Bruno vlad, sail, FerdiFuchs and 2 others like this. __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

September 7, 2012, 06:20		#9
sail Senior Member Vieri Abolaffio Join Date: Jul 2010 Location: Always on the move. Posts: 308 Rep Power: 17	Up! Unfortunately another issue has arisen. if i launch a job using all the 16 cores of my 4- machines cluster it gets stuck. i see on the master node a cpu utilization of 25% in userspace and 75% in systemspace, while the slaves are 50% waiting for IO. this behaviour happens even if the master node has just one processor occupied by the job. it looks to me, but i might be wrong, that the master have issues serving the nfs shared directory to all the nodes because if i first copy the case data locally on the slaves, using the informations provided here: http://www.cfd-online.com/Forums/blo...h-process.html it works flawlessi and is really fast. i then tried to stresstest the nfs executing the following script on a slave but it works flawlessy. Code: time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile1 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile2 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile3 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile4 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile5 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile6 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile7 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile8 bs=16k count=10000 & time dd if=/dev/zero of=~/OpenFOAM/cfduser-2.1.1/testfile9 bs=16k count=10000 even increasing the number of nfs daemons on the server to an higer value (32) produced no effect. i see lots of nfsd processes, using high values of cpu utilization, and the procedure goes well. on the other hand when i lauch the job i see just 2 nfsd at 1% cpu on the master. running a 4-cores job using one core each machine works well, running a 4-cores job using the master, a slave with 2 cores and a slave with 1 core is slower. running a 5-cores job with a master, and 2 2-cores slaves is even slower/gets stuck.: master - top: Code: Tasks: 213 total, 2 running, 206 sleeping, 0 stopped, 5 zombie Cpu(s): 5.0%us, 20.8%sy, 0.0%ni, 73.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 16423888k total, 15999036k used, 424852k free, 52k buffers Swap: 15624996k total, 1469876k used, 14155120k free, 213596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10123 cfduser 20 0 322m 156m 15m R 100 1.0 4:32.87 pimpleDyMFoam 2200 root 20 0 0 0 0 D 1 0.0 0:10.33 nfsd 2202 root 20 0 0 0 0 D 1 0.0 0:26.47 nfsd 9515 root 20 0 0 0 0 S 1 0.0 0:01.24 kworker/0:2 10302 root 20 0 0 0 0 S 1 0.0 0:00.63 kworker/0:3 10301 root 20 0 0 0 0 S 0 0.0 0:00.34 kworker/u:1 slave - top: Code: Tasks: 145 total, 1 running, 141 sleeping, 0 stopped, 3 zombie Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 49.7%id, 50.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16423888k total, 12310280k used, 4113608k free, 580k buffers Swap: 15624996k total, 3100k used, 15621896k free, 11622444k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1695 root 20 0 0 0 0 S 1 0.0 0:01.28 kworker/1:0 4124 root 20 0 0 0 0 D 1 0.0 0:00.56 192.168.0.17-ma 911 cfduser 20 0 17340 1332 968 R 0 0.0 0:00.51 top 4658 lightdm 20 0 744m 18m 12m S 0 0.1 86:52.63 unity-greeter 1 root 20 0 24592 2364 1300 S 0 0.0 0:01.56 init 2 root 20 0 0 0 0 S 0 0.0 0:00.56 kthreadd the case is big enough to let the communication part to be neglegible in comparison to the computing part (5 M cells) any ideas on how to fix it? up to now all the comm on the network happens on gb ethernet. tests shows that the trasfer rate is reasonable (60MB/s). i have another ethernet card in every machine, up to now unused. shuld i activate it, assign a different set of IPs and use one for nfs traffic and another for MPI comm? do you belive this might be the problem? any other options or flags I'm missing? thanks in advance best regards __________________ http://www.leadingedge.it/ Naval architecture and CFD consultancy

September 13, 2012, 03:58		#11
Giuliano69 New Member Giuliano Lotta Join Date: May 2012 Posts: 12 Rep Power: 14	Hi Wildkat ! I'm the sysamdin of the cluster where the program is running. First of all thanks a lot for your huge help. The cluster is formed by 4 ibm systemx xeon ws. The os is Ubuntu 12.04 64 bit.With nfs3+Openfoam2.1.1+OpenMPI1.4.3 Each ws has 2 eth cards, but card eth1 is disconnected, and the MPI foam2Job disable the eth1 and usb0 port I've had to downgrade the nfs4 to nfs3, because with nfs4 the motorbike testcase locked. ---------------------------- nfs3 machine configuration: SERVER - nfs /etc/exports : 192.168.0.19(rw,sync,root_squash,no_subtree_check) 192.168.0.21(rw,sync,root_squash,no_subtree_check) 192.168.0.23(rw,sync,root_squash,no_subtree_check) CLIENT - nfs 3 version /etc/fstab 192.168.0.17:/home/cfduser/OpenFOAM /home/cfduser/OpenFOAM/ nfs _netdev,nfsvers=3,proto=tcp,noac,auto 0 0 ----------------------------------------- Unfortunately, _our_ test case is still not running (the tutorials on MPI do run); I give a reply to the topic you proposed, to see what we could try. 1) incompressible/pisoFoam/les/motorBike/motorBike/ works like a charme. I attach a screen shot of the nmon of the four ws running. After success in motorbike mpi run, we thought that the nfs problem was over.... nope :-( http://ubuntuone.com/6VdREYUP3lU48ii1P6ndvO 2) didn't change writeInterval in controlDict (of the motorbike tutorial), because it worked out of the box 4) Open mpi come in version 1.4.3 together with OpenFoam with the SGI package. In Ubuntu 12.04 MPI ver 1.5 is available as separate package, but cause disinstallation of OpenFoam :-)). Anyway from what I read in your link, ver 1.4.3 should be safer 5) our testcase is some GB in size, so I fear it should overlap the cache 6) /opt/openfoam211/etc/controlDict is the same you posted 7) as motorbike is working flawlessly, I fear that is not (mainly) an nfs configuration problem. Anyway it happens that, if we run our testcase letting it fail, the master keeps two nfsd process alive (with some nwtk activities going on... why?? ) and running then motorbike, lead to failure (motorbike stuked as our case !) I attach two screenshots more (our case failing/locked, and the 2 nfsd zombie after killing the foam2Job with ctrl-c) http://ubuntuone.com/11t2ZT9VIENlM1etMHbpU1 http://ubuntuone.com/6Y1Kt32czcNMYtRSrKveEW As can be seen by the screen shot our test case keeps node 3 and 4 in waiting. Note 2 is working (?) Maybe related in some way to the code, i.e in the order from which the write and read funtion are called, causing a deadlock ? Maybe the node2 keeping a file still open, causing nodes 3 and 4 locked out ? The problem arise from the fist seconds.... Seeing the screenshots, which other test you would consider ? Which debug code we could insert in the code to narrow the point where the problem is generated ? Thanks again for your huge help. Last edited by Giuliano69; September 14, 2012 at 10:46.

September 15, 2012, 09:11		#13
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Greetings Giuliano, OK, I've read through your posts here and will follow up at your other thread: http://www.cfd-online.com/Forums/ope...ed-anyone.html Best regards, Bruno __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

October 12, 2012, 12:24		#14
Giuliano69 New Member Giuliano Lotta Join Date: May 2012 Posts: 12 Rep Power: 14	Thanks wildkat for your kind help. After some test&debug, it was clear that ONLY when node n°4 was working in MPI, we had problems with the MPI lock. Although the installation was done from with a bash script, the only solution was to try a complete format&reinstall of node n° 4. after that, everything worked. Strange but...

October 12, 2012, 12:29		#15
Giuliano69 New Member Giuliano Lotta Join Date: May 2012 Posts: 12 Rep Power: 14	I would like also to share the following experience: under ubuntu 12.04 64bit, BOTH the following nfs configuration (CLIENT side) was found to be fully working with MPI (/etc/fstab configuration) (I mean that it woks EITHER as nfs 3 OR as nfs 4) 192.168.0.17:/home/cfduser/OpenFOAM /home/cfduser/OpenFOAM/ nfs4 _netdev,auto 0 0 #192.168.0.17:/home/cfduser/OpenFOAM /home/cfduser/OpenFOAM/ nfs _netdev,nfsvers=3,proto=tcp,noac,auto 0 0 on the SERVER side, this is the etc/exports configuration /home/cfduser/OpenFOAM 192.168.0.19(rw,sync,root_squash,no_subtree_check) 192.168.0.21(rw,sync,root_squash,no_subtree_check) 192.168.0.23(rw,sync,root_squash,no_subtree_check) At the PRESENT TIME, the used configuration is nfs ver 4 In case could help someone....

August 27, 2013, 13:15	Deadlcok	#17
ehsan Senior Member Ehsan Join Date: Mar 2009 Posts: 112 Rep Power: 17	Hello We are running interPhaseChangeFoam in parallel using 24 nodes. The run starts fine and go for some times but afterwards, one of our systems does not contribute in the communications process and the run encounters the deadlock. Could you please help us in this regards? Sincerely, Ehsan

August 27, 2013, 18:00		#18
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Greetings Ehsan, Did you try following the instructions present on this thread? Best regards, Bruno __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

August 27, 2013, 22:59	Parallel run problem	#19
ehsan Senior Member Ehsan Join Date: Mar 2009 Posts: 112 Rep Power: 17	Hello Burno the problem is that whether the stop of the run is related to the system set-ups that discussed in this page or it is something with the interPhaseChangeFoam solver being run in parallel? In fact, one of our system stop contributing after some runs, say after 1000 or 2000 time steps, if we stop the run and continue again, the run goes ahead until again encountering the same problem. However, we used this system for parallel run of other solvers without problem and without this stops. So, is it possible that problem arise from interPhaseChangeFoam setups in fvSolution or elsewhere? Regards

August 28, 2013, 07:24	Parallel runs	#20
ehsan Senior Member Ehsan Join Date: Mar 2009 Posts: 112 Rep Power: 17	Hello We detected that the problem is that one system goes out of connection from the network, i.e., once we ping it, it won't reply. It is odd that at the start, it goes fine but after some iterations it stop working in the network. Would you please help me in this regards? Thanks

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
how to set periodic boundary conditions	Ganesh	FLUENT	15	November 18, 2020 07:09
Issue with OpenMPI-1.5.3 while running parallel jobs on multiple nodes	LargeEddy	OpenFOAM	1	March 7, 2012 18:05
Issue with running in parallel on multiple nodes	daveatstyacht	OpenFOAM	7	August 31, 2010 18:16
Error using LaunderGibsonRSTM on SGI ALTIX 4700	jaswi	OpenFOAM	2	April 29, 2008 11:54
CFX4.3 -build analysis form	Chie Min	CFX	5	July 13, 2001 00:19