OpenFoam parallel on 2 computers : Cannot find file "points"

Old   June 3, 2015, 13:59
OpenFoam parallel on 2 computers : Cannot find file "points"
Hi everyone,

it's my first post on this forum but I'll try to describe my problem with sufficient details.

I've been experimenting with OpenFoam (2.4.0) and Linux Terminal for a few days and so far so good. I've done some tutorials from the UserGuide. I can run OpenFoam on several processors on my computer using mpirun. Paraview shows its just fine. As I try to run OpenFoam in parallel on two computers by wifi however, things go bad. I've been stuck for many hours, although I read posts with similar problems.

Following the steps from tutorial ''Dam Break'' and section 3.4.2. Everything works except the run command.

Steps I did:
Copy the DamBreak directory to DamBreakFine directory
Modify the BlockMeshDict (the tutorial was about refining the mesh...longer simulation)
Reset alpha.water to uniform field with backup
Run SetFields (OK)
Run DecomposePar (OK)
Write a file named machines in the case directory:

blue@blue-HP-Pavilion-g6-Notebook-PC.local cpu=2
green1@green1-Satellite-A200.local cpu=2
I know that data can be split on several disks but it should also work to only use CPU power of the second computer through mpirun.

Run "mpirun --hostfile machines -np 4 icoFoam -parallel "
Terminal asks for green1 password (slave machine) ... OK
Then..... I get the following error.......

| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.4.0                                 |
|   \\  /    A nd           | Web:                      |
|    \\/     M anipulation  |                                                 |
Build  : 2.4.0-f0842aea0e77
Exec   : icoFoam -parallel
Date   : Jun 03 2015
Time   : 10:11:35
Host   : "blue-HP-Pavilion-g6-Notebook-PC"
PID    : 9695
Case   : /home/blue/OpenFOAM/blue-2.4.0/run/damBreakFine
nProcs : 4
Slaves : 

Pstream initialized with:
    floatTransfer      : 0
    nProcsSimpleSum    : 0
    commsType          : nonBlocking
    polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

[2] Cannot find file "points" in directory "polyMesh" in times 0 down to constant
[2] --------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
    From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&)
[2]     in file db/Time/findInstance.C at line 203.
FOAM parallel run exiting
[3] Cannot find file "points" in directory "polyMesh" in times 0 down to constant
[3]     From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&)
[3]     in file db/Time/findInstance.C at line 203.
FOAM parallel run exiting
mpirun has exited due to process rank 2 with PID 5186 on
node green1-Satellite-A200.local exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
[blue-HP-Pavilion-g6-Notebook-PC:09692] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[blue-HP-Pavilion-g6-Notebook-PC:09692]  Set MCA parameter "orte_base_help_aggregate" to 0 to see all help /  error messages
It seems CPU [2] (slave computer) cannot access the required files that are on the main computer, but the file "points" is there in case/constant/polymesh and in processor0/constant/polymesh, processor1, etc. Is it that ssh doesnt give access to the main computer from the second?

Also, I must admit I don't understand this part of the tutorial:
"The < machines > file contains the names of the machines listed one machine per line. The names must correspond to a fully resolved hostname in the /etc/hosts file of the machine on which the openMPI is run."

Your help is very much appreciated.

Old   June 3, 2015, 22:59
Qiu Xiaoping
Hi, Fred

It seems CPU [2] (slave computer) cannot access the required files that are on the main computer, but the file "points" is there in case/constant/polymesh and in processor0/constant/polymesh, processor1, etc. Is it that ssh doesnt give access to the main computer from the second?
Yes, slave cannot access the mesh data on the master, that's why mpirun complain " Cannot find file "points" in directory "polyMesh" "。You should put your case data in a space which can be accessed by both master and slaves, one way is to share a directory (say,"$HOME/shares") on your master to all of your slaves machines. (run command like this as root on your slave machines: mount -t nfs master:$HOME/shares ~/shares ).

Also, I must admit I don't understand this part of the tutorial:
"The < machines > file contains the names of the machines listed one machine per line. The names must correspond to a fully resolved hostname in the /etc/hosts file of the machine on which the openMPI is run."
It means that you should make a map between your machine name and IP address of your machines in the /etc/hosts file , for example
Code:     green1      blue
parallel computing

