Run OpenFoam in 2 nodes of a cluster

WhiteW · April 22, 2016, 12:22

Hi I'm trying to set and run a simulation of OpenFoam in a cluster (2nodes 16cores per node).
I'm using qsub and a pbs script. However if I don't specify a host file it runs 32 thread in the first node.
When I specify an host file it doesn't run.

These are the commands in my startjob.pbs:

Code:

#!/bin/bash -l
#PBS -N AOC_OF
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI
cd $PBS_O_WORKDIR


#mpirun --hostfile hosts.txt -np 32 simpleFoam -parallel > log.simpleFoam_1st
mpirun --host node6,node7 -np 32 simpleFoam -parallel > log.simpleFoam_1st

This is the hosts.txt file in the main folder (the same of startjob):

Code:

node6
node7

where I'm doing wrong?
thanks in advance,
WhiteW

WhiteW · May 1, 2016, 06:35

Hi,
Has nobody tried to set OF to run on 2 or more nodes?
Thanks,
WhiteW

wyldckat · May 1, 2016, 18:26

Quick answer: Notes about running OpenFOAM in parallel - using foamExec might solve the issue. I know I have this written somewhere in a thread... here's an example: http://www.cfd-online.com/Forums/ope...tml#post504516 - post #7

Found it with the following on Google:

Code:

site:cfd-online.com "wyldckat" "foamExec" parallel

WhiteW · May 2, 2016, 07:37

Thanks for the help Bruno, I have added the path to foamExec in the comand, however it doesn't work.

When I run the startjob (qsub startjob.pbs) with inside the comand:

Code:

mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/FoamExec simpleFoam -parallel > log.simpleFoam_1st

The process is running in qstat (label R), but checking "qstat -n" it seems to run on node1 and node2 (in machines.txt I have specified node6 and node7). However if I enter in node1 and node2 OF is not running and an empty logfile of 0kb has been created.

When I try to run the mpi command directly from the frontend of the cluster, without using qsub, I have the following errors:

[node7:12516] Error: unknown option "--tree-spawn"
input in flex scanner failed
[node6:55424] Error: unknown option "--tree-spawn"
input in flex scanner failed

Have I to install foam-extend? I have only OpenFOAM 2.3.0 installed in the cluster.
Thanks,
WhiteW

wyldckat · May 3, 2016, 18:06

Quick answer: I suspect that the problem will reveal itself if you use this command:

Code:

mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1

Because:

Your command had "FoamExec" instead of "foamExec".
At the end of the line I added this:
Code:
```
2>&1
```
which redirects the textual error output stream (2) into the standard text output stream (1). This way the actual error messages will be stored into the log file.

WhiteW · May 4, 2016, 04:24

Thanks for the explanation!
I have run the correct command and now the error is written in the logfile; however it reports the same error:

Code:

[node7:21261] Error: unknown option "--tree-spawn"
input in flex scanner failed
[node6:67200] Error: unknown option "--tree-spawn"
input in flex scanner failed

Could it be an OF problem or is it a mpirun issue?
WhiteW

WhiteW · May 6, 2016, 05:23

Hi banji. To write the output in a file you have to run:

Code:

mpirun -np 4 pisoFoamIPM -parallel >logfile.txt

[ Moderator note: this part of the text was copied to here: http://www.cfd-online.com/Forums/ope...tml#post599206 ]

--------------------------------------------------------------------------------------------------------------------------------------------------------

I'm still trying to run OF on the two nodes of the cluster, still no success.

Using:

Code:

mpirun -np 32 --hostfile hosts2.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1

Error:

Code:

/home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec: line 145: exec: simpleFoam: not found

Using the --prefix option:

Code:

mpirun --hostfile hosts2.txt --prefix /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/ /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st >2&1

Error:

Code:

bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 33250) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
    node6 - daemon did not report back when launched
    node7 - daemon did not report back when launched
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory

How to set correctly the LD_LIBRARY_PATH?

Someone solved, adding in the first line of bashrc:

Code:

source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc

However, this has no effect..
Are there other solutions to solve the problem?
Thanks,
WhiteW

banji · May 6, 2016, 05:34

Quote:

Originally Posted by WhiteW

[CODE]

Code:

bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 33250) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
    node6 - daemon did not report back when launched
    node7 - daemon did not report back when launched
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory

How to set correctly the LD_LIBRARY_PATH?

Someone solved, adding in the first line of bashrc:

Code:

source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc

However, this has no effect..
Are there other solutions to solve the problem?
Thanks,
WhiteW

I once had a similar problem and noticed it was as a result of conflicting libraries - local vs OpenFoam's. I'd suggest you recompile the source pack again (if this won't cause too much pain).

wyldckat · May 8, 2016, 16:13

Quote:

Originally Posted by WhiteW

Code:

source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc

However, this has no effect..
Are there other solutions to solve the problem?
Thanks,
WhiteW

Quick answer: I'm so sorry, I completely forgot about what installation instructions you were following

Please run the following commands, which will populate the "prefs.sh" file at OpenFOAM's "etc" folder:

Code:

cd /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/
echo export WM_NCOMPPROCS=4 > prefs.sh
echo export foamCompiler=ThirdParty >> prefs.sh
echo export WM_COMPILER=Gcc48 >> prefs.sh
echo export WM_MPLIB=SYSTEMOPENMPI >> prefs.sh

This will fix the problem with mpirun being a bit clueless about which specific settings to use with OpenFOAM.

As for foamExec, there is a small fix that might be useful to do as well:

Edit the file "bin/foamExec" inside the "OpenFOAM-2.3.0" folder.
In the very first line, change this:
Code:
```
#!/bin/sh
```
to this:
Code:
```
#!/bin/bash
```

This will ensure that the correct shell is used for launching OpenFOAM's utilities.

WhiteW · May 10, 2016, 09:46

Hi, I have written the modifications you suggested.
Now using the source in bashrc (source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc) and the modifications in foamExec and prefs.sh

Error:

Code:

/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found

That line of setting.sh reports:

Code:

libDir=`mpicc --showme:link | sed -e 's/.*-L\([^ ]*\).*/\1/'`

Then I have compiled prefs.sh with chmod +x and run. The error now is:

Code:

home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found
/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Data unpack had inadequate space (-25) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Data unpack had inadequate space (-25) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack had inadequate space" (-25) instead of "Success" (0)
--------------------------------------------------------------------------
[node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35896] *** An error occurred in MPI_Init
[node3:35896] *** on a NULL communicator
[node3:35896] *** Unknown error
[node3:35896] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: node3
  PID:        35896

--------------------------------------------------------------------------
[node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
--------------------------------------------------------------------------
mpirun has exited due to process rank 19 with PID 35899 on
node node3 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[frontend:20585] 9 more processes have sent help message help-orte-runtime.txt / orte_init:startup:internal-failure
[frontend:20585] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[frontend:20585] 9 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure
[frontend:20585] 9 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
[frontend:20585] 9 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[frontend:20585] 9 more processes have sent help message help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed

Thanks for the help!
WhiteW

anishtain4 · May 10, 2016, 11:00

To run my simulations on a SunGridEngine I use the following script:

Quote:

#$ -cwd
#$ -l h_rt=47:0:0
#$ -pe ompi* 20

module purge
module load gcc/4.6.4 openmpi/gcc openfoam/3.0
source $OPENFOAM/etc/bashrc

mpirun pimpleFoam -parallel

However you are running on a PBS which may have a different set of commands so the set up might be different (I guess it should only be the top part)

WhiteW · May 10, 2016, 11:11

Hi anishtain4, yes, using pbs it works only if I sent the job in a node (not more that one).
Here is the setting of the pbs files:

If I send the job from the frontend to 1 node it works:

Code:

#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI
cd $PBS_O_WORKDIR

mpirun -np 32 simpleFoam -parallel > log.simpleFoam_1st

Sending the job to two nodes it gives errors:

Code:

#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=16:NRM
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI
cd $PBS_O_WORKDIR

mpirun -np 16--hostfile hosts2.txt /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1

wyldckat · May 10, 2016, 19:35

Quick answer @WhiteW: Hopefully the following will do the trick:

Edit the file "foamExec" once again.
Near the end of the file you will find these two commands:
Code:
```
sourceRc
exec "$@"
```
Add before the first line the line needed for loading the respective module:
Code:
```
module load openmpi-x86_64
sourceRc
exec "$@"
```
Save the file and close the editor.

As for the job script, here is what I suggest that you use for each scenario, at least based on what you wrote:

One node:

Code:

#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
cd $PBS_O_WORKDIR

mpirun -np 32 simpleFoam -parallel > log.simpleFoam_1st 2>&1

Two nodes:

Code:

#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=16:NRM
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
cd $PBS_O_WORKDIR

mpirun -np 16 --hostfile hosts2.txt $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1

Notes:

You had a small flaw, because you had this:
Code:
```
16--hostfile
```
when it should be this:
Code:
```
16 --hostfile
```
Namely, a space was missing.
Using "$WM_PROJECT_DIR" is only to make it easier to read, because the path should expand automatically to the correct path. You can see this by running:
Code:
```
echo $WM_PROJECT_DIR
```

Nonetheless, I believe that there is something incorrectly configured in the line "#PBS -l nodes", at least based on what you provided.

WhiteW · May 11, 2016, 12:55

Thanks!
I have now added the string to foamExec.
However when I run using qsub with the file startjob.pbs:

Code:

#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
cd $PBS_O_WORKDIR

mpirun -np 32 --hostfile hosts2.txt $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1

I get the error in logfile:

Code:

/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found

The strange thing is that in hosts2.txt I have specified:

Code:

node2 cpu=16
node3 cpu=16

qstat -n tells me the processes are running on node1 and node2. However in these nodes there are no runing process.

anishtain4 · May 13, 2016, 01:37

I'm not an expert in clusters, so these are things that I'm guessing:

1. what is the :NRM after your nodes when running on one node? I noticed it is missing when you are running with two nodes. It's not related? No?

2. I'm not sure if your host file is defined correctly or not? I think it needs to be a list of node names on your cluster, for example mine looks something like this:

Quote:

"cl083.acenet.ca.7353"
"cl083.acenet.ca.7354"
"cl084.acenet.ca.26982"
"cl084.acenet.ca.26983"
...

WhiteW · May 13, 2016, 04:01

Hi anishtain4 thanks for the reply,
NRM means that the nodes are automatically assigned to the job, hence you don't have to specify specific nodes.
The job starts correctly if I don't specify the option --hostfile hosts2.txt, however it assignes 32 thread to only one node, he does not divide the processes in two nodes (16 on node1 and 16 on node2)

using both:

Code:

#PBS -l nodes=node1:ppn=16+node2:ppn=16

or

Code:

#PBS -l nodes=2:ppn=16

and

Code:

mpirun -np 32 $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1

the job starts with 32 processes in the first node.
The error about "mpicc: command not found" seems to appear when I specify the --hostfile option.

In the file hosts.txt I have written the name of the nodes as reportd in the file etc/hosts :

Code:

...
192.168.0.1     node1
192.168.0.2     node2
192.168.0.3     node3
192.168.0.4     node4
192.168.0.5     node5
192.168.0.6     node6
192.168.0.7     node7

WhiteW

j-avdeev · December 20, 2016, 01:51

Maybe will help to someone. I have fight with "laplacianFoam not found" environment variables problem and solved it by adding "source ~/.bashrc" to my PBS-file. My PBS-file now:

Code:

#!/bin/bash
#
#PBS -N flange
#PBS -A studtlt
#PBS -l nodes=1:ppn=8
#PBS -l walltime=30:00:00
#PBS -m ae
cd $PBS_O_WORKDIR
module load openmpi-1.4.5
source ~/.bashrc
cd /home/studtlt/avdeev/flange
mpirun -machinefile $PBS_NODEFILE -np $PBS_NP /home/studtlt/OpenFOAM/OpenFOAM-4.0/bin/foamExec laplacianFoam -parallel | tee -a log

and ~/.bashrc has inside

Code:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
module load openmpi-1.4.5; source $HOME/OpenFOAM/OpenFOAM-4.0/etc/bashrc WM_LABEL_SIZE=64 WM_COMPILER_TYPE=ThirdParty FOAMY_HEX_MESH=yes

# User specific aliases and functions
#alias of40='module load openmpi-x86_64; source $HOME/OpenFOAM/OpenFOAM-4.0/etc/bashrc WM_LABEL_SIZE=64 WM_COMPILER_TYPE=ThirdParty FOAMY_HEX_MESH=yes'
alias of40='module load openmpi-1.4.5; source $HOME/OpenFOAM/OpenFOAM-4.0/etc/bashrc WM_LABEL_SIZE=64 WM_COMPILER_TYPE=ThirdParty FOAMY_HEX_MESH=yes'

April 22, 2016, 12:22	Run OpenFoam in 2 nodes of a cluster	#1
WhiteW Member Join Date: Dec 2015 Posts: 74 Rep Power: 11	Hi I'm trying to set and run a simulation of OpenFoam in a cluster (2nodes 16cores per node). I'm using qsub and a pbs script. However if I don't specify a host file it runs 32 thread in the first node. When I specify an host file it doesn't run. These are the commands in my startjob.pbs: Code: #!/bin/bash -l #PBS -N AOC_OF #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI cd $PBS_O_WORKDIR #mpirun --hostfile hosts.txt -np 32 simpleFoam -parallel > log.simpleFoam_1st mpirun --host node6,node7 -np 32 simpleFoam -parallel > log.simpleFoam_1st This is the hosts.txt file in the main folder (the same of startjob): Code: node6 node7 where I'm doing wrong? thanks in advance, WhiteW Last edited by WhiteW; May 2, 2016 at 07:38.

May 1, 2016, 18:26		#3
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Quick answer: Notes about running OpenFOAM in parallel - using foamExec might solve the issue. I know I have this written somewhere in a thread... here's an example: http://www.cfd-online.com/Forums/ope...tml#post504516 - post #7 Found it with the following on Google: Code: site:cfd-online.com "wyldckat" "foamExec" parallel

May 2, 2016, 07:37		#4
WhiteW Member Join Date: Dec 2015 Posts: 74 Rep Power: 11	Thanks for the help Bruno, I have added the path to foamExec in the comand, however it doesn't work. When I run the startjob (qsub startjob.pbs) with inside the comand: Code: mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/FoamExec simpleFoam -parallel > log.simpleFoam_1st The process is running in qstat (label R), but checking "qstat -n" it seems to run on node1 and node2 (in machines.txt I have specified node6 and node7). However if I enter in node1 and node2 OF is not running and an empty logfile of 0kb has been created. When I try to run the mpi command directly from the frontend of the cluster, without using qsub, I have the following errors: [node7:12516] Error: unknown option "--tree-spawn" input in flex scanner failed [node6:55424] Error: unknown option "--tree-spawn" input in flex scanner failed Have I to install foam-extend? I have only OpenFOAM 2.3.0 installed in the cluster. Thanks, WhiteW

May 3, 2016, 18:06		#5
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Quick answer: I suspect that the problem will reveal itself if you use this command: Code: mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 Because: Your command had "FoamExec" instead of "foamExec". At the end of the line I added this: Code: 2>&1 which redirects the textual error output stream (2) into the standard text output stream (1). This way the actual error messages will be stored into the log file.

May 4, 2016, 04:24		#6
WhiteW Member Join Date: Dec 2015 Posts: 74 Rep Power: 11	Thanks for the explanation! I have run the correct command and now the error is written in the logfile; however it reports the same error: Code: [node7:21261] Error: unknown option "--tree-spawn" input in flex scanner failed [node6:67200] Error: unknown option "--tree-spawn" input in flex scanner failed Could it be an OF problem or is it a mpirun issue? WhiteW

May 1, 2016, 06:35		#2
WhiteW Member Join Date: Dec 2015 Posts: 74 Rep Power: 11	Hi, Has nobody tried to set OF to run on 2 or more nodes? Thanks, WhiteW

May 10, 2016, 19:35		#13
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Quick answer @WhiteW: Hopefully the following will do the trick: Edit the file "foamExec" once again. Near the end of the file you will find these two commands: Code: sourceRc exec "$@" Add before the first line the line needed for loading the respective module: Code: module load openmpi-x86_64 sourceRc exec "$@" Save the file and close the editor. As for the job script, here is what I suggest that you use for each scenario, at least based on what you wrote: One node: Code: #!/bin/bash -l #PBS -N AOC_OF_14_M4 #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc cd $PBS_O_WORKDIR mpirun -np 32 simpleFoam -parallel > log.simpleFoam_1st 2>&1 Two nodes: Code: #!/bin/bash -l #PBS -N AOC_OF_14_M4 #PBS -S /bin/bash #PBS -l nodes=1:ppn=16:NRM #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc cd $PBS_O_WORKDIR mpirun -np 16 --hostfile hosts2.txt $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 Notes: You had a small flaw, because you had this: Code: 16--hostfile when it should be this: Code: 16 --hostfile Namely, a space was missing. Using "$WM_PROJECT_DIR" is only to make it easier to read, because the path should expand automatically to the correct path. You can see this by running: Code: echo $WM_PROJECT_DIR Nonetheless, I believe that there is something incorrectly configured in the line "#PBS -l nodes", at least based on what you provided.

May 13, 2016, 04:01		#16
WhiteW Member Join Date: Dec 2015 Posts: 74 Rep Power: 11	Hi anishtain4 thanks for the reply, NRM means that the nodes are automatically assigned to the job, hence you don't have to specify specific nodes. The job starts correctly if I don't specify the option --hostfile hosts2.txt, however it assignes 32 thread to only one node, he does not divide the processes in two nodes (16 on node1 and 16 on node2) using both: Code: #PBS -l nodes=node1:ppn=16+node2:ppn=16 or Code: #PBS -l nodes=2:ppn=16 and Code: mpirun -np 32 $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 the job starts with 32 processes in the first node. The error about "mpicc: command not found" seems to appear when I specify the --hostfile option. In the file hosts.txt I have written the name of the nodes as reportd in the file etc/hosts : Code: ... 192.168.0.1 node1 192.168.0.2 node2 192.168.0.3 node3 192.168.0.4 node4 192.168.0.5 node5 192.168.0.6 node6 192.168.0.7 node7 WhiteW

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to run SU2 on cluster of computers (How to specify nodes ?)	aero_amit	SU2	29	February 24, 2020 14:44
Can not run OpenFOAM in parallel in clusters, help!	ripperjack	OpenFOAM Running, Solving & CFD	5	May 6, 2014 16:25
OpenFOAM solvers not able to run in parallel	raagh77	OpenFOAM Installation	5	November 27, 2013 18:05
Something weird encountered when running OpenFOAM in parallel on multiple nodes	xpqiu	OpenFOAM Running, Solving & CFD	2	May 2, 2013 05:59
Unable to run OF in parallel on a multiple-node cluster	quartzian	OpenFOAM	3	November 24, 2009 14:37