|
[Sponsors] |
April 22, 2016, 12:22 |
Run OpenFoam in 2 nodes of a cluster
|
#1 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Hi I'm trying to set and run a simulation of OpenFoam in a cluster (2nodes 16cores per node).
I'm using qsub and a pbs script. However if I don't specify a host file it runs 32 thread in the first node. When I specify an host file it doesn't run. These are the commands in my startjob.pbs: Code:
#!/bin/bash -l #PBS -N AOC_OF #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI cd $PBS_O_WORKDIR #mpirun --hostfile hosts.txt -np 32 simpleFoam -parallel > log.simpleFoam_1st mpirun --host node6,node7 -np 32 simpleFoam -parallel > log.simpleFoam_1st This is the hosts.txt file in the main folder (the same of startjob): Code:
node6 node7 thanks in advance, WhiteW Last edited by WhiteW; May 2, 2016 at 07:38. |
|
May 1, 2016, 06:35 |
|
#2 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Hi,
Has nobody tried to set OF to run on 2 or more nodes? Thanks, WhiteW |
|
May 1, 2016, 18:26 |
|
#3 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quick answer: Notes about running OpenFOAM in parallel - using foamExec might solve the issue. I know I have this written somewhere in a thread... here's an example: http://www.cfd-online.com/Forums/ope...tml#post504516 - post #7
Found it with the following on Google: Code:
site:cfd-online.com "wyldckat" "foamExec" parallel |
|
May 2, 2016, 07:37 |
|
#4 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Thanks for the help Bruno, I have added the path to foamExec in the comand, however it doesn't work.
When I run the startjob (qsub startjob.pbs) with inside the comand: Code:
mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/FoamExec simpleFoam -parallel > log.simpleFoam_1st When I try to run the mpi command directly from the frontend of the cluster, without using qsub, I have the following errors: [node7:12516] Error: unknown option "--tree-spawn" input in flex scanner failed [node6:55424] Error: unknown option "--tree-spawn" input in flex scanner failed Have I to install foam-extend? I have only OpenFOAM 2.3.0 installed in the cluster. Thanks, WhiteW |
|
May 3, 2016, 18:06 |
|
#5 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quick answer: I suspect that the problem will reveal itself if you use this command:
Code:
mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
|
|
May 4, 2016, 04:24 |
|
#6 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Thanks for the explanation!
I have run the correct command and now the error is written in the logfile; however it reports the same error: Code:
[node7:21261] Error: unknown option "--tree-spawn" input in flex scanner failed [node6:67200] Error: unknown option "--tree-spawn" input in flex scanner failed WhiteW |
|
May 6, 2016, 05:23 |
|
#7 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Hi banji. To write the output in a file you have to run:
Code:
mpirun -np 4 pisoFoamIPM -parallel >logfile.txt -------------------------------------------------------------------------------------------------------------------------------------------------------- I'm still trying to run OF on the two nodes of the cluster, still no success. Using: Code:
mpirun -np 32 --hostfile hosts2.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 Code:
/home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec: line 145: exec: simpleFoam: not found Code:
mpirun --hostfile hosts2.txt --prefix /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/ /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st >2&1 Code:
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory -------------------------------------------------------------------------- A daemon (pid 33250) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -------------------------------------------------------------------------- node6 - daemon did not report back when launched node7 - daemon did not report back when launched bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory Someone solved, adding in the first line of bashrc: Code:
source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc Are there other solutions to solve the problem? Thanks, WhiteW Last edited by wyldckat; May 8, 2016 at 16:23. Reason: see "Moderator note:" |
|
May 6, 2016, 05:34 |
|
#8 | |
Member
Olabanji
Join Date: Jan 2013
Location: U.S.A
Posts: 31
Rep Power: 13 |
Quote:
Last edited by wyldckat; May 8, 2016 at 16:19. Reason: removed excess part of the quote and left only the essential part |
||
May 8, 2016, 16:13 |
|
#9 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quote:
Please run the following commands, which will populate the "prefs.sh" file at OpenFOAM's "etc" folder: Code:
cd /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/ echo export WM_NCOMPPROCS=4 > prefs.sh echo export foamCompiler=ThirdParty >> prefs.sh echo export WM_COMPILER=Gcc48 >> prefs.sh echo export WM_MPLIB=SYSTEMOPENMPI >> prefs.sh As for foamExec, there is a small fix that might be useful to do as well:
|
||
May 10, 2016, 09:46 |
|
#10 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Hi, I have written the modifications you suggested.
Now using the source in bashrc (source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc) and the modifications in foamExec and prefs.sh Error: Code:
/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found Code:
libDir=`mpicc --showme:link | sed -e 's/.*-L\([^ ]*\).*/\1/'` Then I have compiled prefs.sh with chmod +x and run. The error now is: Code:
home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_util_nidmap_init failed --> Returned value Data unpack had inadequate space (-25) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Data unpack had inadequate space (-25) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Data unpack had inadequate space" (-25) instead of "Success" (0) -------------------------------------------------------------------------- [node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35896] *** An error occurred in MPI_Init [node3:35896] *** on a NULL communicator [node3:35896] *** Unknown error [node3:35896] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort -------------------------------------------------------------------------- An MPI process is aborting at a time when it cannot guarantee that all of its peer processes in the job will be killed properly. You should double check that everything has shut down cleanly. Reason: Before MPI_INIT completed Local host: node3 PID: 35896 -------------------------------------------------------------------------- [node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 [node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117 [node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128 -------------------------------------------------------------------------- mpirun has exited due to process rank 19 with PID 35899 on node node3 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [frontend:20585] 9 more processes have sent help message help-orte-runtime.txt / orte_init:startup:internal-failure [frontend:20585] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [frontend:20585] 9 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure [frontend:20585] 9 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure [frontend:20585] 9 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle [frontend:20585] 9 more processes have sent help message help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed WhiteW |
|
May 10, 2016, 11:00 |
|
#11 | |
Senior Member
Mahdi Hosseinali
Join Date: Apr 2009
Location: NB, Canada
Posts: 273
Rep Power: 18 |
To run my simulations on a SunGridEngine I use the following script:
Quote:
|
||
May 10, 2016, 11:11 |
|
#12 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Hi anishtain4, yes, using pbs it works only if I sent the job in a node (not more that one).
Here is the setting of the pbs files: If I send the job from the frontend to 1 node it works: Code:
#!/bin/bash -l #PBS -N AOC_OF_14_M4 #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI cd $PBS_O_WORKDIR mpirun -np 32 simpleFoam -parallel > log.simpleFoam_1st Code:
#!/bin/bash -l #PBS -N AOC_OF_14_M4 #PBS -S /bin/bash #PBS -l nodes=1:ppn=16:NRM #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI cd $PBS_O_WORKDIR mpirun -np 16--hostfile hosts2.txt /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 |
|
May 10, 2016, 19:35 |
|
#13 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quick answer @WhiteW: Hopefully the following will do the trick:
|
|
May 11, 2016, 12:55 |
|
#14 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Thanks!
I have now added the string to foamExec. However when I run using qsub with the file startjob.pbs: Code:
#!/bin/bash -l #PBS -N AOC_OF_14_M4 #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 #PBS -l walltime=999:00:00 module load openmpi-x86_64 source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc cd $PBS_O_WORKDIR mpirun -np 32 --hostfile hosts2.txt $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 Code:
/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found Code:
node2 cpu=16 node3 cpu=16 |
|
May 13, 2016, 01:37 |
|
#15 | |
Senior Member
Mahdi Hosseinali
Join Date: Apr 2009
Location: NB, Canada
Posts: 273
Rep Power: 18 |
I'm not an expert in clusters, so these are things that I'm guessing:
1. what is the :NRM after your nodes when running on one node? I noticed it is missing when you are running with two nodes. It's not related? No? 2. I'm not sure if your host file is defined correctly or not? I think it needs to be a list of node names on your cluster, for example mine looks something like this: Quote:
|
||
May 13, 2016, 04:01 |
|
#16 |
Member
Join Date: Dec 2015
Posts: 74
Rep Power: 10 |
Hi anishtain4 thanks for the reply,
NRM means that the nodes are automatically assigned to the job, hence you don't have to specify specific nodes. The job starts correctly if I don't specify the option --hostfile hosts2.txt, however it assignes 32 thread to only one node, he does not divide the processes in two nodes (16 on node1 and 16 on node2) using both: Code:
#PBS -l nodes=node1:ppn=16+node2:ppn=16 Code:
#PBS -l nodes=2:ppn=16 Code:
mpirun -np 32 $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1 The error about "mpicc: command not found" seems to appear when I specify the --hostfile option. In the file hosts.txt I have written the name of the nodes as reportd in the file etc/hosts : Code:
... 192.168.0.1 node1 192.168.0.2 node2 192.168.0.3 node3 192.168.0.4 node4 192.168.0.5 node5 192.168.0.6 node6 192.168.0.7 node7 WhiteW |
|
December 20, 2016, 01:51 |
|
#17 |
Member
Avdeev Evgeniy
Join Date: Jan 2011
Location: Togliatty, Russia
Posts: 69
Blog Entries: 1
Rep Power: 21 |
Maybe will help to someone. I have fight with "laplacianFoam not found" environment variables problem and solved it by adding "source ~/.bashrc" to my PBS-file. My PBS-file now:
Code:
#!/bin/bash # #PBS -N flange #PBS -A studtlt #PBS -l nodes=1:ppn=8 #PBS -l walltime=30:00:00 #PBS -m ae cd $PBS_O_WORKDIR module load openmpi-1.4.5 source ~/.bashrc cd /home/studtlt/avdeev/flange mpirun -machinefile $PBS_NODEFILE -np $PBS_NP /home/studtlt/OpenFOAM/OpenFOAM-4.0/bin/foamExec laplacianFoam -parallel | tee -a log Code:
# .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi module load openmpi-1.4.5; source $HOME/OpenFOAM/OpenFOAM-4.0/etc/bashrc WM_LABEL_SIZE=64 WM_COMPILER_TYPE=ThirdParty FOAMY_HEX_MESH=yes # User specific aliases and functions #alias of40='module load openmpi-x86_64; source $HOME/OpenFOAM/OpenFOAM-4.0/etc/bashrc WM_LABEL_SIZE=64 WM_COMPILER_TYPE=ThirdParty FOAMY_HEX_MESH=yes' alias of40='module load openmpi-1.4.5; source $HOME/OpenFOAM/OpenFOAM-4.0/etc/bashrc WM_LABEL_SIZE=64 WM_COMPILER_TYPE=ThirdParty FOAMY_HEX_MESH=yes' |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to run SU2 on cluster of computers (How to specify nodes ?) | aero_amit | SU2 | 29 | February 24, 2020 14:44 |
Can not run OpenFOAM in parallel in clusters, help! | ripperjack | OpenFOAM Running, Solving & CFD | 5 | May 6, 2014 16:25 |
OpenFOAM solvers not able to run in parallel | raagh77 | OpenFOAM Installation | 5 | November 27, 2013 18:05 |
Something weird encountered when running OpenFOAM in parallel on multiple nodes | xpqiu | OpenFOAM Running, Solving & CFD | 2 | May 2, 2013 05:59 |
Unable to run OF in parallel on a multiple-node cluster | quartzian | OpenFOAM | 3 | November 24, 2009 14:37 |