|
[Sponsors] |
January 28, 2008, 10:19 |
Dear All,
I'm starting to u
|
#1 |
Senior Member
Gavin Tabor
Join Date: Mar 2009
Posts: 181
Rep Power: 17 |
Dear All,
I'm starting to use OpenFOAM on a new machine. Does anyone have any experience with using OpenFOAM with Sun Grid Engine? Comments on this would be useful; submission scripts would be _really_ useful. Gavin |
|
January 28, 2008, 10:40 |
Getting OpenFOAM working with
|
#2 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Getting OpenFOAM working with OpenMPI and GridEngine okay (much, much better than trying to get LAM working).
1. Check that the OPAL_PREFIX is properly set by your Foam installation. 2. Assuming that you don't have the OpenFOAM settings being sourced within your bashrc/cshrc, or you are using sh/ksh, the job script should include this sourcing information. I've attached a script snippet qFoam-snippet that should help get you going. The snippet CANNOT be used as is. I'd rather not send the entire script, since there a number of interdependencies with our site-specific scripting and it would likely be too confusing anyhow. Since I have it set up to run in the cwd, there is no need to pass the root/case information to the script, but you do need to tell it which application should run. You will not only need site-specific changes, you will also notice funny looking "%{STUFF}" constructs throughout. These placeholders are replaced with the appropriate environment variables to create the final job script. There is also some odd bits with an "etc/" directory. This is simply a link to the appropriate OpenFOAM-VERSION/.OpenFOAM-VERSION directory. |
|
January 28, 2008, 11:00 |
Hi Gavin,
I use SGE job sch
|
#3 |
Member
Luca M.
Join Date: Mar 2009
Location: Luzern, Switzerland
Posts: 59
Rep Power: 17 |
Hi Gavin,
I use SGE job scheduler with OpenFOAM in our cluster. I wrote this rules: PeHostfile2MachineFile() { cat $1 | while read line; do # echo $line host=`echo $line|cut -f1 -d" "|cut -f1 -d"."` nslots=`echo $line|cut -f2 -d" "` i=1 # while [ $i -le $nslots ]; do # # add here code to map regular hostnames into ATM hostnames echo $host cpu=$nslots # i=`expr $i + 1` # done done } touch OFmachines PeHostfile2MachineFile $1 | cat >> OFmachines mhost=`echo $2|cut -f1 -d"."` echo $mhost >> mhost with this batch that creates the SGE script: #!/bin/bash echo Enter a casename: read casename echo "Enter definition WDir:" read Wdir #echo Enter Solver : #read Solver echo "Number of processors:" read cpunumb # if [ $cpunumb = "1" ]; then touch Foam-$casename.sh chmod +x Foam-$casename.sh echo '#!/bin/bash' >> Foam-$casename.sh echo '### SGE ###' >> Foam-$casename.sh echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh echo 'read masthost <mhost'>> Foam-$casename.sh echo 'ssh $masthost "cd $PWD;'SteadyCompFoam' '$Wdir' '$casename' "' >> OFoam-$casename.sh echo 'rm -f OFmachines' >> Foam-$casename.sh echo 'rm -f mhost' >> Foam-$casename.sh echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25. q Foam-$casename.sh else touch Foam-$casename.sh chmod +x Foam-$casename.sh echo '#!/bin/bash' >> Foam-$casename.sh echo '### SGE ###' >> Foam-$casename.sh echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh echo 'read masthost <mhost'>> Foam-$casename.sh echo 'ssh $masthost "export LAMRSH=ssh;cd $PWD;lamboot -v -s OFmachines"' >> Foam-$c asename.sh echo 'ssh $masthost "cd $PWD;mpirun -np '$cpunumb' 'SteadyCompFoam' '$Wdir' '$casename' -parallel" ' >> Foam-$casename.sh echo 'ssh $masthost "cd $PWD;lamhalt -d"' >> Foam-$c asename.sh echo 'rm -f OFmachines' >> Foam-$casename.sh echo 'rm -f mhost' >> Foam-$casename.sh echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25. q Foam-$casename.sh fi The stuff works with LAM mpi libraries. You can submit the job but at the moment you have to stop your calculation by the controlDict and not by the qmon interface. We can start from this to develop a better one Luca |
|
January 30, 2008, 06:30 |
Dear Luca, Mark,
Thanks for
|
#4 |
Senior Member
Gavin Tabor
Join Date: Mar 2009
Posts: 181
Rep Power: 17 |
Dear Luca, Mark,
Thanks for your scripts - I can kind of make sense of them!! I've managed to get things running now for single-processor jobs using a simplified version of what you suggest. For the parallel running case; am I right that $nslots is a variable giving the number of processors being allocated for the parallel run? How is this being set in SGE? Gavin |
|
January 30, 2008, 08:39 |
The $NSLOT (all uppercase) is
|
#5 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
The $NSLOT (all uppercase) is set by GridEngine. The qsub manpage is the best starting point for finding out more about which env variables are used.
Based on personal experience, I would really try to avoid LAM with GridEngine and use OpenMPI instead. BTW: killing the job via qdel (or qmon) works fine (it doesn't leave around any half-dead processes), but obviously won't have OpenFOAM write results before exiting. Using the '-notify' option for qsub would give you a chance to trap the signals. But apart from some OpenMPI issues in the past, it is not certain that a particular OpenFOAM solver could finish its iteration *and* write the results before the true kill signal gets sent. Increasing the notify period before pulling the plug maynot be the correct answer either. For the moment, I've modified a few solvers to recognize the presence of an 'ABORT' file and quit and write if it exists. This is usually quite a bit easier than modifying the controlDict. I think there is another solution, but still need to think about it a bit. /mark |
|
February 7, 2008, 13:32 |
hello,
I am also trying to
|
#6 |
Member
nicolas
Join Date: Mar 2009
Location: Glasgow
Posts: 42
Rep Power: 17 |
hello,
I am also trying to use qsub to run in parallel using 4 cpus. My command is: qsub -q queue_name.q sge_script.sh in sge_script.sh: source /net/c3m/opt/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/bashrc source /net/c3m/opt/OpenFOAM/OpenFOAM-1.4.1/.bashrc mpirun -np 4 simpleFoam .. case -parallel This for some reasons does not work. I get this output: error: executing task of job 25585 failed: [c4n26:16009] ERROR: A daemon on node c4n26 failed to start as expected. [c4n26:16009] ERROR: There may be more information available from [c4n26:16009] ERROR: the 'qstat -t' command on the Grid Engine tasks. [c4n26:16009] ERROR: If the problem persists, please restart the [c4n26:16009] ERROR: Grid Engine PE job [c4n26:16009] ERROR: The daemon exited unexpectedly with status 1. This occurs only if i use mpirun, if i use the same command in serial (simpleFoam .. case) it works ok. Also, if I ssh into the node and start the script there it runs ok. Nicolas |
|
February 8, 2008, 04:10 |
Nishant,
I am re-directing
|
#7 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Nishant,
I am re-directing your thread ( http://www.cfd-online.com/cgi-bin/Op...how.cgi?1/6598 ) to here, since this is where the relevant information is. If you read the thread, you'll notice that my response (with the qFoam snippet) addressed running with OpenMPI, whereas the information from Luca was for LAM. If you are not using LAM, then you don't need any of that stuff and don't need to worry about it. The qFoam snippet is a template run script. The '%{STUFF}' placeholders must be replaced with the relevant information before it can be submitted with the usual qsub -pe NAME slots. How exactly you wish to use the template to create your job script is left to you. Some people might want an interactive solution (like Luca showed) others might want to wrap it with Perl, Python or Ruby. We generally use Perl to create the final shell script and feed it to qsub via stdin. From you original question about using something like "mpirun -machinefile machine -np 4 case root etc". Why do you want to generate a machine file and specify the number of processes? This is the purpose of the OpenMPI and GridEngine integration and you are ignoring it. As you can also see from the qFoam snippet, there is no need to use -machinefile or -np when using OpenMPI and GridEngine. All the bits are already done for you. Have you already consulted your site support people? |
|
February 9, 2008, 08:40 |
Thanks Mark.
I hope this sho
|
#8 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Thanks Mark.
I hope this should help. I will update you soon in this regard. Nishant
__________________
Thanks and regards, Nishant |
|
February 9, 2008, 12:42 |
Hi marks..
I edited my
|
#9 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Hi marks..
I edited my qfoam-snippet.sh file as under and run it. The output is suggesting error at line containing __DATA__ and line 25-26. Please see my file and suggest required editing. 17 rootName=interFoam 18 caseName=$PWD/dam-dumy 19 jobName=$caseName 20 # avoid generic names 21 case "$jobName" in 22 foam | OpenFOAM ) 23 jobName=$(dirname $PWD) 24 jobName=$(basename $jobName) 25 ;; 26 ecas 27 28 # ---------------------------------------- 29 # OpenFOAM (re)initialization 30 # 31 unset FOAM_SILENT 32 FOAM_INST_DIR=$HOME/$WM_PROJECT 33 FOAM_ETC=$WM_PROJECT_DIR/.OpenFOAM-1.4.1 34 35 # source based on parallel environment 36 for i in $FOAM_ETC/bashrc-$PE $FOAM_ETC/bashrc qfoam-snippet.txt
__________________
Thanks and regards, Nishant |
|
February 9, 2008, 15:25 |
sorry but the error posted on
|
#10 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
sorry but the error posted on my last message somes when I tried running on single proccessor, using qsub qfoam-sippet.sh
When I try running it on 4 processor, using qsub -pe qfoam-sippet.sh 4 with modified attached script, job didnt get submitted on the cluster. please see the script. qfoam-snippet.txt
__________________
Thanks and regards, Nishant |
|
February 11, 2008, 04:07 |
The __DATA__ is a remnant from
|
#11 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
The __DATA__ is a remnant from the original Perl wrapper and should be deleted.
The "ecas" is a typo from a last minute edit and should obviously be "esac". The idea of the snippet was to give an idea of what to do, not to provide a finished solution. |
|
February 11, 2008, 16:24 |
Now the error is changed to:
|
#12 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Now the error is changed to:
(EE) /.OpenFOAM-1.4.1/bashrc cannot be found Actually I sourced my code with .bashrc. how can I make this code to run on cluster now? Nishant
__________________
Thanks and regards, Nishant |
|
February 14, 2008, 05:15 |
any ideas on my problem descri
|
#13 |
Member
nicolas
Join Date: Mar 2009
Location: Glasgow
Posts: 42
Rep Power: 17 |
any ideas on my problem described above?
Thanks, Nicolas |
|
February 15, 2008, 14:27 |
Hello,
I finally fixed my p
|
#14 |
Member
nicolas
Join Date: Mar 2009
Location: Glasgow
Posts: 42
Rep Power: 17 |
Hello,
I finally fixed my problem. 'mpirun' should be replaced by 'mpirun -prefix $OPENMPI_ARCH_PATH' Nicolas |
|
February 20, 2008, 13:15 |
Hi ..
i am using $PE mpich
|
#15 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Hi ..
i am using $PE mpich for runing paralell damBreak problem on 4 processor. However I am getting this error: Got 4 processors. Machines: mpirun --prefix ~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ -np 4 -machinefile /tmp/802.1.parallel.q/machines interFoam . dam-dumy -parallel [comp20:32553] mca: base: component_find: unable to open paffinity linux: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open ns proxy: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open ns replica: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open errmgr hnp: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open errmgr orted: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open errmgr proxy: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open rml oob: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open gpr null: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open gpr proxy: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open gpr replica: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open sds env: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open sds pipe: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open sds seed: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open sds singleton: file not found (ignored) [comp20:32553] mca: base: component_find: unable to open sds slurm: file not found (ignored) [comp20:32553] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 214 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_sds_base_select failed --> Returned value -13 instead of ORTE_SUCCESS -------------------------------------------------------------------------- [comp20:32553] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42 [comp20:32553] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52 -------------------------------------------------------------------------- Open RTE was unable to initialize properly. The error occured while attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS. can anybody help?
__________________
Thanks and regards, Nishant |
|
February 21, 2008, 08:52 |
After providing path to the re
|
#16 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
After providing path to the relevant library file of OF, Now I am getting this error:-
Got 4 processors. Machines: [comp30:06445] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 214 -------------------------------------------------------------------------- Sorry! You were supposed to get help about: orte_init:startup:internal-failure from the file: help-orte-runtime But I couldn't find any file matching that name. Sorry! -------------------------------------------------------------------------- [comp30:06445] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42 [comp30:06445] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52 -------------------------------------------------------------------------- Sorry! You were supposed to get help about: orterun:init-failure from the file: help-orterun.txt But I couldn't find any file matching that name. Sorry! -------------------------------------------------------------------------- can anybody comment on it? nishant
__________________
Thanks and regards, Nishant |
|
February 21, 2008, 09:02 |
If OPAL_PREFIX is set, the fil
|
#17 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
If OPAL_PREFIX is set, the file should be found.
|
|
February 21, 2008, 10:08 |
My script file contains these
|
#18 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
My script file contains these lines:
#!/bin/sh # # Your job name #$ -N OMPI_Dumy # # Use current working directory #$ -cwd # # Join stdout and stderr #$ -j y # # pe request for MPICH. Set your number of processors here. # Make sure you use the "mpich" parallel environemnt. #$ -pe mpich 4 # # Run job through bash shell #$ -S /bin/bash # # The following is for reporting only. It is not really needed # to run the job. It will show up in your output file. echo "Got $NSLOTS processors." echo "Machines:" # # These exports needed for OpenMPI, when using the command line # these are set with the modules command, but with SGE scripts # we assume nothing! export PATH=~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/bin:$ PATH export LD_LIBRARY_PATH=~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64Gcc DPOpt/lib:$PATH # Use full pathname to make sure we are using the right mpirun # Need to use prefix for nodes ~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/bin/mpirun --prefix ~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ -np $NSLOTS -machinefile ~/.mpich/mpich_hosts.$JOB_ID interFoam . dam-dumy -parallel Did i set the OPAL_PREFIX right? Nishant
__________________
Thanks and regards, Nishant |
|
February 21, 2008, 10:12 |
i think I am doing something w
|
#19 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
i think I am doing something wrong with the path of machinefile/hostfile. Don't I? The current path is for MPICH hostfle path. Should it be openfoam's open mpi path? can you tell me something about that?
Nishant
__________________
Thanks and regards, Nishant |
|
February 25, 2008, 09:59 |
While I was trying to debug th
|
#20 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
While I was trying to debug the problem on my sge cluster, I used the ompi_info command to see the problem.
The error registered by ompi_info is: ompi_info: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory Can anybody tell me, whats going wrong now? Nishant
__________________
Thanks and regards, Nishant |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CFX integration with Sun Grid Engine | mausmi | CFX | 2 | February 4, 2016 17:30 |
Grid Engine OpenFOAM15dev and OpenMPI124 | tian | OpenFOAM Installation | 11 | February 26, 2009 11:43 |
Running parallel job using qsub on sun grid engine | nishant_hull | OpenFOAM Running, Solving & CFD | 5 | February 7, 2008 15:52 |
IC engine | Araz Banaeizadeh | Main CFD Forum | 0 | June 28, 2006 23:56 |
CFX and Sun Grid Engine | David Hargreaves | CFX | 1 | August 26, 2005 00:50 |