|
[Sponsors] |
Running parallel job using qsub on sun grid engine |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 6, 2008, 16:47 |
Hi all
I need some help from
|
#1 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Hi all
I need some help from you about some open Mpi problem. I am trying to run a program on my AMD64 cluster at university computation facilty. My problem is running fine using command: mpirun -machinefile machine -np 4 case root etc where -machinefile is a manually generated script. But I am trying to run it on cluster using qsub command with automatically allocated machines (not the master node necessarily). for this I write this qsub-file script. I used mpich here and write hostfile/machinefile. #!/bin/sh #$ -N MPICH_JOB #$ -cwd # Join stdout and stderr #$ -j y # pe request for MPICH. Set your number of processors here. # Make sure you use the "mpich" parallel environemnt. #$ -pe mpich 4 # # Run job through bash shell #$ -S /bin/bash # # The following is for reporting only. It is not really needed # to run the job. It will show up in your output file. echo "Got $NSLOTS processors." echo "Machines:" # add here code to map regular hostnames into ATM hostnames #echo $TMPDIR/machines cat $PE_HOSTFILE mpirun -machinefile machine -np 4 case root etc From this script, I am getting the hostfile name in this format: comp03.dcs.hull.ac.uk 1 parallel.q@comp03.dcs.hull.ac.uk <null> comp29.dcs.hull.ac.uk 1 parallel.q@comp29.dcs.hull.ac.uk <null> comp11.dcs.hull.ac.uk 1 parallel.q@comp11.dcs.hull.ac.uk <null> comp09.dcs.hull.ac.uk 1 parallel.q@comp09.dcs.hull.ac.uk <null> But my open Mpi implementation need it in this way:- comp00.dcs.hull.ac.uk slots=2 max-slots=2 comp03.dcs.hull.ac.uk slots=2 max-slots=2 comp04.dcs.hull.ac.uk slots=2 max-slots=2 comp05.dcs.hull.ac.uk slots=2 max-slots=2 Can you please suggest me something about it? If there is any material to read or so then let me know. Any kind of help will be helpful. Also, I like to ask from the experts, Is this possible with the current code? looking forward to your help in this regard. with warm regards, Nishant Singh
__________________
Thanks and regards, Nishant |
|
February 7, 2008, 04:33 |
Nishant,
To run parallel Op
|
#2 |
Member
Michele Vascellari
Join Date: Mar 2009
Posts: 70
Rep Power: 17 |
Nishant,
To run parallel OpenFoam jobs under qsub (Torque version) I use the following script: #!/bin/bash #PBS -N damBreakFine #PBS -l nodes=4 CASE=damBreakFine SOLVER=interFoam CURDIR=$HOME/OpenFOAM/michele-1.4.1/run/tutorials/interFoam cd $CURDIR mpirun --machinefile $PBS_NODEFILE $SOLVER $CURDIR $CASE -parallel The variable $PBS_NODEFILE defines the path of the file where the nodes used for the run are stored. Generally using qsub command you don't know which nodes will be used for the run, so you can not define at priori the machine file. Michele |
|
February 7, 2008, 08:02 |
Thanks Michele,
Unfortunat
|
#3 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Thanks Michele,
Unfortunately my cluster is not pbs supported. As you can see my script. Can you suggest something which could replace $PBS_NODEFILE for my case. Or else, Is there any way to make cluster to support pbs script? Nishant
__________________
Thanks and regards, Nishant |
|
February 7, 2008, 09:59 |
Sorry Nushant,
I don't obse
|
#4 |
Member
Michele Vascellari
Join Date: Mar 2009
Posts: 70
Rep Power: 17 |
Sorry Nushant,
I don't observe that you're using grid engine as resource manager and not torque. I'm sorry, but I don't have any experience on qsub on grid engine. Michele |
|
February 7, 2008, 10:14 |
Nishant,
What was so wrong
|
#5 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Nishant,
What was so wrong with the old thread ( http://www.cfd-online.com/cgi-bin/OpenFOAM_Discus/show.cgi?1/6504 ) that warranted starting a completely new thread for this discussion? IMO it gave fairly reasonable reasonable information and was not exactly out-of-date. |
|
February 7, 2008, 15:52 |
Hi Mark
Thanks for the rep
|
#6 |
Senior Member
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17 |
Hi Mark
Thanks for the reply. In fact I go through that as well. But I can not understand those codes at first hand. I would appreciate if you can please brief me, how to run parallel foam cases on SGE cluster using QSUB command. I can see some piece of code there but I can not exactly figure out how to implement it in my case. I am briefing you wot I undersatnd out of it. Actually I do not exactly get what this piece of code is doing here? PeHostfile2MachineFile() { cat $1 | while read line; do # echo $line host=`echo $line|cut -f1 -d" "|cut -f1 -d"."` nslots=`echo $line|cut -f2 -d" "` i=1 # while [ $i -le $nslots ]; do # # add here code to map regular hostnames into ATM hostnames echo $host cpu=$nslots # i=`expr $i + 1` # done done } touch OFmachines PeHostfile2MachineFile $1 | cat >> OFmachines mhost=`echo $2|cut -f1 -d"."` echo $mhost >> mhost AGAIN, I do not understand why qFoam-Snippet is required and where to use it. Bcoz I am actually looking for just a qsub run script. Sorry If I sound very naive. I understand a bit of the piece of code underneath, which says:- #!/bin/bash echo Enter a casename: read casename echo "Enter definition WDir:" read Wdir #echo Enter Solver : #read Solver echo "Number of processors:" read cpunumb # if [ $cpunumb = "1" ]; then touch Foam-$casename.sh chmod +x Foam-$casename.sh echo '#!/bin/bash' >> Foam-$casename.sh echo '### SGE ###' >> Foam-$casename.sh echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh echo 'read masthost <mhost'>> Foam-$casename.sh echo 'ssh $masthost "cd $PWD;'SteadyCompFoam' '$Wdir' '$casename' "' >> OFoam-$casename.sh echo 'rm -f OFmachines' >> Foam-$casename.sh echo 'rm -f mhost' >> Foam-$casename.sh echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25. q Foam-$casename.sh else touch Foam-$casename.sh chmod +x Foam-$casename.sh echo '#!/bin/bash' >> Foam-$casename.sh echo '### SGE ###' >> Foam-$casename.sh echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh echo 'read masthost <mhost'>> Foam-$casename.sh echo 'ssh $masthost "export LAMRSH=ssh;cd $PWD;lamboot -v -s OFmachines"' >> Foam-$c asename.sh echo 'ssh $masthost "cd $PWD;mpirun -np '$cpunumb' 'SteadyCompFoam' '$Wdir' '$casename' -parallel" ' >> Foam-$casename.sh echo 'ssh $masthost "cd $PWD;lamhalt -d"' >> Foam-$c asename.sh echo 'rm -f OFmachines' >> Foam-$casename.sh echo 'rm -f mhost' >> Foam-$casename.sh echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25. q Foam-$casename.sh fi BUT I DONT GET, How it can help in my case. What OFnet means? Also it is for LAM implementation and I am using OpenMpi. Please suggest, How can I can proceed here? Nishant
__________________
Thanks and regards, Nishant |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Parallel running of Fluent | Bhanu Gupta | FLUENT | 3 | April 7, 2011 10:32 |
Running in parallel | Rasmus Gjesing (Gjesing) | OpenFOAM | 35 | March 31, 2011 19:21 |
Problem in running Parallel | mamaly60 | OpenFOAM Running, Solving & CFD | 1 | April 19, 2010 12:11 |
slow down running in parallel | laf | FLUENT | 1 | April 4, 2007 03:48 |
Postprocessing after running in parallel | balakrishnan | OpenFOAM Pre-Processing | 0 | March 11, 2005 12:22 |