|
[Sponsors] |
all processes end up in the same node when submitting parallel job by SGE |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 1, 2012, 09:11 |
all processes end up in the same node when submitting parallel job by SGE
|
#1 |
New Member
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 14 |
Dear all,
has anyone seen this kind of problem? Background: OpenFOAM version OpenFOAM-2.1.x, compiled by "Allwmake", Grid Engine GE 6.2u3, Scientific Linux SL release 5.5 Cluster of 224 cores in 20-something nodes. The following distributes a task nicely in many nodes: mpirun -np 64 --machinefile machines simpleFoam -parallel Slaves : 63 ( "node015.374" "node016.2178" .. But submitting the same task by SGE leads to a situation where _all_ the processes are in a single node. mpirun -np $NSLOTS --machinefile $TMPDIR/machines simpleFoam -parallel The nodes in "machines" generated by SGE are diverse node015, node016.. but simpleFoam always starts the processes in a single node. Is there something I should check? mpirun is from the ThirdParty package. |
|
November 12, 2012, 03:12 |
|
#2 |
New Member
|
Add this line to your script:
Code:
unset SGE_ROOT Alex |
|
November 13, 2012, 04:43 |
|
#3 |
New Member
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 14 |
Hi,
thanks for the reply. I am not sure what else I should change in the script. Here we have it: mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel this runs everything in just one node unset SGE_ROOT mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel fails with the following error message: ssh: Unsupported option - -x |
|
November 15, 2012, 13:44 |
|
#4 |
New Member
|
The error may depends to openmpi: what version are you using? Can you post your launch script?
Alex |
|
November 16, 2012, 01:07 |
|
#5 |
Senior Member
Niels Nielsen
Join Date: Mar 2009
Location: NJ - Denmark
Posts: 556
Rep Power: 27 |
Hi
if openmpi was built using --with-sge then you dont need "-machinefile $TMPDIR/machines" unset $SGE_ROOT for our cluster puts the job on one node even though its reserving the nodes. Here is how I start an OF job on a sge cluster "qsub runScript" with runScript containing the lines below Code:
#!/bin/bash # #$ -cwd #$ -o ./log.out #$ -e ./log.err #$ -pe orte 24 # #$ -q all.q #$ -S /bin/bash # unset SGE_ROOT echo Got $NSLOTS processors. source /share/apps/OpenFOAM/OpenFOAM-2.1.x/etc/bashrc mpi=`command -v mpirun` solver=`command -v pimpleDyMFoam` echo $mpi echo $solver if [ -z "$mpi" -a -z "$solver" ] then echo ">> mpi was not found, quitting!" exit 1 else echo ">> mpi was found will continue" $mpi -np $NSLOTS -x LD_LIBRARY_PATH -x PATH -x WM_PROJECT_DIR -x WM_PROJECT_INST_DIR -x WM_OPTIONS -x FOAM_LIBBIN -x FOAM_APPBIN -x FOAM_USER_APPBIN -x MPI_BUFFER_SIZE $solver -parallel > log fi
__________________
Linnemann PS. I do not do personal support, so please post in the forums. |
|
November 16, 2012, 09:42 |
|
#6 |
New Member
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 14 |
Many thanks!
Unfortunately the script that you copied behaves the same way as before: all processes in 1 node. I need to set "machinefile" in the script, otherwise I get "ssh unsupported option -x" in stderr. I compiled OpenFOAM 2.1.1 with just "./Allwmake". Is there a trick to force "--with-sge"? |
|
Tags |
sge |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problems with CEL (guess it's simple to solve) | Felggv | CFX | 22 | March 26, 2019 17:42 |
Difficulty in calculating angular velocity of Savonius turbine simulation | alfaruk | CFX | 14 | March 17, 2017 07:08 |
CFX Parameters Settings | Flaky | CFX | 21 | October 28, 2010 19:16 |
error using combination of step function | xujjun | CFX | 1 | January 15, 2008 17:46 |
How to apply negtive pressure to outlet | bioman66 | CFX | 5 | June 3, 2006 02:40 |