CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

all processes end up in the same node when submitting parallel job by SGE

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 1, 2012, 09:11
Default all processes end up in the same node when submitting parallel job by SGE
  #1
New Member
 
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 14
man@hepia is on a distinguished road
Dear all,
has anyone seen this kind of problem?
Background: OpenFOAM version OpenFOAM-2.1.x, compiled by "Allwmake",
Grid Engine GE 6.2u3, Scientific Linux SL release 5.5
Cluster of 224 cores in 20-something nodes.

The following distributes a task nicely in many nodes:
mpirun -np 64 --machinefile machines simpleFoam -parallel
Slaves :
63
(
"node015.374"
"node016.2178"
..

But submitting the same task by SGE leads to a situation where _all_ the processes are in a single node.

mpirun -np $NSLOTS --machinefile $TMPDIR/machines simpleFoam -parallel

The nodes in "machines" generated by SGE are diverse node015, node016.. but simpleFoam always starts the processes in a single node.
Is there something I should check? mpirun is from the ThirdParty package.
man@hepia is offline   Reply With Quote

Old   November 12, 2012, 03:12
Default
  #2
New Member
 
axpl's Avatar
 
Alexandro Palmieri
Join Date: Mar 2009
Location: Ravenna, Italy
Posts: 10
Rep Power: 17
axpl is on a distinguished road
Send a message via Skype™ to axpl
Add this line to your script:
Code:
unset SGE_ROOT
Sincerely,
Alex
axpl is offline   Reply With Quote

Old   November 13, 2012, 04:43
Default
  #3
New Member
 
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 14
man@hepia is on a distinguished road
Hi,
thanks for the reply. I am not sure what else I should change in the script. Here we have it:

mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel

this runs everything in just one node

unset SGE_ROOT
mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel

fails with the following error message:
ssh: Unsupported option - -x
man@hepia is offline   Reply With Quote

Old   November 15, 2012, 13:44
Default
  #4
New Member
 
axpl's Avatar
 
Alexandro Palmieri
Join Date: Mar 2009
Location: Ravenna, Italy
Posts: 10
Rep Power: 17
axpl is on a distinguished road
Send a message via Skype™ to axpl
The error may depends to openmpi: what version are you using? Can you post your launch script?

Alex
axpl is offline   Reply With Quote

Old   November 16, 2012, 01:07
Default
  #5
Senior Member
 
linnemann's Avatar
 
Niels Nielsen
Join Date: Mar 2009
Location: NJ - Denmark
Posts: 556
Rep Power: 27
linnemann will become famous soon enough
Hi

if openmpi was built using --with-sge then you dont need "-machinefile $TMPDIR/machines"

unset $SGE_ROOT for our cluster puts the job on one node even though its reserving the nodes.

Here is how I start an OF job on a sge cluster "qsub runScript" with runScript containing the lines below

Code:
#!/bin/bash
#
#$ -cwd
#$ -o ./log.out
#$ -e ./log.err

#$ -pe orte 24
#
#$ -q all.q
#$ -S /bin/bash

# unset SGE_ROOT

echo Got $NSLOTS processors.

source /share/apps/OpenFOAM/OpenFOAM-2.1.x/etc/bashrc

mpi=`command -v mpirun`
solver=`command -v pimpleDyMFoam`
echo $mpi
echo $solver

if [ -z "$mpi" -a -z "$solver" ]
  then 
      echo ">> mpi was not found, quitting!"
      exit 1
  else
      echo ">> mpi was found will continue"
      $mpi -np $NSLOTS -x LD_LIBRARY_PATH -x PATH -x WM_PROJECT_DIR -x WM_PROJECT_INST_DIR -x  WM_OPTIONS -x  FOAM_LIBBIN -x  FOAM_APPBIN -x  FOAM_USER_APPBIN -x MPI_BUFFER_SIZE $solver -parallel > log
fi
__________________
Linnemann

PS. I do not do personal support, so please post in the forums.
linnemann is offline   Reply With Quote

Old   November 16, 2012, 09:42
Default
  #6
New Member
 
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 14
man@hepia is on a distinguished road
Many thanks!
Unfortunately the script that you copied behaves the same way as before: all processes in 1 node.
I need to set "machinefile" in the script, otherwise I get "ssh unsupported option -x" in stderr.
I compiled OpenFOAM 2.1.1 with just "./Allwmake". Is there a trick to force "--with-sge"?
man@hepia is offline   Reply With Quote

Reply

Tags
sge


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with CEL (guess it's simple to solve) Felggv CFX 22 March 26, 2019 17:42
Difficulty in calculating angular velocity of Savonius turbine simulation alfaruk CFX 14 March 17, 2017 07:08
CFX Parameters Settings Flaky CFX 21 October 28, 2010 19:16
error using combination of step function xujjun CFX 1 January 15, 2008 17:46
How to apply negtive pressure to outlet bioman66 CFX 5 June 3, 2006 02:40


All times are GMT -4. The time now is 17:54.