|
[Sponsors] |
August 3, 2009, 05:47 |
Running via SGE and mpich-mx
|
#1 |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
Dear all,
I have build OF1.5-dev with support for you mpich-mx (myrinet cluster). SUbmitting a Job via SGE fails without any arrer message. The job simple does not start without any error. I use a script for submit: Code:
#!/bin/sh #$ -N OF_0013_mpich_mx #$ -S /bin/sh #$ -cwd #$ -j y #$ -pe mpi 32 export WM_MPLIB=MPICH-MX source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc echo $MPI_HOME which simpleFoam mpirun -machinefile $TMPDIR/machines -np $NSLOTS simpleFoam -parallel </dev/null I get mpich-Benchmarks running with that without problems but not simpleFoam. Any ideas? |
|
August 3, 2009, 11:01 |
|
#2 | |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Quote:
-x PATH -x LD_LIBRARY_PATH In my setup I also transfer WM_PROJECT_DIR, FOAM_MPI_LIBBIN and MPI_ARCH_PATH, but I'm not 100% sure whether these are required (and as long as it works I see no harm in adding those) If that doesn't work for you, have a look into the *.oxxx and *.poXXX-files that SGE gives you and share the error message there with us Bernhard |
||
August 4, 2009, 05:27 |
|
#3 | |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
Thanks Bernhard for your help. I am still struggling.
Quote:
Code:
#!/bin/sh #$ -N OF_0013_mpich_mx #$ -S /bin/sh #$ -cwd #$ -j y #$ -pe mpi 32 export WM_MPLIB=MPICH-MX source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc echo $MPI_HOME which simpleFoam mpirun -machinefile $TMPDIR/machines -np $NSLOTS `which simpleFoam` /run/brblo OF_0013_mpich_mx -parallel -x PATH -x LD_LIBRARY_PATH < /dev/null Code:
/opt/OpenFOAM/ThirdParty/mpich-mx-1.2.7..5 /opt/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linux64GccDPOpt/simpleFoam Code:
/opt/sge/default/spool/node006/active_jobs/5519.1/pe_hostfile node006 node006 node006 node006 node030 node030 node030 node030 node003 node003 node003 node003 node026 node026 node026 node026 node027 node027 node027 node027 node004 node004 node004 node004 node025 node025 node025 node025 node020 node020 node020 node020 rm: cannot remove `/tmp/5519.1.all.q/rsh': No such file or directory The Job does not even seem to start at all. |
||
August 4, 2009, 13:56 |
|
#4 | |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Quote:
No idea what could be the cause. Just some tips for further testing: - use qlogin to test it interactivly. Be aware that there are subtle differences between the environment you get with qlogin and qsub - use less nodes and try to find out whether this is a problem that occurs if all processes are on one node, too (if not, then you might have a rsh/ssh-problem) - MPI-demo programs run with the same script? Bernhard |
||
August 4, 2009, 15:33 |
|
#5 | ||
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
Quote:
Quote:
|
|||
August 5, 2009, 08:01 |
|
#6 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sun Grid Engine | grtabor | OpenFOAM Running, Solving & CFD | 28 | August 22, 2012 10:27 |
Star-ccm+ batch mode on SGE | Karl Jensen | Siemens | 0 | February 4, 2009 16:54 |