CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > CFX

Error while running cfx in parallel configuration

Register Blogs Community New Posts Updated Threads Search

Like Tree3Likes
  • 2 Post By Lance
  • 1 Post By Svetlana

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 17, 2015, 13:45
Default Error while running cfx in parallel configuration
  #1
New Member
 
stater's Avatar
 
H. Omar
Join Date: Mar 2013
Posts: 23
Rep Power: 13
stater is on a distinguished road
Hi everyone,

I am currently trying to run CFX (v16) in parallel configuration using the slurm manager. I used runCFX.sh script which is as follows:

Quote:
!/bin/bash
srun hostname -s > /tmp//hosts.$SLURM_JOB_ID
if [ "x$SLURM_NPROCS" = "x" ]; then
if [ "x$SLURM_NTASKS_PER_NODE" = "x" ];then
SLURM_NTASKS_PER_NODE=1
fi
SLURM_NPROCS=`expr $SLURM_JOB_NUM_NODES \* $SLURM_NTASKS_PER_NODE`
fi
# use ssh instead of rsh
export CFX5RSH=ssh
# format the host list for cfx
cfxHosts=`tr '\n' ',' < /tmp//hosts.$SLURM_JOB_ID`
# run the partitioner and solver
/usr/ansys_inc/v160/CFX/bin/cfx5solve -par -par-dist "$cfxHosts" -def ./AADL2.def -part $SLURM_NPROCS -start-method "Platform MPI Distributed Parallel"
# cleanup
rm /tmp/hosts.$SLURM_JOB_ID
I submitted the job with the following command line

Quote:
sbatch -n 10 -N 2 -p mypartition -t 10 ./runCFX.sh
i obtained the following error in the slurm output file:

Quote:
<IBM Platform MPI>: : warning, dlopen of libhwloc.so failed (null)/lib/linux_amd64/libhwloc.so: cannot open shared object file: No such file or directory
An error has occurred in cfx5solve:

The ANSYS CFX partitioner was interrupted by signal SEGV (11)
Can anyone help me to solve this issue? Thank you
stater is offline   Reply With Quote

Old   October 19, 2015, 02:59
Default
  #2
Senior Member
 
Lance
Join Date: Mar 2009
Posts: 669
Rep Power: 22
Lance is on a distinguished road
We have had problems getting Platform MPI Distributed Parallel to run with SLURM, and got exactly the same error as you. Ansys support wont help since they dont support it...
If I remember correctly it was solved by either using Intel MPI Distributed Parallel and/or unsetting the SLURM_GTIDS environment variable.
stater and Svetlana like this.
Lance is offline   Reply With Quote

Old   October 19, 2015, 13:59
Default
  #3
New Member
 
stater's Avatar
 
H. Omar
Join Date: Mar 2013
Posts: 23
Rep Power: 13
stater is on a distinguished road
Hello sir,

I confirm, the unsettling of the variable SLURM_GTIDS has allowed to solve the problem
Thank you a lot
stater is offline   Reply With Quote

Old   March 18, 2016, 01:02
Default
  #4
Senior Member
 
Svetlana Tkachenko
Join Date: Oct 2013
Location: Australia, Sydney
Posts: 416
Rep Power: 15
Svetlana is on a distinguished road
Thank you, Lance,

We added 'unset SLURM_GTIDS' to the job script and the job runs now,

Now instead of 2 we have 5 people on the planet who know of this workaround.
Svetlana is offline   Reply With Quote

Old   March 30, 2016, 23:17
Default
  #5
New Member
 
EvanOscarSmith's Avatar
 
Evan Oscar Smith
Join Date: Jan 2012
Location: Canberra, Australia
Posts: 6
Rep Power: 14
EvanOscarSmith is on a distinguished road
Thanks so much!

I was having this same issue with CFX 17.0 and the unset SLURM_GTIDS command has fixed it.
EvanOscarSmith is offline   Reply With Quote

Old   May 3, 2016, 19:51
Default
  #6
hmp
New Member
 
Join Date: May 2016
Posts: 1
Rep Power: 0
hmp is on a distinguished road
Thank you for this info.

In case someone is trying to run Abaqus in parallel on one of XSEDE resources (e.g. SDSC Comet) then adding 'unset SLURM_GTIDS' before the ABQ command will get rid of errors about MPI.

Make sure you have parallel_mode=MPI in the ABQ command.

Best,
hmp is offline   Reply With Quote

Old   June 20, 2016, 12:53
Default
  #7
New Member
 
Kyriakos Vafiadis
Join Date: Feb 2011
Location: Kozani, Greece
Posts: 29
Rep Power: 15
k.vafiadis is on a distinguished road
Send a message via Skype™ to k.vafiadis
unset SLURM_GTIDS worked for me too


Quote:
Originally Posted by Светлана View Post
Thank you, Lance,

We added 'unset SLURM_GTIDS' to the job script and the job runs now,

Now instead of 2 we have 5 people on the planet who know of this workaround.
__________________
--
Kyriakos Vafiadis
Mechanical Engineer, PhD candidate
k.vafiadis is offline   Reply With Quote

Old   February 27, 2017, 08:08
Default
  #8
New Member
 
Neys Schreiner
Join Date: Feb 2014
Posts: 4
Rep Power: 12
Neys Schreiner is on a distinguished road
Also worked here, added:
unset SLURM_GTIDS
to my job script below the usual module load commands and voila.

Thanks!
Neys Schreiner is offline   Reply With Quote

Old   May 11, 2017, 20:01
Default
  #9
New Member
 
Alex
Join Date: Mar 2017
Posts: 9
Rep Power: 9
alaspina is on a distinguished road
Anyone know how to do this for an interactive session?
alaspina is offline   Reply With Quote

Old   May 12, 2017, 00:56
Default
  #10
Senior Member
 
Svetlana Tkachenko
Join Date: Oct 2013
Location: Australia, Sydney
Posts: 416
Rep Power: 15
Svetlana is on a distinguished road
Alex, do you mean a CFX session on a GNU/Linux computer? Add the "unset SLURM_GTIDS" line to your bashrc or bash profile.
alaspina likes this.
Svetlana is offline   Reply With Quote

Reply

Tags
cfx, parallel, slurm


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
OF 2.0.1 parallel running problems moser_r OpenFOAM Running, Solving & CFD 9 July 27, 2022 04:15
Problem running cfx on hpc beyonder1 CFX 4 September 14, 2015 03:35
RSH problem for parallel running in CFX Nicola CFX 5 June 18, 2012 19:31
Statically Compiling OpenFOAM Issues herzfeldd OpenFOAM Installation 21 January 6, 2009 10:38
Kubuntu uses dash breaks All scripts in tutorials platopus OpenFOAM Bugs 8 April 15, 2008 08:52


All times are GMT -4. The time now is 16:21.