|
[Sponsors] |
shape_optimization.py - Inconsistent MPI Errors on HPC Nodes |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
July 4, 2023, 12:44 |
shape_optimization.py - Inconsistent MPI Errors on HPC Nodes
|
#1 |
New Member
mardar
Join Date: Dec 2019
Posts: 17
Rep Power: 6 |
Hi everyone,
I'm currently attempting to utilize shape_optimization.py for the 3D inviscid onera tutorial with discrete adjoint. While performing this optimization on my local host, everything runs smoothly without any issues. However, when I attempt to run it on the nodes of HPC cluster, I encounter occasional errors. The error seems to occur randomly, sometimes during the DEFORM process, and other times during the ADJOINT or DIRECT processes. Additionally, I'm using su2 version 740 and open mpi 414. This inconsistency error is seems like related to MPI, and I'm seeking some insights into potential reasons for this behavior on the HPC nodes. Has anyone else experienced a similar issue or have any ideas on what could be causing this problem? Thanks in advance for your help! my job file: #! /bin/bash #$ -S /bin/bash #$ -V #$ -cwd #$ -j y export SU2_MPI_COMMAND="/mypath/apps/ompi414/bin/mpirun --mca mtl ^ofi --mca btl_openib_allow_ib 1 --mca btl vader,self,openib -n %d %s" /mypath/apps/anaconda3/bin/python /mypath/apps/su740/bin/shape_optimization.py -n 30 -g DISCRETE_ADJOINT -f inv_ONERAM6_adv.cfg ERROR: File "/mypath/apps/su740/bin/SU2/run/interface.py", line 208, in SOL run_command( the_Command ) File "/mypath/apps/su740/bin/SU2/run/interface.py", line 271, in run_command raise exception(message) RuntimeError: Path = /mypath/opt_try/try0/DESIGNS/DSN_001/ADJOINT_DRAG/, Command = /mypath/apps/ompi414/bin/mpirun --mca mtl ^ofi --mca btl_openib_allow_ib 1 --mca btl vader,self,openib -n 30 /mypath/apps/su740/bin/SU2_SOL config_SOL.cfg SU2 process returned error '139' [compute-5-2:24097] *** Process received signal *** [compute-5-2:24097] Signal: Segmentation fault (11) [compute-5-2:24097] Signal code: Address not mapped (1) [compute-5-2:24097] Failing at address: 0x2b6466853770 [compute-5-2:24118] *** Process received signal *** [compute-5-2:24118] Signal: Segmentation fault (11) [compute-5-2:24118] Signal code: Address not mapped (1) [compute-5-2:24118] Failing at address: 0x2ad7a4e50770 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 13 with PID 24118 on node compute-5-2 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- |
|
Tags |
su2 |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Building OpenFOAM1.7.0 from source | ata | OpenFOAM Installation | 46 | March 6, 2022 14:21 |
Initial residuals of p increases within the piso loop of pimpleFoam | efsolat | OpenFOAM Running, Solving & CFD | 0 | December 20, 2021 04:25 |
Transient simulation not converging | skabilan | OpenFOAM Running, Solving & CFD | 14 | December 17, 2019 00:12 |
Errors when running Shape Optimization Tutorial 1 - NACA0012 | northfly | SU2 | 7 | February 14, 2019 04:46 |
Upgraded from Karmic Koala 9.10 to Lucid Lynx10.04.3 | bookie56 | OpenFOAM Installation | 8 | August 13, 2011 05:03 |