|
[Sponsors] |
February 23, 2020, 10:12 |
NACA0012 optimization fails on parallel run
|
#1 |
New Member
Join Date: Jun 2019
Posts: 10
Rep Power: 7 |
Hi everybody,
I am using linux SU2 7.0.1 on Fedora 31 on a single server/multicore AMD CPU. All tutorials run perfectly except shape design ones. I encounter two problems with Inviscid_2D_Unconstrained_NACA0012: 1 - CONTINUOUS_ADJOINT run fails when reading surface sensitivity file (exposed in an other thread Can not run any test cases) when DISCRETE_ADJOINT is ok in single process but... 2 - SU2_CFD returns a segmentation fault when the same case is run in parallel. It occurs in DSN_002/DIRECT, with an error 139 (DSN_001 is ok). log_Direct.out finishes with the call to ParMETIS. The error sent to the terminal is: Code:
-------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 2 with PID 0 on node server01 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Lazlo |
|
February 24, 2020, 11:04 |
|
#2 |
New Member
Jason Trinidad
Join Date: Jul 2018
Posts: 8
Rep Power: 8 |
Hi Lazlo,
In my experience shape design can be a memory gobbler. If the first design iteration is running fine and you're experiencing a segfault, my guess is that you may be running out of memory. Have you tried running "top" or something similar on your compute nodes? |
|
February 24, 2020, 14:11 |
|
#3 |
New Member
Join Date: Jun 2019
Posts: 10
Rep Power: 7 |
Thanks jtrin,
I stay very low on memory with the NACA0012 test case (2GB). It only occurs in parallel mode, serial mode is ok. |
|
February 26, 2020, 08:01 |
|
#4 |
Super Moderator
Tim Albring
Join Date: Sep 2015
Posts: 195
Rep Power: 11 |
Can you post the complete stack trace of python?
__________________
Developer Director @ SU2 Foundation Get involved:
|
|
February 26, 2020, 15:01 |
|
#5 |
New Member
Join Date: Jun 2019
Posts: 10
Rep Power: 7 |
Thank you for your interest,
Here is the output : Code:
Traceback (most recent call last): File "/home/lazlo/bin/shape_optimization.py", line 176, in <module> main() File "/home/lazlo/bin/shape_optimization.py", line 108, in main options.nzones ) File "/home/lazlo/bin/shape_optimization.py", line 152, in shape_optimization SU2.opt.SLSQP(project,x0,xb,its,accu) File "/home/lazlo/bin/SU2/opt/scipy_tools.py", line 133, in scipy_slsqp epsilon = eps ) File "/usr/lib64/python3.7/site-packages/scipy/optimize/slsqp.py", line 208, in fmin_slsqp constraints=cons, **opts) File "/usr/lib64/python3.7/site-packages/scipy/optimize/slsqp.py", line 399, in _minimize_slsqp fx = func(x) File "/usr/lib64/python3.7/site-packages/scipy/optimize/optimize.py", line 300, in function_wrapper return function(*(wrapper_args + args)) File "/home/lazlo/bin/SU2/opt/scipy_tools.py", line 383, in obj_f obj_list = project.obj_f(x) File "/home/lazlo/bin/SU2/opt/project.py", line 233, in obj_f return self._eval(konfig, func,dvs) File "/home/lazlo/bin/SU2/opt/project.py", line 202, in _eval vals = design._eval(func,*args) File "/home/lazlo/bin/SU2/eval/design.py", line 147, in _eval vals = eval_func(*inputs) File "/home/lazlo/bin/SU2/eval/design.py", line 244, in obj_f func += su2func(this_obj,config,state) * sign * scale * global_factor File "/home/lazlo/bin/SU2/eval/functions.py", line 92, in function aerodynamics( config, state ) File "/home/lazlo/bin/SU2/eval/functions.py", line 255, in aerodynamics info = su2run.direct(config) File "/home/lazlo/bin/SU2/run/direct.py", line 77, in direct SU2_CFD(konfig) File "/home/lazlo/bin/SU2/run/interface.py", line 112, in CFD run_command( the_Command ) File "/home/lazlo/bin/SU2/run/interface.py", line 292, in run_command raise exception(message) RuntimeError: Path = /media/data/lazlo/Logiciels/git/SU2/Tutorials/Inviscid_2D_Unconstrained_NACA0012 DA/DESIGNS/DSN_002/DIRECT/, Command = mpirun -n 8 /home/lazlo/bin/SU2_CFD config_CFD.cfg SU2 process returned error '139' [server01:6556 :0:6556] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== [server01:6557 :0:6557] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== [server01:6558 :0:6558] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== [server01:6559 :0:6559] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== [server01:6561 :0:6561] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== [server01:6563 :0:6563] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x514) ==== backtrace ==== [server01:6555 :0:6555] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace ==== 0 /lib64/libucs.so.0(+0x1b25f) [0x7fb6ced4025f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7fb6ced4042a] 2 /home/lazlo/bin/SU2_CFD() [0xb3d2e0] 3 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 4 /home/lazlo/bin/SU2_CFD() [0x8050f4] 5 /home/lazlo/bin/SU2_CFD() [0x806447] 6 /home/lazlo/bin/SU2_CFD() [0x806bf8] 7 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 8 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fb6d4e8e1a3] 10 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== 0 /lib64/libucs.so.0(+0x1b25f) [0x7f79a413225f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7f79a413242a] 2 /home/lazlo/bin/SU2_CFD() [0xb3d96b] 3 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 4 /home/lazlo/bin/SU2_CFD() [0x8050f4] 5 /home/lazlo/bin/SU2_CFD() [0x806447] 6 /home/lazlo/bin/SU2_CFD() [0x806bf8] 7 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 8 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f79a52801a3] 10 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== 0 /lib64/libucs.so.0(+0x1b25f) [0x7f0cdc54a25f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7f0cdc54a42a] 2 /home/lazlo/bin/SU2_CFD() [0xb3d96b] 3 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 4 /home/lazlo/bin/SU2_CFD() [0x8050f4] 5 /home/lazlo/bin/SU2_CFD() [0x806447] 6 /home/lazlo/bin/SU2_CFD() [0x806bf8] 7 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 8 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f0cde6991a3] 10 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== 0 /lib64/libucs.so.0(+0x1b25f) [0x7f2bb8f4125f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7f2bb8f4142a] 2 /home/lazlo/bin/SU2_CFD() [0xb3d96b] 3 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 4 /home/lazlo/bin/SU2_CFD() [0x8050f4] 5 /home/lazlo/bin/SU2_CFD() [0x806447] 6 /home/lazlo/bin/SU2_CFD() [0x806bf8] 7 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 8 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 0 /lib64/libucs.so.0(+0x1b25f) [0x7fe60d20b25f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7fe60d20b42a] 2 /home/lazlo/bin/SU2_CFD() [0xb3d96b] 3 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 4 /home/lazlo/bin/SU2_CFD() [0x8050f4] 5 /home/lazlo/bin/SU2_CFD() [0x806447] 6 /home/lazlo/bin/SU2_CFD() [0x806bf8] 7 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 8 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fe60f35a1a3] 10 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== 0 /lib64/libucs.so.0(+0x1b25f) [0x7f32782c925f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7f32782c942a] 2 /home/lazlo/bin/SU2_CFD() [0xb3d96b] 3 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 4 /home/lazlo/bin/SU2_CFD() [0x8050f4] 5 /home/lazlo/bin/SU2_CFD() [0x806447] 6 /home/lazlo/bin/SU2_CFD() [0x806bf8] 7 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 8 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f327a4181a3] 10 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== 0 /lib64/libucs.so.0(+0x1b25f) [0x7f6ff004825f] 1 /lib64/libucs.so.0(+0x1b42a) [0x7f6ff004842a] 2 /lib64/libc.so.6(cfree+0x20) [0x7f6ff11fb7b0] 3 /home/lazlo/bin/SU2_CFD() [0x5f5882] 4 /home/lazlo/bin/SU2_CFD() [0xb3d366] 5 /home/lazlo/bin/SU2_CFD() [0xb3ecb9] 6 /home/lazlo/bin/SU2_CFD() [0x8050f4] 7 /home/lazlo/bin/SU2_CFD() [0x806447] 8 /home/lazlo/bin/SU2_CFD() [0x806bf8] 9 /home/lazlo/bin/SU2_CFD() [0x80cb1f] 10 /home/lazlo/bin/SU2_CFD() [0x45a8e0] 11 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f6ff11961a3] 12 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== 9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f2bbb0901a3] 10 /home/lazlo/bin/SU2_CFD() [0x4687be] =================== -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 6 with PID 0 on node server01 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Last edited by Lazlo; February 26, 2020 at 15:01. Reason: typo |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Lagrangian particle tracking cannot be run in parallel for the cases with AMI patches | Armin.Sh | OpenFOAM Running, Solving & CFD | 7 | March 28, 2021 23:33 |
unable to run in parallel with OpenFOAM 2.2 on CentOS | einatlev | OpenFOAM Running, Solving & CFD | 9 | June 26, 2014 01:24 |
[mesh manipulation] Cannot get refineMesh to run in parallel | smschnob | OpenFOAM Meshing & Mesh Conversion | 2 | June 3, 2014 12:20 |
First Parallel Run - need some help | Gian Maria | OpenFOAM | 3 | June 17, 2011 13:08 |
Ignition fails in parallel run combustion solvers | msha | OpenFOAM Bugs | 17 | January 17, 2009 04:49 |