|
[Sponsors] |
September 16, 2022, 11:59 |
Odd MPI Issues When Running Large Simulation
|
#1 |
New Member
Anonymous
Join Date: Mar 2022
Posts: 6
Rep Power: 4 |
I've been running scramjet simulations for my research work for several months on a coarse mesh. The simulations usually run without any issues. Recently, I created a finer mesh with improvements to key areas. This increases the total cells from 2 million to approximately 9.8 million. On the finer mesh, the simulation will run for 100-200 iterations before crashing and producing errors such as:
Node 28: Process 14072: Received signal SIGSEGV. 999999: mpt_accept: error: accept failed: No error (repeated multiple times) . . . 999999: mpt_accept: error: accept failed: Abort(138030991) on node 31 (rank 31 in comm 0): Fatal error in PMPI_Waitall: Other MPI error, error stack: PMPI_Waitall(346)..............: MPI_Waitall(count=6, req_array=000000387A7FDE00, status_array=000000387A7FDF00) failed MPIR_Waitall(174)..............: MPIR_Waitall_impl(55)..........: MPIDI_Progress_test(185).......: MPIDI_OFI_handle_cq_error(1042): OFI poll failed (netmod\\ofi\\ofi_events.c:1042:MPIDI_OFI_handle_c q_error:Unknown error) Abort(70922127) on node 21 (rank 21 in comm 0): Fatal error in PMPI_Waitall: Other MPI error, error stack: PMPI_Waitall(346)..............: MPI_Waitall(count=16, req_array=000000E02DBFE170, status_array=000000E02DBFE270) failed MPIR_Waitall(174)..............: MPIR_Waitall_impl(55)..........: MPIDI_Progress_test(185).......: MPIDI_OFI_handle_cq_error(1042): OFI poll failed (netmod\\ofi\\ofi_events.c:1042:MPIDI_OFI_handle_c q_error:Unknown error) The fl process could not be started. I've talked to Ansys personnel who asked that I confirm the intel MPI was being used, and it has been confirmed using the .trn file. I'm running the simulations parallel on a machine with the following hardware: AMD Threadripper Pro 3975WX 32 core 128 Gb DDR4 Memory Does anyone on this forum have experience with this type of error, and possibly how to go about fixing the issue? Thank you in advance for your help. Last edited by JaySmall1; September 16, 2022 at 12:01. Reason: Forgot to include details on my machine |
|
September 16, 2022, 14:55 |
|
#2 |
Senior Member
Lucky
Join Date: Apr 2011
Location: Orlando, FL USA
Posts: 5,752
Rep Power: 66 |
What is crashing is your calculation and setup. Ignore that it is an MPI error. The MPI is the parallel handler. When a a program crashes on your computer, you don't blame your ISP, so don't blame the MPI. The MPI is simply the person that reporting you have a segmentation fault. It is likely your MPI is working fine since, well, it is working fine for other cases.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Simulation just stops output writing, but keeps running | blaise | OpenFOAM Running, Solving & CFD | 6 | March 27, 2024 05:49 |
open foam stops running simulation - reactingFoam | Vittorio94 | OpenFOAM Running, Solving & CFD | 0 | June 12, 2019 12:51 |
Huge file sizes when Running VOF simulation | aarratia | FLUENT | 0 | May 8, 2014 13:27 |
GUI crash and simulation engine still running | RPJones | FLOW-3D | 2 | November 9, 2010 09:18 |
Error using LaunderGibsonRSTM on SGI ALTIX 4700 | jaswi | OpenFOAM | 2 | April 29, 2008 11:54 |