CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > FLUENT

Odd MPI Issues When Running Large Simulation

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 16, 2022, 11:59
Default Odd MPI Issues When Running Large Simulation
  #1
New Member
 
Anonymous
Join Date: Mar 2022
Posts: 6
Rep Power: 4
JaySmall1 is on a distinguished road
I've been running scramjet simulations for my research work for several months on a coarse mesh. The simulations usually run without any issues. Recently, I created a finer mesh with improvements to key areas. This increases the total cells from 2 million to approximately 9.8 million. On the finer mesh, the simulation will run for 100-200 iterations before crashing and producing errors such as:


Node 28: Process 14072: Received signal SIGSEGV.

999999: mpt_accept: error: accept failed: No error (repeated multiple times)
. . .
999999: mpt_accept: error: accept failed: Abort(138030991) on node 31 (rank 31 in comm 0): Fatal error in PMPI_Waitall: Other MPI error, error stack:
PMPI_Waitall(346)..............: MPI_Waitall(count=6, req_array=000000387A7FDE00, status_array=000000387A7FDF00) failed
MPIR_Waitall(174)..............:
MPIR_Waitall_impl(55)..........:
MPIDI_Progress_test(185).......:
MPIDI_OFI_handle_cq_error(1042): OFI poll failed (netmod\\ofi\\ofi_events.c:1042:MPIDI_OFI_handle_c q_error:Unknown error)
Abort(70922127) on node 21 (rank 21 in comm 0): Fatal error in PMPI_Waitall: Other MPI error, error stack:
PMPI_Waitall(346)..............: MPI_Waitall(count=16, req_array=000000E02DBFE170, status_array=000000E02DBFE270) failed
MPIR_Waitall(174)..............:
MPIR_Waitall_impl(55)..........:
MPIDI_Progress_test(185).......:
MPIDI_OFI_handle_cq_error(1042): OFI poll failed (netmod\\ofi\\ofi_events.c:1042:MPIDI_OFI_handle_c q_error:Unknown error)
The fl process could not be started.



I've talked to Ansys personnel who asked that I confirm the intel MPI was being used, and it has been confirmed using the .trn file.

I'm running the simulations parallel on a machine with the following hardware:


AMD Threadripper Pro 3975WX 32 core
128 Gb DDR4 Memory


Does anyone on this forum have experience with this type of error, and possibly how to go about fixing the issue? Thank you in advance for your help.

Last edited by JaySmall1; September 16, 2022 at 12:01. Reason: Forgot to include details on my machine
JaySmall1 is offline   Reply With Quote

Old   September 16, 2022, 14:55
Default
  #2
Senior Member
 
Lucky
Join Date: Apr 2011
Location: Orlando, FL USA
Posts: 5,752
Rep Power: 66
LuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura about
What is crashing is your calculation and setup. Ignore that it is an MPI error. The MPI is the parallel handler. When a a program crashes on your computer, you don't blame your ISP, so don't blame the MPI. The MPI is simply the person that reporting you have a segmentation fault. It is likely your MPI is working fine since, well, it is working fine for other cases.
LuckyTran is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Simulation just stops output writing, but keeps running blaise OpenFOAM Running, Solving & CFD 6 March 27, 2024 05:49
open foam stops running simulation - reactingFoam Vittorio94 OpenFOAM Running, Solving & CFD 0 June 12, 2019 12:51
Huge file sizes when Running VOF simulation aarratia FLUENT 0 May 8, 2014 13:27
GUI crash and simulation engine still running RPJones FLOW-3D 2 November 9, 2010 09:18
Error using LaunderGibsonRSTM on SGI ALTIX 4700 jaswi OpenFOAM 2 April 29, 2008 11:54


All times are GMT -4. The time now is 05:13.