August 2, 2023, 06:39
|
Strange MPI error
|
#1
|
New Member
Join Date: Aug 2022
Posts: 5
Rep Power: 4
|
Hi,
I'm building OpenFOAM11 beowulf cluster. With MotorBike tutorial everything goes well, as with cavity tutorial. I've tested with some MPI benchmark and they executed correctly.
Every program starts with this command
Quote:
mpirun --host master:12,slave-1:12 <program>
|
But if i run my case, i have an MPI error.
potentialFoam
Quote:
Pstream initialised with:
floatTransfer : false
nProcsSimpleSum : 0
commsType : nonBlocking
polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 10)
allowSystemOperations : Allowing user-supplied system call operations
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time
Create mesh for time = 0
Reading velocity field U
Constructing pressure field p
Constructing velocity potential field Phi
No MRF models present
Calculating potential flow
GAMG: Solving for Phi, Initial residual = 1, Final residual = 0.01761, No Iterations 2
GAMG: Solving for Phi, Initial residual = 0.439493, Final residual = 0.00614545, No Iterations 2
Continuity error = 9.54293e-06
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node master exited on signal 9 (Killed).
--------------------------------------------------------------------------
|
simpleFoam
Quote:
Starting time loop
forces force_coefs:
Not including porosity effects
forceCoeffs force_coefs:
Not including porosity effects
forceCoeffs force_coefs write:
Cm = 0
Cd = 0
Cl = 0
Cl(f) = 0
Cl(r) = 0
Time = 0.1s
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected. This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).
Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate. For
example, there may be a core file that you can examine. More
generally: such peer hangups are frequently caused by application bugs
or other external events.
Local host: slave-1
Local PID: 17347
Peer host: master
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 4 with PID 0 on node stantion-1 exited on signal 9 (Killed).
--------------------------------------------------------------------------
[master:30431] 11 more processes have sent help message help-mpi-btl-tcp.txt / peer hung up
[master:30431] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
|
What could be the problem?
|
|
|