|
[Sponsors] |
999999 (../../src/mpsystem.c@1123):mpt_read: failed:errno = 11 |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 21, 2011, 10:51 |
999999 (../../src/mpsystem.c@1123):mpt_read: failed:errno = 11
|
#1 |
New Member
Giuse
Join Date: Jul 2010
Location: Italy
Posts: 21
Rep Power: 16 |
Hi everybody!
I'm facing a serious problem trying to simulate a complex multiphase species transport model within an axialsymmetric domain. To model such a complex problem I'm using 3 different UDFs: 2 imposed as boundary conditions (consumption terms) and 1 executed at the end of each time-step which computes variables to apply to the other two UDFs. These UDF are correctly compiled (with no mistake) and when I start the simulation, in serial, they work efficiently. The error arises when I start the simulation in parallel. In particular at the end of the first time-step an error pops out: ================================================== ============================ Stack backtrace generated for node id 4 on signal 11 : ================================================== ============================ Stack backtrace generated for node id 5 on signal 11 : MPI Application rank 4 killed before MPI_Finalize() with signal 11 node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... [....] node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... 999999 (../../src/mpsystem.c@1123): mpt_read: failed: errno = 11 999999: mpt_read: error: read failed trying to read 4 bytes: Resource temporarily unavailable I'm running Fluent on a 64-bit linux cluster on 8-processors (lnamd64 architecture) and trying to run the same simulation on a 32-bit linux cluster on 4 processors the error doesn't occur. The "mpt_read: error: read failed trying to read 4 bytes" message makes me think of a problem of 32 Vs 64 -bit libraries (since 4 bytes are 32 bit) .. Could you help me?? Simulation is really heavy and I need to run it on more processors I can... I thank you in advance UDS_rambler |
|
November 22, 2011, 10:23 |
|
#2 |
Member
|
We've seen this error.
Our cluster is a 64bit windows HPC system on a GigE network. We've been told by cluster experts that the MPI system is dependent on network Latency, not Bandwidth. A GigE network will have a Latency of ~5ms. An Infiniband network has a latency of microseconds. We've also been told to use 'Message Passing' for the DPM parallel scheme. And if you haven't done so, compile your UDFs for 64 bit when running on the 64 bit cluster. Hope this helps R. |
|
November 22, 2011, 10:46 |
|
#3 |
New Member
Giuse
Join Date: Jul 2010
Location: Italy
Posts: 21
Rep Power: 16 |
Thank you so much Ronald.
I was going mental because of this problem. Now I know I should ask to the personnel in charge of the maintenance of the network in order to obtain the information you cited above. I'll give you a feedback when I learn more about it. Thank you again G. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
parallel fluent runs being killed at partitioing | Ben Aga | FLUENT | 3 | June 8, 2012 11:40 |
Error in parallel fluent | federica | Main CFD Forum | 0 | November 20, 2011 06:21 |