|
[Sponsors] |
July 24, 2007, 15:16 |
Parallel computation
|
#1 |
Guest
Posts: n/a
|
Hi,
I am running a flow solver using MPI (i.e. parallel computing) The thing is that when its running it stops the calculation and the output file says That a node(xxx): "waiting too long for completion" can anyone tell me a solution to this. Has anyone encountered this problem before what is the remedy.. Thanks Shrini |
|
July 25, 2007, 12:39 |
Re: Parallel computation
|
#2 |
Guest
Posts: n/a
|
Could it be that the node xxx is waiting to receive data (blocking) from some other node, say yyy and there is no send posted by yyy to xxx ?
Use a debugger or insert print statements just before and after the receive statements to see where exactly it is getting stuck. |
|
July 25, 2007, 12:47 |
Re: Parallel computation
|
#3 |
Guest
Posts: n/a
|
Thanks Agg,
Ok, Actually the solver runs for like ~10000 time steps and then the code does not respond/stalls and when i kill the job the output file displays that node(xxx) is waiting long for completion. I checked the SEND RECEIVE command too tht is doing fine. Is this anything to do with load balancing algorithm or something like that. Also, when I check a particular status of a node....It echoes that the node is running and also the job is running...but again the output file from the code is not appended, this forces me to delete the job and the diagnosis report tells that 3 of four nodes are waiting to complete. Thanks for the help, br, _shrini |
|
July 26, 2007, 15:37 |
Re: Parallel computation
|
#4 |
Guest
Posts: n/a
|
Does the problem run to completion on one processor?
Load balancing may be a problem. However, why does the problem occur only after 10000 time steps? There must be some collective communication (e.g. time step calculation using allreduce) where all processors must wait at the end of each time step. The load balance problem should then be seen after each time step. You said you are using a flow solver. What variables are you computing? u,v,w,p,rho? |
|
July 26, 2007, 16:00 |
Re: Parallel computation
|
#5 |
Guest
Posts: n/a
|
Thanks agg,
I mean it is not specific to 10000 time steps. This occurs abruptly. Also this problem does not occur always. I have encountered this problem 3/15 times that I have run the case. Yes i am computing u,v,w rho and T. It is a incompressible flow solver with structured mesh blocks and unstructured decomposition of the blocks i.e.Adjacent blocks maybe oriented in a different way, i axis of one block coincides with j axis of another block. Currently I am running 8 blocks on 8 processors and the load balancing is turned off. thanks for everything Shirnivas |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
problem in the CFX12.1 parallel computation | BalanceChen | ANSYS | 2 | July 7, 2011 11:26 |
Parallel computation using NUMECA 6.1 | BalanceChen | Fidelity CFD | 1 | June 5, 2011 07:24 |
Why the parallel computation is slow | ztdep | OpenFOAM Running, Solving & CFD | 1 | May 1, 2008 05:55 |
how to parallel computation | Jane | Siemens | 2 | April 28, 2004 07:11 |
Parallel computation problem in Tascflow | dandy | CFX | 3 | April 21, 2002 01:32 |