|
[Sponsors] |
April 13, 2004, 05:35 |
MPI and parallel computation
|
#1 |
Guest
Posts: n/a
|
Hi All,
When I run my code on Origin 3700/2000 manchine, everything is ok if total cells were below 200x200x200. However, when the size is 256x256x256, the code is crashed. Analysis using Totalview shows that the crashed point was at MPI_Wait for non-block message passing. If using block message passing, the crashed point was at MPI_Recv. The error information is as followings: MPI: Program ./lbm, Rank 15, Process 11849507 received signal SIGSEGV(11) MPI: --------stack traceback------- 11849507(5): 0xaf82b50[MPI_SGI_stacktraceback] 0xaf82f98[first_arriver_handler] 0xaf83228[slave_sig_handler] 0xd9f7ff4[memmove] 0x67[memmove] FATAL: Protocol version of Server /merged/2.9.1 does not match version of Client /merged/2.1 Your command referenced dbx but env var TOOLROOT is set to /sanopt/dbx/7.3.3 Perhaps try $TOOLROOT/usr/bin/dbx MPI: dbx version 7.3.2 73509_May21 MR May 21 2001 17:15:31 MPI: -----stack traceback ends----- MPI: Program ./lbm, Rank 15, Process 11849507: Dumping core on signal SIGSEGV(11) into directory /sanhp/scrijw/lbm/lbm16mar MPI: MPI_COMM_WORLD rank 15 has terminated without calling MPI_Finalize() MPI: aborting job MPI: Received signal 11 I can not understand the outfile about the case. I attached the outfile of the result. Could you explain it for me if you have similar experience? Thank you very much in advance |
|
April 13, 2004, 05:44 |
Re: MPI and parallel computation
|
#2 |
Guest
Posts: n/a
|
Hi,
A very simple question. Are you trying to use more memory than is available? the error messages can be quite criptical in that case. /Tom |
|
April 13, 2004, 06:08 |
Re: MPI and parallel computation
|
#3 |
Guest
Posts: n/a
|
Hi Tom,
Thank you very much for your reply. I am confusing about that. If it is memory problem, the code should be crashed when the matrices are initialised. However, now the crash occurs after all of the matrices have been initialised. In addition, I can not understand what means the error information. Could you help me to explain it? |
|
April 13, 2004, 07:55 |
Re: MPI and parallel computation
|
#4 |
Guest
Posts: n/a
|
Hi,
The error does not have to be related to MPI. The first error message says there is some segmentation violation (bad memory access). It can be some bug in the program. Try some array checking or use the command stop to find out where the error occurs. You can also the command size to see if the executable is not too large for the machine. /Tom |
|
April 14, 2004, 07:34 |
Re: MPI and parallel computation
|
#5 |
Guest
Posts: n/a
|
Hi Tom,
Thank you very much for your help. I try to debug using Totalview. Before MPI_Send or MPI_Recv, the array variales are ok. The code can not pass the MPI_Send or MPI_Recv. That is to say, the code stopes during message passing. Have you some idea how to check the variables like this case? Furthermore, how to see if the executable is too large for the machine? |
|
April 14, 2004, 11:35 |
Re: MPI and parallel computation
|
#6 |
Guest
Posts: n/a
|
Perhaps it would be worth seeing if the looping structures are looking to send more informaiton can is available. See what the arguments that go into MPI_SEND nad MPI_RECV are, and if they are what you expect. It possibly stops when there is informaiton that is being sent or received that does not 'exist'.
|
|
April 14, 2004, 12:17 |
Re: MPI and parallel computation
|
#7 |
Guest
Posts: n/a
|
i would suggest the problem is either a memory issue or as 'guest' says a problem with the array size that is being sent/recieved...the problem is how to distingush between these 2 problems.
i would say you should make one of your 225x225x225 dimensions smaller ie. 225x225x20. this will make the problem much smaller and elminiate memory as an issue. if you are sending/recieving information seperately one dimension at a time make the problem 225x225x20. you can then see if the send and recieve works in each direction. if you are sending information as one 225x225x225 block you can try sending many 225x225x20 blocks at once i.e. 12 of them, to check the buffer size is above your 225x225x225 limit. |
|
April 15, 2004, 12:25 |
Re: MPI and parallel computation
|
#8 |
Guest
Posts: n/a
|
Hi All,
Thank all of you very much for your help. Now this problem have solved. It is the machine memory problem. When the array size is smaller and the code is no problems. After I rm some the arraies and optimise the memory, the code is running. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
problem in the CFX12.1 parallel computation | BalanceChen | ANSYS | 2 | July 7, 2011 11:26 |
HP MPI warning...Distributed parallel processing | Peter | CFX | 10 | May 14, 2011 07:17 |
Is Testsuite on the way or not | lakeat | OpenFOAM Installation | 6 | April 28, 2008 12:12 |
MPI and parallel computation | Wang | Main CFD Forum | 4 | April 3, 2005 07:40 |
PROBLEM IN PARALLEL PROGRAMMING WITH MPI | Niavarani | Main CFD Forum | 1 | April 20, 2004 07:51 |