|
[Sponsors] |
Message truncated, error stack: MPIDI_CH3U_Receive_data_found |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
October 27, 2013, 17:05 |
Message truncated, error stack: MPIDI_CH3U_Receive_data_found
|
#1 |
Senior Member
Vishal Nandigana
Join Date: Mar 2009
Location: Champaign, Illinois, U.S.A
Posts: 208
Rep Power: 18 |
Hi Brunos,
I was trying to install OF 2.0.0 in CentOS version 4.x. It did not work. Anyway, I think I would update the OS and install it in the future. However, I have another question regarding running codes in parallel in OF. It appears that my solver runs fine in serial mode, however, when I run it in solver, I received the following error message. Fatal error in MPI_Recv: Message truncated, error stack: MPIDI_CH3U_Receive_data_found(257): Message from rank 2 and tag 1 truncated; 8 bytes received but buffer size is 4 I tried to search for any similar errors encountered by the OF users before, but could not find any good suggestion. Please let me know what is the source for this error and how to fix this issue. Thanks Regards, Vishal |
|
October 28, 2013, 03:47 |
|
#2 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Vishal,
I suspect that you need to find the correct environment variable for setting the buffer size for the MPI toolbox to use. For example, OpenFOAM sets for Open-MPI the variable "MPI_BUFFER_SIZE": https://github.com/OpenFOAM/OpenFOAM...ttings.sh#L580 Best regards, Bruno
__________________
|
|
October 28, 2013, 11:28 |
|
#3 |
Senior Member
Vishal Nandigana
Join Date: Mar 2009
Location: Champaign, Illinois, U.S.A
Posts: 208
Rep Power: 18 |
Hi Brunos,
Could you provide more details on this one. I am not that familiar on how to identify the environment variables to set the buffer size. I did look at the settings.sh file and the MPI_BUFFER_SIZE is set similar to the link you had mentioned. It would be great if you could help me out in this one. Thanks Regards, Vishal |
|
October 28, 2013, 17:12 |
|
#4 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Vishal,
I've moved this line of conversation from http://www.cfd-online.com/Forums/ope...eleased-4.html to this new thread, because the other one refers to installing OpenFOAM, not regarding running it in parallel As for your problem, I need to know a few things:
Bruno
__________________
|
|
October 29, 2013, 12:38 |
|
#5 |
Senior Member
Vishal Nandigana
Join Date: Mar 2009
Location: Champaign, Illinois, U.S.A
Posts: 208
Rep Power: 18 |
Hi Brunos,
Below is the reply for all your questions. 1. Linux distribution: Linux taubh1 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 11:13:47 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux Scientific Linux release 6.1 (Carbon) 2. OpenFoam version: OpenFoam-1.7.1 3. Which MPI toolbox is being used with OpenFOAM? Command: echo $FOAM_MPI (did not work) Command used: echo $FOAM_MPI_LIBBIN message: /home/nandiga1/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/mvapich2-1.6-gcc+ifort 3.2. Check which mpirun is being found: which mpirun /usr/local/mvapich2-1.6-gcc+ifort/bin/mpirun ls -l $(which mpirun) lrwxrwxrwx 1 394298 394298 13 2011-11-18 16:53 /usr/local/mvapich2-1.6-gcc+ifort/bin/mpirun -> mpiexec.hydra 3.3. Check which version of MPI it's being used: mpirun --version HYDRA build details: Version: 1.6rc3 Release Date: unreleased development copy CC: gcc -fpic CXX: g++ -fpic F77: ifort -fpic F90: ifort -fpic Configure options: '--prefix=/usr/local/mvapich2-1.6-gcc+ifort' 'CC=gcc -fpic' 'CXX=g++ -fpic' 'F77=ifort -fpic' 'F90=ifort -fpic' 'FC=ifort -fpic' '--with-mpe' '--enable-sharedlibs=gcc' '--disable-checkerrors' '--with-atomic-primitives=auto_allow_emulation' 'CFLAGS= -DNDEBUG -O2' 'LDFLAGS= ' 'LIBS= -lpthread -libverbs -libumad -ldl -lrt ' 'CPPFLAGS= -I/usr/local/src/mvapich/mvapich2-1.6/src/openpa/src -I/usr/local/src/mvapich/mvapich2-1.6/src/openpa/src' Process Manager: pmi Launchers available: ssh rsh fork slurm ll lsf sge none persist Binding libraries available: hwloc plpa Resource management kernels available: none slurm ll lsf sge pbs Checkpointing libraries available: Demux engines available: poll select 3.4 What exact command are you using for launching the application in parallel? #PBS -q cse #PBS -l nodes=1pn=12 #PBS -l walltime=60:30:00 #PBS -j oe #PBS -o simout #PBS -N 2D_circular cd ${PBS_O_WORKDIR} module load mvapich2/1.6-gcc+ifort mpiexec -np 12 circularFoam_full -parallel Hope this would provide you some idea regarding the problem. Thanks Regards, Vishal |
|
November 2, 2013, 17:24 |
|
#6 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Vishal,
OK, I got a better idea of the system you've got, but no clear notion as to why this error occurs. Some searching online gave me the indication that it could be a memory limitation problem on the machines themselves. In other words, perhaps the mesh is too big for the machines you want to use. Another indication was that there is no way to control the buffer size on mvapich2. I suggest that you do a basic communication test on the cluster, following the instructions given here on how to test if MPI is working: post #4 of "openfoam 1.6 on debian etch", and/or post #19 of "OpenFOAM updates" Then try to run one of OpenFOAM's tutorials in parallel, such as the tutorial "multiphase/interFoam/laminar/damBreak". Best regards, Bruno
__________________
|
|
August 19, 2015, 02:21 |
|
#7 |
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
Hello,
I am also getting the same error for my parallel simulations: Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(184).......................: MPI_Recv(buf=0x12e3180, count=21, MPI_PACKED, src=1, tag=1, MPI_COMM_WORLD, status=0x7fff4975d160) failed MPIDI_CH3U_Request_unpack_uebuf(691): Message truncated; 7776 bytes received but buffer size is 21 I am also using mvapich2 on my machine in cluster. But I was running the tutorial on "multiphase/interFoam/laminar/damBreak" and that case is running good in parallel without any error. My domain is 2D and very small (1.5m x 0.4m) and mesh (500x150). I am not sure why I am getting that error for some specific cases. Anyone have the solution? Thanks |
|
August 19, 2015, 08:58 |
|
#8 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings mmmn036,
After some more researching online, that seems to be a problem with mvapich2 1.9. Which version are you using? Beyond that, my guess is that the problem is related to a wrongly configured shell environment for using mvapich2? Check its manual for more details. There is a way to test running in parallel in OpenFOAM, namely by compiling and using the Test-parallel application. More details available here: Quote:
Bruno
__________________
|
||
August 19, 2015, 12:44 |
|
#9 |
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
I run the following command:
which mpirun /opt/apps/intel14/mvapich2/2.0b/bin/mpirun |
|
August 19, 2015, 19:11 |
|
#11 |
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
||
August 20, 2015, 14:16 |
|
#12 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi mmmn036,
Sigh... you could have stated that sooner And I had forgotten that foam-extend didn't have the test folder for some reason... OK, run the following commands, for getting and building the application: Code:
mkdir -p $FOAM_RUN cd $FOAM_RUN mkdir parallelTest wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/parallelTest.C mkdir Make cd Make/ wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/Make/options wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/Make/files If it works, it should output something like this: Code:
Create time [1] Starting transfers [1] [1] slave sending to master 0 [1] slave receiving from master 0 [0] Starting transfers [0] [0] master receiving from slave 1 [0] (0 1 2) [0] master sending to slave 1 End [1] (0 1 2) Finalising parallel run Bruno
__________________
|
|
August 20, 2015, 20:34 |
|
#13 | |
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
Quote:
I ran the following command : Code:
foamJob -p -s parallelTest Code:
Parallel processing using MV2MPI with 16 processors Executing: mpirun -np 16 /work/02813/jzb292/foam/foam-extend-3.1/bin/foamExec parallelTest -parallel | tee log It is still showing the same error while I run in parallel. |
||
August 20, 2015, 23:25 |
|
#15 | |
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
Quote:
Now i got something in my log file which looks similar to the thread you mentioned. Please see the attached log file. But I am still seeing the same error in parallel simulation. |
||
August 21, 2015, 12:28 |
|
#16 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Quote:
And since you're using mvapich2 2.0b, then it's not a problem related to the version itself. You wrote on your first post on this topic: Quote:
By the way, has you tried using one of the latest versions of OpenFOAM, such as 2.4.0 or 2.4.x, to see if it works with mvapich2? I ask this for the same reason as the above, as this could be a corner case that is not be contemplated in foam-extend 3.1, but might be already contemplated in OpenFOAM. |
|||
August 22, 2015, 04:03 |
|
#17 | |||||||
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
Quote:
Here is the information you asked: Which solver/application are you using? I ask this because there are some settings in "system/fvSchemes" that might help with the problem and usually that depends on the simulation being done. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||||
August 22, 2015, 09:25 |
|
#18 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
OK, with any luck I found the answer that might help your problem, as reported in these two locations:
Code:
export MV2_ON_DEMAND_THRESHOLD=16 I had seen this solution before, but the answer in the second location referred to another error message, which was why I hadn't suggested this before. |
|
August 22, 2015, 13:50 |
|
#19 | ||
Member
Manjura Maula Md. Nayamatullah
Join Date: May 2013
Location: San Antonio, Texas, USA
Posts: 42
Rep Power: 13 |
Quote:
I got a answer in other thread http://www.cfd-online.com/Forums/ope...tml#post519793 Following the change in below fix my issue. I can run my cases parallel now. Quote:
|
|||
August 22, 2015, 14:24 |
|
#20 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
I'm glad you've found the solution!
Changing the blocking/non-blocking option had already crossed my mind, but it always felt that the issue was on the side of MVAPICH2. The other reason is that I thought that foam-extend was set to non-blocking by default, was because OpenFOAM is like that since at least 1.5.x!? But apparently it was changed in foam-extend without an explicit explanation in the respective commit, on the 2010-09-21 15:32:04... |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Guide: Getting Started with the CFD Online Discussion Forums | pete | Site Help, Feedback & Discussions | 8 | July 29, 2016 06:00 |
single directional message transmit in parallel model | su_junwei | OpenFOAM Programming & Development | 1 | December 17, 2009 08:00 |
Error message in CFX-10 | CFDworker | CFX | 2 | October 12, 2007 08:23 |
error message | susan | Siemens | 0 | August 17, 2007 01:27 |
MPI Message truncated | josef | OpenFOAM Running, Solving & CFD | 1 | January 9, 2006 07:29 |