|
[Sponsors] |
MPI problem turbulentTemperatureCoupledBaffleMixedFvPatchScala rField.C |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 26, 2019, 11:42 |
MPI problem turbulentTemperatureCoupledBaffleMixedFvPatchScala rField.C
|
#1 |
Member
Robin Kamenicky
Join Date: Mar 2016
Posts: 74
Rep Power: 11 |
Hi guys,
I have been playing around with a BC based on turbulentTemperatureCoupledBaffleMixedFvPatchScala rField.C and temperatureCoupledBase.C I have adjusted both of the files (including the header files) in a way that in the temperatureCoupledBase.C are now two different methods for calculating kappa. In the turbulentTemperatureCoupledBaffleMixedFvPatchScala rField.C is an if condition which decides which kappa function to use. The decision is made based on the user entry. My code is running well on 1CPU but I am not able to run it on more CPUs. When I use 2CPUs the solver is able to make quite a few PIMPLE iterations but when I run it on 4CPUs it freezes almost instantly. By freezing I mean that no backtrace or other error is given. The solver is still running or rather hanging in a deadlock. I can see it, when I run the top command but no output is produced anymore. When I run with mpirunDebug, SIGHUP error occures: Code:
-> Thread 1 "XYZ" received signal SIGHUP, Hangup. 0x00007fffdd069ba8 in ?? () from /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so #0 0x00007fffdd069ba8 in ?? () from /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so #1 0x00007fffe901f1ea in opal_progress () from /usr/lib/libopen-pal.so.13 #2 0x00007fffeaceff65 in ompi_request_default_wait_all () from /usr/lib/libmpi.so.12 #3 0x00007fffdbdcd426 in ompi_coll_tuned_allreduce_intra_recursivedoubling () from /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so #4 0x00007fffeacfff23 in PMPI_Allreduce () from /usr/lib/libmpi.so.12 #5 0x00007ffff041f343 in Foam::allReduce<double, Foam::sumOp<double> > (Value=@0x7fffffff4cb0: -2.4676237571465036e-09, MPICount=1, MPIType=0x7fffeaf6b5e0 <ompi_mpi_double>, MPIOp=0x7fffeaf7c920 <ompi_mpi_op_sum>, bop=..., tag=1, communicator=0) at allReduceTemplates.C:157 #6 0x00007ffff041c7f9 in Foam::reduce (Value=@0x7fffffff4cb0: -2.4676237571465036e-09, bop=..., tag=1, communicator=0) at UPstream.C:223 #7 0x000000000047e59a in Foam::gSum<double> (f=..., comm=0) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/OpenFOAM/lnInclude/FieldFunctions.C:543 #8 0x00007ffff7a26730 in Foam::gSum<double> (tf1=...) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/OpenFOAM/lnInclude/FieldFunctions.C:543 #9 0x00007ffff755ddd5 in Foam::fvc::domainIntegrate<double> (vf=...) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/finiteVolume/lnInclude/fvcVolumeIntegrate.C:95 Code:
-> Thread 1 "XYZ" received signal SIGHUP, Hangup. 0x00007fffdd069a40 in ?? () from /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so #0 0x00007fffdd069a40 in ?? () from /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so #1 0x00007fffe901f1ea in opal_progress () from /usr/lib/libopen-pal.so.13 #2 0x00007fffeaceff65 in ompi_request_default_wait_all () from /usr/lib/libmpi.so.12 #3 0x00007fffead20d27 in PMPI_Waitall () from /usr/lib/libmpi.so.12 #4 0x00007ffff041e03a in Foam::UPstream::waitRequests (start=0) at UPstream.C:730 #5 0x000000000052391c in Foam::mapDistributeBase::distribute<double, Foam::flipOp> (commsType=Foam::UPstream::commsTypes::nonBlocking, schedule=..., constructSize=20, subMap=..., subHasFlip=false, constructMap=..., constructHasFlip=false, field=..., negOp=..., tag=2) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/OpenFOAM/lnInclude/mapDistributeBaseTemplates.C:587 #6 0x000000000051fb3d in Foam::mapDistributeBase::distribute<double, Foam::flipOp> (this=0x10ab1f8, fld=..., negOp=..., tag=2) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/OpenFOAM/lnInclude/mapDistributeBaseTemplates.C:1208 #7 0x00007ffff4cd5880 in Foam::mapDistribute::distribute<double, Foam::flipOp> (this=0x10ab1f0, fld=..., negOp=..., dummyTransform=true, tag=2) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/OpenFOAM/lnInclude/mapDistributeTemplates.C:137 #8 0x00007ffff4cd56eb in Foam::mapDistribute::distribute<double> (this=0x10ab1f0, fld=..., dummyTransform=true, tag=2) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/OpenFOAM/lnInclude/mapDistributeTemplates.C:155 #9 0x00007ffff4eba4d2 in Foam::mappedPatchBase::distribute<double> (this=0xda9c28, lst=...) at /home/robin/OpenFOAM/OpenFOAM-dev-debug/src/meshTools/lnInclude/mappedPatchBaseTemplates.C:38 #10 0x00007fffd1e67928 in Foam::turbulentTemperatureCoupledBaffleMixedXYZFvPatchScalarField::updateCoeffs (this=0xf14690) at turbulentTemperatureCoupledBaffleMixedXYZ/turbulentTemperatureCoupledBaffleMixedXYZFvPatchScalarField.C:213 I define it only once as well as mpp in the boundary condition only than I split the code by the if condition to choose the kappa() function. I call updateCoeffs() inside UpdateCoeffs() inside evaluate(), If this information is of any use. Some general In formation: Code:
mpirun -V mpirun (Open MPI) 1.10.2 Ubuntu 16.04 LTS My mesh is very small with two regions. I appreciate any help or comments. Thank you Robin |
|
February 26, 2019, 17:42 |
|
#2 | |
Senior Member
Andrew Somorjai
Join Date: May 2013
Posts: 175
Rep Power: 13 |
Quote:
|
||
March 6, 2019, 14:17 |
|
#3 |
Member
Robin Kamenicky
Join Date: Mar 2016
Posts: 74
Rep Power: 11 |
Hi Andrew,
sorry for my late answer and thank you for your response. Yes the kappa is updated every time step but the function is defined in exact same manner as the original kappa function. Just look for thermophysicalProperties or for turbulenceProperties. I debuged it further and it seems that the problem is that I call this new BC( let's call it mixed BC) from another BC to evaluate the wall temperature. First run through the mixed BC is fine but then it returns to the BC from which it is called. That BC decides that it needs to make one more iteration because of BC residuals which it controls. Here starts the problem. Only some CPUs enter the mixed BC again and the others do not enter in. This means that if I use 4 CPUs, 2 CPUs wait inside the mixed BC for the other 2 and the other 2 waits outside at some other point in the code. The problem seems to be at following lines (turbulentTemperatureCoupledBaffleMixedFvPatchScal arField.C:233): Code:
this->refValue() = nbrIntFld(); this->refGrad() = 0.0; this->valueFraction() = nbrKDelta()/(nbrKDelta() + myKDelta()); Thank you again, Robin |
|
February 12, 2020, 18:08 |
|
#4 |
New Member
Benjamin Khoo
Join Date: Aug 2018
Posts: 3
Rep Power: 8 |
Hi Robin,
Did you manage to solve the issue? I'm facing a similar issue too |
|
February 13, 2020, 10:26 |
|
#5 |
Member
Robin Kamenicky
Join Date: Mar 2016
Posts: 74
Rep Power: 11 |
Hi Benjamin,
Yes, I was able to solve it. Actually, I saw that same problem is also in OF-1906 and OF-1912, still need to contact developers and discuss it. There is no bug in the original code! The main thing is that I call the function updateCoeffs() of the class from some other boundary condition. In the other boundary condition, there is an if condition within for loop to evaluate an error. Based on the evaluation the for loop is exited or looped again. The problem occurs when the if condition is evaluated based on separate results from each processor. Some processor evaluates the if condition as true and some false. Therefore some processor stays in the for loop and other processor exits the for loop. The hanging occurs when the processors go into different part of the code an both encounter a mpi function which requires information from all processors. That means, that they wait for the other processors for ever. To circumvent the problem the if condition must be evaluated using and information from all processors not from each of them separately. This might be done by using g functions such as gMax(). Code:
error = gMax(mag(oldValue - newValue)) if (error < 1e-4) {..} Have a great day, Robin |
|
February 13, 2020, 17:46 |
|
#6 |
Senior Member
Herpes Free Engineer
Join Date: Sep 2019
Location: The Home Under The Ground with the Lost Boys
Posts: 931
Rep Power: 13 |
Is there a bug, or the problem caused by your modifications. If you think there is a bug, I would issue abug ticket in GitLab (considerinb u r using ESI-OF version).
__________________
The OpenFOAM community is the biggest contributor to OpenFOAM: User guide/Wiki-1/Wiki-2/Code guide/Code Wiki/Journal Nilsson/Guerrero/Holzinger/Holzmann/Nagy/Santos/Nozaki/Jasak/Primer Governance Bugs/Features: OpenFOAM (ESI-OpenCFD-Trademark) Bugs/Features: FOAM-Extend (Wikki-FSB) Bugs: OpenFOAM.org How to create a MWE New: Forkable OpenFOAM mirror |
|
February 13, 2020, 18:23 |
|
#7 |
Member
Robin Kamenicky
Join Date: Mar 2016
Posts: 74
Rep Power: 11 |
Hi HPE,
As I have written in the previous post, there is no bug. The code modifications I made were initially made in OF-6 from openfoam.org. Some time later, I have found that same code was implemented in OF-1906 but part of the code was commented out because of lagging when multiple CPUs was used (Exactly for the same reason I have encountered during my development in OF-6). This commented part of the code has been then deleted in OF-v1912. I find the part of the code to be pretty important therefore I have issued a ticket couple hours ago (https://develop.openfoam.com/Develop...am/issues/1592). I would call it as a room for improvement instead of bug. Thanks Robin |
|
February 13, 2020, 18:25 |
|
#8 |
Senior Member
Herpes Free Engineer
Join Date: Sep 2019
Location: The Home Under The Ground with the Lost Boys
Posts: 931
Rep Power: 13 |
Very interesting. Thank you.
__________________
The OpenFOAM community is the biggest contributor to OpenFOAM: User guide/Wiki-1/Wiki-2/Code guide/Code Wiki/Journal Nilsson/Guerrero/Holzinger/Holzmann/Nagy/Santos/Nozaki/Jasak/Primer Governance Bugs/Features: OpenFOAM (ESI-OpenCFD-Trademark) Bugs/Features: FOAM-Extend (Wikki-FSB) Bugs: OpenFOAM.org How to create a MWE New: Forkable OpenFOAM mirror |
|
February 14, 2020, 18:55 |
|
#9 | |
New Member
Benjamin Khoo
Join Date: Aug 2018
Posts: 3
Rep Power: 8 |
Quote:
|
||
Tags |
boundary condition error, mpi, multiregion, parallel |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
MPI problem when using snappyHexMesh | Mohamed Mousa | OpenFOAM Running, Solving & CFD | 3 | September 17, 2017 12:45 |
MPI problem with fluent | daviyu | FLUENT | 0 | July 8, 2017 05:22 |
Problem running in parralel | Val | OpenFOAM Running, Solving & CFD | 1 | June 12, 2014 03:47 |
MPI problem with fluent | alexsatan | FLUENT | 2 | July 9, 2013 05:56 |
OpenFOAM 1.7.1 installation problem on OpenSUSE 11.3 | flakid | OpenFOAM Installation | 16 | December 28, 2010 09:48 |