CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

reactingParcelFoam 2D crash in parallel, works fine in serial

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 28, 2015, 13:15
Default reactingParcelFoam 2D crash in parallel, works fine in serial
  #1
Member
 
Ferdinand Pfender
Join Date: May 2013
Location: Berlin, Germany
Posts: 40
Rep Power: 13
FerdiFuchs is on a distinguished road
Hi everyone,

im solving a "simple" 2d channelflow with air and a spray with water in 2d (similar to $FOAM_TUT/lagrangian/reactingParcelFoam/verticalChannel), just in 2d.

When i try to run this case in parallel, the solver crashes at the first injection timestep with the following errormessage:
Code:
Solving 2-D cloud reactingCloud1

--> Cloud: reactingCloud1 injector: model1
Added 91 new parcels

[$HOSTNAME:31049] *** An error occurred in MPI_Recv
[$HOSTNAME:31049] *** reported by process [139954540642305,1]
[$HOSTNAME:31049] *** on communicator MPI_COMM_WORLD
[$HOSTNAME:31049] *** MPI_ERR_TRUNCATE: message truncated
[$HOSTNAME:31049] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[$HOSTNAME:31049] ***    and potentially your MPI job)
[$HOSTNAME:31035] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[$HOSTNAME:31035] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
In serial, it runs fine without any error. If i change number of $WM_NCOMPPROCS, sometimes the solver stucks instead of crashing. htop shows then a lot of red cpu usage (kernel threads).

i found something in the net; someone had the same error here, solved it by disable functionObjects and cloudFunctions. Not in my case...
method for decomposing is also irrelevant, i checked simple and scotch.

Maybe this thread is also better placed in OpenFOAM bugs? if someone could confirm this, i will also open an issue in OF-2.3.x-bugtracking.
Tomorrow ill check it in OF-2.4.x and in FE-3.1.

If somebody knows what to do, every help is appreciated. This case is some kind of urgent for me.

Thank you very much!
FerdiFuchs is offline   Reply With Quote

Old   June 11, 2015, 07:31
Default
  #2
Member
 
Join Date: Sep 2010
Location: Leipzig, Germany
Posts: 96
Rep Power: 16
oswald is on a distinguished road
I'm having a similar problem with a lagrangian tracking solver in parallel, based on icoUncoupledKinematicParcelFoam. It works at first, but after some time it crashes with the same error message as in your case.

Code:
[ran:7367] *** An error occurred in MPI_Waitall
[ran:7367] *** on communicator MPI_COMM_WORLD
[ran:7367] *** MPI_ERR_TRUNCATE: message truncated
[ran:7367] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7367 on
node ran exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
And, also as in your case, sometimes it just gets stuck somewhere without crashing. I tried to narrow down the step where it gets stuck and it seems to be that in kinematicCloud.evolve() in kinematicCloud.C it is stuck at getting the trackingData

Code:
template<class CloudType>
void Foam::KinematicCloud<CloudType>::evolve()
{
    Info << "start kinematicCloud.evolve" << endl;
    if (solution_.canEvolve())
    {
        Info << "solution can evolve, getting track data" << endl;
        typename parcelType::template
            TrackingData<KinematicCloud<CloudType> > td(*this);

        Info << "start solving" << endl;
        solve(td);
    }
}
When the solver is stuck, my program's last output is "solution can evolve, getting track data". So it seems to be some error there.

When changing the commsType from nonBlocking to blocking in $WM_PROJECT_DIR/etc/controlDict, the error is:
Code:
[0]
[0]
[0] --> FOAM FATAL IO ERROR:
[0] error in IOstream "IOstream" for operation operator>>(Istream&, List<T>&) : reading first token
[0]
[0] file: IOstream at line 0.
[0]
[0]     From function IOstream::fatalCheck(const char*) const
[0]     in file db/IOstreams/IOstreams/IOstream.C at line 114.
[0]
FOAM parallel run exiting
[0]

Last edited by oswald; June 11, 2015 at 08:26. Reason: new information
oswald is offline   Reply With Quote

Old   August 4, 2015, 11:32
Default reproduced
  #3
New Member
 
Join Date: Dec 2013
Posts: 4
Rep Power: 0
clockworker is on a distinguished road
Hi there!

I ran into the same error message in a case similar to
$FOAM_TUT/lagrangian/reactingParcelFoam/verticalChannel/

On the tutorial case i was able to reproduce the described behavior
with the following commands:

Code:
#!/bin/sh
cd ${0%/*} || exit 1    # run from this directory

# Source tutorial run functions
. $WM_PROJECT_DIR/bin/tools/RunFunctions

# create mesh
runApplication blockMesh

cp -r 0.org 0

# initialise with potentialFoam solution
runApplication potentialFoam

rm -f 0/phi

# run the solver
runApplication pyFoamDecompose.py . 4
runApplication pyFoamPlotRunner.py mpirun -np 4 reactingParcelFoam -parallel

# ----------------------------------------------------------------- end-of-file
The calculation hangs at

Code:
...
Courant Number mean: 1.705107874 max: 4.895575368
deltaT = 0.0004761904762
Time = 0.0109524

Solving 3-D cloud reactingCloud1
with htop showing CPU usage of ~ 100 % on all cores.

If i deactivate
Code:
dispersionModel none;//stochasticDispersionRAS;
I can reproduce the error message in the OP:

Code:
--> Cloud: reactingCloud1 injector: model1
[$Hostname:15844] *** An error occurred in MPI_Recv
[$Hostname:15844] *** on communicator MPI_COMM_WORLD
[$Hostname:15844] *** MPI_ERR_TRUNCATE: message truncated
[$Hostname:15844] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
I use Ubuntu 14.04.3 LTS with openfoam240. Can anyone else confirm this behaviour or even provide a solution?
Thank you very much for your time.

Last edited by clockworker; August 5, 2015 at 03:19. Reason: Politeness, Anonymitiy. Sorry first time posting.
clockworker is offline   Reply With Quote

Old   August 7, 2015, 04:03
Default
  #4
New Member
 
Join Date: Dec 2013
Posts: 4
Rep Power: 0
clockworker is on a distinguished road
Hi there,
I think I stumbled upon a solution
I changed the reactingCloud1Properties from

Code:
massTotal       8;
duration        10000;
to

Code:
massTotal       0.0008;
duration        1;
and the calculation continued without the error messages
Hope this helps someone.
clockworker is offline   Reply With Quote

Old   August 10, 2015, 10:18
Default
  #5
Member
 
Ferdinand Pfender
Join Date: May 2013
Location: Berlin, Germany
Posts: 40
Rep Power: 13
FerdiFuchs is on a distinguished road
mh this does not really help.

what you changed is the timeframe of the injection and the mass which is injected in this time.
The Injection starts at SOI for the defined timeframe.

If you change these values, you will definetly get results you dont want to have.

Greets,
Ferdi
FerdiFuchs is offline   Reply With Quote

Old   August 10, 2015, 19:19
Default 3rd try
  #6
New Member
 
Join Date: Dec 2013
Posts: 4
Rep Power: 0
clockworker is on a distinguished road
Hi Ferdi,

I was under the impression that you can maintain a constant mass flow rate if you change massTotal proportional to the duration according to this
HTML Code:
http://www.dhcae-tools.com/images/dhcaeLTSThermoParcelSolver.pdf
as long as duration is longer as endTime. I stand corrected if this is not the case.
Nonetheless, I was not able to reproduce the described behavior at home on 2 cores anymore. Meaning the error messages appear no matter what I do with massTotal or duration.

What I tried now was changing the injectionModel from patchInjection to coneNozzleInjection like this:

Code:
injectionModels
    {
            model1 
    { 
        type            coneNozzleInjection; 
        SOI             0.01; 
        massTotal       8; 
        parcelBasisType mass; 
        injectionMethod disc; 
      	flowType 	constantVelocity;
      	UMag 		40; 
        outerDiameter   6.5e-3; 
        innerDiameter   0; 
        duration        10000;
        position        ( 12.5e-3 -230e-3 0 ); 
        direction       ( 1 0 0 ); 
        parcelsPerSecond 1e5; 
        flowRateProfile constant 1; 
        Cd              constant 0.9; 
        thetaInner      constant 0.0; 
        thetaOuter      constant 1.0; 

        sizeDistribution 
	{
                type        general;
                generalDistribution
                {
                    distribution
                    (
                        (10e-06      0.0025)
                        (15e-06      0.0528)
                        (20e-06      0.2795)
                        (25e-06      1.0918)
                        (30e-06      2.3988)
                        (35e-06      4.4227)
                        (40e-06      6.3888)
                        (45e-06      8.6721)
                        (50e-06      10.3153)
                        (55e-06      11.6259)
                        (60e-06      12.0030)
                        (65e-06      10.4175)
                        (70e-06      10.8427)
                        (75e-06      8.0016)
                        (80e-06      6.1333)
                        (85e-06      3.8827)
                        (90e-06      3.4688)
                    );
                }
            }
And now the error messages disappear I don't know if coneNozzleInjection is applicable in 2D but perhaps that does provide a workaround. Or perhaps ManualInjection can be an alternative. I have to try this at my work case. It is also 2D.
Thanks Ferdi for taking the time.
Greetings
clockworker
clockworker is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Poisson eq w setReference works serial diverges in parallel tehache OpenFOAM Running, Solving & CFD 5 August 29, 2012 10:41
serial run fine, but parallel run diverged phsieh2005 OpenFOAM Running, Solving & CFD 2 October 6, 2009 09:33
Parallel run diverges, serial does not SammyB OpenFOAM Running, Solving & CFD 1 May 10, 2009 04:28
interpret works fine but compile doesn't Jan Balemans FLUENT 0 March 14, 2008 10:41
Serial run OK parallel one fails r2d2 OpenFOAM Running, Solving & CFD 2 November 16, 2005 13:44


All times are GMT -4. The time now is 02:22.