|
[Sponsors] |
May 13, 2009, 10:17 |
Performance of GGI case in parallel
|
#1 |
Senior Member
Hannes Kröger
Join Date: Mar 2009
Location: Rostock, Germany
Posts: 124
Rep Power: 18 |
Hello everyone,
I just tried to run a case using interDyMFoam in parallel. The case consists of a non-moving outer mesh and an inner cylindrical mesh that is rotating (with a surface-piercing propeller in it). I use GGI to connect both meshes. The inner mesh is polyhedral and the outer one hexahedral. The entire case consists of approx. 1 million cells (most of them in the inner mesh) I have run this case in parallel on a different number of processors on a SMP machine with 8 quad opteron processors (decompositionMethod metis): #Proc; time per timestep; speedup 1; 360s; 1 4; 155s; 2.3 8; 146s; 2.4 16; 130s; 2.7 So the speedup doesn't even reach 3. A similar case where the whole domain is rotating and the mesh consists only of polyhedra shows a linear speedup up to 8 processors and a decreasing parallel efficieny beyond that. I wonder if this has to do with the GGI interface? I tried to stitch it and repeat the test but unfortunately stitchMesh failed. Does anyone have an idea how to improve the parallel efficiency? Best regards, Hannes PS: Despite the missing parallel efficiency, the case seems to run fine. Typical output: Courant Number mean: 0.0005296648 max: 29.60281 velocity magnitude: 56.38879 GGI pair (slider, inside_slider) : 1.694001 1.692101 Diff = 1.71989e-05 or 0.001015283 % Time = 0.004368 Execution time for mesh.update() = 5.04 s Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.05648662 average: 0.001119383 Largest master weighting factor correction: 0.02077904 average: 0.0006102291 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero time step continuity errors : sum local = 2.139033e-14, global = -1.99808e-16, cumulative = -1.372663e-05 PCG: Solving for pcorr, Initial residual = 1, Final residual = 0.000847008, No Iterations 17 PCG: Solving for pcorr, Initial residual = 0.08500426, Final residual = 0.0003398914, No Iterations 4 time step continuity errors : sum local = 2.889153e-17, global = -1.458812e-19, cumulative = -1.372663e-05 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 MULES: Solving for gamma Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1 smoothSolver: Solving for Ux, Initial residual = 7.74393e-06, Final residual = 1.011977e-07, No Iterations 1 smoothSolver: Solving for Uy, Initial residual = 3.776374e-06, Final residual = 4.505313e-08, No Iterations 1 smoothSolver: Solving for Uz, Initial residual = 1.477087e-05, Final residual = 1.429332e-07, No Iterations 1 GAMG: Solving for pd, Initial residual = 4.955159e-05, Final residual = 9.252356e-07, No Iterations 2 GAMG: Solving for pd, Initial residual = 6.533451e-06, Final residual = 2.441995e-07, No Iterations 2 time step continuity errors : sum local = 1.379978e-12, global = 6.242956e-15, cumulative = -1.372663e-05 GAMG: Solving for pd, Initial residual = 3.37058e-06, Final residual = 1.664205e-07, No Iterations 2 PCG: Solving for pd, Initial residual = 1.65434e-06, Final residual = 4.308411e-09, No Iterations 5 time step continuity errors : sum local = 2.435524e-14, global = 7.917505e-17, cumulative = -1.372663e-05 ExecutionTime = 81834.14 s ClockTime = 81851 s |
|
May 27, 2009, 09:23 |
Solved
|
#2 |
Senior Member
Hannes Kröger
Join Date: Mar 2009
Location: Rostock, Germany
Posts: 124
Rep Power: 18 |
I just updated to SVN revision 1266 because of the performance updates for GGI.
It helped. The above case now scales linearly up to 8 processors. Thanks for this, Hrv. Best regards, Hannes |
|
May 31, 2009, 07:37 |
|
#3 |
Member
Hai Yu
Join Date: Mar 2009
Location: Harbin
Posts: 67
Rep Power: 17 |
my experience also shows that 8 cores gives the best speed.
while it slows down for 16 cores. I wonder it is because for me, each computer has 8 cores, and the communication between two computers is much inefficient. |
|
September 9, 2009, 11:28 |
|
#4 |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
I have consicered similar problems with the ggi performance in paralllel and I still get no speedup for more than 8 cores. THis is extremely bad since I have large cases typically running on 32 cores with lots of interfaces in. Running them on 8 cores is dam slow for me. Hrv: Are there plans to further improve parallel performance of ggi?
Regards BastiL |
|
October 28, 2009, 16:05 |
GGI in parallel
|
#5 |
New Member
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 17 |
Hello All,
I am running turbDyMFoam with GGI on a full wind turbine. So, the mesh is huge (~ 4 million cells). I am having problems with running in parallel. I am running on 32 processors. It runs very slowly and eventually one of the processes dies. The same job runs perfectly fine in serial, but I think this case would take a long time to finish in serial. I will be very much thankful to you if you shed some light on improving the ggi parallel performance. Thank you -- Dnyanesh |
|
October 31, 2009, 14:27 |
|
#6 | |
Senior Member
Martin Beaudoin
Join Date: Mar 2009
Posts: 332
Rep Power: 22 |
Hello Dnyanesh,
I need a bit more information in order to try to help you out.
Regards, Martin Quote:
|
||
October 31, 2009, 15:56 |
|
#7 |
New Member
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 17 |
Hello Mr. Beaudoin,
Thank you for your reply. I realize I should have given all the details before it self. I am sorry for that. Wont happen in future. 1. I was earlier using the latest SVN version. But later, since I was facing the parallel problems, I read more and thought I should follow the ERCOFTAC page. So I reverted back to version 1238. I will upgrade to latest one now. 2. I am quite sure that it is not a hardware related issue. I am running my cases on our college supercomputer cluster. I had successfully run MRFSimpleFoam on 32 cores for quite a long time. I used to get linear speed up till 32 cores. Some numbers from my case: 150K cells / processor - 12 Proc - 8 sec / time step 110K cells / processor - 24 Proc - 6 sec / time step about 75K cells / processor - 32 Proc - 2.5 sec / time step after that it would be almost stable, and later start to increase. 3. The cluster is a 72 node cluster with 8 processors per node. Each node has 4 GB RAM. The interconnect between the nodes is GigaByte Ethernet. 4. I am not using a VMWare machine. 5. The required files are copied below. Answers to some more questions: a. MPI version is mpich2-1.0.8 b. command used : mpiexec-pbs turbDyMFoam -parallel > outfile c. boundary file: FoamFile { version 2.0; format ascii; class polyBoundaryMesh; location "constant/polyMesh"; object boundary; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // 11 ( outerSliderOutlet { type ggi; nFaces 25814; startFace 5565326; shadowPatch innerSliderOutlet; zone outerSliderOutlet_zone; bridgeOverlap false; } outerSliderWall { type ggi; nFaces 43870; startFace 5591140; shadowPatch innerSliderWall; zone outerSliderWall_zone; bridgeOverlap false; } outerSliderInlet { type ggi; nFaces 25814; startFace 5635010; shadowPatch innerSliderInlet; zone outerSliderInlet_zone; bridgeOverlap false; } innerSliderOutlet { type ggi; nFaces 18596; startFace 5660824; shadowPatch outerSliderOutlet; zone innerSliderOutlet_zone; bridgeOverlap false; } innerSliderInlet { type ggi; nFaces 1148; startFace 5679420; shadowPatch outerSliderInlet; zone innerSliderInlet_zone; bridgeOverlap false; } innerSliderWall { type ggi; nFaces 5424; startFace 5680568; shadowPatch outerSliderWall; zone innerSliderWall_zone; bridgeOverlap false; } tower_plate { type wall; nFaces 13024; startFace 5685992; } rotor { type wall; nFaces 19180; startFace 5699016; } outlet { type patch; nFaces 2052; startFace 5718196; } outer_wall { type wall; nFaces 10390; startFace 5720248; } inlet { type patch; nFaces 2052; startFace 5730638; } ) d. CONTROLDICT: FoamFile { version 2.0; format ascii; class dictionary; object controlDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // applicationClass icoTopoFoam; startFrom startTime; startTime 0.108562; stopAt endTime; endTime 5; deltaT 0.05; writeControl timeStep; writeInterval 20; cycleWrite 0; writeFormat ascii; writePrecision 6; writeCompression uncompressed; timeFormat general; timePrecision 6; runTimeModifiable yes; adjustTimeStep yes; maxCo 1; maxDeltaT 1.0; functions ( ggiCheck { // Type of functionObject type ggiCheck; phi phi; // Where to load it from (if not already in solver) functionObjectLibs ("libsampling.so"); } ); e. decomposeParDict: FoamFile { version 2.0; format ascii; root ""; case ""; instance ""; local ""; class dictionary; object decomposeParDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // numberOfSubdomains 8; // Patches or Face Zones which need to face both cells on the same CPU //preservePatches (innerSliderInlet outerSliderInlet innerSliderWall outerSliderWall outerSliderOutlet innerSliderOutlet); //preserveFaceZones (innerSliderInlet_zone outerSliderInlet_zone innerSliderWall_zone outerSliderWall_zone outerSliderOutlet_zone innerSliderOutlet_zone); // Face zones which need to be present on all CPUs in its entirety globalFaceZones (innerSliderInlet_zone outerSliderInlet_zone innerSliderWall_zone outerSliderWall_zone outerSliderOutlet_zone innerSliderOutlet_zone); method metis; simpleCoeffs { n (4 2 1); delta 0.001; } hierarchicalCoeffs { n (1 1 1); delta 0.001; order xyz; } metisCoeffs { processorWeights ( 1 1 1 1 1 1 1 1 ); } manualCoeffs { dataFile "cellDecomposition"; } distributed no; roots ( ); f. dynamicMeshDict: dynamicFvMeshLib "libtopoChangerFvMesh.so"; dynamicFvMesh mixerGgiFvMesh; mixerGgiFvMeshCoeffs { coordinateSystem { type cylindrical; origin (0 0 0); axis (0 0 1); direction (0 1 0); } rpm -72; slider { moving ( innerSliderInlet innerSliderWall innerSliderOutlet ); static ( outerSliderInlet outerSliderWall outerSliderOutlet ); } } NOTE: I had a doubt. Since my rotating zone has the finest mesh. So, the GGI faces have the maximum number of cells. When I use globalFaceZones in decomposeParDict, does it copy all the ggi faces on all processors? If that is the case, then it would run really slow, because, it will take time to interpolate between 100K cells and communicate data. Please forgive me if what I am thinking is wrong. Thank you very much for your help. I am grateful. Sincerely, -- Dnyanesh Digraskar Last edited by ddigrask; October 31, 2009 at 19:30. |
|
October 31, 2009, 23:13 |
|
#8 | |
Senior Member
Martin Beaudoin
Join Date: Mar 2009
Posts: 332
Rep Power: 22 |
Hello Dnyanesh,
Thank you for the information, this is much more helpful. From the information present in your boundary file, I can see that your GGI interfaces are indeed composed of large sets of facets. With the actual implementation of the GGI, this will have an impact because the GGI faceZones are shared on all the processors and communication will take it's toll. Also, one internal algorithm of the GGI is a bit slow when using very large numbers of facets for the GGI patches (my bad here, but I am working on it...). But not to the point of a simulation to crash and burn like you are describing. So another important piece of information I need is your simulation log file; not the PBS log file, but the log messages generated by turbDyMFoam when running your 32 processors parallel run. This file is probably quite large for posting on the Forum, so I would like to see at least the very early log messages, from line #1 (the turbDyMFoam splash header) down to let's say the 10th simulation time step. I also need to see the log for the last 10 time steps, just before your application crashed. As a side note: As I mentioned, I am currently working on some improvements to the GGI in order speed up the code when using GGI patches with a large number of facets (100K and +). My research group needs to run large GGI cases like that, so this is a priority for me to get this nailed down asap. We will contribute our modifications to Hrv's dev version, so you will have access to the improvements as well. Regards, Martin Quote:
|
||
November 1, 2009, 15:42 |
|
#9 |
New Member
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 17 |
Dear Mr. Beaudoin,
Sorry for a bit late reply. The turbDyMFoam output is attached below. The code doesnot crash because of solver settings, it just waits on some step during the calculation and finally dies giving MPI error. After carefully looking at each time step output, I have observed that the maximum time consuming part of the solution is the GGI Interpolation step. That is where the solver takes about 2 - 3 minutes to post the output. Following is the turbDyMFoam output: /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.5-dev | | \\ / A nd | Revision: 1388 | | \\/ M anipulation | Web: http://www.OpenFOAM.org | \*---------------------------------------------------------------------------*/ Exec : turbDyMFoam -parallel Date : Nov 01 2009 Time : 14:13:26 Host : node76 PID : 26246 Case : /home/ddigrask/OpenFOAM/ddigrask-1.5-dev/run/fall2009/ggi/turbineGgi_bigMesh nProcs : 32 Slaves : 31 ( node76.26247 node76.26248 node76.26249 node76.26250 node76.26251 node76.26252 node76.26253 node23.22676 node23.22677 node23.22678 node23.22679 node23.22680 node23.22681 node23.22682 node23.22683 node42.22800 node42.22801 node42.22802 node42.22803 node42.22804 node42.22805 node42.22806 node42.22807 node31.31933 node31.31934 node31.31935 node31.31936 node31.31937 node31.31938 node31.31939 node31.31940 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Create dynamic mesh for time = 0 Selecting dynamicFvMesh mixerGgiFvMesh void mixerGgiFvMesh::addZonesAndModifiers() : Zones and modifiers already present. Skipping. Mixer mesh: origin: (0 0 0) axis : (0 0 1) rpm : -72 Reading field p Reading field U Reading/calculating face flux field phi Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.00102394 average: 1.66438e-06 Largest master weighting factor correction: 0.0922472 average: 0.000376558 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.00104841 average: 0.000148099 Largest master weighting factor correction: 2.79095e-06 average: 4.34772e-09 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 1.51821e-05 average: 3.4379e-07 Largest master weighting factor correction: 0.176367 average: 0.00114207 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Selecting incompressible transport model Newtonian Selecting RAS turbulence model SpalartAllmaras Reading field rAU if present Starting time loop Courant Number mean: 0.00864601 max: 1.0153 velocity magnitude: 10 deltaT = 0.000492466 --> FOAM Warning : From function dlLibraryTable:pen(const dictionary& dict, const word& libsEntry, const TablePtr tablePtr) in file lnInclude/dlLibraryTableTemplates.C at line 68 library "libsampling.so" did not introduce any new entries Creating ggi check Time = 0.000492466 Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.000966753 average: 1.68023e-06 Largest master weighting factor correction: 0.0926134 average: 0.000376611 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.00104722 average: 0.000148213 Largest master weighting factor correction: 2.90604e-06 average: 4.45196e-09 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 1.56139e-05 average: 3.5192e-07 Largest master weighting factor correction: 0.179572 average: 0.0011419 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero PBiCG: Solving for Ux, Initial residual = 1, Final residual = 1.15144e-06, No Iterations 9 PBiCG: Solving for Uy, Initial residual = 1, Final residual = 1.69491e-06, No Iterations 9 PBiCG: Solving for Uz, Initial residual = 1, Final residual = 5.0914e-06, No Iterations 7 GAMG: Solving for p, Initial residual = 1, Final residual = 0.0214318, No Iterations 6 time step continuity errors : sum local = 1.50927e-07, global = -1.79063e-08, cumulative = -1.79063e-08 GAMG: Solving for p, Initial residual = 0.284648, Final residual = 0.00377362, No Iterations 2 time step continuity errors : sum local = 5.39845e-07, global = 3.49001e-09, cumulative = -1.44163e-08 (END) It is not even one complete time step. This is all the code has run for. After this the code quits with MPI error. mpiexec-pbs: Warning: tasks 0-29,31 died with signal 15 (Terminated). mpiexec-pbs: Warning: task 30 died with signal 9 (Killed). Thank you again for your help. I am also trying to run the same case with just 2 ggi faces (instead of 6), i.e. ggiInside and ggiOutside. But even this does not help from making it run faster. Sincerely, -- Dnyanesh Digraskar Last edited by ddigrask; November 1, 2009 at 16:42. |
|
November 1, 2009, 17:57 |
|
#10 | |
Senior Member
Martin Beaudoin
Join Date: Mar 2009
Posts: 332
Rep Power: 22 |
Hello,
Some comments: 1: It would be useful to see a stack trace in your log file when your run aborts. Could you set the environment variable FOAM_ABORT=1 and make sure every parallel task got this variable activated as well? That way, we could see where the parallel tasks are crashing through the stack trace in the log file. 2: You said your cluster has 72 nodes, 8 processors per node and each node has 4 GB RAM. 3: From your log file, we can see that you have 8 parallel tasks running on each node. Overall, your parallel run is using only 4 nodes on your cluster (node76, node23, node42 and node31). 4: So basically, for a ~4 million cells mesh, you are using only 4 computers, each with only 4 GB of RAM, and 8 tasks per node fighting simultaneously for access to this amount of RAM. Am I right? If so, because of your large mesh, your 4 nodes probably don't have enough memory available, and could be swapping for virtual memory on the hard-drive, which is quite slow. And depending on your memory bus architecture, your 8 tasks will have to compete for access to the memory bus, which will slow you down as well. Did you meant 4 GB RAM per processor instead, which would give you 32 GB RAM per node or computer? Could you just double-check that your cluster information is accurate? Martin Quote:
|
||
November 1, 2009, 20:59 |
|
#11 |
New Member
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 17 |
Hello Mr. Beaudoin,
Thank you for your reply. I was a little bit confused between cores and processors. The cluster is 72 nodes --- 8 cores per node --- 4 GB RAM per node. My information about memory per node (computer) is correct. I had also tried running the same job on 32 nodes with one process per node. That takes more time than this. I will post the stack trace log soon. I will also try running the case on more cores (maybe 48 or 56) in order to avoid the memory bottleneck. Thank you again for help. Sincerely, -- Dnyanesh Digraskar |
|
November 2, 2009, 17:22 |
|
#12 |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
Martin,
this sounds great to me since I have similar problems with large models with many ggi-pairs. I am really looking forward to this. Regards BastiL |
|
November 2, 2009, 18:23 |
|
#13 |
New Member
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 17 |
Hello Mr. Beaudoin,
Some updates to my previous post. After enabling FOAM_ABORT=1, I get a detailed MPI error message than earlier one: Create time Create dynamic mesh for time = 0 Selecting dynamicFvMesh mixerGgiFvMesh void mixerGgiFvMesh::addZonesAndModifiers() : Zones and modifiers already present. Skipping. Mixer mesh: origin: (0 0 0) axis : (0 0 1) rpm : -72 Reading field p Reading field U Reading/calculating face flux field phi Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.00102394 average: 1.66438e-06 Largest master weighting factor correction: 0.0922472 average: 0.000376558 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.00104841 average: 0.000148099 Largest master weighting factor correction: 2.79095e-06 average: 4.34772e-09 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 1.51821e-05 average: 3.4379e-07 Largest master weighting factor correction: 0.176367 average: 0.00114207 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Selecting incompressible transport model Newtonian Selecting RAS turbulence model SpalartAllmaras Reading field rAU if present Working directory is /home/ddigrask/OpenFOAM/ddigrask-1.5-dev/run/fall2009/ggi/turbineGgi_bigMesh --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Selecting incompressible transport model Newtonian Selecting RAS turbulence model SpalartAllmaras Reading field rAU if present Starting time loop Courant Number mean: 0.00865431 max: 1.0153 velocity magnitude: 10 deltaT = 0.000492466 --> FOAM Warning : From function dlLibraryTable:pen(const dictionary& dict, const word& libsEntry, const TablePtr tablePtr) in file lnInclude/dlLibraryTableTemplates.C at line 68 library "libsampling.so" did not introduce any new entries Creating ggi check Time = 0.000492466 Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.000966753 average: 1.68023e-06 Largest master weighting factor correction: 0.0926134 average: 0.000376611 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall Evaluation of GGI weighting factors: Largest slave weighting factor correction : 0.00104722 average: 0.000148213 Largest master weighting factor correction: 2.90604e-06 average: 4.45196e-09 --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero --> FOAM Warning : From function min(const UList<Type>&) in file lnInclude/FieldFunctions.C at line 342 empty field, returning zero Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet Evaluation of GGI weighting factors: Largest slave weighting factor correction : 1.56139e-05 average: 3.5192e-07 Largest master weighting factor correction: 0.179572 average: 0.0011419 Fatal error in MPI_Send: Other MPI error, error stack: MPI_Send(173).............................: MPI_Send(buf=0x7f4487e25010, count=1052888, MPI_PACKED, dest=0, tag=1, MPI_COMM_WORLD) failed MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(420): MPIDU_Socki_handle_read(637)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)[cli_8]: aborting job: Fatal error in MPI_Send: Other MPI error, error stack: MPI_Send(173).............................: MPI_Send(buf=0x7f4487e25010, count=1052888, MPI_PACKED, dest=0, tag=1, MPI_COMM_WORLD) failed MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(420): MPIDU_Socki_handle_read(637)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer) Again, this is not even one complete time step and the code quits after almost 20 mins. 2. I have also tried running the same case on 56 and 64 cores. (i.e 7 and 8 nodes respectively). There was no change in the output. 3. Mr. Oliver Petit suggested me to manually create movingCells cellzone. I will try that to see if it helps. Thank you. Sincerely. -- Dnyanesh Digraskar |
|
November 2, 2009, 20:31 |
|
#14 | |
Senior Member
Martin Beaudoin
Join Date: Mar 2009
Posts: 332
Rep Power: 22 |
Hello,
Quote:
It only says that it crashed in a MPI operation. We don't know where. It can be in the GGI code, it can be in the solver, it can be anywhere MPI is being used. So unfortunately, this stack trace is useless. I don't have enough information to help you much more. Try logging on your compute nodes to see if you have enough memory while the parallel job runs. 20 mins gives you plenty of time to catch this. Try checking if your nodes are not swapping on disk for virtual memory on disk. I hope to be able to contribute some improvements to the GGI soon. I do not know if this will help you. Let's hope for the best. Regards, Martin |
||
November 2, 2009, 20:34 |
|
#15 |
Senior Member
Martin Beaudoin
Join Date: Mar 2009
Posts: 332
Rep Power: 22 |
Hello BastiL,
Out of curiosity, how large is large? I mean how many GGIs, and how many faces per GGI pairs? Martin |
|
November 3, 2009, 04:15 |
|
#16 |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
||
November 26, 2009, 05:45 |
|
#17 |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
Martin,
I am wondering how your work is going on? Is there some way I can support your work, e.g. by testing improvements with our models so please let me know. Thanks. Regards BastiL |
|
November 27, 2009, 10:54 |
|
#18 |
Senior Member
Martin Beaudoin
Join Date: Mar 2009
Posts: 332
Rep Power: 22 |
Hey BastiL,
I am actively working on that one. Thanks for the offer. I will keep you posted. Regards, Martin |
|
March 16, 2010, 04:17 |
|
#19 |
Senior Member
BastiL
Join Date: Mar 2009
Posts: 530
Rep Power: 20 |
||
March 17, 2010, 09:43 |
|
#20 |
Senior Member
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,907
Rep Power: 33 |
Actually, I've got an update for you. There is a new layer of optimisation code built into the GGI interpolation, aimed at sorting out the loss of performance in parallel for a large number of CPUs. In short, each GGI will recognise whether it is located on a single CPU or not and, based on this, it will adjust the communications pattern in parallel.
This has shown good improvement on a bunch of cases I have tried but you need to be careful on the parallel decomposition you choose. There are two further optimisation steps we can do, but they are much more intrusive. I am delaying this until we start doing projects with real multi-stage compressors (lots of GGIs) and until we get the mixing plane code rocking (friends involved here). Further updates are likely to follow, isn't that right Martin? Hrv
__________________
Hrvoje Jasak Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
Script to Run Parallel Jobs in Rocks Cluster | asaha | OpenFOAM Running, Solving & CFD | 12 | July 4, 2012 23:51 |
Serial vs parallel different results | luca | OpenFOAM Bugs | 2 | December 3, 2008 11:12 |
Parallel Performance of Fluent | Soheyl | FLUENT | 2 | October 30, 2005 07:11 |
PC vs. Workstation | Tim Franke | Main CFD Forum | 5 | September 29, 1999 16:01 |