|
[Sponsors] |
June 7, 2016, 15:44 |
Scaling Problems on Cluster with MVAPICH2
|
#1 |
New Member
Join Date: Jun 2016
Location: Amherst, MA
Posts: 5
Rep Power: 10 |
Hi all,
We are running a custom solver implemented in foam-extend-3.1 on the Stampede supercomputer. For quite a while now, we've been trying to narrow down a really, really bad parallel scaling problem. Our installation is compiled using MVAPICH2, the only MPI library supported on Stampede as far as I can tell. We have a test case which takes about 8 minutes to run on 16 cores. When we run the same case on 128 cores, the runtime is around 1.25 hours. I've done some profiling (I'll try to upload the results), and it looks like on the 128 core run, we are getting really stuck setting a couple of memory addresses over and over again (function call is __intel_memset). I've tried tuning the MVAPICH2 settings, and managed to get the runtime down to 45 minutes. But.... That's still pretty messed up. A different case scales very well on 96 cores, with a slightly larger mesh and no other real differences. The case we're having issues with runs about 20,000 mesh cells per core. Any insight would be appreciated, I'm completely out of ideas at this point.... Cheers, Gabe |
|
June 9, 2016, 13:56 |
|
#2 |
New Member
Join Date: Jun 2016
Location: Amherst, MA
Posts: 5
Rep Power: 10 |
Update... Immediately after posting this, I realized that the issue only occurs when we are using mesh motion!
Anyone else experience a similar issue when running a dynamic mesh in parallel? Here's the dynamicMeshDict from the case, if it helps... Code:
/*--------------------------------*- C++ -*----------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.5 | | \\ / A nd | Web: http://www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; object motionProperties; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // dynamicFvMesh dynamicMotionSolverFvMesh; solver laplace; diffusivity quadratic; distancePatches 0 (); frozenDiffusion yes; // ************************************************************************* // Thanks, and sorry if I sound like a "noob"... I am one |
|
October 4, 2016, 22:21 |
|
#3 |
New Member
Join Date: Jun 2016
Location: Amherst, MA
Posts: 5
Rep Power: 10 |
Thought I'd try reviving this thread one more time...
I'm still unable to figure out why the dynamicMotionSolverFvMesh library scales so poorly. It doesn't appear to be sensitive to system / interconnect / compiler types, which is what I initially thought it might be. Has anyone else experienced a similar issue with this library? Cheers |
|
October 5, 2016, 11:34 |
|
#4 | |
Member
Bruno Blais
Join Date: Sep 2013
Location: Canada
Posts: 64
Rep Power: 13 |
Did you test if the poor scaling is observed for all matrix solvers ?
It seems that when you move the mesh, or at least execute one of the routines of dynamicMotionSolverFvMesh, you make a call to rebuild something related to the mesh topology or etc and this is what is very detrimental to parallel efficiency. I know for example that using dynamic mesh refinement destroys the parallelism... This is very interesting to me, so please keep us informed! Sorry I cannot give a good practical answer... Quote:
|
||
October 8, 2016, 17:06 |
|
#5 | |
New Member
Join Date: Jun 2016
Location: Amherst, MA
Posts: 5
Rep Power: 10 |
Quote:
If there wasn't a good speedup, that'd be one thing, but seeing a tremendous slowdown like this seems suspicious... The non "extend" version of OpenFOAM has a fix for parallel communication in the latest release that seems applicable to this issue. Thoughts? |
||
October 17, 2016, 10:23 |
|
#6 | |
Member
Bruno Blais
Join Date: Sep 2013
Location: Canada
Posts: 64
Rep Power: 13 |
Slowdown compared to serial execution can occur.
I remember running a case with dynamic mesh refinement where I used the GAMG matrix solver. I recall that running the case with 8 processors took about 1.5x time more time than with a single processor. Using dynamic mesh refinement across multiple processor every iteration caused a dramatic decrease in performance due to the numerous communication, but also due to the fact that some pre-caching the GAMG solver could not be used anymore. In all cases, this is surprising, but I am just saying it can occur! Good luck ! Quote:
|
||
Tags |
cluster, mvapich2, parallel, scaling, slow |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[ANSYS Meshing] Periodicity problems in icem | zeeshu | ANSYS Meshing & Geometry | 0 | April 17, 2016 21:59 |
[OpenFOAM.org] problems with installation of OpenFOAM-2.1.1 on cluster with RHEL 6.5 | lisa_china | OpenFOAM Installation | 1 | March 29, 2016 09:08 |
Problems running OF on cluster | kate.F | OpenFOAM Running, Solving & CFD | 2 | January 14, 2016 13:33 |
Compute Cluster with diskless compute nodes | Pauli | Hardware | 0 | October 6, 2015 17:48 |
Some problems with Star CD | Micha | Siemens | 0 | August 6, 2003 14:55 |