Scaling Problems on Cluster with MVAPICH2

pilotcorky · June 7, 2016, 15:44

Hi all,

We are running a custom solver implemented in foam-extend-3.1 on the Stampede supercomputer. For quite a while now, we've been trying to narrow down a really, really bad parallel scaling problem. Our installation is compiled using MVAPICH2, the only MPI library supported on Stampede as far as I can tell.

We have a test case which takes about 8 minutes to run on 16 cores. When we run the same case on 128 cores, the runtime is around 1.25 hours. I've done some profiling (I'll try to upload the results), and it looks like on the 128 core run, we are getting really stuck setting a couple of memory addresses over and over again (function call is __intel_memset). I've tried tuning the MVAPICH2 settings, and managed to get the runtime down to 45 minutes. But.... That's still pretty messed up.

A different case scales very well on 96 cores, with a slightly larger mesh and no other real differences. The case we're having issues with runs about 20,000 mesh cells per core.

Any insight would be appreciated, I'm completely out of ideas at this point....

Cheers,
Gabe

pilotcorky · June 9, 2016, 13:56

Update... Immediately after posting this, I realized that the issue only occurs when we are using mesh motion!

Anyone else experience a similar issue when running a dynamic mesh in parallel? Here's the dynamicMeshDict from the case, if it helps...

Code:

/*--------------------------------*- C++ -*----------------------------------*\
| =========                 |                                                 |
| \\	  /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.5                                   |
|   \\  /    A nd           | Web:	http://www.OpenFOAM.org               |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format	ascii;
    class	dictionary;
    object	motionProperties;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

dynamicFvMesh dynamicMotionSolverFvMesh;
solver laplace;
diffusivity quadratic;


distancePatches 0 ();
frozenDiffusion yes;
// ************************************************************************* //

Thanks, and sorry if I sound like a "noob"... I am one

pilotcorky · October 4, 2016, 22:21

Thought I'd try reviving this thread one more time...

I'm still unable to figure out why the dynamicMotionSolverFvMesh library scales so poorly. It doesn't appear to be sensitive to system / interconnect / compiler types, which is what I initially thought it might be.

Has anyone else experienced a similar issue with this library?

Cheers

blais.bruno · October 5, 2016, 11:34

Did you test if the poor scaling is observed for all matrix solvers ?
It seems that when you move the mesh, or at least execute one of the routines of dynamicMotionSolverFvMesh, you make a call to rebuild something related to the mesh topology or etc and this is what is very detrimental to parallel efficiency.
I know for example that using dynamic mesh refinement destroys the parallelism...

This is very interesting to me, so please keep us informed! Sorry I cannot give a good practical answer...

Quote:

Originally Posted by pilotcorky

Thought I'd try reviving this thread one more time...

I'm still unable to figure out why the dynamicMotionSolverFvMesh library scales so poorly. It doesn't appear to be sensitive to system / interconnect / compiler types, which is what I initially thought it might be.

Has anyone else experienced a similar issue with this library?

Cheers

pilotcorky · October 8, 2016, 17:06

Quote:

Originally Posted by blais.bruno

Did you test if the poor scaling is observed for all matrix solvers ?

I did not, that's a good idea... I'll report back. I can observe the scaling problem well before the time loop is even started, though.

If there wasn't a good speedup, that'd be one thing, but seeing a tremendous slowdown like this seems suspicious... The non "extend" version of OpenFOAM has a fix for parallel communication in the latest release that seems applicable to this issue. Thoughts?

blais.bruno · October 17, 2016, 10:23

Slowdown compared to serial execution can occur.

I remember running a case with dynamic mesh refinement where I used the GAMG matrix solver. I recall that running the case with 8 processors took about 1.5x time more time than with a single processor. Using dynamic mesh refinement across multiple processor every iteration caused a dramatic decrease in performance due to the numerous communication, but also due to the fact that some pre-caching the GAMG solver could not be used anymore.

In all cases, this is surprising, but I am just saying it can occur!

Good luck

!

Quote:

Originally Posted by pilotcorky

I did not, that's a good idea... I'll report back. I can observe the scaling problem well before the time loop is even started, though.

If there wasn't a good speedup, that'd be one thing, but seeing a tremendous slowdown like this seems suspicious... The non "extend" version of OpenFOAM has a fix for parallel communication in the latest release that seems applicable to this issue. Thoughts?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[ANSYS Meshing] Periodicity problems in icem	zeeshu	ANSYS Meshing & Geometry	0	April 17, 2016 21:59
[OpenFOAM.org] problems with installation of OpenFOAM-2.1.1 on cluster with RHEL 6.5	lisa_china	OpenFOAM Installation	1	March 29, 2016 09:08
Problems running OF on cluster	kate.F	OpenFOAM Running, Solving & CFD	2	January 14, 2016 13:33
Compute Cluster with diskless compute nodes	Pauli	Hardware	0	October 6, 2015 17:48
Some problems with Star CD	Micha	Siemens	0	August 6, 2003 14:55

October 4, 2016, 22:21		#3
pilotcorky New Member Join Date: Jun 2016 Location: Amherst, MA Posts: 5 Rep Power: 10	Thought I'd try reviving this thread one more time... I'm still unable to figure out why the dynamicMotionSolverFvMesh library scales so poorly. It doesn't appear to be sensitive to system / interconnect / compiler types, which is what I initially thought it might be. Has anyone else experienced a similar issue with this library? Cheers