|
[Sponsors] |
November 20, 2013, 08:50 |
Cluster Parallelization Performance
|
#1 |
Member
Join Date: Apr 2009
Posts: 36
Rep Power: 17 |
I have been fortunate enough to given some hardware to try and set up an OpenFOAM cluster. I have it up and running with 2 nodes at the moment and am getting unexpectedly poor performance. I am hoping someone can provide some input as to where to look. Here is the info:
Hardware: 2 identical HP Z400 with Xeon CPU W3550 @ 3.07 GHz x 4 11.7 GB Memory They are connected via a Linksys RVS4000 Gigabit switch. I have used iperf and can vouch that the machines are transferring at gigabit speed. The version of foam is 2.2. Both machines are running Ubuntu 13.10 and have identical setups. My test case is the pimpleDyMFoam tutorial mixerVesselAMI2D. I have thrown away the default mesh and 2 levels of refinement. The first level of refinement is 307200 cells. I have 3 results for the first case, one with a single core (no parallel option), one with a single node, 4 processors, and one with both nodes, 8 processors. I am using the scotch method (per the tutorial) of decomposition for both single host and multihost runs. The 2-node case is decomposed as: Code:
FoamFile { version 2.0; format ascii; class dictionary; location "system"; object decomposeParDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // //- Force AMI to be on single processor. Can cause imbalance with some // decomposers. //singleProcessorFaceSets ((AMI -1)); numberOfSubdomains 8; method scotch; distributed yes; roots ( ); // ************************************************************************* // To launch the cases Parallel: time mpirun -hostfile hostfile pimpleDyMFoam -parallel > log Single Core: time pimpleDyMFoam > log The hostfile looks like Code:
192.168.0.3 slots=4 192.168.0.4 slots=4 Results of the first level of mesh refinement look like: Code:
Single Core Run real 79m05.605s user 78m17.394s sys 0m44.688s Single Host 4 core machine real 42m49.394s user 168m27.428s sys 0m13.658s Full Parallel Run real 60m58.221s user 104m3.251s sys 137m15.823s Code:
Single Node 4 Core real 65m20.622s user 256m19.924s sys 0m56.965s Full Parallel Run real 58m50.084s user 143m23.455s sys 90m40.328s So, am I'm trying this, I'm seeing a couple of things. Firstly, I suppose the AMI may be causing issuse? I wouldn't expect the AMI to degrade the parallel performance so much. Also, I am seeing something about a Code:
nCellsInCoarsestLevel 10; Anyways, I will keep churning through these -- but any insight or help is appreciated; thanks! edit: The single node job finished 10 min faster than projected. Barely a 10% speedup on the 2 node, 8 processor run.\ edit 2: Job finished with modified nCellsInCoarsestLevel. I went back to the "fine" case, and set that value: Code:
nCellsInCoarsestLevel 550; ! sqrt(300000) Code:
Single Core Run real 79m05.605s user 78m17.394s sys 0m44.688s Single Host 4 core machine real 42m49.394s user 168m27.428s sys 0m13.658s Full Parallel Run real 60m58.221s user 104m3.251s sys 137m15.823s Full Parallel Run, Modified nCellsInCoarestLevel real 57m39.171s user 90m7.315s sys 138m22.254s Code:
singleProcessorFaceSets ((AMI -1)); Last edited by minger; November 20, 2013 at 12:00. Reason: added run with nCellsInCoarsestLevel |
|
November 21, 2013, 18:45 |
|
#2 |
Member
Join Date: Apr 2009
Posts: 36
Rep Power: 17 |
It seems that the AMI and/or dynamic mesh motion was SEVERELY slowing the parallelization down. I went to a more basic test case, and chose the pitzDaily simpleFoam test. Results are:
Code:
================================ pitzDaily Single Host 4 core machine real 0m23.332s user 1m24.179s sys 0m0.688s Full Parallel real 0m48.747s user 1m5.969s sys 1m56.091s ================================ pitzDaily Fine - 49k cells Single Host 4 core machine real 2m33.846s user 10m8.397s sys 0m0.982s Full Parallel real 2m36.021s user 5m30.923s sys 4m32.839s ================================ pitzDaily xFine - 195k cells Single Host 4 core machine real 45m59.531s user 182m16.379s sys 0m6.847s Full Parallel real 19m44.253s user 61m9.221s sys 16m59.335s It does raise the question as to whether its the AMI or the DyN that is causing the slowdown. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Set up for High performance cluster | c0sk | OpenFOAM Running, Solving & CFD | 2 | January 31, 2014 00:33 |
poor performance at massive parallel run using SGI cluster | matthias | OpenFOAM Running, Solving & CFD | 8 | October 21, 2011 09:24 |
Parallel cluster solving with OpenFoam? P2P Cluster? | hornig | OpenFOAM Programming & Development | 8 | December 5, 2010 17:06 |
Linux Cluster Performance with a bi-processor PC | M. | FLUENT | 1 | April 22, 2005 10:25 |
link to cluster performance study | fourier | Main CFD Forum | 0 | March 8, 2002 02:00 |