|
[Sponsors] |
January 27, 2015, 07:08 |
Unconsistent parallel jobs running time
|
#1 |
New Member
Join Date: May 2013
Posts: 23
Rep Power: 13 |
Hello all!
I keep posting on this forum as I find it really useful. I have recently come up with some issues regarding parallel jobs. I am running potentialFoam and simpleFoam on several cluster nodes. I am experiencing really different running times depending on the nodes selected. The times can be multiplied by *5 or even be stuck on the cluster depending on the nodes selected ! I am running with openfoam-2.3.1 and mpirun-1.6.5 and using InfiniBand. Before I give you more information, does anyone has seen those kind of problems ? I would like to know if there is a software or an openfoam utility to output the amount of data transferred between the processors ? I know there is something on fluent to obtain the parallel data transfer. I have tried to set the Pstream debug switches to 1 in openfoam but the output is so low level that it is impossible to draw any conclusions with this... |
|
January 27, 2015, 11:05 |
|
#2 |
Senior Member
Armin
Join Date: Feb 2011
Location: Helsinki, Finland
Posts: 156
Rep Power: 19 |
I'm not aware of any utility to meassure the parallel data transfer.
Couple of hints/questions:
Armin |
|
January 30, 2015, 15:44 |
|
#3 |
New Member
Join Date: May 2013
Posts: 23
Rep Power: 13 |
Thanks for your reply Armin,
To answer your questions, 1) No I am using the standard openfam solvers, utilities, etc coming from openfoam-2.3.1 2)Between 300k and 1M which I think should be ok 3)I don't write any data neither do I read it (I start from steady boundary conditions)! 4)I am running this test at the moment, I will let you know ! 5) Execution time and cloktime are very similar, should I see a major difference ? |
|
January 30, 2015, 17:32 |
|
#4 | |
Senior Member
Armin
Join Date: Feb 2011
Location: Helsinki, Finland
Posts: 156
Rep Power: 19 |
Jep, that should be OK. If you have more than 100k cells per cpu, your application should scale well. I wouldn't run with less than 50k per CPU, but that is also a bit depending on the application.
Quote:
Meaning, the closer they are the more time you are actually computing something and the less time is spend with other stuff like IO. At least that's how it typically goes, there are exceptions though. |
||
February 10, 2015, 13:42 |
|
#5 |
New Member
Join Date: May 2013
Posts: 23
Rep Power: 13 |
Hello I am coming back to you with more information.
I have run the Test-Parallel of OpenFOAM and the output looks fine for me. Here is an example of the log file PHP Code:
Just as a quick reminder, we observe this behaviour: Running on a single switch, the case is running as expected with let's say 80 seconds per iteration. Running the same job across multiple switches, each iteration takes 250 sec, so 3 times more. I want to emphasize that the IB fabric seems to work correctly as we don't observe any issue running commercial grade CFD applications. We have built mpich3.1.3 from source and we observe exactly the same behaviour as using openmpi (slow across switches and fast in a single switch) so this suggests it is not mpi-related. Has anyone experienced this behaviour running parallel openfoam jobs ? Any pointer would be greatly appreciated ! |
|
Tags |
cluster, discrepancy, mpirun, openfoam-2.3.1, parallel |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Other] Contribution a new utility: refine wall layer mesh based on yPlus field | lakeat | OpenFOAM Community Contributions | 58 | December 23, 2021 03:36 |
How to export time series of variables for one point? | mary mor | OpenFOAM Post-Processing | 8 | July 19, 2017 11:54 |
Star cd es-ice solver error | ernarasimman | STAR-CD | 2 | September 12, 2014 01:01 |
plot over time | fferroni | OpenFOAM Post-Processing | 7 | June 8, 2012 08:56 |
Could anybody help me see this error and give help | liugx212 | OpenFOAM Running, Solving & CFD | 3 | January 4, 2006 19:07 |