|
[Sponsors] |
August 28, 2013, 16:43 |
|
#21 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ehsan,
From the little information you've provided, and taking into account that you've checked most of the information on this thread, there are several possibilities:
Best regards, Bruno
__________________
|
|
August 30, 2013, 08:04 |
Reply
|
#22 |
Senior Member
Ehsan
Join Date: Mar 2009
Posts: 112
Rep Power: 17 |
Hi Bruno
We make 4 or 5 systems parallel, they work very fine if we try pimpleFoam but connection fails if we try interPhaseChangeFoam. We use OF v. 2.1, the decomposed parts have the same size as we used Scotch and there is no sign of hardware failure. We changed the solver settings, i.e., from GAMG to PCG, or tried increasing nCellsInCoarsestLevel from the default value of 10 to 5000. These changes helped the run to go further but crashed at another time, let say with nCellsInCoarsestLevel 10: connection stopped after 1000 s nCellsInCoarsestLevel 5000: connection stopped after 6000 s changing p solver to PCG: connection stopped after 3000 s Would you please help in this regards? Thanks |
|
August 31, 2013, 12:42 |
|
#23 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ehsan,
Here are several questions I asked the other day on a related thread: Quote:
Bruno
__________________
|
||
August 31, 2013, 14:25 |
|
#24 |
Senior Member
Ehsan
Join Date: Mar 2009
Posts: 112
Rep Power: 17 |
Hi Bruno
Thanks for your time and efforts: How many cells does your mesh have? R: 900,000 cells What kinds of cells does your mesh have? R: Structured, we create the mesh using Gambit and then read it with OF. What does checkMesh output? More specifically: Code: checkMesh -allGeometry -allTopology R: We did not check this, but will try it. Meanwhile, I like to say that the case run correctly with 2 systems w.o any problems. Are you using any kind of moving mesh, or AMI, MRF, cyclic patches, mapped boundary conditions, symmetry planes or wedges? R: Yes, we use symmetry planes. We solve 3D cavitating flow behind a disk and we solve 1/4 of the geometry. Are you using dynamic mesh refinement during the simulation? R: No. Which decomposition method did you use? R: Scotch What did the last time instance of the output of the solver look like? R: Like other times but it stops before writing iteration of P_rgh equation, i.e., the run hangs. Are you using any function objects? Yes, but the code stops while solving P-rgh equation. functions ( forces { type forces; functionObjectLibs ("libforces.so"); //Lib to load patches (disk); // change to your patch name rhoInf 998; //Reference density for fluid CofR (0 0 0); //Origin for moment calculations outputControl timeStep; outputInterval 100; } forceCoeffs { type forceCoeffs; functionObjectLibs ("libforces.so"); patches (disk); //change to your patch name rhoInf 998; CofR (0 0 0); liftDir (0 1 0); dragDir (1 0 0); pitchAxis (0 0 0); magUInf 10; lRef 0.07; Aref 0.0049; outputControl timeStep; outputInterval 100; } ); Have you tried the more recent versions of OpenFOAM? R: Not yet, we only tried this version. I will be glad if you could help me in this problem. Regards |
|
August 31, 2013, 14:41 |
|
#25 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ehsan,
OK, then there is still this answer left unanswered:
And this still has me worried: Quote:
Bruno
__________________
|
||
August 31, 2013, 14:57 |
|
#26 |
Senior Member
Ehsan
Join Date: Mar 2009
Posts: 112
Rep Power: 17 |
Hi Bruno
I did not understand the first question, do you mean stopping of the run if we run multiphase/interPhaseChangeFoam/cavitatingBullet? We use 24 processors, each of them deal with the same size sub-domain (around 8Mg) as we use Scotch to make decompositions. Question: 1- where could I check the version of NFS? 2- We use Ubuntu v. 11, where should I precisely check for network drivers? Regards and best thanks for your time. |
|
August 31, 2013, 15:44 |
|
#27 | |||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Ehsan,
Quote:
Quote:
The rule of thumb usually is that the minimum should be 50000 cells per processor. If you use the same number of machines, but less processors per machine, does it still freeze? Run on each machine: Code:
nfsstat Quote:
Code:
cat /etc/lsb-release
Best regards, Bruno
__________________
|
||||
July 1, 2019, 20:57 |
|
#28 |
Member
Hüseyin Can Önel
Join Date: Sep 2018
Location: Ankara, Turkey
Posts: 47
Rep Power: 8 |
Hi,
I am having a problem which I guess is related to this topic. * If I do not use #calcEntry (#calc "..." function) in my dictionaries, I am able to run simulations on single or multiple nodes. * If I use #calcEntry (#calc "..." function) in my dictionaries, I am able to run simulations on a single node, but when I try multiple nodes, I get the following error in log.pimpleFoam: Code:
wmake libso /cfd/honel/OpenFOAM/honel-5.x/c.dual-isol.g-64/dynamicCode/_a17e4453985d3e4233a02ad03051231029c9bb42 ln: ./lnInclude wmkdep: codeStreamTemplate.C Ctoo: codeStreamTemplate.C ld: /cfd/honel/OpenFOAM/honel-5.x/c.dual-isol.g-64/dynamicCode/_a17e4453985d3e4233a02ad03051231029c9bb42/../platforms/linux64Gcc48DPInt64Opt/lib/libcodeStream_a17e4453985d3e4233a02ad03051231029c9bb42.so [86] [86] [86] --> FOAM FATAL IO ERROR: [86] Cannot read (NFS mounted) library "/cfd/honel/OpenFOAM/honel-5.x/c.dual-isol.g-64/dynamicCode/platforms/linux64Gcc48DPInt64Opt/lib/libcodeStream_a17e4453985d3e4233a02ad03051231029c9bb42.so" on processor 86 detected size -1 whereas master size is 129190 bytes. If your case is not NFS mounted (so distributed) set fileModificationSkew to 0 [86] [86] file: /cfd/honel/OpenFOAM/honel-5.x/c.dual-isol.g-64/processor86/0/p from line 25 to line 13. [86] [86] From function static void (* Foam::functionEntries::codeStream::getFunction(const Foam::dictionary&, const Foam::dictionary&))(Foam::Ostream&, const Foam::dictionary&) [86] in file db/dictionary/functionEntries/codeStream/codeStream.C at line 270. [86] FOAM parallel run exiting [86] How can I overcome this? Thanks in advance. |
|
July 9, 2019, 19:47 |
|
#29 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Quick answer: Without more details on how you are launching the solver, it's harder to give direct instructions to test what's going on.
In principle, the problem is that the file "libcodeStream_a17e4453985d3e4233a02ad03051231029c 9bb42.so" is not properly accessible on all parallel processes. This is either because NFS was not able to deliver the file on-time for the load, or because it simply was not shared via NFS. One possible trick and test is to launch a test command with mpirun with -np X (X for number of cores) so that it will check the content of the file. For example, these commands will let you know what the applications are seeing before the simulation is launched: Code:
mpirun -np 84 find dynamicCode -name "*.so" mpirun -np 84 ls -l dynamicCode/*/*/*/* mpirun -np 84 md5sum dynamicCode/*/*/*/*
__________________
|
|
July 10, 2019, 05:42 |
|
#30 |
Member
Hüseyin Can Önel
Join Date: Sep 2018
Location: Ankara, Turkey
Posts: 47
Rep Power: 8 |
Hi wyldckat,
Thanks for your reply, as always. I could not quite get what you mean by how I am launching the solver, but I'm running pimpleFoam via runParallel command. Please ask any specific information if necessary. I have run the find, ls and md5sum commands in parallel as you have said. The pimpleFoam log of my problematic case is as follows (it runs on 2 nodes x 16 cores each = 32 cores in total): http://s000.tinyupload.com/download....84241688376836 Here is the output of find, ls and md5sum commands: http://s000.tinyupload.com/download....06754599415741 (I had to upload it on an external website because of the 1.5Mb size) My understanding is that the file which is claimed not to be found is seen by all 32 processors. |
|
July 23, 2019, 20:08 |
|
#31 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi hconel,
Sorry for the late reply... Quote:
Quote:
I ask this because if the pieces of code were built during decomposePar, then you can try running the command line with md5sum before running the solver, to try and enforce the files to be loaded into cache on all cores. Best regards, Bruno |
|||
October 4, 2019, 08:12 |
Distributed parallel with interIsoFoam
|
#32 |
Member
Ndong-Mefane Stephane Boris
Join Date: Nov 2013
Location: Kawasaki (JAPAN)
Posts: 53
Rep Power: 13 |
Hello,
Does anyone have some experience with interIsoFoam in distributed parallel? in my case the command just hangs, and the mpi process does not start (no output, no error message). I've already checked that i can access both nodes (yeah I'm trying with two nodes) via ssh, so now I really lost as to why it does not work. Kazu |
|
November 2, 2022, 06:48 |
|
#33 |
Senior Member
Josh Williams
Join Date: Feb 2021
Location: Scotland
Posts: 113
Rep Power: 5 |
I had a similar issue very recently running multi-node jobs on Oracle cloud. The simulation would go fine for a few timesteps, but then it would eventually just hang indefinitely.
This was resolved by adding the additional mpi tags detailed in this blog. Maybe it is specific to Oracle, but hopefully it will help someone in future. FYI our setup was Bare metal Optimized3.36 nodes running on OpenFOAM 6 with OpenMPI. OS was Oracle-Linux 7.9. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to set periodic boundary conditions | Ganesh | FLUENT | 15 | November 18, 2020 07:09 |
Issue with OpenMPI-1.5.3 while running parallel jobs on multiple nodes | LargeEddy | OpenFOAM | 1 | March 7, 2012 18:05 |
Issue with running in parallel on multiple nodes | daveatstyacht | OpenFOAM | 7 | August 31, 2010 18:16 |
Error using LaunderGibsonRSTM on SGI ALTIX 4700 | jaswi | OpenFOAM | 2 | April 29, 2008 11:54 |
CFX4.3 -build analysis form | Chie Min | CFX | 5 | July 13, 2001 00:19 |