|
[Sponsors] |
Parallel Run on dynamically mounted partition |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
September 27, 2007, 04:54 |
Hi,
I would like to run a c
|
#1 |
Senior Member
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19 |
Hi,
I would like to run a case in parallel which has its root on a dynamically mounted partition '/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest' I decomposed the case in that directory and tried to run it, but somehow it looks for the information in a non-existing 'home'-directory... ceplx049/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest> mpirun --hostfile Klimakruemmer/machines -np 4 interFoam . damBreak -parallel > log & [1] 26003 ceplx049/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest> [2] [2] [2] --> FOAM FATAL IO ERROR : cannot open file [2] [2] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor2/system/controlDict at line 0. [2] [2] From function regIOobject::readStream(const word&) [2] in file db/regIOobject/regIOobjectRead.C at line 66. [2] FOAM parallel run exiting [2] [ceplx050:20277] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1 [ceplx049][0,1,0][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104 [3] [3] [3] --> FOAM FATAL IO ERROR : cannot open file [3] [3] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor3/system/controlDict at line 0. [3] [3] From function regIOobject::readStream(const word&) [3] in file db/regIOobject/regIOobjectRead.C at line 66. [3] FOAM parallel run exiting [3] [ceplx050:20278] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 mpirun noticed that job rank 0 with PID 26007 on node ceplx049 exited on signal 15 (Terminated). A parallel run with its root in my 'home'-directory works fine, but I limited space :-( Would be nice, if anybody has an idea!? Regards! Fabian |
|
September 27, 2007, 05:47 |
Hmm,
You started from the d
|
#2 | |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Hmm,
You started from the directory '/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest' And MPI is reporting that it can't find the file Quote:
Check what 'mount -v' is showing and what the host is exporting (Linux: /usr/sbin/showmount -e HOST). Depending on the configuration, you might need some form of directory mapping. For some directories we use the GridEngine sge_aliases, which lets you specify stuff like this: #subm_dir subm_host exec_host path_replacement /tmp_mnt/ * * / |
||
September 27, 2007, 06:08 |
Hi Mark,
yes, you are right
|
#3 |
Senior Member
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19 |
Hi Mark,
yes, you are right, I started from '/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest' and the case is located in that directory too, but I was wondering about the asking 'home' path, which obviously does not exists.... sorry, it works now when I run it with the complete path for the root and not just with '.'. Thanks! Fabian |
|
October 2, 2007, 04:49 |
Hi,
as I mentioned before,
|
#4 |
Senior Member
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19 |
Hi,
as I mentioned before, it actually works now, but somehow I get the below error message after the first write to disk: Time = 50 DILUPBiCG: Solving for Ux, Initial residual = 0.0224561, Final residual = 0.000388077, No Iterations 1 DILUPBiCG: Solving for Uy, Initial residual = 0.0595427, Final residual = 0.00106835, No Iterations 1 DILUPBiCG: Solving for Uz, Initial residual = 0.0407827, Final residual = 0.000722178, No Iterations 1 DICPCG: Solving for p, Initial residual = 0.758773, Final residual = 0.00727437, No Iterations 269 time step continuity errors : sum local = 0.00111858, global = 9.43786e-05, cumulative = -0.00822085 DILUPBiCG: Solving for epsilon, Initial residual = 0.0116643, Final residual = 0.000240804, No Iterations 1 DILUPBiCG: Solving for k, Initial residual = 0.0600995, Final residual = 0.000742773, No Iterations 1 ExecutionTime = 4908.29 s ClockTime = 5530 s Time = 51 [2] --> FOAM Warning : [5] --> FOAM Warning : [5] From function Time::readModifiedObjects() [5] in file db/Time/TimeIO.C at line 222 [5] Delaying reading objects due to inconsistent file time-stamps between processors [6] --> FOAM Warning : [8] --> FOAM Warning : [9] --> FOAM Warning : [9] From function Time::readModifiedObjects() [9] in file db/Time/TimeIO.C at line 222 [9] Delaying reading objects due to inconsistent file time-stamps between processors [2] From function Time::readModifiedObjects() [2] in file db/Time/TimeIO.C at line 222 [2] Delaying reading objects due to inconsistent file time-stamps between processors [3] --> FOAM Warning : [3] From function Time::readModifiedObjects() [3] in file db/Time/TimeIO.C at line 222 [3] Delaying reading objects due to inconsistent file time-stamps between processors [4] --> FOAM Warning : [4] From function Time::readModifiedObjects() [4] in file db/Time/TimeIO.C at line 222 [4] Delaying reading objects due to inconsistent file time-stamps between processors [6] From function Time::readModifiedObjects() [6] in file db/Time/TimeIO.C at line 222 [6] Delaying reading objects due to inconsistent file time-stamps between processors [7] --> FOAM Warning : [7] From function Time::readModifiedObjects() [7] in file db/Time/TimeIO.C at line 222 [7] Delaying reading objects due to inconsistent file time-stamps between processors [8] From function Time::readModifiedObjects() [8] in file db/Time/TimeIO.C at line 222 [8] Delaying reading objects due to inconsistent file time-stamps between processors This Messages appears afterwards every time step, but the reconstruction and vtk-export works well at the end. Does anyone know, what kind of problem I face!? I run those calculations over ethernet... Regards! Fabian |
|
October 2, 2007, 05:03 |
Yup, the time daemon is out of
|
#5 |
Senior Member
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,907
Rep Power: 33 |
Yup, the time daemon is out of sync on your machines. Either set up a timeslave to work properly or play around with:
~/.OpenFOAM-1.4.1-dev/controlDict OptimisationSwitches { fileModificationSkew 10; Enjoy, Hrv
__________________
Hrvoje Jasak Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk |
|
October 2, 2007, 11:34 |
Hi Hrvoje,
thanks! I assume
|
#6 |
Senior Member
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19 |
Hi Hrvoje,
thanks! I assume the given switch accepts a sync problem of 10msec!? Fabian |
|
February 28, 2008, 12:43 |
Hi Hrvoje,
I've encountere
|
#7 |
Member
Michael Rangitsch
Join Date: Mar 2009
Location: Midland, Michigan, USA
Posts: 31
Rep Power: 17 |
Hi Hrvoje,
I've encountered the time-stamp problem as well, but it's a bit more mysterious. I'm running Xoodles on 8 cores of a single processor so it really can't be a time daemon problem. I get the time-stamps error when reading/writing files -- not all the time, but enough to make things unpleasant. Sometimes it shows up as an inability to read a file (and openFOAM crashes), other times it just doesn't write one of the files on one of the processors (and I get a 0 length file for whatever variable was writing). reconstructPar fails then. It's very inconsistent, and will not reproduce at the same point in the execution. Where exactly is the controlDict entry to do the fileModificationSkew, just in the controlDict in the system directory of my case, or elsewhere? Thanks in advance! Mike |
|
February 29, 2008, 03:53 |
Look at:
~/.OpenFOAM-1.4.1-
|
#8 |
Senior Member
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,907
Rep Power: 33 |
Look at:
~/.OpenFOAM-1.4.1-dev/controlDict (the path may be adjusted for your version) and search for: OptimisationSwitches { fileModificationSkew 10; If you haven't got this, the equivalent bit in your OpenFOAM installation should be read instead (haven't checked): /home/hjasak/OpenFOAM/OpenFOAM-1.4.1-dev/.OpenFOAM-1.4.1-dev/controlDict Enjoy, Hrv
__________________
Hrvoje Jasak Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk |
|
March 7, 2008, 05:25 |
Hi all!
I started to run a pa
|
#9 |
Member
merrouche djemai
Join Date: Mar 2009
Location: ain-oussera, djelfa, algeria
Posts: 46
Rep Power: 17 |
Hi all!
I started to run a parallel OF 1.4.1 case on a small network (04Pcs). In the past versions, I used LAM/MPI without problelms. Now, when I decompose the case,I can't find the corresponds files on the others nodes and when I run mpirun (openmpi) it fails. what should I indicate in rhe decomposeParDict in the last lines, what is the problem, what is missing? N.B. the SSH works well in different nodes. Djemai |
|
March 14, 2008, 07:33 |
Hi to all,
I have the same er
|
#10 |
Guest
Posts: n/a
|
Hi to all,
I have the same error message of Fabian: [27] --> FOAM Warning : [27] From function Time::readModifiedObjects() [27] in file db/Time/TimeIO.C at line 222 [27] Delaying reading objects due to inconsistent file time-stamps between processors [36] --> FOAM Warning : [48] --> FOAM Warning : [48] From function Time::readModifiedObjects() [48] in file db/Time/TimeIO.C at line 222 [48] Delaying reading objects due to inconsistent file time-stamps between processors [29] --> FOAM Warning : [29] From function Time::readModifiedObjects() [29] in file db/Time/TimeIO.C at line 222 [29] Delaying reading objects due to inconsistent file time-stamps between processors [37] --> FOAM Warning : [37] From function Time::readModifiedObjects() [37] in file db/Time/TimeIO.C at line 222 [37] Delaying reading objects due to inconsistent file time-stamps between processors [49] --> FOAM Warning : [30] --> FOAM Warning : [38] --> FOAM Warning : [38] From function Time::readModifiedObjects() [38] in file db/Time/TimeIO.C at line 222 [38] Delaying reading objects due to inconsistent file time-stamps between processors I controlled the file ~/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/controlDict and I found the section "OptimisationSwitches { fileModificationSkew 10; " that was set to 10 yet. So I tried to change this value into 20 but it didn't work, I have the same error message. The machine that I use is a cluster with 8 nodes with 2 INTEL XEON QUAD CORE for each node. Any suggestion? Thanks, Matteo. |
|
March 14, 2008, 11:37 |
Hi to all,
sorry I've forgott
|
#11 |
Guest
Posts: n/a
|
Hi to all,
sorry I've forgotten to tell you that I'm working on a mesh with 15000000 of cells. The same case with a coarser mesh (4000000 cells) don't give me any problem. Thanks, Matteo |
|
October 4, 2010, 06:58 |
error while running in parallel on multi-cpus across nodes
|
#12 |
New Member
Srikara Mahishi
Join Date: Mar 2009
Location: Bangalore
Posts: 22
Rep Power: 17 |
Hi All,
While running a case in parallel I get the following error: PHP Code:
Thank you in advance, Srikara |
|
October 4, 2010, 14:56 |
|
#13 |
Senior Member
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19 |
Hi,
I saw this quite frequently in the last month as well and assume, that it is a nfs error as Mark mentioned a long time ago. Unfortunately, I have no idea how to get rid of this... :-( Would be great, if you have an idea! Fabian |
|
October 5, 2010, 08:41 |
|
#14 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
It's not the first time I've seen reports about this issue with NFS, but I've never been able to reproduce that error with NFS to try and figure out the proper solution. But I do have an idea on how to fix that issue with NFS a while back, but didn't get a reply about it specifically: Quote:
So, if you guys can test these theories, perhaps we can get to the bottom of this problem! Best regards, Bruno
__________________
|
||
October 5, 2010, 15:43 |
|
#15 | |
Senior Member
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19 |
Hello Bruno,
thanks for the advice! I will check our settings again... would be great if this works. As it occurs only occasionally it might take some days to give you a feedback. Thanks! Fabian Quote:
|
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to specify partition count for local parallel | rohit | CFX | 2 | October 17, 2008 04:48 |
Basic rules for mesh partition for parallel runs | hsieh | OpenFOAM Running, Solving & CFD | 1 | December 24, 2006 12:07 |
dynamically allocated memory in C++ | Junseok Kim | Main CFD Forum | 5 | November 13, 2006 15:22 |
basic rules for partition domain for parallel run | phsieh2005 | Main CFD Forum | 19 | September 18, 2006 10:34 |
Dynamically changing boundray conditions | Acha | CFX | 0 | December 1, 2005 09:56 |