|
[Sponsors] |
May 4, 2010, 14:06 |
stop when I run in parallel
|
#1 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Hello everyone,
When I run a parallel case it stops (or sometimes it succeeds) without any error message. Its seems to be busy (all cpu at 100%) but there is no progress. It happens at the beginnig or later, a kind of random error. I'm using OpenFoam1.6.x with Ubuntu 9.10 and gcc 4.4.1 as compiler. I have no problem when I run a case with a single processor. Has anyone an idea of what happen? Here is a case which run and stop. I just modify the number of processors from the tutorial case. Thank you for your help. Nolwenn OpenFOAM sourced mecaflu@monarch01:~$ cd OpenFOAM/mecaflu-1.6.x/run/damBreak/ mecaflu@monarch01:~/OpenFOAM/mecaflu-1.6.x/run/damBreak$ mpirun -np 6 interFoam -parallel /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6.x-605bfc578b21 Exec : interFoam -parallel Date : May 04 2010 Time : 18:46:25 Host : monarch01 PID : 23017 Case : /media/teradrive01/mecaflu-1.6.x/run/damBreak nProcs : 6 Slaves : 5 ( monarch01.23018 monarch01.23019 monarch01.23020 monarch01.23021 monarch01.23022 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Create mesh for time = 0 Reading g Reading field p Reading field alpha1 Reading field U Reading/calculating face flux field phi Reading transportProperties Selecting incompressible transport model Newtonian Selecting incompressible transport model Newtonian Selecting turbulence model type laminar time step continuity errors : sum local = 0, global = 0, cumulative = 0 DICPCG: Solving for pcorr, Initial residual = 0, Final residual = 0, No Iterations 0 time step continuity errors : sum local = 0, global = 0, cumulative = 0 Courant Number mean: 0 max: 0 Starting time loop |
|
May 6, 2010, 01:19 |
|
#2 |
New Member
Jay
Join Date: Feb 2010
Posts: 15
Rep Power: 16 |
Hi Nolwenn
I want to ask you, are you using the distributed parallelization or on silgle machine you are giving the : mpirun -np 6 ..... . Regards Jay |
|
May 6, 2010, 04:36 |
|
#3 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Hi Jay,
I am using a single machine with mpirun -np 6 interFoam -parallel. When I run with 2 processors it seems it runs more iterations than with 4 or more... Regards Nolwenn |
|
May 6, 2010, 21:40 |
|
#4 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings Nolwenn,
It could be a memory issue. OpenFOAM is known to crash and/or freeze Linux boxes when memory isn't enough. Check this post (or the whole thread it's on) for more on it: mpirun problems post # 3 Also, try using the parallelTest utility - information available on this post: OpenFOAM updates post #19 The parallelTest utility (it's part of OpenFOAM's test utilities) can aid you in sorting out the more basic MPI problems, like communication problems or missing environment settings or libraries not found, without running any particular solver functionalities.. For example: for some weird reason, there might me something missing in the mpirun command to allow the 6 cores to work properly together! Best regards, Bruno
__________________
|
|
May 7, 2010, 05:33 |
|
#5 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Hello Bruno,
Thank you for your answer, I run parallel test and obtain this : Code:
Executing: mpirun -np 6 /home/mecaflu/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log [0] Starting transfers [0] [0] master receiving from slave 1 [0] (0 1 2) [0] master receiving from slave 2 [0] (0 1 2) [0] master receiving from slave 3 [0] (0 1 2) [0] master receiving from slave 4 [0] (0 1 2) [0] master receiving from slave 5 [0] (0 1 2) [0] master sending to slave 1 [0] master sending to slave 2 [0] master sending to slave 3 [0] master sending to slave 4 [0] master sending to slave 5 [1] Starting transfers [1] [1] slave sending to master 0 [1] slave receiving from master 0 [1] (0 1 2) [2] Starting transfers [2] [2] slave sending to master 0 [2] slave receiving from master 0 [2] (0 1 2) [3] Starting transfers [3] [3] slave sending to master 0 [3] slave receiving from master 0 [3] (0 1 2) [4] Starting transfers [4] [4] slave sending to master 0 [4] slave receiving from master 0 [4] (0 1 2) /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6.x-605bfc578b21 Exec : parallelTest -parallel Date : May 07 2010 Time : 10:09:41 Host : monarch01 PID : 4344 Case : /media/teradrive01/mecaflu-1.6.x/run/mine/7 nProcs : 6 Slaves : 5 ( monarch01.4345 monarch01.4346 monarch01.4347 monarch01.4348 monarch01.4349 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time End Finalising parallel run [5] Starting transfers [5] [5] slave sending to master 0 [5] slave receiving from master 0 [5] (0 1 2) I have 8 GiB of memory and 3GiB of swap so memory seems to be ok! Best regards Nolwenn |
|
May 7, 2010, 15:34 |
|
#6 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings Nolwenn,
That is a strange output... it seems a bit out of sync It has happened to me once sometime ago, but the OpenFOAM header always came first! Doesn't the script foamJob work for you? Or does it output the exact same thing? Another possibility, is that it could actually reveal a bug in OpenFOAM! So, how did you decompose the domains for each processor? Best regards, Bruno
__________________
|
|
May 10, 2010, 04:57 |
|
#7 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Hello Bruno,
Here is the result of foamJob, I can't find a lot of information! Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6.x-605bfc578b21 Exec : parallelTest -parallel Date : May 07 2010 Time : 10:09:41 Host : monarch01 PID : 4344 Case : /media/teradrive01/mecaflu-1.6.x/run/mine/7 nProcs : 6 Slaves : 5 ( monarch01.4345 monarch01.4346 monarch01.4347 monarch01.4348 monarch01.4349 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time End Finalising parallel run Code:
// The FOAM Project // File: decomposeParDict /* ------------------------------------------------------------------------------- ========= | dictionary \\ / | \\ / | Name: decomposeParDict \\ / | Family: FoamX configuration file \\/ | F ield | FOAM version: 2.1 O peration | Product of Nabla Ltd. A and | M anipulation | Email: Enquiries@Nabla.co.uk ------------------------------------------------------------------------------- */ // FoamX Case Dictionary. FoamFile { version 2.0; format ascii; class dictionary; object decomposeParDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // numberOfSubdomains 6; method hierarchical; //method metis; //method parMetis; simpleCoeffs { n (2 1 2); delta 0.001; } hierarchicalCoeffs { n (3 1 2); delta 0.001; order xyz; } manualCoeffs { dataFile "cellDecomposition"; } metisCoeffs { //n (5 1 1); //cellWeightsFile "constant/cellWeightsFile"; } // ************************************************************************* // I use gcc compiler, is it possible another compiler solve this? Thank you for your help Bruno! Best regards, Nolwenn |
|
May 10, 2010, 23:58 |
|
#8 |
Member
Scott
Join Date: Sep 2009
Posts: 44
Rep Power: 17 |
How many processors or cores does your machine have?
I would presume if you have 8gb that you prob only have a quad core machine, hence I would only partition the domain into 4 volumes. If you have a dual core machine then that would explain why it is ok with 2 processors, because that all you have. Please post up your machine specs so that we can try and be more helpful. Cheers, Scott |
|
May 11, 2010, 05:28 |
|
#9 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Hello Scott!
I have 8 processors on my machine, I tried to find specs : Code:
r3@monarch01:~$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3599.34 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.11 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 2 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 1 siblings : 2 core id : 0 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.10 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 3 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 1 siblings : 2 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.11 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 4 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 2 siblings : 2 core id : 0 cpu cores : 2 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.10 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 5 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 2 siblings : 2 core id : 1 cpu cores : 2 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.10 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 6 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 2 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.10 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 7 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 865 stepping : 0 cpu MHz : 1799.670 cache size : 1024 KB physical id : 3 siblings : 2 core id : 1 cpu cores : 2 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy bogomips : 3600.11 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp Best regards Nolwenn |
|
May 11, 2010, 18:34 |
|
#10 |
Member
Scott
Join Date: Sep 2009
Posts: 44
Rep Power: 17 |
Have you tried with all 8 processors?
I dont have this problem on mine when I use all of the processors. Make sure you use decomposepar to get 8 partitions before you try. Scott |
|
May 11, 2010, 18:42 |
|
#11 |
Member
Scott
Join Date: Sep 2009
Posts: 44
Rep Power: 17 |
Also are these 8 processes all on the same machine or are they on different machines? ie, is it a small cluster?
I haven't done this on a cluster setup before so can't be of any help with that. I was assuming that you had two quad core processors on a single motherboard, but I just went through it again and its either 8 dual core processors, or it is 4 dual core processers reporting a process for each core. Can you confirm exactly what it is and maybe someone else can help you. If its a cluster than you may have load issues, interconnect problems, or questionable installations on other machines. Cheers, Scott |
|
May 12, 2010, 05:34 |
|
#12 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Sorry, I am not very familiar with machine specs !
It is a single machine with 4 dual cores processors. When I run with all processors the problem is the same : Code:
Parallel processing using OPENMPI with 8 processors Executing: mpirun -np 8 /home/mecaflu/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log [0] Starting transfers [0] [0] master receiving from slave 1 [0] (0 1 2) [0] master receiving from slave 2 [0] (0 1 2) [0] master receiving from slave 3 [0] (0 1 2) [0] master receiving from slave 4 [0] (0 1 2) [0] master receiving from slave 5 [0] (0 1 2) [0] master receiving from slave 6 [0] (0 1 2) [0] master receiving from slave 7 [0] (0 1 2) [0] master sending to slave 1 [0] master sending to slave 2 [0] master sending to slave 3 [0] master sending to slave 4 [0] master sending to slave 5 [0] master sending to slave 6 [0] master sending to slave 7 [1] Starting transfers [1] [1] slave sending to master 0 [1] slave receiving from master 0 [1] (0 1 2) [2] Starting transfers [2] [2] slave sending to master 0 [2] slave receiving from master 0 [2] (0 1 2) [3] Starting transfers [3] [3] slave sending to master 0 [3] slave receiving from master 0 [3] (0 1 2) [4] Starting transfers [4] [4] slave sending to master 0 [4] slave receiving from master 0 [4] (0 1 2) [5] Starting transfers [5] [5] slave sending to master 0 [5] slave receiving from master 0 [5] (0 1 2) [6] Starting transfers [6] [6] slave sending to master 0 [6] slave receiving from master 0 [6] (0 1 2) [7] Starting transfers [7] [7] slave sending to master 0 [7] slave receiving from master 0 [7] (0 1 2) /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6.x-605bfc578b21 Exec : parallelTest -parallel Date : May 12 2010 Time : 10:22:27 Host : monarch01 PID : 4894 Case : /media/teradrive01/mecaflu-1.6.x/run/mine/9 nProcs : 8 Slaves : 7 ( monarch01.4895 monarch01.4896 monarch01.4899 monarch01.4919 monarch01.4922 monarch01.4966 monarch01.4980 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time End Finalising parallel run Best regards Nolwenn |
|
May 12, 2010, 06:03 |
|
#13 |
Member
Björn Fabritius
Join Date: Mar 2009
Location: Freiberg, Germany
Posts: 31
Rep Power: 17 |
I encounter the same problem as Nolwenn. I use 12 cores on a single machine. parallelTest works fine and prints out results in a reasonable order. But when I run foamJob computation hangs on solving the first UEqn. All cores are on 100% but nothing is happening.
solver: simpleFoam case: pitzDaily decomposition: simple OpenFOAM 1.6.x Here is the output from mpirun -H localhost -np 12 simpleFoam -parallel Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6.x-1d1db32a12b0 Exec : simpleFoam -parallel Date : May 12 2010 Time : 09:31:45 Host : brahms PID : 11694 Case : /home/fabritius/OpenFOAM/OpenFOAM-1.6.x/tutorials/incompressible/simpleFoam/pitzDaily nProcs : 12 Slaves : 11 ( brahms.11695 brahms.11696 brahms.11697 brahms.11698 brahms.11699 brahms.11700 brahms.11701 brahms.11702 brahms.11703 brahms.11704 brahms.11705 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Create mesh for time = 0 Reading field p Reading field U Reading/calculating face flux field phi Selecting incompressible transport model Newtonian Selecting RAS turbulence model kEpsilon kEpsilonCoeffs { Cmu 0.09; C1 1.44; C2 1.92; sigmaEps 1.3; } Starting time loop Time = 1 I tried a verbose mode of mpirun but that delivered no useful information either. Unfortunately I have no profiling tools at hand for parallel code. If anyone of you has vampir or sth similar and could try this out, that would be great. |
|
May 15, 2010, 13:12 |
|
#14 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings to all,
Well, this is quite odd. The only solutions that come to mind is to test the same working conditions with other build scenarios, namely:
Because the only reasons that come to mind for the solvers to just jam up and not do anything productive, is that something didn't get built how it is suppose to be. As for the output from parallelTest to come out with the outputs swapped, it could be an stdout buffering issue, where mpirun outputs the text from the slaves prior to the master's output, because the master's output didn't fill up fast enough to trigger the limit of number of characters before flushing. Best regards, Bruno
__________________
|
|
May 17, 2010, 04:50 |
|
#15 |
New Member
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 16 |
Hello everyone,
Now everything seems to be ok for me! I came back to OF 1.6 (prebuilt) with Ubuntu 8.04 and I have no longer problem. Thank you again for your help Bruno! Cheers, Nolwenn |
|
May 27, 2010, 16:29 |
|
#16 |
New Member
Gonzalo Tampier
Join Date: Apr 2009
Location: Berlin, Germany
Posts: 9
Rep Power: 17 |
Hello everyone,
I'm experiencing the very same problem with openSuse. I've tried the pre-compiled 1.6 version and it worked! My problem arises again when I recompile openmpi. I do this in order to add the torque (batch system) and ofed options. Since we have a small cluster, this options are necessary for running cases in more than one node. Even if I recompile openmpi without this options (and just recompile it, nothing else), I get the same problem! (calculations stop, sometimes earlier, sometimes later and sometimes at the beginning, w/o any error mssg. and keeping all CPU's at 100%). This is quite strange - I would be glad if someone has further ideas... I'll keep you informed if I make some progress. regards Gonzalo |
|
May 27, 2010, 21:41 |
|
#17 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings Gonzalo,
Let's see... here are my questions for you:
The easiest way to avoid these issues, would be to use the same version of distros as the pre-built binaries came from, namely, if I'm not mistaken, Ubuntu 9.04 and openSUSE 11.0 or 11.1, because they have gcc 4.3.3 as their system compiler. Best regards, Bruno
__________________
|
|
May 28, 2010, 04:09 |
|
#18 |
New Member
Gonzalo Tampier
Join Date: Apr 2009
Location: Berlin, Germany
Posts: 9
Rep Power: 17 |
Hello Bruno, hello all,
thanks for your comments. I compiled now openmpi again and it worked! I was trying to compile it with the system's gcc (4.4.1) of openSuse 11.2 first, which apparently caused the problems. Now I've tried it again with the ThirdParty gcc (4.3.3) and it works! In both cases I compiled it with Allwmake from the ThirdParty-1.6 directory, after uncommenting the openib and openib-libdir options and adding the --with-tm option for torque. Then I deleted the openmpi-1.3.3/platform dir and executed Allwmake in ThirdParty-1.6. After this, it wasn't necessary to recompile OpenFOAM again. Now I have run first tests with 2 nodes and a total of 16 processes (finer damBreak tutorial) and it seems to work fine! It still remains for me a strange task, since I made the same for 1.6.x and it didn't work! I'll try now with the system's compiler for both OpenFOAM-1.6.x and ThirdParty when I have more time. Thanks again! Gonzalo |
|
May 28, 2010, 16:39 |
parallel problem
|
#19 |
Member
Join Date: Mar 2010
Posts: 31
Rep Power: 16 |
Hi,
I've got a problem running a code in parallel. (one machine, quad core). I'm using openfoam 1.6 prebuilt binaries, on fedora 12. The error I get is: /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6 | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6-f802ff2d6c5a Exec : interFoam -parallel Date : May 28 2010 Time : 12:27:10 Host : blue PID : 23136 Case : /home/bunni/OpenFOAM/OpenFOAM-1.6/tutorials/quartcyl nProcs : 2 Slaves : 1 ( blue.23137 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Create mesh for time = 0 [blue:23137] *** An error occurred in MPI_Bsend [blue:23137] *** on communicator MPI_COMM_WORLD [blue:23137] *** MPI_ERR_BUFFER: invalid buffer pointer [blue:23137] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -------------------------------------------------------------------------- mpirun has exited due to process rank 1 with PID 23137 on node blue exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [blue:23135] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal [blue:23135] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages - so I take it the program is crashing in the mesh part? It seems to run fine on a single proc. (and another geometry I had ran fine for parallel jobs). I've meshed a quarter of a cylinder, with the cylinder aligned on the z-axis. I've done simple decomposition along the z-axis, thinking that the circular geometry might be causing the problem. Above, bruno mentioned the scripts: runParallel, parallelTest. Where are those scripts? Cheers |
|
May 28, 2010, 22:41 |
|
#20 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings bunni,
Quote:
Quote:
Best regards, Bruno
__________________
|
|||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unable to run OF in parallel on a multiple-node cluster | quartzian | OpenFOAM | 3 | November 24, 2009 14:37 |
Swap usage on parallel run | nikhilesh | OpenFOAM Bugs | 1 | April 30, 2009 05:42 |
Problem on Parallel Run Setup | Hamidur Rahman | CFX | 0 | September 23, 2007 18:11 |
Windows 64-bit, Distributed Parallel Run Issues... | Erich | CFX | 3 | March 28, 2006 17:36 |
Serial run OK parallel one fails | r2d2 | OpenFOAM Running, Solving & CFD | 2 | November 16, 2005 13:44 |