|
[Sponsors] |
May 11, 2012, 08:56 |
parallel simpleFoam freezes the whole system
|
#1 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Dear all,
I am trying to run a parallel simpleFoam case but I am encountering serious problems. 9 out of 10 times I start the simulation it leads to a complete freeze of my system (Fedora core 12). It seems that one (random) node gets a segmentation violation I've placed here below the messages I get I would appreciate any help! Thanks! Vangelis __________________________________________________ [vangelis@midas OF]$ mpirun -np 12 simpleFoam -case tetra_layers -parallel > log.txt & [1] 2758 [vangelis@midas OF]$ gnuplot residuals - Message from syslogd@localhost at May 11 13:57:11 ... kernel:------------[ cut here ]------------ Message from syslogd@localhost at May 11 13:57:11 ... kernel:invalid opcode: 0000 [#1] SMP Message from syslogd@localhost at May 11 13:57:11 ... kernel:last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq Message from syslogd@localhost at May 11 13:57:11 ... kernel:Stack: Message from syslogd@localhost at May 11 13:57:11 ... kernel:Call Trace: Message from syslogd@localhost at May 11 13:57:11 ... kernel:Code: 48 89 45 a0 4c 89 ff e8 75 c1 2a 00 41 8b b6 58 03 00 00 4c 89 e7 ff c6 e8 cf bb ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 -------------------------------------------------------------------------- mpirun noticed that process rank 7 with PID 2766 on node midas exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- |
|
May 11, 2012, 09:37 |
|
#2 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings Vangelis,
That's very little information about your system. So here are a few questions:
Bruno
__________________
|
|
May 11, 2012, 09:47 |
|
#3 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Dear Bruno,
Thank you for your reply! Please give me some time to collect the answers to the questions your have (correctly) addressed to me. Best regards Vangelis |
|
May 14, 2012, 05:09 |
|
#4 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Dear Bruno,
Here are some details on the problem I have (By the way I have just managed to make it run, without changing anything. The problem is random but happens very often) 1) OpenFOAM version 2.0 2) gcc 4.4.4 3) OpenMPI 1.4.1 4) Single machine with 2 XEON X5680 3.3GHz processors with hyperthreading enabled, 12 physical cores, 24 virtual 5) I have 96Gb and I am only solving a problem with around 8 million cells 6) I have only tried 12 processors Hope this helps Best regards, Vangelis |
|
May 14, 2012, 05:46 |
|
#5 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Vangelis,
OK, which exact version of OpenFOAM 2.0? 2.0.0, 2.0.1 or 2.0.x? It would be ideal to upgrade at least to 2.0.x or even 2.1.0 or 2.1.x, because a lot has been fixed since 2.0.0 and 2.0.1 were released. Best regards, Bruno
__________________
|
|
May 14, 2012, 11:58 |
|
#6 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Hi Bruno
I was using 2.0.0 I have just switched to 2.1.0 but unfortunately again in a random fashion parallel runs temd to freeze my system. When this happens OF immediately reports something like: mpirun noticed that process rank 7 with PID 2766 on node midas exited on signal 11 (Segmentation fault). I have had to hardware reset my system too many times lately. Any suggestions? Best regards, Vangelis |
|
May 14, 2012, 12:13 |
|
#7 |
Senior Member
Martin
Join Date: Oct 2009
Location: Aachen, Germany
Posts: 255
Rep Power: 22 |
Hi Vangelis,
I had similar problems with a dual Xeon system a while ago: the memory chips got too hot... I opened the system and used simple ventilators to cool the memory banks down and simulations were stable. In the end I installed special memory coolers and everything was fine... Martin |
|
May 14, 2012, 14:39 |
|
#8 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
I am not sure if this is also related
http://www.cfd-online.com/Forums/ope...taneously.html |
|
May 14, 2012, 15:18 |
|
#9 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
||
May 14, 2012, 18:16 |
|
#10 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Vangelis,
Well, with the current information:
Bruno
__________________
|
|
May 15, 2012, 06:48 |
|
#11 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Thank you all for your replies.
I do not think it should be a temperature issue of the memories as on this machine I have run Fluent parallel simulations for days without a problem. My swap size is 8Gb so this should neither be the issue. I saw in another post that I may have to make this change
|
|
May 15, 2012, 10:28 |
|
#12 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Changing the Min Buffer Size did not help.
It cannot be a temperature issue as OpenFOAM freezes immediately freezes the system upon startup. This is very frustrating indeed |
|
May 15, 2012, 11:11 |
|
#13 |
Senior Member
Martin
Join Date: Oct 2009
Location: Aachen, Germany
Posts: 255
Rep Power: 22 |
Hi Vangelis,
another idea: disable the CPU power management options in the BIOS, such as Intel SpeedStep, Turbo-Mode, C1E... from your first posting it seems to be the frequency setting that fails. Or bind the processes to specific CPU cores, so that they are not hopping from core to core, with: Code:
numactl --physcpubind=0,2,4,6,8,10,12,14,16,18,20,22 mpirun -np 12 simpleFoam -case tetra_layers -parallel > log.txt & |
|
May 15, 2012, 17:00 |
|
#14 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Martin and Vangelis,
@Martin: mpirun is meant to come first, right? Code:
mpirun numactl --physcpubind=0,2,4,6,8,10,12,14,16,18,20,22 -np 12 simpleFoam -case tetra_layers -parallel > log.txt &
Best regards, Bruno
__________________
|
|
May 16, 2012, 06:12 |
|
#15 |
Senior Member
Vangelis Skaperdas
Join Date: Mar 2009
Location: Thessaloniki, Greece
Posts: 287
Rep Power: 21 |
Dear Bruno and Martin,
Thank you very much for your help. I do not think I have a system MPI. The output of echo $VM_MPLIB is OPENMPI When I run OF2.0 MPI is 1.4.1 when I run OF2.1 MPI is 1.5.3 as you have stated. It seems that the CPU frequency setting which is "on demand" and I cannot change it as I do not have admin priviledges in my PC, is the problem. As a workaround I have tried to run something that peaks the CPU frequency to 3.33GHz just before executing the mpirun of OpenFOAM and, up to now at least, it runs without freezing the system. Vangelis |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Random machine freezes when running several OpenFoam jobs simultaneously | 2bias | OpenFOAM Installation | 5 | July 2, 2010 08:40 |
Own boundary condition modified simpleFoam erorr in parallel execution | sponiar | OpenFOAM Running, Solving & CFD | 1 | August 27, 2008 10:16 |
IcoFoam parallel woes | msrinath80 | OpenFOAM Running, Solving & CFD | 9 | July 22, 2007 03:58 |
plz rply urgent regrding vof model for my system | garima chaudhary | FLUENT | 1 | July 20, 2007 09:37 |
Need ideas-fuel discharge system | Jan | CFX | 1 | October 9, 2006 09:16 |