|
[Sponsors] |
mpirun problems: exited on signal 11 (segmentation fault) |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
May 4, 2010, 09:01 |
mpirun problems: exited on signal 11 (segmentation fault)
|
#1 |
Senior Member
Join Date: Feb 2010
Posts: 213
Rep Power: 17 |
I installed OpenFOAM-1.6.x and something strange happened. If I launch a parallel running:
Code:
foamJob -p -s simpleFoam Code:
mpirun noticed that process rank 1 with PID [4 digits] on node xxx-laptop exited on signal 11 (segmentation fault) Then I followed a test procedure (see here, post 19-20) and the output seemed correct. I runned the case in parallel mode again and all was ok. A heisenbug, it was suggested. Now the problem came back, the parallel test output is: Code:
Parallel processing using OPENMPI with 2 processors Executing: mpirun -np 2 /home/giulia/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log Building on 2 cores Building on 2 cores /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6.x-069803848c44 Exec : parallelTest -parallel Date : May 04 2010 Time : 13:44:38 Host : giulia-laptop PID : 2150 Case : /home/giulia/OpenFOAM/giulia-1.6.x/run/hydrofoil_0 nProcs : 2 Slaves : 1 ( giulia-laptop.2151 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time [1] [0] Starting transfers [1] [1] slave sending to master 0 [1] slave receiving from master 0 Starting transfers [0] [0] master receiving from slave 1 [0] (0 1 2) [0] master sending to slave 1 [1] (0 1 2) End Finalising parallel run Code:
mpirun noticed that process rank 1 with PID [4 digits] on node xxx-laptop exited on signal 11 (segmentation fault) |
|
May 4, 2010, 09:19 |
|
#2 |
Senior Member
Join Date: Feb 2010
Posts: 213
Rep Power: 17 |
mh. Maybe it's an amount of memory question, but I can't understand why I had no problems before. I'm not an expert of Ubuntu, can anyone help me?
|
|
May 4, 2010, 10:41 |
|
#3 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hello Maurizio, it's me again
Uhm, you didn't elaborate on what happened last time. Possibly it's a swap problem; read the Swap FAQ at help.ubuntu and increase your Ubuntu's swap size. Then try again to crash your Ubuntu Best regards, Bruno PS: later in the day I'll review the post you made on how to have a side-by-side OpenFOAM 1.6 + 1.6.x installation
__________________
Last edited by wyldckat; May 4, 2010 at 10:42. Reason: serious typo... typed Ubuntu instead of OpenFOAM :P |
|
May 4, 2010, 11:48 |
|
#4 |
Senior Member
Join Date: Feb 2010
Posts: 213
Rep Power: 17 |
You are my angel, do you know it?
I expanded the notebook memory, adding a 512 Mb swap file. And now mpirun works! Well, I was afraid of having to install my (few and not so smart) neurones on my notebook Thank you very much, Bruno. |
|
May 4, 2010, 18:19 |
|
#5 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
You're welcome and I'm glad it actually wasn't an heisenbug
By the way, I don't remember seeing this written in OpenFOAM's forum, nor on the unofficial openfoamwiki.net, but by my experience, there is a minimum amount of RAM specifically required for doing a full build of OpenFOAM. The magic number is somewhere between 1.3GiB and 1.5GiB of RAM, and swap won't cover that necessity!! Best regards, Bruno
__________________
Last edited by wyldckat; May 4, 2010 at 18:19. Reason: typo... |
|
May 31, 2011, 07:20 |
|
#6 |
New Member
Join Date: May 2011
Posts: 8
Rep Power: 15 |
Hey guys!
I want to run a simulation through a bash script. The aim is to simulate wind coming from 16 differents directions. When I run the simulation (16 cases one after another) and I use a first order scheme for divergence, there is no problem. Nevertheless, when I run the same simulation in a second order scheme, my computer stop running and I have to reboot it. The error I obtain is: ----------------------------------------------------------------------------------------------------------------------------------------------- mpirun noticed that process rank 5 with PID 1890 on node cener-desktop exited on signal 11 (segmentation fault) ----------------------------------------------------------------------------------------------------------------------------------------------- As I understand by your message, it could be a problem of RAM or swap memory although I have 15 GB of RAM memory and 12 GB of Swap space, so I think that the memory shouldn't be a problem! Do you have any idea??? Thanks a lot! PS: I don't know if it matters, but I use "mpirun -np 8 simpleFoam -parallel" to run the simulations |
|
May 31, 2011, 07:33 |
|
#7 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings rgarcia and welcome to the forum!
Mmm, have you tried running in serial to see if that order works at all? Try monitoring how much RAM the simpleFoam processes are using and see if it crashes when they were increasing RAM ocupation. Another problem could be insufficient contiguous memory, i.e., allocating 3GB in a single matrix on RAM, when there are the RAM is loaded with various processes that occupy in various locations... although I haven't seen many problems like that lately... Yet another possibility is that there isn't enough MPI buffer length for communication. That's definable... in "OpenFOAM-*/etc/settings.sh" if I'm not mistaken. I would have to verify the variable name, but right now I can't. Good luck! Bruno
__________________
|
|
May 31, 2011, 09:25 |
|
#8 |
New Member
Join Date: May 2011
Posts: 8
Rep Power: 15 |
Hey Bruno!
Thanks for your quick reply! In serial it works good... but it takes very long! The thing that I don't understand is that when I do some directions it works (in parallel) but sometimes it didn't... It has to be a reason but it seems random! I'm becoming crazy! Aparently, the problem is combining second order and parallel running... (any second order schemes work well for the 16 directions) If you have any more suggestion I'll be glad to receive it! In any case, thank you very much! |
|
June 1, 2011, 05:03 |
|
#9 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi rgarcia,
And have you checked the sanity of the mesh, by running checkMesh? Other than these, it could have to do with boundary conditions or some configuration you're overlooking, something like maxCo or some other thing like that Best regards, Bruno
__________________
|
|
June 2, 2011, 05:59 |
|
#10 | |
New Member
Join Date: May 2011
Posts: 8
Rep Power: 15 |
Quote:
Has you recommend, I'm trying to change the settings.sh file, but I can't: rgarcia@cener-desktop:/opt/openfoam171/etc$ chmod +x settings.sh chmod: cambiando los permisos de «settings.sh»: Operación no permitida rgarcia@cener-desktop:/opt/openfoam171/etc$ chmod +w settings.sh chmod: cambiando los permisos de «settings.sh»: Operación no permitida I can copy the file to another folder and change it but then i'm not able to paste it again... I had already tried the other suggestions you made and it doesn't seems to have anything to do with that! Thanks again Bruno! |
||
June 3, 2011, 10:31 |
|
#11 |
Member
Nicolas Lussier Clément
Join Date: Apr 2009
Location: Montréal, Qc, Canada
Posts: 61
Rep Power: 17 |
Hi rgarcia
how many cell do you have ? It work with first order and parallel but not with second order and parallel that is what you say? Are you using some custom code ? Did you try without it ? Regards Bruno, I'd like to ask you a question: What is the meaning of minBufferSize:=20000000 ?? What dos it limit ? Regards Nicolas Lussier |
|
June 4, 2011, 09:21 |
|
#12 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
@rgarcia: you should run like this: Code:
sudo chmod o+w settings.sh As for "o+w", this will give the proper permission for you to edit the file directly without sudo. After changing it, you can use the option "o-w" to revert the change. @Nicolas: MPI_BUFFER_SIZE indicates the minimum message size in bytes required for communications between MPI processes. I'm suggesting this solution in an attempt to check if it's an MPI related problem or an OpenFOAM problem. @rgarcia: In a related note, you might also want to create a small case in the mean time that reproduces this same problem, because it might be necessary to report this as a bug, after we've tried to isolate the problem. But still, after increasing the message size, trying with fewer cores is also a good idea, in an attempt to isolate the problem. Best regards, Bruno
__________________
|
|
June 6, 2011, 04:40 |
|
#13 |
New Member
Join Date: May 2011
Posts: 8
Rep Power: 15 |
Greetings Nicolas and Bruno!
@Bruno: Finally I could change de MPI_BUFFER_SIZE but apparently It's not a problem of message size. Where should I write a repport for my bug? @Nicolas: I try two cases, one very simple (50000 cells) and the other 500000 cells. I wrote a bash that allows me to do a rose wind study. The study begins at 0 direction (adding the velocity components in /0/U) and after 1000 iterations, it change to the direction 22.5, etc. (16 directions in total). I'm custom model in turbulence. The case work in second order for coarser and finest grid... but until 4 processor! If I run the case with 5, 6, 7 or 8 processor it didn't work! And the message that always appear it's "mpirun... signal 11 (Segmentation fault)". |
|
June 6, 2011, 17:09 |
|
#14 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi rgarcia,
OK, you can report the possible bug here: http://www.openfoam.com/bugs/ Giving a small test case and making a full description of the problem is the best thing to do. On a side note, OpenFOAM has some issues with patches that are divided between sub-domains. I suspect that this may be the problem that is occurring here. I vaguely remember that there is an option for enforcing patches to not be split apart... you can start reading here: http://www.cfd-online.com/Forums/ope...tml#post305687 Best regards, Bruno
__________________
|
|
April 18, 2017, 11:05 |
mpirun noticed
|
#15 |
New Member
Kay
Join Date: Jan 2016
Posts: 5
Rep Power: 10 |
Hello All, Hello Bruno,
Sorry to ressurrect this old thread again. Has the problem been solved? I tried to mesh a CT data using snappyHexMesh and I get this error. =>mpirun noticed that process rank 0 with PID 22908 on node bophy102 exited on signal 9 (Killed). I followed the suggested procedures, thus: changed minBufferSize :=200000000 and used only 4 cores but does not solve the problem.Please let me know if there was a solution already. Regards, K. |
|
April 24, 2017, 14:24 |
|
#16 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings KayGhana,
As long as it is on topic, it's preferred here on CFD-Online to re-use threads for the same specific topic, instead of starting a new one. Quote:
Therefore, please provide more details, specification the "log" file for snappyHexMesh run that you did and that resulted in that error message. Because mpirun only told you what it noticed that had happened to the application, it was not able to specify why exactly that happened. That's when we should look at the application output (the contents of the log file) to see what happened before the crash. Best regards, Bruno
__________________
|
||
February 5, 2018, 11:45 |
|
#17 |
Member
Join Date: Dec 2017
Location: Germany
Posts: 48
Rep Power: 9 |
I do also have a problem while using mpirun. I was running the chtMultiRegionFoam solver and get following message
Code:
mpirun noticed that process rank 22 with PID 36705 on node cserver exited on signal 8 (Floating point exception). What is the problem and how do I solve it? |
|
March 30, 2023, 13:53 |
|
#18 |
New Member
Join Date: Dec 2022
Posts: 2
Rep Power: 0 |
I had a similar problem with same error.
Parallel processing for a case wasn't working. Although for other cases, it had no problems and I could easily run a parallel simulation. Also running the case on a single processer (serial simulation) was also OK and it worked with no problem. So I thought my problem is because of decomposition. I changed my decomposition method from "scotch" to "simple" (in my decomposeParDict). And it solved my issue. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[snappyHexMesh] SHM mpirun rank 4, PID 0 exited on signal 9 (kill) | mwaqas | OpenFOAM Meshing & Mesh Conversion | 1 | February 29, 2024 10:25 |
[ImmersedBoundary] About the moving immersed boundary tutorial: icoDyMIbFoam+movingCylinderInChannelIco | wyldckat | OpenFOAM Community Contributions | 25 | September 14, 2021 18:15 |
Error: received a fatal signal (Segmentation fault). Error Object: #f | Naveen Kumar Gulla | FLUENT | 0 | May 18, 2018 15:12 |
fatal signal segmentation fault | flo90000 | Fluent UDF and Scheme Programming | 16 | October 26, 2017 19:28 |
receive fluent received a fatal signal (Segmentation fault). | chenkaiqe | FLUENT | 2 | March 10, 2015 09:21 |