CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

mpirun problems: exited on signal 11 (segmentation fault)

Register Blogs Community New Posts Updated Threads Search

Like Tree5Likes
  • 1 Post By wyldckat
  • 1 Post By rgarcia
  • 3 Post By wyldckat

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 4, 2010, 09:01
Default mpirun problems: exited on signal 11 (segmentation fault)
  #1
Senior Member
 
Join Date: Feb 2010
Posts: 213
Rep Power: 17
vaina74 is on a distinguished road
I installed OpenFOAM-1.6.x and something strange happened. If I launch a parallel running:
Code:
foamJob -p -s simpleFoam
I obtain
Code:
mpirun noticed that process rank 1 with PID [4 digits] on node xxx-laptop
exited on signal 11 (segmentation fault)
and the Ubuntu freezes!
Then I followed a test procedure (see here, post 19-20) and the output seemed correct. I runned the case in parallel mode again and all was ok. A heisenbug, it was suggested.
Now the problem came back, the parallel test output is:
Code:
Parallel processing using OPENMPI with 2 processors
Executing: mpirun -np 2 /home/giulia/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log
Building on  2  cores
Building on  2  cores
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.6.x                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-069803848c44
Exec   : parallelTest -parallel
Date   : May 04 2010
Time   : 13:44:38
Host   : giulia-laptop
PID    : 2150
Case   : /home/giulia/OpenFOAM/giulia-1.6.x/run/hydrofoil_0
nProcs : 2
Slaves : 
1
(
giulia-laptop.2151
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

[1] [0] 
Starting transfers
[1] 
[1] slave sending to master 0
[1] slave receiving from master 0

Starting transfers
[0] 
[0] master receiving from slave 1
[0] (0 1 2)
[0] master sending to slave 1
[1] (0 1 2)
End

Finalising parallel run
but when I run my case I always obtain
Code:
mpirun noticed that process rank 1 with PID [4 digits] on node  xxx-laptop
exited on signal 11 (segmentation fault)
Please, help me!
vaina74 is offline   Reply With Quote

Old   May 4, 2010, 09:19
Default
  #2
Senior Member
 
Join Date: Feb 2010
Posts: 213
Rep Power: 17
vaina74 is on a distinguished road
mh. Maybe it's an amount of memory question, but I can't understand why I had no problems before. I'm not an expert of Ubuntu, can anyone help me?
vaina74 is offline   Reply With Quote

Old   May 4, 2010, 10:41
Default
  #3
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hello Maurizio, it's me again

Uhm, you didn't elaborate on what happened last time. Possibly it's a swap problem; read the Swap FAQ at help.ubuntu and increase your Ubuntu's swap size.

Then try again to crash your Ubuntu

Best regards,
Bruno

PS: later in the day I'll review the post you made on how to have a side-by-side OpenFOAM 1.6 + 1.6.x installation
febriyan91 likes this.
__________________

Last edited by wyldckat; May 4, 2010 at 10:42. Reason: serious typo... typed Ubuntu instead of OpenFOAM :P
wyldckat is offline   Reply With Quote

Old   May 4, 2010, 11:48
Default
  #4
Senior Member
 
Join Date: Feb 2010
Posts: 213
Rep Power: 17
vaina74 is on a distinguished road
You are my angel, do you know it?
I expanded the notebook memory, adding a 512 Mb swap file. And now mpirun works! Well, I was afraid of having to install my (few and not so smart) neurones on my notebook
Thank you very much, Bruno.
vaina74 is offline   Reply With Quote

Old   May 4, 2010, 18:19
Default
  #5
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
You're welcome and I'm glad it actually wasn't an heisenbug

By the way, I don't remember seeing this written in OpenFOAM's forum, nor on the unofficial openfoamwiki.net, but by my experience, there is a minimum amount of RAM specifically required for doing a full build of OpenFOAM. The magic number is somewhere between 1.3GiB and 1.5GiB of RAM, and swap won't cover that necessity!!

Best regards,
Bruno
__________________

Last edited by wyldckat; May 4, 2010 at 18:19. Reason: typo...
wyldckat is offline   Reply With Quote

Old   May 31, 2011, 07:20
Default
  #6
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 15
rgarcia is on a distinguished road
Hey guys!

I want to run a simulation through a bash script. The aim is to simulate wind coming from 16 differents directions. When I run the simulation (16 cases one after another) and I use a first order scheme for divergence, there is no problem. Nevertheless, when I run the same simulation in a second order scheme, my computer stop running and I have to reboot it. The error I obtain is:

-----------------------------------------------------------------------------------------------------------------------------------------------
mpirun noticed that process rank 5 with PID 1890 on node cener-desktop exited on signal 11 (segmentation fault)
-----------------------------------------------------------------------------------------------------------------------------------------------

As I understand by your message, it could be a problem of RAM or swap memory although I have 15 GB of RAM memory and 12 GB of Swap space, so I think that the memory shouldn't be a problem!

Do you have any idea???

Thanks a lot!

PS: I don't know if it matters, but I use "mpirun -np 8 simpleFoam -parallel" to run the simulations
ebrahim27 likes this.
rgarcia is offline   Reply With Quote

Old   May 31, 2011, 07:33
Default
  #7
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings rgarcia and welcome to the forum!

Mmm, have you tried running in serial to see if that order works at all?
Try monitoring how much RAM the simpleFoam processes are using and see if it crashes when they were increasing RAM ocupation. Another problem could be insufficient contiguous memory, i.e., allocating 3GB in a single matrix on RAM, when there are the RAM is loaded with various processes that occupy in various locations... although I haven't seen many problems like that lately...

Yet another possibility is that there isn't enough MPI buffer length for communication. That's definable... in "OpenFOAM-*/etc/settings.sh" if I'm not mistaken. I would have to verify the variable name, but right now I can't.

Good luck!
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   May 31, 2011, 09:25
Default
  #8
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 15
rgarcia is on a distinguished road
Hey Bruno!

Thanks for your quick reply!

In serial it works good... but it takes very long! The thing that I don't understand is that when I do some directions it works (in parallel) but sometimes it didn't...

It has to be a reason but it seems random! I'm becoming crazy!
Aparently, the problem is combining second order and parallel running... (any second order schemes work well for the 16 directions)

If you have any more suggestion I'll be glad to receive it! In any case, thank you very much!
rgarcia is offline   Reply With Quote

Old   June 1, 2011, 05:03
Default
  #9
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi rgarcia,

  • Edit the file OpenFOAM*/etc/settings.sh;
  • Find the lines that have this:
    Code:
    # Set the minimum MPI buffer size (used by all platforms except SGI MPI)
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    : ${minBufferSize:=20000000}
    
    
    if [ "${MPI_BUFFER_SIZE:=$minBufferSize}" -lt $minBufferSize ]
    then
        MPI_BUFFER_SIZE=$minBufferSize
    fi
    export MPI_BUFFER_SIZE
  • Change 20000000 to 200000000.
  • Save the file.
  • Start a new terminal and try running it in parallel again.
Other possibilities is to try and divide the mesh in fewer or more sub-domains.
And have you checked the sanity of the mesh, by running checkMesh?

Other than these, it could have to do with boundary conditions or some configuration you're overlooking, something like maxCo or some other thing like that

Best regards,
Bruno
sunshuai, elham usefi and kcavatar like this.
__________________
wyldckat is offline   Reply With Quote

Old   June 2, 2011, 05:59
Default
  #10
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 15
rgarcia is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Hi rgarcia,

  • Edit the file OpenFOAM*/etc/settings.sh;
  • Find the lines that have this:
    Code:
    # Set the minimum MPI buffer size (used by all platforms except SGI MPI)
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    : ${minBufferSize:=20000000}
    
    
    if [ "${MPI_BUFFER_SIZE:=$minBufferSize}" -lt $minBufferSize ]
    then
        MPI_BUFFER_SIZE=$minBufferSize
    fi
    export MPI_BUFFER_SIZE
  • Change 20000000 to 200000000.
  • Save the file.
  • Start a new terminal and try running it in parallel again.
Hi Bruno,

Has you recommend, I'm trying to change the settings.sh file, but I can't:

rgarcia@cener-desktop:/opt/openfoam171/etc$ chmod +x settings.sh
chmod: cambiando los permisos de «settings.sh»: Operación no permitida
rgarcia@cener-desktop:/opt/openfoam171/etc$ chmod +w settings.sh
chmod: cambiando los permisos de «settings.sh»: Operación no permitida

I can copy the file to another folder and change it but then i'm not able to paste it again...

I had already tried the other suggestions you made and it doesn't seems to have anything to do with that!

Thanks again Bruno!
rgarcia is offline   Reply With Quote

Old   June 3, 2011, 10:31
Default
  #11
nlc
Member
 
nlc's Avatar
 
Nicolas Lussier Clément
Join Date: Apr 2009
Location: Montréal, Qc, Canada
Posts: 61
Rep Power: 17
nlc is on a distinguished road
Hi rgarcia

how many cell do you have ?
It work with first order and parallel but not with second
order and parallel that is what you say?
Are you using some custom code ? Did you try without it ?

Regards

Bruno, I'd like to ask you a question:

What is the meaning of minBufferSize:=20000000 ??
What dos it limit ?

Regards

Nicolas Lussier
nlc is offline   Reply With Quote

Old   June 4, 2011, 09:21
Default
  #12
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

@rgarcia: you should run like this:
Code:
sudo chmod o+w settings.sh
The sudo command will request your password to run the application as superuser, namely as root. This is necessary because the /opt folder is a system folder, from where everyone can read and execute, but only the root user can make changes to the files.
As for "o+w", this will give the proper permission for you to edit the file directly without sudo. After changing it, you can use the option "o-w" to revert the change.


@Nicolas: MPI_BUFFER_SIZE indicates the minimum message size in bytes required for communications between MPI processes.
I'm suggesting this solution in an attempt to check if it's an MPI related problem or an OpenFOAM problem.


@rgarcia: In a related note, you might also want to create a small case in the mean time that reproduces this same problem, because it might be necessary to report this as a bug, after we've tried to isolate the problem.
But still, after increasing the message size, trying with fewer cores is also a good idea, in an attempt to isolate the problem.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   June 6, 2011, 04:40
Default
  #13
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 15
rgarcia is on a distinguished road
Greetings Nicolas and Bruno!

@Bruno: Finally I could change de MPI_BUFFER_SIZE but apparently It's not a problem of message size. Where should I write a repport for my bug?

@Nicolas: I try two cases, one very simple (50000 cells) and the other 500000 cells. I wrote a bash that allows me to do a rose wind study. The study begins at 0 direction (adding the velocity components in /0/U) and after 1000 iterations, it change to the direction 22.5, etc. (16 directions in total). I'm custom model in turbulence.

The case work in second order for coarser and finest grid... but until 4 processor! If I run the case with 5, 6, 7 or 8 processor it didn't work! And the message that always appear it's "mpirun... signal 11 (Segmentation fault)".
rgarcia is offline   Reply With Quote

Old   June 6, 2011, 17:09
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi rgarcia,

OK, you can report the possible bug here: http://www.openfoam.com/bugs/
Giving a small test case and making a full description of the problem is the best thing to do.

On a side note, OpenFOAM has some issues with patches that are divided between sub-domains. I suspect that this may be the problem that is occurring here.

I vaguely remember that there is an option for enforcing patches to not be split apart... you can start reading here: http://www.cfd-online.com/Forums/ope...tml#post305687

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   April 18, 2017, 11:05
Default mpirun noticed
  #15
New Member
 
Kay
Join Date: Jan 2016
Posts: 5
Rep Power: 10
KayGhana is on a distinguished road
Hello All, Hello Bruno,

Sorry to ressurrect this old thread again.

Has the problem been solved?

I tried to mesh a CT data using snappyHexMesh and I get this error.

=>mpirun noticed that process rank 0 with PID 22908 on node bophy102 exited on signal 9 (Killed).

I followed the suggested procedures, thus:
changed minBufferSize :=200000000

and used only 4 cores but does not solve the problem.Please let me know if there was a solution already.

Regards,

K.
KayGhana is offline   Reply With Quote

Old   April 24, 2017, 14:24
Default
  #16
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings KayGhana,

Quote:
Originally Posted by KayGhana View Post
Sorry to ressurrect this old thread again.
As long as it is on topic, it's preferred here on CFD-Online to re-use threads for the same specific topic, instead of starting a new one.

Quote:
Originally Posted by KayGhana View Post
Has the problem been solved?

I tried to mesh a CT data using snappyHexMesh and I get this error.

=>mpirun noticed that process rank 0 with PID 22908 on node bophy102 exited on signal 9 (Killed).
Unfortunately back then I didn't know as much as I do today and I forgot to ask for more details about the case, specifically what were the error messages and output that the solver gave in that situation.

Therefore, please provide more details, specification the "log" file for snappyHexMesh run that you did and that resulted in that error message.
Because mpirun only told you what it noticed that had happened to the application, it was not able to specify why exactly that happened. That's when we should look at the application output (the contents of the log file) to see what happened before the crash.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   February 5, 2018, 11:45
Default
  #17
Eko
Member
 
Join Date: Dec 2017
Location: Germany
Posts: 48
Rep Power: 8
Eko is on a distinguished road
I do also have a problem while using mpirun. I was running the chtMultiRegionFoam solver and get following message

Code:
mpirun noticed that process rank 22 with PID 36705 on node cserver exited on signal 8 (Floating point exception).
It's my first time using mpirun so I have no clue what to do.
What is the problem and how do I solve it?
Eko is offline   Reply With Quote

Old   March 30, 2023, 13:53
Post
  #18
New Member
 
Join Date: Dec 2022
Posts: 2
Rep Power: 0
Hojjat is on a distinguished road
I had a similar problem with same error.

Parallel processing for a case wasn't working. Although for other cases, it had no problems and I could easily run a parallel simulation. Also running the case on a single processer (serial simulation) was also OK and it worked with no problem.

So I thought my problem is because of decomposition. I changed my decomposition method from "scotch" to "simple" (in my decomposeParDict). And it solved my issue.
Hojjat is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[snappyHexMesh] SHM mpirun rank 4, PID 0 exited on signal 9 (kill) mwaqas OpenFOAM Meshing & Mesh Conversion 1 February 29, 2024 10:25
[ImmersedBoundary] About the moving immersed boundary tutorial: icoDyMIbFoam+movingCylinderInChannelIco wyldckat OpenFOAM Community Contributions 25 September 14, 2021 18:15
Error: received a fatal signal (Segmentation fault). Error Object: #f Naveen Kumar Gulla FLUENT 0 May 18, 2018 15:12
fatal signal segmentation fault flo90000 Fluent UDF and Scheme Programming 16 October 26, 2017 19:28
receive fluent received a fatal signal (Segmentation fault). chenkaiqe FLUENT 2 March 10, 2015 09:21


All times are GMT -4. The time now is 07:33.