CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

OpenFOAM parallel running error in cluster

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 24, 2014, 06:14
Default OpenFOAM parallel running error in cluster
  #1
Member
 
vishal
Join Date: Mar 2013
Posts: 73
Rep Power: 13
vishal_s is on a distinguished road
Hi all,
I am trying to run my case at openFOAM in cluster. But somehow its not working and showing following error...
Code:
[vishal@iceng1 case_parallel]$ mpirun --hostfile iceng1.hpc.com -np 4 turbulentFlameletRhoSimpleFoam -parallel >data&
[1] 129047
[vishal@iceng1 case_parallel]$ --------------------------------------------------------------------------
Open RTE was unable to open the hostfile:
    iceng1.hpc.com
Check to make sure the path and filename are correct.
--------------------------------------------------------------------------
[iceng1.hpc.com:129047] [[35788,0],0] ORTE_ERROR_LOG: Not found in file base/ras_base_allocate.c at line 236
[iceng1.hpc.com:129047] [[35788,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 72
[iceng1.hpc.com:129047] [[35788,0],0] ORTE_ERROR_LOG: Not found in file plm_rsh_module.c at line 990
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

^C
[1]+  Exit 1                  mpirun --hostfile iceng1.hpc.com -np 4 turbulentFlameletRhoSimpleFoam -parallel > data
Can anyone kindly suggest, how to sort out this problem..???

Thanks in advance

Regards
vishal

Last edited by wyldckat; March 1, 2014 at 08:46. Reason: Added [CODE][/CODE]
vishal_s is offline   Reply With Quote

Old   March 1, 2014, 08:53
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Vishal,

The error message clearly states:
Code:
Open RTE was unable to open the hostfile:
    iceng1.hpc.com
This is because you told mpirun to use that file name for a host-file:
Code:
mpirun --hostfile iceng1.hpc.com
Let me see if I can find a recent post I made on this topic... here you go, post #2: http://www.cfd-online.com/Forums/ope...tml#post466532

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   March 10, 2014, 07:19
Default
  #3
Member
 
vishal
Join Date: Mar 2013
Posts: 73
Rep Power: 13
vishal_s is on a distinguished road
Code:
[vishal@iceng1 SM1.kOmegaSST_parallel]$ mpirun  -np 4 turbulentFlameletRhoSimpleFoam -parallel >data&
[5] 24696
[vishal@iceng1 SM1.kOmegaSST_parallel]$ [iceng1.hpc.com:24697] [[5026,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/nidmap.c at line 117
[iceng1.hpc.com:24697] [[5026,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ess_env_module.c at line 174
[iceng1.hpc.com:24698] [[5026,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/nidmap.c at line 117
[iceng1.hpc.com:24698] [[5026,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ess_env_module.c at line 174
[iceng1.hpc.com:24699] [[5026,1],2] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/nidmap.c at line 117
[iceng1.hpc.com:24699] [[5026,1],2] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ess_env_module.c at line 174
[iceng1.hpc.com:24700] [[5026,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/nidmap.c at line 117
[iceng1.hpc.com:24700] [[5026,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ess_env_module.c at line 174
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[iceng1.hpc.com:24697] [[5026,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file runtime/orte_init.c at line 128
[iceng1.hpc.com:24698] [[5026,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file runtime/orte_init.c at line 128
[iceng1.hpc.com:24699] [[5026,1],2] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file runtime/orte_init.c at line 128
[iceng1.hpc.com:24700] [[5026,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file runtime/orte_init.c at line 128
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
--------------------------------------------------------------------------
[iceng1.hpc.com:24700] *** An error occurred in MPI_Init
[iceng1.hpc.com:24700] *** on a NULL communicator
[iceng1.hpc.com:24700] *** Unknown error
[iceng1.hpc.com:24700] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: iceng1.hpc.com
  PID:        24700
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 24700 on
node iceng1.hpc.com exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[iceng1.hpc.com:24696] 3 more processes have sent help message help-orte-runtime.txt / orte_init:startup:internal-failure
[iceng1.hpc.com:24696] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[iceng1.hpc.com:24696] 3 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure
[iceng1.hpc.com:24696] 3 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
[iceng1.hpc.com:24696] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[iceng1.hpc.com:24696] 3 more processes have sent help message help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
Thanx for your link..I am trying to sort it out...By the above command its not running...
I run my parallel case in one cluster only...

Vishal

Last edited by wyldckat; March 10, 2014 at 16:19. Reason: Added [CODE][/CODE]
vishal_s is offline   Reply With Quote

Old   March 10, 2014, 16:22
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Vishal,

A quick search online indicates that it's possible that you are using OpenFOAM with an incompatible Open-MPI version.
Was OpenFOAM compiled with the cluster's own Open-MPI version or was it compiled with the version supplied with OpenFOAM?

Best regards,
Bruno

PS: When you need to post code or screen output, such as the ones from your previous two posts, please follow the instructions from this link: Posting code and output with [CODE]
__________________
wyldckat is offline   Reply With Quote

Old   March 11, 2014, 02:02
Default
  #5
Member
 
vishal
Join Date: Mar 2013
Posts: 73
Rep Power: 13
vishal_s is on a distinguished road
Hi Bruno,
Actually I compiled with cluster openmpi ..there might be some issue ican't find it out...So I changed the path in bashrc file of openfoam...and compiled with supplied version of OpenFOAM...
Now this problem is showing....
Code:
        
[vishal@iceng1 SM1.kOmegaSST_parallel]$ mpirun  -np 4 turbulentFlameletRhoSimpleFoam -parallel >data&
[1] 77174
[vishal@iceng1 SM1.kOmegaSST_parallel]$ --------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not find an executable:

Executable: turbulentFlameletRhoSimpleFoam
Node: iceng1.hpc.com

while attempting to start process rank 0.
--------------------------------------------------------------------------
^C
[1]+  Exit 133                mpirun -np 4 turbulentFlameletRhoSimpleFoam -parallel > data
[vishal@iceng1 SM1.kOmegaSST_parallel]$
This is a flamelet solver compile in OpenFOAM..How i take care of this issue??

This is the path of flamelet solver...
Code:
     
 /export/home/vishal/OpenFOAM/OpenFOAM-2.1.1/flamelet-2.1/tutorials/turbulentFlameletRhoSimpleFoam/
Regards,
Vishal
vishal_s is offline   Reply With Quote

Old   March 11, 2014, 16:11
Default
  #6
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Vishal,

Try running with this command:
Code:
foamJob -p turbulentFlameletRhoSimpleFoam
It will define automatically the number of cores depending on your decomposed case.

For more information, run:
Code:
foamJob -help
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Case running in serial, but Parallel run gives error atmcfd OpenFOAM Running, Solving & CFD 18 March 26, 2016 13:40
Running OpenFOAM in parallel samiam1000 OpenFOAM 4 November 11, 2013 09:01
Running OpenFoam on a Computer Cluster in the Cloud - cloudnumbers.com Markus Schmidberger OpenFOAM Announcements from Other Sources 0 July 26, 2011 09:18
Running in Parallel on cluster NewFoamer OpenFOAM Running, Solving & CFD 3 November 3, 2010 17:20
Random machine freezes when running several OpenFoam jobs simultaneously 2bias OpenFOAM Installation 5 July 2, 2010 08:40


All times are GMT -4. The time now is 01:53.