CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM

problems after decomposing for running

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 18, 2011, 06:47
Default problems after decomposing for running
  #1
Member
 
alessio.nz's Avatar
 
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16
alessio.nz is on a distinguished road
Hello, I had a mesh with the decomposePartDict included and I could use this flie for running in parallel without problem. The mesh splitted well and then the running was perfect (this file is actually set in order to split my domain in more than one node of the cluster - each node has 8 cores, so for example I can run in 4 node = 32 cores)

I wanted to use the same file for another mesh, but after splitting the domains in the 32 processors, apparently without errors,

Number of processor faces = 50892
Max number of processor patches = 8
Max number of faces between processors = 9008

Processor 0: field transfer
Processor 1: field transfer
Processor 2: field transfer
Processor 3: field transfer
Processor 4: field transfer
Processor 5: field transfer
Processor 6: field transfer
Processor 7: field transfer
Processor 8: field transfer
Processor 9: field transfer
Processor 10: field transfer
Processor 11: field transfer
Processor 12: field transfer
Processor 13: field transfer
Processor 14: field transfer
Processor 15: field transfer
Processor 16: field transfer
Processor 17: field transfer
Processor 18: field transfer
Processor 19: field transfer
Processor 20: field transfer
Processor 21: field transfer
Processor 22: field transfer
Processor 23: field transfer
Processor 24: field transfer
Processor 25: field transfer
Processor 26: field transfer
Processor 27: field transfer
Processor 28: field transfer
Processor 29: field transfer
Processor 30: field transfer
Processor 31: field transfer

End.

I tried to run with the foamJob -p simpleFoam and gives the following error:


Executing: mpirun -np 32 -hostfile system/machines /cvos/shared/apps/OpenFOAM/OpenFOAM-1.7.1/bin/foamExec simpleFoam -parallel > log 2>&1
[user@cluster]$ tail -f log
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

do you know what could it be? I attach the file on the mail.
alessio.nz is offline   Reply With Quote

Old   April 18, 2011, 07:33
Default
  #2
Senior Member
 
Steven van Haren
Join Date: Aug 2010
Location: The Netherlands
Posts: 149
Rep Power: 16
stevenvanharen is on a distinguished road
it seems like the call to mpi generated by the foamJob script is not correct. (I miss the file specifying the machines)

Read section 3.4 in the user guide and try to run mpi without using the foamJob script.
stevenvanharen is offline   Reply With Quote

Old   April 18, 2011, 10:46
Default
  #3
Member
 
alessio.nz's Avatar
 
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16
alessio.nz is on a distinguished road
This is the command I put:
mpirun --hostfile system/machines -np 32 SimpleFoam -parallel

and this is what I got:
--------------------------------------------------------------------------
Open RTE detected a parse error in the hostfile:
system/machines
It occured on line number 1 on token 1.
--------------------------------------------------------------------------
[elmo:11368] [[22308,0],0] ORTE_ERROR_LOG: Error in file base/ras_base_allocate.c at line 236
[elmo:11368] [[22308,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 72
[elmo:11368] [[22308,0],0] ORTE_ERROR_LOG: Error in file plm_rsh_module.c at line 990
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
alessio.nz is offline   Reply With Quote

Old   April 18, 2011, 10:58
Default
  #4
Senior Member
 
Steven van Haren
Join Date: Aug 2010
Location: The Netherlands
Posts: 149
Rep Power: 16
stevenvanharen is on a distinguished road
Quote:
Originally Posted by alessio.nz View Post
--------------------------------------------------------------------------
Open RTE detected a parse error in the hostfile:
system/machines
It occured on line number 1 on token 1.
--------------------------------------------------------------------------
Somehow it is not happy with your machines file, are you sure you set the right names for the remote nodes in the "machines" file?
stevenvanharen is offline   Reply With Quote

Old   April 18, 2011, 11:11
Default
  #5
Member
 
alessio.nz's Avatar
 
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16
alessio.nz is on a distinguished road
yes, I am sure, I was working with another mesh and it work perfectly, the problem is that with this different one the splitting seems ok, but once I am running it crashes giving the errors I mentioned
alessio.nz is offline   Reply With Quote

Old   April 20, 2011, 09:44
Default Re:
  #6
Member
 
alessio.nz's Avatar
 
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16
alessio.nz is on a distinguished road
Hello, finally it worked, maybe there was a problem in the cluster itself. Anyway thanks for the help.regards
alessio.nz is offline   Reply With Quote

Old   December 23, 2015, 15:27
Default
  #7
New Member
 
alireza
Join Date: Jul 2010
Posts: 12
Rep Power: 16
alireza2475 is on a distinguished road
Quote:
Originally Posted by stevenvanharen View Post
Somehow it is not happy with your machines file, are you sure you set the right names for the remote nodes in the "machines" file?
Just in case for anyone else may face the problem:

There is something wrong in the hostname file as steve mentioned.
Sometimes, even if you copy a working file for a new run, it's not gonna work. I suggest that you create another hostname file from scratch. I have just had the same problem by running a system that worked perfectly before. I just wrote the machine names again and it works now.
alireza2475 is offline   Reply With Quote

Old   March 5, 2021, 05:49
Unhappy
  #8
New Member
 
Islamabad
Join Date: Mar 2021
Posts: 1
Rep Power: 0
kashaf is on a distinguished road
Quote:
Originally Posted by alessio.nz View Post
Hello, finally it worked, maybe there was a problem in the cluster itself. Anyway thanks for the help.regards
HEY HI , How did you resolve this issue ,I am facing the same error
kashaf is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Needed Benchmark Problems for FSI Mechstud Main CFD Forum 4 July 26, 2011 13:13
Two-phase air water flow problems by activating Wall Lubrication Force challenger85 CFX 5 November 5, 2009 06:44
Help required to solve Hydraulic related problems aero CFX 0 October 30, 2006 12:00
Some problems with Star CD Micha Siemens 0 August 6, 2003 14:55
Inverse problems Aleksey Alekseev Main CFD Forum 0 May 12, 1999 16:38


All times are GMT -4. The time now is 12:40.