|
[Sponsors] |
April 18, 2011, 06:47 |
problems after decomposing for running
|
#1 |
Member
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16 |
Hello, I had a mesh with the decomposePartDict included and I could use this flie for running in parallel without problem. The mesh splitted well and then the running was perfect (this file is actually set in order to split my domain in more than one node of the cluster - each node has 8 cores, so for example I can run in 4 node = 32 cores)
I wanted to use the same file for another mesh, but after splitting the domains in the 32 processors, apparently without errors, Number of processor faces = 50892 Max number of processor patches = 8 Max number of faces between processors = 9008 Processor 0: field transfer Processor 1: field transfer Processor 2: field transfer Processor 3: field transfer Processor 4: field transfer Processor 5: field transfer Processor 6: field transfer Processor 7: field transfer Processor 8: field transfer Processor 9: field transfer Processor 10: field transfer Processor 11: field transfer Processor 12: field transfer Processor 13: field transfer Processor 14: field transfer Processor 15: field transfer Processor 16: field transfer Processor 17: field transfer Processor 18: field transfer Processor 19: field transfer Processor 20: field transfer Processor 21: field transfer Processor 22: field transfer Processor 23: field transfer Processor 24: field transfer Processor 25: field transfer Processor 26: field transfer Processor 27: field transfer Processor 28: field transfer Processor 29: field transfer Processor 30: field transfer Processor 31: field transfer End. I tried to run with the foamJob -p simpleFoam and gives the following error: Executing: mpirun -np 32 -hostfile system/machines /cvos/shared/apps/OpenFOAM/OpenFOAM-1.7.1/bin/foamExec simpleFoam -parallel > log 2>&1 [user@cluster]$ tail -f log libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- mpirun: clean termination accomplished do you know what could it be? I attach the file on the mail. |
|
April 18, 2011, 07:33 |
|
#2 |
Senior Member
Steven van Haren
Join Date: Aug 2010
Location: The Netherlands
Posts: 149
Rep Power: 16 |
it seems like the call to mpi generated by the foamJob script is not correct. (I miss the file specifying the machines)
Read section 3.4 in the user guide and try to run mpi without using the foamJob script. |
|
April 18, 2011, 10:46 |
|
#3 |
Member
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16 |
This is the command I put:
mpirun --hostfile system/machines -np 32 SimpleFoam -parallel and this is what I got: -------------------------------------------------------------------------- Open RTE detected a parse error in the hostfile: system/machines It occured on line number 1 on token 1. -------------------------------------------------------------------------- [elmo:11368] [[22308,0],0] ORTE_ERROR_LOG: Error in file base/ras_base_allocate.c at line 236 [elmo:11368] [[22308,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 72 [elmo:11368] [[22308,0],0] ORTE_ERROR_LOG: Error in file plm_rsh_module.c at line 990 -------------------------------------------------------------------------- A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- mpirun: clean termination accomplished |
|
April 18, 2011, 10:58 |
|
#4 |
Senior Member
Steven van Haren
Join Date: Aug 2010
Location: The Netherlands
Posts: 149
Rep Power: 16 |
Somehow it is not happy with your machines file, are you sure you set the right names for the remote nodes in the "machines" file?
|
|
April 18, 2011, 11:11 |
|
#5 |
Member
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16 |
yes, I am sure, I was working with another mesh and it work perfectly, the problem is that with this different one the splitting seems ok, but once I am running it crashes giving the errors I mentioned
|
|
April 20, 2011, 09:44 |
Re:
|
#6 |
Member
Alex
Join Date: Apr 2010
Posts: 48
Rep Power: 16 |
Hello, finally it worked, maybe there was a problem in the cluster itself. Anyway thanks for the help.regards
|
|
December 23, 2015, 15:27 |
|
#7 | |
New Member
alireza
Join Date: Jul 2010
Posts: 12
Rep Power: 16 |
Quote:
There is something wrong in the hostname file as steve mentioned. Sometimes, even if you copy a working file for a new run, it's not gonna work. I suggest that you create another hostname file from scratch. I have just had the same problem by running a system that worked perfectly before. I just wrote the machine names again and it works now. |
||
March 5, 2021, 05:49 |
|
#8 |
New Member
Islamabad
Join Date: Mar 2021
Posts: 1
Rep Power: 0 |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Needed Benchmark Problems for FSI | Mechstud | Main CFD Forum | 4 | July 26, 2011 13:13 |
Two-phase air water flow problems by activating Wall Lubrication Force | challenger85 | CFX | 5 | November 5, 2009 06:44 |
Help required to solve Hydraulic related problems | aero | CFX | 0 | October 30, 2006 12:00 |
Some problems with Star CD | Micha | Siemens | 0 | August 6, 2003 14:55 |
Inverse problems | Aleksey Alekseev | Main CFD Forum | 0 | May 12, 1999 16:38 |