|
[Sponsors] |
Remote cluster parallel solve without master, Ansys CFX 14.5 |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
May 6, 2013, 17:19 |
Remote cluster parallel solve without master, Ansys CFX 14.5
|
#1 |
New Member
Andres M. Aguirre Mesa
Join Date: May 2013
Posts: 1
Rep Power: 0 |
Hi.
I'm currently trying to configure Ansys CFX 14.5 to run on a Linux Cluster (Rocks 6.1). I've already followed all the installation process, including environment variables. The communication via ssh is working, and I'm using Platform MPI. I configured the hostinfo.ccl file. I'm even able to run in distributed parallel mode using this sintax: cfx5solve -def input.def -start-method "Platform MPI Distributed Parallel" -par-dist "master,node01*2,node02*2" The problem is that I'm not allowed to run using master node because the cluster belongs to the university I work for. I've tried to cheat cfx using, for example "master*0" or removing master, but the program fails with the following message: Unable to find the master host cluster.domain.edu in the host list: at least one partition must be assigned to the master host. I've also tried launching the run from node01, but I got something like this: MPI Application rank 0 exited before MPI_Finalize() with status 2 An error has occurred in cfx5remote on compute-2-0.local: /share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was interrupted by signal TERM (15) An error has occurred in cfx5remote on compute-2-0.local: /share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was interrupted by signal TERM (15) An error has occurred in cfx5remote on compute-2-0.local: /share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was interrupted by signal TERM (15) An error has occurred in cfx5remote on compute-2-1.local: /share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was interrupted by signal TERM (15) An error has occurred in cfx5remote on compute-2-1.local: /share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was interrupted by signal TERM (15) An error has occurred in cfx5solve: The ANSYS CFX solver could not be started, or exited with return code 255. No results file has been created. Running at least in one processor of the master is our last option. Users are allowed to log in to the master node and launch programs from it, but are not allowed to use master processors. I also configured APDL and I'm able to do something similar to the above, using this sintax: ansys145 -dis -b -machines compute-2-0:2:compute-2-1:2 < input.dat > output.out Is it possible to something similar with CFX? Regards, A. Aguirre. |
|
August 27, 2013, 07:34 |
|
#2 |
New Member
Anonymous
Join Date: Aug 2013
Posts: 2
Rep Power: 0 |
Hello!
I just installed Rocks 6.1 on a small cluster to run Ansys CFX and I am having the same problem; how to set up parallel runs without the head node... Did you ever find a solution to this? Best regards, John |
|
August 28, 2013, 13:55 |
|
#3 |
Senior Member
Bruno
Join Date: Mar 2009
Location: Brazil
Posts: 277
Rep Power: 21 |
CFX requires that the computer you're logged at be a part of the simulation. There is a way to do what you want called indirect start, but it involves editing some of the files from the CFX setup ('CFX/etc/start-methods.ccl') plus writing some scripts. It can be done, but its a hassle, so skip it.
Instead, just send the solver command though SSH to one of the nodes that belong to the simulation. You'll need an additional option (-chdir) directing CFX to run the solver on a specified path, though, or else it'll run on your home directory. Your command line will be something like this: Code:
ssh node01 cfx5solve -def input.def -chdir /path/to/deffile -start-method \"Platform MPI Distributed Parallel\" -par-dist \"node01*2,node02*2\" -batch <other_options> That works fine (I use it here), as long as you've got SSH configured not to ask for passwords (which you probably already do). Cheers |
|
August 29, 2013, 02:34 |
Thanks, that worked!
|
#4 | |
New Member
Anonymous
Join Date: Aug 2013
Posts: 2
Rep Power: 0 |
Quote:
Hello, Bruno! The method you suggested works very well. Thank you! On our setup the head node is the only node that can see the license server, so using the above method manages to get licenses and then run the parallel computation on the nodes only. This is exactly what I was looking for. Great! We want to avoid using the head node in the calculations because this node is presumably slower than the compute nodes (head has one E5-2620 @ 2.0 GHz versus dual E5-2643 @ 3.3 GHz on the nodes). We have not been able to do much testing yet, but we suspect that using cores from the head node in the calculations might slow things down. We have set the Relative Speed settings in hostinfo.ccl, but still think it is wise to avoid using the slower cores. Do you know if this actually makes a difference? Anyway, not using any cores on the head node gives it more resources to keep Ganglia running smoothly and keeping the disk system happy . Speaking of Ganglia, do you know a way to get it to keep the head node out of the reporting and resource use displays? The head node is primarily there to help us with interfacing and utilizing the cluster, so I think it is a bit odd that it as default includes its own CPU cores in the statistics and reporting. Not a huge problem, but... Now I just need to get the InfiniBand network up to speed... Best regards, John |
||
January 27, 2015, 10:15 |
|
#5 |
New Member
Felipe Mendes
Join Date: Mar 2012
Location: Lausanne
Posts: 11
Rep Power: 14 |
Dear friends,
I have a similar problem. We've got a cluster with 3 nodes: 1 master node with 12 procs and 2 computing nodes with 16 procs. I've tried to launch CFX only on the computing nodes using the previous sintax posted on the forum, but it doesn't find the license. If I put at least one proc on the master node it will run pretty well. The license server is the master node. On the ".out" the computing nodes seems to find the license server but search for acfd_cfx license for the solver (which does not find) instead of the correct name "acfx_cfx_solver". Does anyone know what may be the problem? Thanks! |
|
Tags |
cfxsolver, clusters, distributed parallel, without master |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Compressible Flow in Ansys CFX | bcheruk | CFX | 15 | July 6, 2017 07:30 |
FSI and parallel processing | Jorn | CFX | 5 | June 8, 2007 16:53 |
Temperature transferring from CFX to ANSYS? | Se-Hee | CFX | 0 | November 28, 2006 06:56 |
CFX - Parallel Problems | CFX User | CFX | 0 | November 1, 2004 19:12 |
ANSYS to acquire CFX | Fred | Siemens | 0 | February 18, 2003 22:03 |