|
[Sponsors] |
March 10, 2010, 15:39 |
Fluent jobs through pbs
|
#1 |
New Member
Ibad Kureshi
Join Date: Mar 2010
Posts: 5
Rep Power: 16 |
Hi all,
I am a system admin at a UK university and we got a request to provide an HPC resource for fluent. After installing Fluent on Our Cluster which is running CENTOS5.4 and is of the architecture nodes=16pn=4 plus a head node we were successfully able to start fluent vi the terminal and submit parallel jobs through the shell with a journal file and the -g switch. The University has 45 licenses for Fluent 6.3.26 and 30 licenses for an older version 6.0/2?? (not sure which). These licenses reside on a windows server with flexlm running on it. We have floating licenses for many softwares on that machine. When I try to submit a job through the job scheduler PBS the simulations do not run as there is a license problem, even though it seems to be looking in the right place. I have posted this on a PBS/TORQUE based forum as well but I thought since it dealt with fluent users here might be better help. I would appreciate any help regarding this. Below are the PBS submission script, the journal file, the PBS output file and the PBS error file respectively. ______________________________ PBS Submission Script ______________________________ #!/bin/bash # # Example PBS script to run a job on the myrinet-3 cluster. # The lines beginning #PBS set various queuing parameters. #PBS -m e # o -N Job Name #PBS -N fluent #PBS -M sengik@hud.ac.uk # # o -l resource lists that control where job goes # here we ask for 3 nodes, each with the attribute "p4". #PBS -l nodes=3 # # o Where to write output # asd PBS -e stderr # asd PBS -o stdout # # o Export all my environment variables to the job #PBS -V # fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in ______________________________ Journal File ______________________________ file/read-case /home/sengik/Desktop/test/2dcar_10.cas solve/initialize/initialize-flow file/write-data /home/sengik/Desktop/test/2dcar_10.dat exit yes #as you can see just a simple case of load initialise save and exit ______________________________ PBS Output File ______________________________ /usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in Loading "/usr/Fluent.Inc/fluent6.3.26/lib/fluent.dmp.114-32" Done. /usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -pethernet -host -alnx86 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh -cx node16.testbed-CLS:56711:56126 Server node is down or not responding See the system adminstrator about starting the server, or make sure the you're referring to the right host (see LM_LICENSE_FILE) Feature: fluent Hostname: mech1 License path: 7241@mech1:/usr/Fluent.Inc/license/lnx86/../license.dat FLEXlm error: -96,7. System Error: 11 "Resource temporarily unavailable" For further information, refer to the FLEXlm End User Manual, available at "www.macrovision.com". ______________________________ PBS Error File ______________________________ /usr/Fluent.Inc/fluent6.3.26/bin/fluent: line 2397: glxinfo: command not found /usr/Fluent.Inc/fluent6.3.26/cortex/lnx86/cortex.3.7.3 -f fluent -g -i /home/sengik/Desktop/test/input.in (fluent "2d -pethernet -host -alnx86 -r6.3.26 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh") Starting /usr/Fluent.Inc/fluent6.3.26/lnx86/2d_host/fluent.6.3.26 host -cx node16.testbed-CLS:56711:56126 "(list (rpsetvar (QUOTE parallel/function) "fluent 2d -node -alnx86 -r6.3.26 -t3 -pethernet -mpi=hp -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "3") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/usr/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) "") )" Welcome to Fluent 6.3.26 Copyright 2006 Fluent Inc. All Rights Reserved Loading "/usr/Fluent.Inc/fluent6.3.26/lib/flprim.dmp.1119-32" Done. Unexpected license problem; exiting. ______________________________________ The simple line: fluent 2d -g -ssh -t3 -cnf=<hostfile> -i /home/sengik/Desktop/test/input.in works perfectly fine. EDIT: PBS allocates the nodes correctly and is working fine. the simulation just ends when the license error occurs. Thanks in advance for the help |
|
March 10, 2010, 16:03 |
Cleaned Up the Script
|
#2 |
New Member
Ibad Kureshi
Join Date: Mar 2010
Posts: 5
Rep Power: 16 |
I just realised that the script might be misleading as it is a sample one i usually use to experiment. I have cleaned it up:
#!/bin/bash #PBS -S /bin/bash #PBS -m e #PBS -M sengik@hud.ac.uk #PBS -N fluent #PBS -l nodes=3 # #PBS -e stderr #PBS -o stdout # #PBS -V # fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in |
|
March 11, 2010, 06:37 |
|
#3 |
New Member
Ibad Kureshi
Join Date: Mar 2010
Posts: 5
Rep Power: 16 |
i was just wondering, would we need fluent-par licenses to submit a job like this through PBS?
when i submit of the commandline and specify multiple processors, it just takes away as many licenses it needs from the pool. But i was reading on another forum that through PBS fluent looks for fluent-par licenses. Is this really the case? Cant I just specify which pool it should take licenses from? |
|
June 8, 2011, 12:17 |
|
#4 | |
Senior Member
|
Quote:
Dear Mr. kureshi If you are accessing the license server on a remote machine you on windows you must sepcify its IP address in the /etc/hosts file of the linux node. I havent tried this myself but one of my friend did that. Also before that I recommend you to ping <IP remote PC> like ping 192.168.1.1 to see if your linux PC is seeing the remote one on the network? Then in the etc file as I said type 192.168.1.1 <license server name> <domain name> 192.168.1.1 node00 abc.ac.uk abc.ac.uk Then try to run a serial solver first without a job scheduler to see if its picking the license or not. If its OK go for parallel without job scheduler and then finally parallel with job scheduler. I hope it will work. |
||
June 9, 2011, 13:02 |
|
#5 |
New Member
Ibad Kureshi
Join Date: Mar 2010
Posts: 5
Rep Power: 16 |
Hi Shamoon,
This was sorted a while ago. We have a DNS server that would translate the license server name. What I had not done was enable IP forwarding on the head node of the cluster. The job scheduler makes a node the simulation controller which books out the license. So when I would run directly it would work but through the script it would not work. It was just a matter of enabling IPTABLES and setting the appropriate rules. Thanks for your input though. If you would like any help do let me know Regards Ibad Kureshi Lecturer: Department of Engineering and Technology Administrator: High Performance Computing - Resource Centre Postgraduate Researcher: School of Computing and Engineering Canal Side East 2/13 University of Huddersfield Queensgate Huddersfield HD13DH t: 01484 422288 ext 1855 |
|
June 9, 2011, 14:43 |
|
#6 | |
Senior Member
|
Quote:
Thanks Mr. Kureshi I assume that you are from Pakistan. Same as I. Well, in my case I am facing problem with Fluent 6.3. I have installed it on multiple nodes (workstations) and then when I try to run in parallel, on the head node it says after after few 1000 iterations "connection to license server lost". I don't know why is it so? I have license installed on each node because sometime i have to use them individually. I have ssh properly configured on each node plus I have password-less environment set. Remember that I do not use job scheduler. |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fluent - license problem. | Marcin | FLUENT | 3 | April 13, 2018 17:33 |
problem of running parallel Fluent on linux cluster | ivanbuz | FLUENT | 15 | September 23, 2017 20:12 |
Fluent connection with simulink by TCP/IP | Tanktruck | Fluent UDF and Scheme Programming | 1 | June 28, 2009 13:56 |
OpenFOAM vs. Fluent & CFX | marco | Main CFD Forum | 81 | March 31, 2009 15:22 |
solving ocean wave with Fluent or CFX? | gholamghar | Main CFD Forum | 1 | March 21, 2009 13:49 |