|
[Sponsors] |
June 29, 2017, 09:00 |
MPI Distributed parallel
|
#1 |
Senior Member
Jiri
Join Date: Mar 2014
Posts: 221
Rep Power: 13 |
I use Ansys 18.1.
- when using IBM MPI distributed parallel on 1 computer (32 cores), iteration lasts 30 minutes - when using IBM MPI distributed parallel on 2 computer (40 cores), iteration lasts again 30 minutes. - when using IBM MPI distributed parallel on 3 computer (46 cores), iteration lasts again 30 minutes. - when using Intel MPI local parallel on 1 computer (32 cores), iteration lasts 11 minutes. The 32 core computer has got AMD processors, not intel! - using Intel MPI distributed parallel does not work. Does anyone know why IBM MPI distributed parallel lasts the same time although even 3 computers are included? Where is the problem? Thank you for any reply. |
|
June 29, 2017, 19:20 |
|
#2 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,872
Rep Power: 144 |
You need to consider if the bottleneck is:
* The network system * The computers themselves * The MPI distribution is running properly on your hardware+network * The hardware has correct versions of motherboard, BIOS, drivers * The simulation you are using is limited by CPU, hard drive, memory or something else * Other software and/or users on the machine, including virus checkers * Relative speed differences between the MPI implementations * Relative speed differences between the different machines Optimising distributed parallel simulations is a complex task. |
|
June 30, 2017, 06:43 |
|
#3 |
Senior Member
Jiri
Join Date: Mar 2014
Posts: 221
Rep Power: 13 |
Thank you for reply. Yes it is a very complex task.
I have another question: Is it possible to reasonably increase iteration time by another partitioning type? For example to use Circumferential partitioning instead of MeTiS type? |
|
July 1, 2017, 08:08 |
|
#4 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,872
Rep Power: 144 |
The ideal partition has small interface areas with other partitions as data needs to be transferred at these boundaries, so making them smaller increases parallel efficiency. So if another partitioning method generates better partition shapes than METIS then it will improve run time.
But in my experience for most cases the simulation speed differences between different partitions is minimal. The partitions need to be really bad before it makes a difference. If you have multiple domains you might consider the coupled partitioner. That takes into account the multiple domains better. Also some simulations don't like nasty features aligning with partition boundaries (eg shock waves, free surface etc). |
|
August 2, 2017, 01:26 |
|
#5 |
Member
Join Date: Jan 2015
Posts: 62
Rep Power: 11 |
I've done similar benchmark studies, distributing Cfx across multiple high end work stations linked with dual 10 Gb Ethernet. My simulation time flatlined after adding a third computer. A true cluster on Linux with no IT security software with IB 56 Gb interconnect is the way to go. Also platform MPI is worse than Intel. Fix Intel distributed parallel. Turn on verbose mode by adding -v to startmethods.ccl. See if you can find the error that way.
Sent from my iPhone using CFD Online Forum mobile app |
|
August 3, 2017, 11:49 |
|
#6 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
I've run up to 5 machines with 20 Gbps DDR infiniband, and IBM platform MPI, and seen linear speedup. Granted I was only running ~4 cores per machine, but fast cores and fast memory. (4.2 GHz CPU and Quad channel 2133 MHz RAM).
I agree that Intel MPI is default and is supposed to be better, but I could never get it to work. |
|
August 3, 2017, 13:10 |
|
#7 |
Member
Join Date: Jan 2015
Posts: 62
Rep Power: 11 |
I've had issues with Intel MPI too. Did you archive the username and PW for all the machines? Also I think startmethods.ccl had a line of code in it that was freezing things for distributed Intel that I had to delete. Unfortunately I've not found ANSYS tech support to be very helpful on any of these issues.
Sent from my iPhone using CFD Online Forum mobile app |
|
August 8, 2017, 09:04 |
|
#8 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
Thanks for the advice Christophe. I did cache a username and password on the head node. I can't remember if I did it on all nodes or not. Probably not as you don't have to with IBM MPI, only the head node. I'll make sure to cache it on at least 2 nodes, and see if those 2 work in distributed.
Actually, I believe I did get Intel MPI working, but only over TCPiB, it wouldn't work over native infiniband. I might give it another shot if I ever get less swamped at work (I wish) |
|
September 24, 2017, 08:07 |
|
#9 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
I think it's better to post my question here rather than creating a new thread.
First of all i use Ansys Mechanical (FEA), not CFX. I am trying to setup a distributed network with no result until now. Here is my configuration: Node1 Supermicro 2x Xeon X5670, 16GBs of DDR3 ECC RAM, Win10 Node 2 HP Proliant DL180 G6, 2x Xeon X5670, 96GBs of DDR3 ECC RAM, Win10 Both systems are compatible i suppose. Same CPUs, same chipset, same type of RAM (DDR3 ECC). Same OS, same installation path for Ansys and Intel MPI. Right now i have the two systems connected togheter through Gigabit Ethernet to test the connectivity. I have already ordered a pair of Infiniband 4x QDR cards and a suitable cable. I have followed all the instructions found here: https://www.sharcnet.ca/Software/Ans..._setupdan.html Regarding MPI, i installed the software on both systems and finally cached my Windows password on both systems. The Windows account name and password is the same for the 2 systems. Only computer name (host) is different. Hostname1=SUPERMICRO Hostname2=HP I populated the hostnames in ANS_ADMIN 18.1 successfully (see attached pic). I did the same on both machines! Then i go to Mechanical APDL product launcher (from SUPERMICRO machine) to run an official benchmark. See attached picture for File management tab See attached picture for File HPC tab Then i click "Run" but nothing happens. I think i have done all the necessary steps but nothing happens. Anyone can help? Thanks in advance! |
|
September 25, 2017, 10:50 |
|
#10 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
- You have to have the same working directory in both systems. make sure the full folder path is present in both systems, and solve from that directory. On all my machines I use G:\D_ANSYS\ANSYS_WD. which are my RAID SSD drives. These working directories must be shared folders, so you can remotely access them.
- You must have remote desktop privileges on all nodes. - You have to share the installation directory with all nodes, usually C:\Program Files\ANSYS Inc. - Did you make the machines type: win x64 when you configured the hosts file? - You have to allow firewall exceptions for all the correct programs. see the ansys installation documentation for a complete list. Something like all exe files in : C:\Program Files\ANSYS Inc\v181\commonfiles\MPI\IBM\9.1.4.2\winx64\bin C:\Program Files\ANSYS Inc\v181\commonfiles\MPI\IBM\9.1.4.2\winx64\sbin For CFX, solver-MPI in: C:\Program Files\ANSYS Inc\v181\CFX\bin\winnt-amd64 C:\Program Files\ANSYS Inc\v181\CFX\bin\winnt-amd64\double Allow on all networks, including public networks if your infiniband network is public. Check your working directory to see if you got any error clues. |
|
September 25, 2017, 16:30 |
|
#11 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
Hello Erik and many thanks for your help.
I've done most of what you have recommended above but no result yet. I made a trip through ANSYS license management center -> View Licensing Interconnect Log ON SUYPERMICRO machine and i got this interesting info: Unable to retrieve IP for host HP (see attached pic for the full message). I also attach a picture of my machines IP manual configuration i did. I set the SUPERMICRO's IP as 192.168.1.1 and for HP i set 192.168.1.2. I have left the DNS fields blank though... The local network works as it should. But it seems that the SUPERMICRO node cannot connect with HP node in Ansys. UPDATE NSLOOKUP ON SUPERMICRO DEFAULT SERVER: 192.168.2.1 ADDRESS: 192.168.2.1 NSLOOKUP ON HP DNS REQUEST TIMED OUT TIMEOUT WAS 2 SECONDS. DEFAULT SERVER: UNKNOWN ADDRESS: fec0:0:0:ffff::1 Weird.... but i feel i am pretty close... |
|
September 25, 2017, 17:17 |
|
#12 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
I also attach printscreens of ipconfig/all for both machines.
|
|
September 26, 2017, 11:52 |
|
#13 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
You have to do ALL of what I said, not most, these have to be done, among other things.
Did you set the environmental variable for the license path? If both machines get their license from the same license server, you don't have to do this. Otherwise you have to make the path of the slave node point to the head nodes license. Specifics are in the documentation. ANSYSLIC_DIR on slave node must point to head nodes license. Also when you specify hosts in the ansys launcher, make sure the head node is on top, or it won't work, but it looks like you did that. Try reinstalling IBM MPI, or you could try Intel MPI. |
|
September 26, 2017, 11:57 |
|
#14 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
Hi Eric, how can i set the environmental variables?
|
|
September 26, 2017, 12:02 |
|
#15 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
ANSYS170_DIR=\\head_node_machine_name\ANSYS Inc\v181\ansys
ANSYSLIC_DIR=\\head_node_machine_name\ANSYS Inc\Shared Files\Licensing Start >> Right click on computer >> properties >> Advanced system settings >> Advanced Tab >> Environmental variables - Then set your system variables. |
|
September 26, 2017, 12:07 |
|
#16 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
Maybe this helps.
I attach a screenshot of the license status page from the HP (slave) node. It says that the head node license server is down or not reporting. The inverse is happening on the head node license status. |
|
September 26, 2017, 12:32 |
|
#17 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
Can you run a local run at all? Is your license down completely?
Did you share out privileges to the ANSYS install path? Make it a shared folder. |
|
September 26, 2017, 12:43 |
|
#18 | |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
Quote:
I set the environment variables as you suggested. Installation folder is shared. Something vague is happening too. I go again to the Ansys License manager and i get this error: ERROR Unable to retrieve IP for host THIS. Host not found: THIS The host is named HP, not THIS.... It drives me crazy... |
||
September 26, 2017, 15:17 |
|
#19 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
Try using the static IPs instead of the computer names.
Turn off firewall completely during testing, just to see if it is something in the firewall blocking you. |
|
July 22, 2019, 09:58 |
parallel processing
|
#20 |
New Member
Alireza
Join Date: Jul 2019
Posts: 5
Rep Power: 7 |
1 Question:
Hello, We asked our dear friends, there are several idle systems that we want to use with our network of other active systems, 4 weighing software applications, in parallel processing and maximum processing power, to save time and money. The question is, is there any software that can handle all of the above? best regards. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
problem during mpi in server: expected Scalar, found on line 0 the word 'nan' | muth | OpenFOAM Running, Solving & CFD | 3 | August 27, 2018 05:18 |
Run Mode:Platform MPI Local Parallel core problem | mztcu | CFX | 0 | October 13, 2016 04:14 |
Explicitly filtered LES | saeedi | Main CFD Forum | 16 | October 14, 2015 12:58 |
simpleFoam parallel | AndrewMortimer | OpenFOAM Running, Solving & CFD | 12 | August 7, 2015 19:45 |
HP MPI warning...Distributed parallel processing | Peter | CFX | 10 | May 14, 2011 07:17 |