CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Slow performance on Azure cloud

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By fertinaz

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 14, 2018, 17:34
Default Slow performance on Azure cloud
  #1
New Member
 
Faraz
Join Date: Mar 2018
Posts: 25
Rep Power: 8
feacluster is on a distinguished road
I am testing out the 1.7 million cell motorbike tutorial on Azure. Long story short, it solves in 190 seconds on one 36 cpu machine ( F72s_v2 ). When I run it on two machines ( 72 cpus ), it runs in 600 seconds!

When I do top on both machines, I see 32 threads of simplefoam running at 100% . So it seems to be working correctly.

Is the network too slow for mpi ? I know I have run openfoam on a cluster with 10gigE network and it seemed to scale well..

I am also aware there are HPC specific nodes on Azure. But I want to get my feet wet with the basics before trying those.
feacluster is offline   Reply With Quote

Old   August 15, 2018, 04:46
Default
  #2
Senior Member
 
anonymous
Join Date: Jan 2016
Posts: 416
Rep Power: 14
simrego is on a distinguished road
Hi!


That 72 cores is not too much for 1.7 million cells??? You will have more processor faces than internal... What if you try with 17 million cells? (But i think 72 cores will be still a lot for 17 million cells)
simrego is offline   Reply With Quote

Old   August 15, 2018, 10:59
Default
  #3
New Member
 
Faraz
Join Date: Mar 2018
Posts: 25
Rep Power: 8
feacluster is on a distinguished road
Quote:
Originally Posted by simrego View Post
Hi!
That 72 cores is not too much for 1.7 million cells??? You will have more processor faces than internal... What if you try with 17 million cells? (But i think 72 cores will be still a lot for 17 million cells)
The same model scales well upto 160 cores on my bare-metal cluster . After 160 cores I see minimal speedup..

Last edited by feacluster; August 15, 2018 at 14:04.
feacluster is offline   Reply With Quote

Old   August 20, 2018, 13:19
Default
  #4
New Member
 
Faraz
Join Date: Mar 2018
Posts: 25
Rep Power: 8
feacluster is on a distinguished road
I ran the Ohio State benchmarks to measure latency on Azure:

http://mvapich.cse.ohio-state.edu/benchmarks/

I got around 60 micro seconds which is quite high . For our infiniband cluster I get 1-2 micro seconds. I suppose that is the reason for my slowdown when using multiple nodes?

I know people run Openfoam on Amazon aws. I am not sure how their latency numbers are. I would imagine they are in the same ballpark?
feacluster is offline   Reply With Quote

Old   August 20, 2018, 15:57
Default
  #5
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick answer: 10 GigE is known to have a considerably larger latency than Infiniband. 60 ms vs 1-2 ms sound about right.


Searching online for "azure infiniband" does return several answers, namely how to use Infiniband on Azure.
__________________
wyldckat is offline   Reply With Quote

Old   August 20, 2018, 22:54
Default
  #6
Member
 
Join Date: Jan 2015
Posts: 62
Rep Power: 11
Christophe is on a distinguished road
I have a bunch of high end workstations with a 10 GigE interconnect. Did a lot of benchmarking. Once I added a third machine to go above 50 cores there was a huge diminishing return. I also have a cluster w 56 GB infiniband and it scaled perfectly with the same model I tested on the desktops.

Would def stay away from 10 GigE if possible.

I think a lot of instances on Azure have at least 25 Gbps.
Christophe is offline   Reply With Quote

Old   August 21, 2018, 03:41
Default
  #7
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 30
akidess will become famous soon enough
Quote:
Originally Posted by Christophe View Post

Would def stay away from 10 GigE if possible.

I think a lot of instances on Azure have at least 25 Gbps.
They could have 10000 Gbps and still show bad scaling. It's the latency that counts.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   August 23, 2018, 14:41
Default
  #8
New Member
 
Faraz
Join Date: Mar 2018
Posts: 25
Rep Power: 8
feacluster is on a distinguished road
I re-ran my Openfoam benchmark on the Azure HPC compute instances ( H16r ). The perfomance scaled well, same as my bare-metal cluster.

These are 16 core machines with infiniband network. The catch is they are only compatible with Intel MPI . So you need your own cluster edition Intel compiler. I was able to compile on my bare-metal cluster and copy the executable to Azure and it ran fine..
feacluster is offline   Reply With Quote

Old   November 20, 2018, 10:21
Default
  #9
New Member
 
hypersonic
Join Date: Mar 2009
Posts: 5
Rep Power: 17
hypersonic is on a distinguished road
Hi

I'm running ANSYS on Azure (8*H16r), but can only get RDMA inifiband at 1XDRR (5Gbit)
Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page?

Thanks
hypersonic is offline   Reply With Quote

Old   November 20, 2018, 21:35
Default
  #10
New Member
 
Faraz
Join Date: Mar 2018
Posts: 25
Rep Power: 8
feacluster is on a distinguished road
Quote:
Originally Posted by hypersonic View Post
Hi

I'm running ANSYS on Azure (8*H16r), but can only get RDMA inifiband at 1XDRR (5Gbit)
Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page?

Thanks

I am not any expert in Infiniband, but I always try and check the basics first. Try running this and report the results:


http://mvapich.cse.ohio-state.edu/benchmarks/


  • osu_bw - Bandwidth Test
  • osu_latency - Latency Test
feacluster is offline   Reply With Quote

Old   November 22, 2018, 14:24
Default
  #11
Member
 
Fatih Ertinaz
Join Date: Feb 2011
Location: Istanbul
Posts: 64
Rep Power: 15
fertinaz is on a distinguished road
Hello

As Faraz has already pointed out, performance bottleneck you encounter should be due to the network latency.

Quote:
Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page?
Infiniband switches are not directly available on Azure even if you use H-type instances, afaik. Is there a report where they achieve that performance? Instead, RDMA is used which should be fine enough. There are a few things you can check:
  • Set fabrics to dapl: Assuming that you're using IntelMPI, set env-var to "ofa-v2-ib0"
  • Increase debug level to see which fabrics are being used
  • Run ifconfig: Make sure eth is utilized
  • Check rdma conf file: Grep file "/etc/rdma/dat.conf" and see if ib0 is referenced to eth

Hope this helps
wyldckat likes this.
fertinaz is offline   Reply With Quote

Old   November 25, 2019, 12:14
Default
  #12
Member
 
Jairo A. Gutiérrez S
Join Date: Nov 2014
Posts: 60
Rep Power: 12
jairoandres is on a distinguished road
I think you should limit to 50k cells per processor.
jairoandres is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Inconsistencies in reading .dat file during run time in new injection model Scram_1 OpenFOAM 0 March 23, 2018 23:29
GAMBIT + win7 = Very slow performance rieuk ANSYS Meshing & Geometry 3 September 6, 2017 03:41
Error during reconstructing lagarangian fields ybapat OpenFOAM 9 November 17, 2014 08:52
problem with solving lagrange reaction cloud Polli OpenFOAM Running, Solving & CFD 0 April 30, 2014 08:53
Running OpenFoam on a Computer Cluster in the Cloud - cloudnumbers.com Markus Schmidberger OpenFOAM Announcements from Other Sources 0 July 26, 2011 09:18


All times are GMT -4. The time now is 22:53.