Slow performance on Azure cloud

feacluster · August 14, 2018, 17:34

I am testing out the 1.7 million cell motorbike tutorial on Azure. Long story short, it solves in 190 seconds on one 36 cpu machine ( F72s_v2 ). When I run it on two machines ( 72 cpus ), it runs in 600 seconds!

When I do top on both machines, I see 32 threads of simplefoam running at 100% . So it seems to be working correctly.

Is the network too slow for mpi ? I know I have run openfoam on a cluster with 10gigE network and it seemed to scale well..

I am also aware there are HPC specific nodes on Azure. But I want to get my feet wet with the basics before trying those.

simrego · August 15, 2018, 04:46

Hi!

That 72 cores is not too much for 1.7 million cells??? You will have more processor faces than internal... What if you try with 17 million cells? (But i think 72 cores will be still a lot for 17 million cells)

feacluster · August 15, 2018, 10:59

Quote:

Originally Posted by simrego

Hi!
That 72 cores is not too much for 1.7 million cells??? You will have more processor faces than internal... What if you try with 17 million cells? (But i think 72 cores will be still a lot for 17 million cells)

The same model scales well upto 160 cores on my bare-metal cluster . After 160 cores I see minimal speedup..

feacluster · August 20, 2018, 13:19

I ran the Ohio State benchmarks to measure latency on Azure:

http://mvapich.cse.ohio-state.edu/benchmarks/

I got around 60 micro seconds which is quite high . For our infiniband cluster I get 1-2 micro seconds. I suppose that is the reason for my slowdown when using multiple nodes?

I know people run Openfoam on Amazon aws. I am not sure how their latency numbers are. I would imagine they are in the same ballpark?

wyldckat · August 20, 2018, 15:57

Quick answer: 10 GigE is known to have a considerably larger latency than Infiniband. 60 ms vs 1-2 ms sound about right.

Searching online for "azure infiniband" does return several answers, namely how to use Infiniband on Azure.

Christophe · August 20, 2018, 22:54

I have a bunch of high end workstations with a 10 GigE interconnect. Did a lot of benchmarking. Once I added a third machine to go above 50 cores there was a huge diminishing return. I also have a cluster w 56 GB infiniband and it scaled perfectly with the same model I tested on the desktops.

Would def stay away from 10 GigE if possible.

I think a lot of instances on Azure have at least 25 Gbps.

akidess · August 21, 2018, 03:41

Quote:

Originally Posted by Christophe

Would def stay away from 10 GigE if possible.

I think a lot of instances on Azure have at least 25 Gbps.

They could have 10000 Gbps and still show bad scaling. It's the latency that counts.

feacluster · August 23, 2018, 14:41

I re-ran my Openfoam benchmark on the Azure HPC compute instances ( H16r ). The perfomance scaled well, same as my bare-metal cluster.

These are 16 core machines with infiniband network. The catch is they are only compatible with Intel MPI . So you need your own cluster edition Intel compiler. I was able to compile on my bare-metal cluster and copy the executable to Azure and it ran fine..

hypersonic · November 20, 2018, 10:21

Hi

I'm running ANSYS on Azure (8*H16r), but can only get RDMA inifiband at 1XDRR (5Gbit)
Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page?

Thanks

feacluster · November 20, 2018, 21:35

Quote:

Originally Posted by hypersonic

Hi

I'm running ANSYS on Azure (8*H16r), but can only get RDMA inifiband at 1XDRR (5Gbit)
Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page?

Thanks

I am not any expert in Infiniband, but I always try and check the basics first. Try running this and report the results:

http://mvapich.cse.ohio-state.edu/benchmarks/

osu_bw - Bandwidth Test
osu_latency - Latency Test

fertinaz · November 22, 2018, 14:24

Hello

As Faraz has already pointed out, performance bottleneck you encounter should be due to the network latency.

Quote:

Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page?

Infiniband switches are not directly available on Azure even if you use H-type instances, afaik. Is there a report where they achieve that performance? Instead, RDMA is used which should be fine enough. There are a few things you can check:

Set fabrics to dapl: Assuming that you're using IntelMPI, set env-var to "ofa-v2-ib0"
Increase debug level to see which fabrics are being used
Run ifconfig: Make sure eth is utilized
Check rdma conf file: Grep file "/etc/rdma/dat.conf" and see if ib0 is referenced to eth

Hope this helps

jairoandres · November 25, 2019, 12:14

I think you should limit to 50k cells per processor.

August 14, 2018, 17:34	Slow performance on Azure cloud	#1
feacluster New Member Faraz Join Date: Mar 2018 Posts: 25 Rep Power: 8	I am testing out the 1.7 million cell motorbike tutorial on Azure. Long story short, it solves in 190 seconds on one 36 cpu machine ( F72s_v2 ). When I run it on two machines ( 72 cpus ), it runs in 600 seconds! When I do top on both machines, I see 32 threads of simplefoam running at 100% . So it seems to be working correctly. Is the network too slow for mpi ? I know I have run openfoam on a cluster with 10gigE network and it seemed to scale well.. I am also aware there are HPC specific nodes on Azure. But I want to get my feet wet with the basics before trying those.

August 20, 2018, 15:57		#5
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Quick answer: 10 GigE is known to have a considerably larger latency than Infiniband. 60 ms vs 1-2 ms sound about right. Searching online for "azure infiniband" does return several answers, namely how to use Infiniband on Azure. __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Inconsistencies in reading .dat file during run time in new injection model	Scram_1	OpenFOAM	0	March 23, 2018 23:29
GAMBIT + win7 = Very slow performance	rieuk	ANSYS Meshing & Geometry	3	September 6, 2017 03:41
Error during reconstructing lagarangian fields	ybapat	OpenFOAM	9	November 17, 2014 08:52
problem with solving lagrange reaction cloud	Polli	OpenFOAM Running, Solving & CFD	0	April 30, 2014 08:53
Running OpenFoam on a Computer Cluster in the Cloud - cloudnumbers.com	Markus Schmidberger	OpenFOAM Announcements from Other Sources	0	July 26, 2011 09:18

August 15, 2018, 04:46		#2
simrego Senior Member anonymous Join Date: Jan 2016 Posts: 416 Rep Power: 14	Hi! That 72 cores is not too much for 1.7 million cells??? You will have more processor faces than internal... What if you try with 17 million cells? (But i think 72 cores will be still a lot for 17 million cells)

August 20, 2018, 13:19		#4
feacluster New Member Faraz Join Date: Mar 2018 Posts: 25 Rep Power: 8	I ran the Ohio State benchmarks to measure latency on Azure: http://mvapich.cse.ohio-state.edu/benchmarks/ I got around 60 micro seconds which is quite high . For our infiniband cluster I get 1-2 micro seconds. I suppose that is the reason for my slowdown when using multiple nodes? I know people run Openfoam on Amazon aws. I am not sure how their latency numbers are. I would imagine they are in the same ballpark?

August 20, 2018, 22:54		#6
Christophe Member Join Date: Jan 2015 Posts: 62 Rep Power: 11	I have a bunch of high end workstations with a 10 GigE interconnect. Did a lot of benchmarking. Once I added a third machine to go above 50 cores there was a huge diminishing return. I also have a cluster w 56 GB infiniband and it scaled perfectly with the same model I tested on the desktops. Would def stay away from 10 GigE if possible. I think a lot of instances on Azure have at least 25 Gbps.

August 23, 2018, 14:41		#8
feacluster New Member Faraz Join Date: Mar 2018 Posts: 25 Rep Power: 8	I re-ran my Openfoam benchmark on the Azure HPC compute instances ( H16r ). The perfomance scaled well, same as my bare-metal cluster. These are 16 core machines with infiniband network. The catch is they are only compatible with Intel MPI . So you need your own cluster edition Intel compiler. I was able to compile on my bare-metal cluster and copy the executable to Azure and it ran fine..

November 20, 2018, 10:21		#9
hypersonic New Member hypersonic Join Date: Mar 2009 Posts: 5 Rep Power: 17	Hi I'm running ANSYS on Azure (8*H16r), but can only get RDMA inifiband at 1XDRR (5Gbit) Anyone had any luck getting 4XFDR (56Gbit) to work with Intel MPI - as they say is possible on the Azure HPC page? Thanks

November 25, 2019, 12:14		#12
jairoandres Member Jairo A. Gutiérrez S Join Date: Nov 2014 Posts: 60 Rep Power: 12	I think you should limit to 50k cells per processor.