48 Core Cluster - GigE Network

ronaldalau · September 8, 2011, 12:22

I have a 48 core cluster made up of 4 servers, each with dual 6 core CPUs (Intel) on a GigE network. OS is Windows HPC 2008 R2, CFD software is Fluent v13.

When I use 24 cores on a parallel job, everything is great. CPU usage and network usage is very high. 100 iterations in 20 minutes.

When I use 36 cores, both CPU and network usage drop to near nothing, and its 6 hours for 100 iterations.

We have fixed all configuration issues, and each server is now identical in drivers and config.

Every benchmark I find published on the web for GigE stops at 24 cores. Is GigE just not capable of handling mpi between more than 24 cores?

RobertB · September 9, 2011, 14:24

We used to have a 64 core (16*4) system with Gig-E and it worked OK when running all cores if not exactly linearly (using STAR-CD at the time) I forget the exact numbers but say 75% parallel efficiency.

We now have a bigger cluster with infiband and that does scale better.

Does it matter how you distribute the 36 cores among the 48 available? It seems strange that the CPUs and network go to zero, could you have some hardware or cabling issues? Does it matter which 24 cores you pick or the machines they are on?

Do you run bonded Gig-E which would double your nominal throughput?

Do you get a choice as to which MPI you run? On the STAR series of codes the hpmpi seems to work best and is most controllable.

Are you running hyperthreading?

Some thoughts, I too hate these types of problem.

ronaldalau · September 9, 2011, 14:50

I did a series of tests previously that did what you describe, using different servers, checking server config, etc. We have 4 identical servers, each configured identically from network mappings to hardware drivers. There was one server that had to un-set hyperthreading, but that was corrected before I ran the tests.

Each server is dual socket, with 6 core Intel Xeons in each socket. Any combination of 24 cores is ok, but any combination of 32 cores is really bad.

The MPI on Windows HPC Server 2008 R2 is "msmpi", so I don't doubt that it could be the issue.

The network is capable of passing enough data. When there are 24 cores, the network is passing ~5Mb/sec from each server, but when its 32 cores, it drops to <500kb/sec. From what I have read, it has to do with latency, not bandwidth. GigE latency is on the order of 50 microseconds, and IfiniBand on the order of 1 microsecond.

I guess I'll just have to wait for Ansys Tech Support to let me know what performance they get.

September 8, 2011, 12:22	48 Core Cluster - GigE Network	#1
ronaldalau Member Ronald A. Lau Join Date: Jul 2009 Location: Chicago Posts: 30 Rep Power: 17	I have a 48 core cluster made up of 4 servers, each with dual 6 core CPUs (Intel) on a GigE network. OS is Windows HPC 2008 R2, CFD software is Fluent v13. When I use 24 cores on a parallel job, everything is great. CPU usage and network usage is very high. 100 iterations in 20 minutes. When I use 36 cores, both CPU and network usage drop to near nothing, and its 6 hours for 100 iterations. We have fixed all configuration issues, and each server is now identical in drivers and config. Every benchmark I find published on the web for GigE stops at 24 cores. Is GigE just not capable of handling mpi between more than 24 cores?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
solving a conduction problem in FLUENT using UDF	Avin2407	Fluent UDF and Scheme Programming	1	March 13, 2015 03:02
Superlinear speedup in OpenFOAM 13	msrinath80	OpenFOAM Running, Solving & CFD	18	March 3, 2015 06:36
4 core & 8 core: same 64bit XP, Xeon, x5570, 2.93GHZ	leeruoyun	CFX	4	August 19, 2009 01:47
Serial Job Jumping from Core to Core	Will	FLUENT	2	August 25, 2008 15:21
Distributed Parallel on dual core remote machine	Justin	CFX	1	February 3, 2008 18:23

September 9, 2011, 14:24		#2
RobertB Senior Member Robert Join Date: Jun 2010 Posts: 117 Rep Power: 17	We used to have a 64 core (16*4) system with Gig-E and it worked OK when running all cores if not exactly linearly (using STAR-CD at the time) I forget the exact numbers but say 75% parallel efficiency. We now have a bigger cluster with infiband and that does scale better. Does it matter how you distribute the 36 cores among the 48 available? It seems strange that the CPUs and network go to zero, could you have some hardware or cabling issues? Does it matter which 24 cores you pick or the machines they are on? Do you run bonded Gig-E which would double your nominal throughput? Do you get a choice as to which MPI you run? On the STAR series of codes the hpmpi seems to work best and is most controllable. Are you running hyperthreading? Some thoughts, I too hate these types of problem.

September 9, 2011, 14:50		#3
ronaldalau Member Ronald A. Lau Join Date: Jul 2009 Location: Chicago Posts: 30 Rep Power: 17	I did a series of tests previously that did what you describe, using different servers, checking server config, etc. We have 4 identical servers, each configured identically from network mappings to hardware drivers. There was one server that had to un-set hyperthreading, but that was corrected before I ran the tests. Each server is dual socket, with 6 core Intel Xeons in each socket. Any combination of 24 cores is ok, but any combination of 32 cores is really bad. The MPI on Windows HPC Server 2008 R2 is "msmpi", so I don't doubt that it could be the issue. The network is capable of passing enough data. When there are 24 cores, the network is passing ~5Mb/sec from each server, but when its 32 cores, it drops to <500kb/sec. From what I have read, it has to do with latency, not bandwidth. GigE latency is on the order of 50 microseconds, and IfiniBand on the order of 1 microsecond. I guess I'll just have to wait for Ansys Tech Support to let me know what performance they get.