CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > CFX

CFX scalability with MPI

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   December 8, 2016, 16:02
Default CFX scalability with MPI
  #1
New Member
 
Join Date: Jan 2015
Posts: 29
Rep Power: 11
evan247 is on a distinguished road
For those who have been running CFX with 'local parallel MPI' options: how well CFX is doing in terms of scalability?

I'm currently using a Xeon E5v3 workstation with 2 CPUs, each with 8 cores (or 16 with hyper-threading). My experience with running single stage turbine steady simulation with mesh size from 2m up to 8m is that 16 cores is barely faster than 8 cores, while 8 cores is ~60% faster than 4 cores. The speed up for 8 cores is roughly 6 times compared to serial mode.

I'm just wondering if ppl experience a similar level of scalability, and any general advice to improve the speed up when running with more partitions? Later on I need to run URANS with ~40m mesh and it'd be good to keep the running time down.

So far what I could think of is to keep the job on the same socket i.e. using 1 CPU to avoid communication cost between sockets.
evan247 is offline   Reply With Quote

Old   December 8, 2016, 17:58
Default
  #2
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,872
Rep Power: 144
ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice
* Turn hyperthreading off. It does not help CFX simulations, you need to use physical cores.
* Xeon workstations will loose considerable parallel performance due to bottlenecks in things like the memory bus. If you report a 6x speedup on 8 cores then you are doing pretty well, this is about as good as you are going to get.
* Distributed parallel speedup is better. This is because you have multiple cores, but also multiple memory busses and all the other stuff.
* But at around 8 to 16 cores on distributed parallel you will start to have scaling problems if you are using ethernet. You will need to consider high speed interconnects like infiniband.
* If you are looking at large systems (a few hundred cores or more) then the design of these systems is very complex. To get good performance you need to carefully design many factors. You can't just buy lots of workstations and hook them up - your speedup will be terrible. A big investment like this will require careful design and testing to ensure it works well.

Note that none of my comments above mention CFX. These factors are common for any software running on multiple cores, so the issue is not unique to CFX.
ghorrocks is online now   Reply With Quote

Old   December 8, 2016, 18:33
Default
  #3
New Member
 
Join Date: Jan 2015
Posts: 29
Rep Power: 11
evan247 is on a distinguished road
Quote:
Originally Posted by ghorrocks View Post
.* But at around 8 to 16 cores on distributed parallel you will start to have scaling problems if you are using ethernet. You will need to consider high speed interconnects like infiniband.
Thanks Glenn - did you mean I would have scaling problem once the total # of cores on all distributed systems exceed 16, or 16 per computer?
evan247 is offline   Reply With Quote

Old   December 9, 2016, 01:14
Default
  #4
Senior Member
 
Join Date: Feb 2011
Posts: 496
Rep Power: 18
Antanas is on a distinguished road
Quote:
Originally Posted by evan247 View Post
Thanks Glenn - did you mean I would have scaling problem once the total # of cores on all distributed systems exceed 16, or 16 per computer?
There is section 16.4 in CFX Modelling Guide with advices on using CFX in parallel.
Antanas is offline   Reply With Quote

Old   December 9, 2016, 01:57
Default
  #5
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,872
Rep Power: 144
ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice
Let me clarify: I would expect that a distributed parallel run, with 16 partitions (as either 16 nodes x 1 partition per node OR 8 nodes x 2 partitions per node) would start to slow down unless you have a high speed interconnect.

Or another way: I would expect a distributed parallel run with 8 partitions as 2 nodes with 4 partitions per node to start slowing down on ethernet as the network speed will be the bottleneck here.

Disclaimer: It has been a few years since I did parallel benchmarks so my rules of thumb might be a bit out of date.
ghorrocks is online now   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
CFX vs. FLUENT turbo CFX 4 April 13, 2021 09:08
Problem running cfx on hpc beyonder1 CFX 4 September 14, 2015 03:35
MPI code on multiple nodes, scalability and best practice t.teschner Hardware 0 October 7, 2014 06:07
CFX pressure in Simulations problem nasdak CFX 1 April 14, 2010 14:22
PhD using CFX Rui CFX 9 May 28, 2007 06:59


All times are GMT -4. The time now is 22:30.