CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Parallel speedup scales better than number of CPUs

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 20, 2020, 10:00
Default Parallel speedup scales better than number of CPUs
  #1
Member
 
Mike Worth
Join Date: Jun 2019
Posts: 45
Rep Power: 7
MikeWorth is on a distinguished road
I've got a model that has a fairly low mesh cell count (~80k), with a big AMI boundary running up through it that will significantly limit how it can be decomposed. I don't have the knowledge to decide how many CPUs would be best.

As such I decided to do a quick scaling test, where I run the first 1ms of simulated time over and over with different numbers of CPUs, recording how long it took each time. I also calculated the ratio of each time to the time for 1 CPU.

I ran all of this on an AWS c5a.8xlarge machine (32 virtual CPUs, so 16 proper cores). The results are tabulated below:
Code:
CPUs	Time/s	Speedup
1	128.86	1
2	43.68	2.95
3	32.09	4.015
4	45.94	2.804
5	40.26	3.2
6	23.85	5.402
7	34.44	3.741
8	17.6	7.321
9	21.82	5.905
10	19.16	6.725
11	22.15	5.817
12	20.75	6.21
13	19.62	6.567
14	28.57	4.51
15	36.98	3.484
16	20.14	6.398
What strikes me as odd is that the 2 and 3 core results suggest a speedup more than the increased computational power - twice the cores solves in a third of the time. Have I missed something, or is there something funny going on with my approach?

I'm using scotch decomposition, and my (not very polished) script is this:
Code:
maxCpus=16 #Try all CPU counts up to this value
runLength=0.001 #How much simulated time to run for with each CPU count?

. ${WM_PROJECT_DIR:?}/bin/tools/RunFunctions        # Tutorial run functions
solver=$(getApplication)

sed -i "/^endTime/c\endTime         $runLength;" system/controlDict

./Allrun.pre

echo "cpuCount	executionTime	SpeedUp" > log.scalingTest

runApplication  $solver
executionTimeSerial=$(grep ExecutionTime log.${solver} | tail -n1 | cut -d' ' -f3)

echo "1	${executionTimeSerial}	1" >> log.scalingTest
echo "Execution Time: $executionTimeSerial s"

mv log.${solver} log.${solver}.1CPUs


for cpuCount in $(seq 2 $maxCpus)
do

  foamDictionary system/decomposeParDict -entry numberOfSubdomains -set $cpuCount
  
  runApplication decomposePar 
  
  find -maxdepth 1 -name "processor*" -type d | while read procDir
  do
      (cp include/meshModifiers.parallel $procDir/constant/polyMesh/meshModifiers)
  done

  runParallel  $solver

  executionTime=$(grep ExecutionTime log.${solver} | tail -n1 | cut -d' ' -f3)
  speedUp=$(echo "scale=3; $executionTimeSerial / $executionTime" | bc)
  
  echo "${cpuCount}	${executionTime}	${speedUp}" >> log.scalingTest
  echo "Execution Time: $executionTime s"
  echo "Speed Up (over serial): $speedUp"
  
  rm -r processor* log.decomposePar
  mv log.${solver} log.${solver}.${cpuCount}CPUs
  
  
done

sed -i '/^endTime/c\endTime         $simFinish;' system/controlDict

echo "Results:"
cat log.scalingTest
Thanks,
Mike
MikeWorth is offline   Reply With Quote

Old   August 20, 2020, 11:08
Default
  #2
Senior Member
 
Gerhard Holzinger
Join Date: Feb 2012
Location: Austria
Posts: 342
Rep Power: 28
GerhardHolzinger will become famous soon enoughGerhardHolzinger will become famous soon enough
If you plot your speed-up vs. the CPUs, then you will see an initial rise which is followed by leveling-off with quite some noise super-imposed.

Why the "noise"? Some numbers of CPUs distribute the load more favourably among the CPUs, while other numbers (one more or one less) distribute the load more unfavourably.
Attached Images
File Type: png cpu_vs_speedUp.png (15.5 KB, 36 views)
GerhardHolzinger is offline   Reply With Quote

Old   August 20, 2020, 11:29
Default
  #3
Member
 
Mike Worth
Join Date: Jun 2019
Posts: 45
Rep Power: 7
MikeWorth is on a distinguished road
The initial rise, followed by a levelling off (and after a while dropping back down) is exactly what I was expecting. The thing that threw me was the points above the x=y line that you've plotted.

Is it genuinely the case that for my setup I can expect 2xCPU to run 3 times faster than 1xCPU, or is this output a sign that I've done something silly?
MikeWorth is offline   Reply With Quote

Old   August 20, 2020, 12:35
Default
  #4
New Member
 
Wenyuan Fan
Join Date: Mar 2017
Posts: 27
Rep Power: 9
Wenyuan is on a distinguished road
Hi Mike,

Could you please run your simulations for a longer time, say, 10 ms, then calculate the time it takes for the last 1 ms for each simulation?
Wenyuan is offline   Reply With Quote

Old   August 21, 2020, 18:03
Default
  #5
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 8
pattim is on a distinguished road
Is there any auto-partitioner in OF? Sometimes it has been said that decomposition is best if the number of decomposed-partition-interconnect cells is minimal, but that may also depend on the specifics of the flow...


Thanks!


Quote:
Originally Posted by GerhardHolzinger View Post
If you plot your speed-up vs. the CPUs, then you will see an initial rise which is followed by leveling-off with quite some noise super-imposed.

Why the "noise"? Some numbers of CPUs distribute the load more favourably among the CPUs, while other numbers (one more or one less) distribute the load more unfavourably.
pattim is offline   Reply With Quote

Old   August 21, 2020, 18:30
Default
  #6
Senior Member
 
joegi
Join Date: Nov 2009
Location: genoa
Posts: 104
Rep Power: 17
joegi.geo is on a distinguished road
Super linear speed-up!!!


Look for that on the web.
joegi.geo is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[snappyHexMesh] snappyHexMesh sticking point natty_king OpenFOAM Meshing & Mesh Conversion 11 February 20, 2024 10:12
AMI speed performance danny123 OpenFOAM 21 October 24, 2020 05:13
simpleFoam parallel AndrewMortimer OpenFOAM Running, Solving & CFD 12 August 7, 2015 19:45
[blockMesh] --> foam fatal error: lillo763 OpenFOAM Meshing & Mesh Conversion 0 March 5, 2014 11:27
Problem with parallel run Hisham OpenFOAM Running, Solving & CFD 9 March 13, 2012 09:31


All times are GMT -4. The time now is 21:19.