|
[Sponsors] |
Parallel speedup scales better than number of CPUs |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
August 20, 2020, 10:00 |
Parallel speedup scales better than number of CPUs
|
#1 |
Member
Mike Worth
Join Date: Jun 2019
Posts: 45
Rep Power: 7 |
I've got a model that has a fairly low mesh cell count (~80k), with a big AMI boundary running up through it that will significantly limit how it can be decomposed. I don't have the knowledge to decide how many CPUs would be best.
As such I decided to do a quick scaling test, where I run the first 1ms of simulated time over and over with different numbers of CPUs, recording how long it took each time. I also calculated the ratio of each time to the time for 1 CPU. I ran all of this on an AWS c5a.8xlarge machine (32 virtual CPUs, so 16 proper cores). The results are tabulated below: Code:
CPUs Time/s Speedup 1 128.86 1 2 43.68 2.95 3 32.09 4.015 4 45.94 2.804 5 40.26 3.2 6 23.85 5.402 7 34.44 3.741 8 17.6 7.321 9 21.82 5.905 10 19.16 6.725 11 22.15 5.817 12 20.75 6.21 13 19.62 6.567 14 28.57 4.51 15 36.98 3.484 16 20.14 6.398 I'm using scotch decomposition, and my (not very polished) script is this: Code:
maxCpus=16 #Try all CPU counts up to this value runLength=0.001 #How much simulated time to run for with each CPU count? . ${WM_PROJECT_DIR:?}/bin/tools/RunFunctions # Tutorial run functions solver=$(getApplication) sed -i "/^endTime/c\endTime $runLength;" system/controlDict ./Allrun.pre echo "cpuCount executionTime SpeedUp" > log.scalingTest runApplication $solver executionTimeSerial=$(grep ExecutionTime log.${solver} | tail -n1 | cut -d' ' -f3) echo "1 ${executionTimeSerial} 1" >> log.scalingTest echo "Execution Time: $executionTimeSerial s" mv log.${solver} log.${solver}.1CPUs for cpuCount in $(seq 2 $maxCpus) do foamDictionary system/decomposeParDict -entry numberOfSubdomains -set $cpuCount runApplication decomposePar find -maxdepth 1 -name "processor*" -type d | while read procDir do (cp include/meshModifiers.parallel $procDir/constant/polyMesh/meshModifiers) done runParallel $solver executionTime=$(grep ExecutionTime log.${solver} | tail -n1 | cut -d' ' -f3) speedUp=$(echo "scale=3; $executionTimeSerial / $executionTime" | bc) echo "${cpuCount} ${executionTime} ${speedUp}" >> log.scalingTest echo "Execution Time: $executionTime s" echo "Speed Up (over serial): $speedUp" rm -r processor* log.decomposePar mv log.${solver} log.${solver}.${cpuCount}CPUs done sed -i '/^endTime/c\endTime $simFinish;' system/controlDict echo "Results:" cat log.scalingTest Mike |
|
August 20, 2020, 11:08 |
|
#2 |
Senior Member
Gerhard Holzinger
Join Date: Feb 2012
Location: Austria
Posts: 342
Rep Power: 28 |
If you plot your speed-up vs. the CPUs, then you will see an initial rise which is followed by leveling-off with quite some noise super-imposed.
Why the "noise"? Some numbers of CPUs distribute the load more favourably among the CPUs, while other numbers (one more or one less) distribute the load more unfavourably. |
|
August 20, 2020, 11:29 |
|
#3 |
Member
Mike Worth
Join Date: Jun 2019
Posts: 45
Rep Power: 7 |
The initial rise, followed by a levelling off (and after a while dropping back down) is exactly what I was expecting. The thing that threw me was the points above the x=y line that you've plotted.
Is it genuinely the case that for my setup I can expect 2xCPU to run 3 times faster than 1xCPU, or is this output a sign that I've done something silly? |
|
August 20, 2020, 12:35 |
|
#4 |
New Member
Wenyuan Fan
Join Date: Mar 2017
Posts: 27
Rep Power: 9 |
Hi Mike,
Could you please run your simulations for a longer time, say, 10 ms, then calculate the time it takes for the last 1 ms for each simulation? |
|
August 21, 2020, 18:03 |
|
#5 | |
Member
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 8 |
Is there any auto-partitioner in OF? Sometimes it has been said that decomposition is best if the number of decomposed-partition-interconnect cells is minimal, but that may also depend on the specifics of the flow...
Thanks! Quote:
|
||
August 21, 2020, 18:30 |
|
#6 |
Senior Member
joegi
Join Date: Nov 2009
Location: genoa
Posts: 104
Rep Power: 17 |
Super linear speed-up!!!
Look for that on the web. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[snappyHexMesh] snappyHexMesh sticking point | natty_king | OpenFOAM Meshing & Mesh Conversion | 11 | February 20, 2024 10:12 |
AMI speed performance | danny123 | OpenFOAM | 21 | October 24, 2020 05:13 |
simpleFoam parallel | AndrewMortimer | OpenFOAM Running, Solving & CFD | 12 | August 7, 2015 19:45 |
[blockMesh] --> foam fatal error: | lillo763 | OpenFOAM Meshing & Mesh Conversion | 0 | March 5, 2014 11:27 |
Problem with parallel run | Hisham | OpenFOAM Running, Solving & CFD | 9 | March 13, 2012 09:31 |