Scalability of OpenFOAM

PeterShi · December 20, 2018, 22:32

Hi all,

What is your experience on the Scalability of OpenFOAM? Is it scalable up to like more than 1,000 CPUs for a large-scale simulation?

Thank you in advance.

Best regards,
Peter

Cyp · December 21, 2018, 04:56

Hi Peter,

I will say it depends on the kind of simulations you are interested in. You can have a look at these papers for solver-specific scalability analysis:

https://www.sciencedirect.com/science/article/pii/S0010465514003403

https://www.sciencedirect.com/science/article/pii/S0010465514002719

https://link.springer.com/article/10.1007/s11242-015-0458-0

Cheers,
Cyprien

PeterShi · December 21, 2018, 12:17

Quote:

Originally Posted by Cyp

Hi Peter,

I will say it depends on the kind of simulations you are interested in. You can have a look at these papers for solver-specific scalability analysis:

https://www.sciencedirect.com/science/article/pii/S0010465514003403

https://www.sciencedirect.com/science/article/pii/S0010465514002719

https://link.springer.com/article/10.1007/s11242-015-0458-0

Cheers,
Cyprien

Hi Cyprien

Your papers are very useful. Thank you.

Best regards,
Peter

x86_64leon · August 26, 2019, 05:30

Hi everybody

Has anyone already measured the intranode scalability for OpenFOAM ?

We have tried to do this on 2 different machines and obtained the same kind of trend. I've attached the results on skylake processors as a pdf. Runs are done inside a single node that contains 48 cores. This is thus showing only intranode scalability. Note that these tests have been performed by varying the number of processors in decomposeParDict AND by reserving each time a full node of 48 cores so that no other process can perturbate the computations.

intranode scalability is a very important measure of performance. It will show the efficiency of OpenFOAM on a give hardware architecture.

We have tested different solvers (interFoam, interIsoFoam, pimpleFoam and simpleFoam) and different sizes of grids. The pdf curve attached to this post shows :

We seldom reach the theoretical speedup line, and never when the node is 100% charged.
When the node is full, we reach somehow about 50% of the theoretical speedup This is a bit disappointing.
Some solvers experience a very wiggly behaviour when the number of cores is varied, like the GAMG interFoam (black empty triangles)
The PCG linear solver gives smoothers curves than GAMG.

Any comments or share of experience are very welcome

Best

flotus1 · August 26, 2019, 08:01

There are quite a few results here, albeit only for a single case: OpenFOAM benchmarks on various hardware
Less-than-ideal strong intra-node scaling is to be expected and says nothing about the quality of the parallel implementation. CFD solvers with unstructured meshes all share the same problem: they run into a memory bandwidth bottleneck beyond 2-3 cores per memory channel.

PeterShi · August 26, 2019, 13:17

Quote:

Originally Posted by x86_64leon

Hi everybody

Has anyone already measured the intranode scalability for OpenFOAM ?

We have tried to do this on 2 different machines and obtained the same kind of trend. I've attached the results on skylake processors as a pdf. Runs are done inside a single node that contains 48 cores. This is thus showing only intranode scalability. Note that these tests have been performed by varying the number of processors in decomposeParDict AND by reserving each time a full node of 48 cores so that no other process can perturbate the computations.

intranode scalability is a very important measure of performance. It will show the efficiency of OpenFOAM on a give hardware architecture.

We have tested different solvers (interFoam, interIsoFoam, pimpleFoam and simpleFoam) and different sizes of grids. The pdf curve attached to this post shows :

We seldom reach the theoretical speedup line, and never when the node is 100% charged.
When the node is full, we reach somehow about 50% of the theoretical speedup This is a bit disappointing.
Some solvers experience a very wiggly behaviour when the number of cores is varied, like the GAMG interFoam (black empty triangles)
The PCG linear solver gives smoothers curves than GAMG.

Any comments or share of experience are very welcome

Best

Hello Lionel,

Thank you for sharing. While you conducted small-scale tests, I tested OpenFOAM scalability with the number of CPU varies from 512 to 4096 using KNL nodes. The speedup from 512 to 4096 is supposed to be 8, however the reality is a little below 4. Thus, the corresponding efficiency is below 50%. My mesh has 12 million cells and the solver I used is simpleFoam.

Hope it helps.

Best,
Peter

x86_64leon · August 26, 2019, 17:28

Hi Peter,

What you have observed is absolutely normal when you put too many cores. We have also measured that. There is in fact an optimum in terms of number of cells per core. If there are too many cells per core, then you have not enough parallelized you run and you have bad performance. If you put too many cores, then you are too much parallelized and you spend all your time in communications ... so that you also have bad performances.

I join a curve plotting the global CPU time per cell per iteration plotted against the number of cells per core. You will see that there is an optimal range ... too much parallelization (on the left of the curve) and you get bad performance. Not enough parallelization (on the right of the curve) and you also get bad performance, although the difference is less important.

However, be careful, as this (like also speedup in extra nodes) does not show anything about the performances INSIDE a node, which was at the origin of my question.

Just to go a bit further, with 12 million cells on 4096 cores, you will only get around 3000 cells per core ... you are then too much parallelized in your case, so that performance goes down. This explain your 50% break of performance.

With 2 million cells on 48 cores, I will get a minimum of 41667 cell per core. So, I'm still in the good performance range !

Best

x86_64leon · August 26, 2019, 17:29

With the file ....

PeterShi · August 26, 2019, 17:41

Quote:

Originally Posted by x86_64leon

With the file ....

Hello Lionel,

You are absolutely right. I do realize there should be a lower bound of cells per CPU for the best performance. In my case, I do not think I will go above 2048 CPUs.

Not sure if you know an open-source CFD solver called Nek5000. It is highly scalable up to millions of CPUs, as long as the number of elements exceeds a certain value (~60).

Best regards,

joshwilliams · August 16, 2022, 03:52

Does anyone have any idea how well OpenFOAM scales for particle-laden flows? For example, I am aware there is a lower bound for number of cores/CPU for parallelisation efficiency. When there is a low number of particles, one can assume that the parallelisation efficiency would be fairly similar.

I am interested in cases with not many cells (maybe between 500k and 2 million), but large number of particles (lets say, upwards of 50 million, reaching perhaps a maximum of one billion). Then, is the parallelisation mainly dominated by number of particles / CPU? Or is it a combination of number of particles and number of cells per CPU?

Thanks,
Josh

tomf · August 17, 2022, 04:52

Hi Josh,

Just a few thoughts from my side, I am also keen to learn more.

My experience with Lagrangian particles is mainly in running (ico)UncoupledKinematicParcelFoam where there is no update on the fluid side, so there the scalability is only related to the number of particles. The main issue I found was that there is typically a clustering of particles in just a few cells, so that limits scalability. I am not sure if load balancing can be achieved based on the locations of the particles. As the solver ran pretty quickly without the fluid side updating it was not really worth my effort to optimize.

For coupled approaches there will be a balance between the fluid field (how complicated is the model: reactions/heat exchange/turbulence may all influence the amount of time spent in the fluid part) and the number of particles per processor and their distribution.

There may be optimization in using the collated decomposition method, but I never tested that. Furthermore there could be manual decomposition to have more processors clustered around the area with a lot of particles and less on other areas?

Cheers,
Tom

joshwilliams · August 17, 2022, 12:57

Hi Tom,

Yes, the issue you describe is one I also experience. In my simulations, all of the particles are clustered at the inlet for around 20% of the total run time, then they begin to disperse evenly throughout the domain and eventually exit the outlets. For me, dynamic load balancing would be excellent, but it is not available except for a few github repositories focused on specific problems.

We have recently gained some funding to perform large-scale simulations on cloud computing HPC, where we aim to simulate number of particles in the order of 100 million and maybe upwards. Hopefully, I get some interesting results to share with the community.

Best,
Josh

dlahaye · August 18, 2022, 14:58

Are these useful?

https://exafoam.eu/wp5/

https://prace-ri.eu/wp-content/uploa...-Framework.pdf

joshwilliams · August 19, 2022, 13:49

Quote:

Originally Posted by dlahaye

Are these useful?

https://exafoam.eu/wp5/

https://prace-ri.eu/wp-content/uploa...-Framework.pdf

Thanks, Domenico. The exaFoam one is helpful. I am eagerly awaiting publications from the project! I think one of the main target areas is linear equation solvers (hence the PETSc paper). I think with modifications to the code, it can be made much easier to implement on varying heterogeneous architectures. I am not sure how much they are doing for Lagrangian particle tracking, but like I said, I am very interested in any upcoming results on this from the project!

December 20, 2018, 22:32	Scalability of OpenFOAM	#1
PeterShi Senior Member Peter Shi Join Date: Feb 2017 Location: Davis Posts: 102 Rep Power: 9	Hi all, What is your experience on the Scalability of OpenFOAM? Is it scalable up to like more than 1,000 CPUs for a large-scale simulation? Thank you in advance. Best regards, Peter

December 21, 2018, 04:56		#2
Cyp Senior Member Cyprien Join Date: Feb 2010 Location: Stanford University Posts: 299 Rep Power: 18	Hi Peter, I will say it depends on the kind of simulations you are interested in. You can have a look at these papers for solver-specific scalability analysis: https://www.sciencedirect.com/science/article/pii/S0010465514003403 https://www.sciencedirect.com/science/article/pii/S0010465514002719 https://link.springer.com/article/10.1007/s11242-015-0458-0 Cheers, Cyprien __________________ www.cypriensoulaine.com/openfoam

August 26, 2019, 08:01		#5
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	There are quite a few results here, albeit only for a single case: OpenFOAM benchmarks on various hardware Less-than-ideal strong intra-node scaling is to be expected and says nothing about the quality of the parallel implementation. CFD solvers with unstructured meshes all share the same problem: they run into a memory bandwidth bottleneck beyond 2-3 cores per memory channel. Santiago likes this.

August 17, 2022, 12:57		#12
joshwilliams Senior Member Josh Williams Join Date: Feb 2021 Location: Scotland Posts: 113 Rep Power: 5	Hi Tom, Yes, the issue you describe is one I also experience. In my simulations, all of the particles are clustered at the inlet for around 20% of the total run time, then they begin to disperse evenly throughout the domain and eventually exit the outlets. For me, dynamic load balancing would be excellent, but it is not available except for a few github repositories focused on specific problems. We have recently gained some funding to perform large-scale simulations on cloud computing HPC, where we aim to simulate number of particles in the order of 100 million and maybe upwards. Hopefully, I get some interesting results to share with the community. Best, Josh tomf likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OpenFOAM Foundation releases OpenFOAM® 3.0.0	CFDFoundation	OpenFOAM Announcements from OpenFOAM Foundation	1	November 7, 2015 16:16
OpenFOAM Training, London, Chicago, Munich, Sep-Oct 2015	cfd.direct	OpenFOAM Announcements from Other Sources	2	August 31, 2015 14:36
OpenFOAM Foundation releases OpenFOAM 2.2.2	opencfd	OpenFOAM Announcements from ESI-OpenCFD	0	October 14, 2013 08:18
Cross-compiling OpenFOAM 1.7.0 on Linux for Windows 32 and 64bits with Mingw-w64	wyldckat	OpenFOAM Announcements from Other Sources	3	September 8, 2010 07:25
The OpenFOAM extensions project	mbeaudoin	OpenFOAM	16	October 9, 2007 10:33

August 26, 2019, 17:28		#7
x86_64leon New Member Lionel GAMET Join Date: Nov 2013 Location: Lyon Posts: 20 Rep Power: 13	Hi Peter, What you have observed is absolutely normal when you put too many cores. We have also measured that. There is in fact an optimum in terms of number of cells per core. If there are too many cells per core, then you have not enough parallelized you run and you have bad performance. If you put too many cores, then you are too much parallelized and you spend all your time in communications ... so that you also have bad performances. I join a curve plotting the global CPU time per cell per iteration plotted against the number of cells per core. You will see that there is an optimal range ... too much parallelization (on the left of the curve) and you get bad performance. Not enough parallelization (on the right of the curve) and you also get bad performance, although the difference is less important. However, be careful, as this (like also speedup in extra nodes) does not show anything about the performances INSIDE a node, which was at the origin of my question. Just to go a bit further, with 12 million cells on 4096 cores, you will only get around 3000 cells per core ... you are then too much parallelized in your case, so that performance goes down. This explain your 50% break of performance. With 2 million cells on 48 cores, I will get a minimum of 41667 cell per core. So, I'm still in the good performance range ! Best

August 16, 2022, 03:52		#10
joshwilliams Senior Member Josh Williams Join Date: Feb 2021 Location: Scotland Posts: 113 Rep Power: 5	Does anyone have any idea how well OpenFOAM scales for particle-laden flows? For example, I am aware there is a lower bound for number of cores/CPU for parallelisation efficiency. When there is a low number of particles, one can assume that the parallelisation efficiency would be fairly similar. I am interested in cases with not many cells (maybe between 500k and 2 million), but large number of particles (lets say, upwards of 50 million, reaching perhaps a maximum of one billion). Then, is the parallelisation mainly dominated by number of particles / CPU? Or is it a combination of number of particles and number of cells per CPU? Thanks, Josh

August 17, 2022, 04:52		#11
tomf Senior Member Tom Fahner Join Date: Mar 2009 Location: Breda, Netherlands Posts: 647 Rep Power: 32	Hi Josh, Just a few thoughts from my side, I am also keen to learn more. My experience with Lagrangian particles is mainly in running (ico)UncoupledKinematicParcelFoam where there is no update on the fluid side, so there the scalability is only related to the number of particles. The main issue I found was that there is typically a clustering of particles in just a few cells, so that limits scalability. I am not sure if load balancing can be achieved based on the locations of the particles. As the solver ran pretty quickly without the fluid side updating it was not really worth my effort to optimize. For coupled approaches there will be a balance between the fluid field (how complicated is the model: reactions/heat exchange/turbulence may all influence the amount of time spent in the fluid part) and the number of particles per processor and their distribution. There may be optimization in using the collated decomposition method, but I never tested that. Furthermore there could be manual decomposition to have more processors clustered around the area with a lot of particles and less on other areas? Cheers, Tom

August 18, 2022, 14:58		#13
dlahaye Senior Member Domenico Lahaye Join Date: Dec 2013 Posts: 802 Blog Entries: 1 Rep Power: 18	Are these useful? https://exafoam.eu/wp5/ https://prace-ri.eu/wp-content/uploa...-Framework.pdf