CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

OpenFOAM - parallel raspberry pi 4 grid/cluster

Register Blogs Community New Posts Updated Threads Search

Like Tree13Likes
  • 1 Post By flotus1
  • 2 Post By arjun
  • 2 Post By Simbelmynė
  • 3 Post By flotus1
  • 3 Post By sbaffini
  • 2 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 5, 2019, 09:20
Default OpenFOAM - parallel raspberry pi 4 grid/cluster
  #1
New Member
 
Ivan
Join Date: Nov 2012
Location: Czech Republic
Posts: 22
Rep Power: 14
cfdhelp is on a distinguished road
Hello,


I would like to ask you about possibility to use OpenFOAM for parallel calculations with some raspberry pi 4 type computer.
How many raspberry pi 4 is need to have the same calculation power asi for example with Intel I7 with six cores - for example.

Or is there possibility to run OpenFOAM on some Google Coral edquipment ?


Thank you and regards,
Ivan
cfdhelp is offline   Reply With Quote

Old   November 5, 2019, 09:46
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The first question that comes to mind is: why?

It is definitely possible to build a diy cluster from single-board computers like raspberry pi. There are plenty of examples on the internet, e.g. https://www.researchgate.net/publica...rry_Pi_Cluster and https://blog.mvbakker.nl/posts/openf...raspi-cluster/
But these are not practical in any way. These projects are either for fun, or for learning purposes. If that is what you want, then go ahead
If you want performance per dollar (and much less hassle), get a normal PC instead.
Ultimately, the performance of such a cluster would be limited by the node interconnect. Without low-latency interconnects available, speedup in CFD applications like OpenFOAM will drop off rather quickly. Maximum scaling was observed on only 4 nodes here: https://krex.k-state.edu/dspace/handle/2097/17612
aero_head likes this.
flotus1 is offline   Reply With Quote

Old   December 12, 2019, 06:44
Default
  #3
New Member
 
Ahmed Eissa FRSA
Join Date: Dec 2019
Location: London, UK
Posts: 4
Rep Power: 7
ahmedeissa is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
The first question that comes to mind is: why?

It is definitely possible to build a diy cluster from single-board computers like raspberry pi. There are plenty of examples on the internet, e.g. https://www.researchgate.net/publica...rry_Pi_Cluster and https://blog.mvbakker.nl/posts/openf...raspi-cluster/
But these are not practical in any way. These projects are either for fun, or for learning purposes. If that is what you want, then go ahead
If you want performance per dollar (and much less hassle), get a normal PC instead.
Ultimately, the performance of such a cluster would be limited by the node interconnect. Without low-latency interconnects available, speedup in CFD applications like OpenFOAM will drop off rather quickly. Maximum scaling was observed on only 4 nodes here: https://krex.k-state.edu/dspace/handle/2097/17612
The research article you're referring to is very old and was on the very old Raspberry Pi... Now the Raspberry Pi comes with Quad-core CPU and 4GB of RAM and a 64bit base single board computer. So I think the answer is Yes you can do it.
ahmedeissa is offline   Reply With Quote

Old   December 12, 2019, 08:57
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I never said that it could not be done.
Instead, I gave reasons for why this is not a viable solution for a productive system. And these reasons hold true, regardless of CPU type. More theoretical FP throughput would probably not help anyway, due to memory bandwidth limitations.
flotus1 is offline   Reply With Quote

Old   December 12, 2019, 09:50
Default
  #5
New Member
 
Ahmed Eissa FRSA
Join Date: Dec 2019
Location: London, UK
Posts: 4
Rep Power: 7
ahmedeissa is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I never said that it could not be done.
Instead, I gave reasons for why this is not a viable solution for a productive system. And these reasons hold true, regardless of CPU type. More theoretical FP throughput would probably not help anyway, due to memory bandwidth limitations.
I understand your reply, but why then ORACLE had used it for their Database Centres?

Check this:
Oracle: This 1,060 Raspberry Pi supercomputer is 'world's largest Pi cluster'
URL:
https://www.zdnet.com/article/oracle...st-pi-cluster/

Please advise.
ahmedeissa is offline   Reply With Quote

Old   December 12, 2019, 09:52
Default
  #6
New Member
 
Ahmed Eissa FRSA
Join Date: Dec 2019
Location: London, UK
Posts: 4
Rep Power: 7
ahmedeissa is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I never said that it could not be done.
Instead, I gave reasons for why this is not a viable solution for a productive system. And these reasons hold true, regardless of CPU type. More theoretical FP throughput would probably not help anyway, due to memory bandwidth limitations.
And this one:

https://www.zdnet.com/article/raspbe...test-software/
ahmedeissa is offline   Reply With Quote

Old   December 12, 2019, 12:40
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Do you know which kind of codes they run on these clusters? I don't, but I can make an educated guess: codes that don't rely on low-latency node-interconnects to achieve inter-node scaling. OpenFOAM does not fall into this category.
Since you already need quite a few of these boards to consolidate a basic PC in OpenFOAM -e.g, with a Ryzen 5 3600- I stand by my assessment: a solution that works, but makes no sense from a performance or financial point of view. These solutions are showpieces or testbeds, not alternatives for conventional PC or server hardware.
flotus1 is offline   Reply With Quote

Old   December 12, 2019, 13:48
Default
  #8
New Member
 
Ahmed Eissa FRSA
Join Date: Dec 2019
Location: London, UK
Posts: 4
Rep Power: 7
ahmedeissa is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Do you know which kind of codes they run on these clusters? I don't
I do respect your answer, but:

Do you work on Databases before? Especially ORACLE Database? Have you experienced the Load Balance for this type of database?

When someone like ORACLE adopt hardware, they really did this after a thoroughly tested hardware before they announce to the world that we are using this.

ORACLE database management algorithm is very complicated and is taking a very long time on an average computer with limited CPU power, which includes searching algorithms that must return results in a fraction of seconds.

To me, this is even far complicated than OpenFOAM algorithms.
ahmedeissa is offline   Reply With Quote

Old   December 12, 2019, 14:12
Default
  #9
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,289
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by ahmedeissa View Post
I do respect your answer, but:

Do you work on Databases before? Especially ORACLE Database? Have you experienced the Load Balance for this type of database?

When someone like ORACLE adopt hardware, they really did this after a thoroughly tested hardware before they announce to the world that we are using this.

ORACLE database management algorithm is very complicated and is taking a very long time on an average computer with limited CPU power, which includes searching algorithms that must return results in a fraction of seconds.

To me, this is even far complicated than OpenFOAM algorithms.

This is first time i heard someone said a search algorithm is more complicated than a multiphysics solver that solves navier stokes.





Quote:
Originally Posted by ahmedeissa View Post

When someone like ORACLE adopt hardware, they really did this after a thoroughly tested hardware before they announce to the world that we are using this.

This statement i agree with because you are chosing openfoam on pi and you haven't done thorough investigation so yes. They have done but openfoam users do not do it.
sbaffini and aero_head like this.
arjun is offline   Reply With Quote

Old   December 12, 2019, 14:24
Default
  #10
Member
 
Mianzhi Wang
Join Date: Jan 2015
Location: Columbus, IN
Posts: 34
Rep Power: 11
wangmianzhi is on a distinguished road
let's have a $500 x86 vs Pi4 challenge! The losing side pays for both systems.
wangmianzhi is offline   Reply With Quote

Old   December 13, 2019, 01:46
Default
  #11
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,289
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by wangmianzhi View Post
let's have a $500 x86 vs Pi4 challenge! The losing side pays for both systems.



i can buy Intel E5-2670 Prozessor SR0KX, 2,60 GHz, 20 m Cache, 8-Core

for 45 euros in germany from amazon. I did.



Dual processor mainboard for 199.



I am not sure your pi can catch 16 core 32 process code that could be made around 500$
arjun is offline   Reply With Quote

Old   December 16, 2019, 11:09
Default
  #12
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 552
Rep Power: 16
Simbelmynė is on a distinguished road
A Raspberry Pi 4 is around 65-70 Euros where I live. Power supply is another 10 Euros. Then I guess some cooling would be needed if we are to run a long CFD simulation. That would add at least 10 more Euros if we can go with a passive solution. An active solution will add even more.


Some type of interconnect is needed, most likely Gigabit Ethernet + Switch perhaps?


I think you will be really hard pressed to land below 100 Euros per unit.


A Raspberry Pi 4 has a memory bandwidth of around 4 GiB/s. Assuming this will be bandwidth limited then your guesstimate of performance per Euro would be 0.04 GiB/(s*Euro)


A Ryzen 3600 will give you around 47 GiB/s and assuming you build your system for 500 Euro (I can assemble a computer without a case for 450 Euro). Then you will have a guesstimate of performance per Euro of 0.09 GiB/(s*Euro).


Conclusion: even without the performance degradation of the interconnect you can see that the Raspberry Pi 4 is almost twice as expensive as the AMD for the most important metric. Considering that the Ryzen 3600 is quite expensive, you can find even better bargains in the x86 camp (not to mention the huge stock of used components available on Ebay).


Would be fun if you benchmark a Pi 4 though and put it in the benchmark thread.
gpouliasis and aero_head like this.
Simbelmynė is offline   Reply With Quote

Old   December 16, 2019, 13:03
Default
  #13
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
But the Pi has 4 cores. And Oracle...and complisticated algorithms
arjun, gpouliasis and aero_head like this.
flotus1 is offline   Reply With Quote

Old   December 16, 2019, 14:11
Default
  #14
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,192
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Quote:
Originally Posted by ahmedeissa View Post
ORACLE database management algorithm is very complicated and is taking a very long time on an average computer with limited CPU power, which includes searching algorithms that must return results in a fraction of seconds.

To me, this is even far complicated than OpenFOAM algorithms.
We also respect your answer but, let's try to see what happens within a single iteration of a general purpose CFD solver (I don't know OpenFOAM specifically, but that's not the point).

For each set of equations intended to be solved in a coupled manner (V equations all together):
  1. Do the necessary memory allocations/deallocations, O(VN) cost, for O(N) cells per process.
  2. Exchange V independent variables in ghost cells for parallel computations. For O(N) cells for P processes, this is O(VPN^2/3) exchanges globally in the system and O(VN^2/3) exchanges for each process.
  3. Compute 3D gradients of V independent variables on O(N) cells. This involves, at least, an O(N) loop and an O(3VN) access to memory.
  4. Exchange 3D gradients of V independent variables in ghost cells for parallel computations. For O(N) cells for P processes, this is O(3VPN^2/3) exchanges globally in the system and O(3VN^2/3) exchanges for each process.
  5. Compute fluxes of V independent variables on O(3N) faces. This has a memory access footprint and cost of O(6VN). Actually, the constant here might be quite high, in some cases proportional to V^2, so you may end up with an O(NV^3) cost.
  6. Add flux jacobian to the system matrix. For a matrix in CSR format you might want to use a binary search of the column within the matrix (just to mention a place where some search might be done during iterations), but it is typically not worth the effort and the column is moe commonly found by brute force searching in the CSR structure. Adding a jacobian to a matrix for each flux then sums up to the brute force seach cost (however, it is a constant with good approximation) times an O(6V^2N) cost.
  7. Previous two points should actually be repeated on the O(N^2/3) faces of the boundary for bcs.
  8. Add source terms to the rhs (O(VN)) and their jacobians to the lhs (O(V^2N)).
  9. Actually solve the system of equations. Not going into the details of the thing here, but a typical AMG does multiple sweeps of a SGS, each one with its parallel exchange of solution variables + actually building the parallel structure for exchanging variables at the different levels it employs in that iteration. This times the number of times it is performed at each iteration.
  10. Update the solution from the system solve, an O(VN) operation.
  11. Do remaining office stuff, like monitoring quantities. Cost may vary according to the specific operation, but at leastl O(VN) operations are quite common here, followed by O(Log(P)) operations globally for the reduction on P processes.
EndFor

This is performed multiple times per time step for unsteady cases. Also, at some point, you want to write your solution on disk (P processes writing O(VPN) entries on file).

Typical industrial applications may have PN=O(10^6-10^9)

Now, I know that databases are an important part of the modern world, but I don't get how retrieving an entry from whatever data structure it is stored in can be more delicate, complicate or compute/memory intensive then the process described above. In the worst case it is a single O(PN) operation, but probably it is an O(Log(P)Log(N)) one.
arjun, gpouliasis and aero_head like this.

Last edited by sbaffini; December 17, 2019 at 05:36.
sbaffini is offline   Reply With Quote

Old   March 19, 2021, 05:26
Default Have you seen this?
  #15
New Member
 
Diego
Join Date: Dec 2019
Location: Cardiff
Posts: 9
Rep Power: 7
Hip2BL7 is on a distinguished road
Hi guys,

I'm wondering what you think of this:

8-stack picluster outperforming various laptops

I have some experience in using HPCs for OpenFOAM calculations at my university, but purely from an academic point of view so I'd like to know your thoughts are on hardware.

Hip2BL7 is offline   Reply With Quote

Old   March 19, 2021, 14:32
Default
  #16
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
As is stated in the first 2 minutes of the video: this cluster of SBCs is first and foremost a learning tool.
And the code that was run could not be further from a FV CFD code. Low optimization, computationally intensive and virtually zero communication between threads/nodes (i.e. embarrassingly parallel).
arjun and aero_head like this.
flotus1 is offline   Reply With Quote

Old   May 3, 2021, 00:35
Default not worth it
  #17
New Member
 
MASc Student
Join Date: Sep 2016
Posts: 25
Rep Power: 10
New-to-CFD is on a distinguished road
lol I can confirm that I have done exactly this and built a 6node + 1master RPi3 cluster and ran OpenFOAM on it and it was fun but its a toy..and I doubt the 4's would be much better
New-to-CFD is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Frequently Asked Questions about Installing OpenFOAM wyldckat OpenFOAM Installation 3 November 14, 2023 12:58
Openfoam parallel calculation performance study - Half performance on mpirun Jurado OpenFOAM Running, Solving & CFD 22 March 24, 2018 21:40
[Commercial meshers] OpenFoam Mesh to Fluent Mesh in parallel case DominicTNC OpenFOAM Meshing & Mesh Conversion 3 November 22, 2017 10:19
OpenFoam parallel on 2 computers : Cannot find file "points" Blue8655 OpenFOAM Running, Solving & CFD 1 June 3, 2015 22:59
Suggestion for a new sub-forum at OpenFOAM's Forum wyldckat Site Help, Feedback & Discussions 20 October 28, 2014 10:04


All times are GMT -4. The time now is 05:45.