Foam-extend 3.1 cuda solver solve time

phandy · October 21, 2015, 17:11

I've just compiled the cuda solver for foam-extend 3.1. I'm using a Quadro k4000. At much length, I was able to successfully run a few cases with the cuda solver, but it has been an order of magnitude slower that a single core cpu. Perhaps it is only because this requires more relaxation iterations, but I've not been able to get anything faster out of it. I've attached the simple icoFoam cavity case that I modified to run on GPU, in hopes that someone can help me figure out how to get some use out of the cuda solver.

Cheers.

chegdan · October 21, 2015, 21:14

Paul,

If you are really running on this case, the mesh is really small....too small to see any benefit. Now, if you really bumped up the cell count, then you will start to see some action. I would also play with the preconditioners once you start getting a sizable mesh. Lastly, when you get up in cell-counts then transient cases tend to not be a good choice on GPUs using this method because it will spend a lot of time moving data around.

phandy · October 22, 2015, 10:10

Thanks for the reply, Dan. I have some cases that I want to run in the future which have much higher cell-counts than the example provided. So GPU solving doesn't work as well on transient cases as it does on steady-state? That's news to me.

chegdan · October 22, 2015, 10:31

Quote:

Originally Posted by phandy

Thanks for the reply, Dan. I have some cases that I want to run in the future which have much higher cell-counts than the example provided. So GPU solving doesn't work as well on transient cases as it does on steady-state? That's news to me.

Its not that they don't work well, its the nature of the acceleration. The GPU only solves the inner iterations i.e. the Ax=b system. FOAM will build the coefficient matrix, along with x and b and then throw that over to the GPU. If this is a rather large amount of data then this will be the bottleneck of the operation. This process of building Ax=b, moving to the GPU, solving, passing x back....rinse and repeat can be done efficiently but you need to minimize the bottlenecks. So, if you are doing many inner iterations then it is a great tool. If you are doing a few inner iterations and then taking many outer iterations (time steps or solver PIMPLE/PISO/SIMPLE iterations) then it will be slow. So, ideally, this is great for problems were very deep convergence of inner iterations are a must.

Now, there is an effort to offload the entire PIMPLE/PISO/SIMPLE algorithm onto the GPU to reduce this bottleneck and it has shown some promise. But at present, the hybrid computing approach currently in use with FOAM has its issues. If you are a developer....Im happy to revive it and do some more development work. I haven't touched this since Grad school (cufflink project) and another user moved this into foam-extend.....but we can definitely revive this. I just moved it to github about 5 minutes ago.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[OpenFOAM] Take derivative of mean velocity in paraFoam	hiuluom	ParaView	13	April 26, 2016 07:44
dynamic Mesh is faster than MRF????	sharonyue	OpenFOAM Running, Solving & CFD	14	August 26, 2013 08:47
Upgraded from Karmic Koala 9.10 to Lucid Lynx10.04.3	bookie56	OpenFOAM Installation	8	August 13, 2011 05:03
[blockMesh] BlockMesh FOAM warning	gaottino	OpenFOAM Meshing & Mesh Conversion	7	July 19, 2010 15:11
[blockMesh] Axisymmetrical mesh	Rasmus Gjesing (Gjesing)	OpenFOAM Meshing & Mesh Conversion	10	April 2, 2007 15:00

October 21, 2015, 21:14		#2
chegdan Senior Member Daniel P. Combest Join Date: Mar 2009 Location: St. Louis, USA Posts: 621 Rep Power: 0	Paul, If you are really running on this case, the mesh is really small....too small to see any benefit. Now, if you really bumped up the cell count, then you will start to see some action. I would also play with the preconditioners once you start getting a sizable mesh. Lastly, when you get up in cell-counts then transient cases tend to not be a good choice on GPUs using this method because it will spend a lot of time moving data around.

October 22, 2015, 10:10		#3
phandy New Member Paul Handy Join Date: Sep 2014 Location: Idaho, USA Posts: 21 Rep Power: 12	Thanks for the reply, Dan. I have some cases that I want to run in the future which have much higher cell-counts than the example provided. So GPU solving doesn't work as well on transient cases as it does on steady-state? That's news to me.