OpenCL linear solver for OpenFoam 1.7 (alpha) will come out very soon

qinmaple · May 20, 2011, 08:10

Dear OpenFoamers,

========================update==================== =============
Dear Openfoamers

The OpenCL solver plugin : clFoam v0.1 come out for test.

Until now, clFoam single precision has been tested on ATI 5650M GPU and NVidia Tesla C2050. The speed is slightly slower than CPU on Tesla C2050 for 160000 cells of case: cavity 4 times steps (clPCG). (see profilingDatasheet.xls in profiling data/ for details)

The openCL solver is still promising, as it is a new tech and has great space to improve.

download link:
http://www.iesensor.com/download/clFoam_v0.1.zip

Quite a lot of work to do, any advice on improving the efficiency is appreciated. further, there must be some errors in the manual, DO leave me a email to correct them.

Thanks very much

Yours,

Qingfeng Xia
services@iesensor.com

---------------------introduction------------------------------
1. Project Layout

# file system structure of the project generated by command:
there are 3 projects(subfolders) in clFoam
clUtils/ basic vector csrMatrix operation written by author
(BSD licensed)
Tested and profiled on AMD_STREAM_SDK, SP on GPU and DP on CPU

clFoam/ clPCG and clPBICG solver based on clUtils/
(GPLv3 licensed)
Tested and profiled on AMD_STREAM_SDK , single precison on GPU

vclFoam/ a wrapper to call viennaCL blas solver
(GPLv3 licensed)
Not finished, there is a bug

# other resource included
doc/ some useful documents, tutorials, install manuals
bin/ some bash scripts
profiling data/
SpeedITOFPlugin1.1/ is downloaded from SpeedIT toolkit website and edited for SP support

**** USABILITY*******
(1)clUtils : single precision works for both AMD and NV GPU

double precision past the test on openCL via GPU
double precision on cuda 3.1, fails for "OUT_OF_RESOURCE"
double precision NOT work properly on Tesla C2050 Cuda 3.1

(2)clFoam is usable for only single precison on GPU, clPCG and clPBiCG

(see profilingDatasheet.xls in profiling data/ for details)
For double precision, it should work but still buggy.
I did not have hardware handy for debug, only ssh assess to the remote cluster without upgrade to CUDA 4.0

(3)vclFoam is totally not usable,
As vclFoam will be not probably faster than clFoam, I do not spend quite a lot time on that plugin

**** *****************
---------------
2. Requirements
-----------------
clFoam requires the following:
* A recent C++ compiler (e.g. gcc 4.x.x), GCC >4.4 is needed!!!
* OpenFoam 1.7.X
* OpenCL: For accessing GPUs(shared library and include files)
For AMD GPUs, install the AMD_STREAM_SDK
SEE installation guide:

For Nvida GPUs, CUDA_SDK and CUDA_TOOLKIT
SEE installation guide:

optional vclFoam
* uBLAS : (shipped with the Boost libraries)
#sudo apt-get install boost
* viennaCL 1.1 header has been put into vclFoam,

-----------------
3. Installation
-----------------

the install tutorials are put in separate files:

install_vclFoam_guide.txt
install_clFoam_guide.txt
install_clUtils_guide.txt
install_speedIT_class_guide.txt

-----------------------
4. Authors and Contact
------------------------

Qingfeng Xia
services@iesensor.com

June 01 2011

Qingfeng Xia

======================== old post ==============================
An openCL solver is planned Xmas 2011, inspired by speedIT plugin free for Single Precision.

At first, I want wrapper the BLAS solvers from ViennaCL.1.0.5, but there is always some error, so I just write my only PCG and BiPCG solvers. I have not fully profile the solver, it is slower on my laptop ATI card, but I am trying on the Tesla C2050. The first version of technote(first and only test on my ATI 5650) is on my blog.

http://qinmaple.wordpress.com/

The code will be release as GPL for solver wrapper and BSD for the clUtils(BLAS function).

If someone is interested in the ViennaCL solver. I will upload my wrapper. So he/she can debug. I can not include the *.hpp of ViennaCL. I am trying on the NVidida cards, hopefully, it can work.

In my opinion, the GPU solver will not greatly faster than CPU, because the preconditioners of OF can not be paralleled. Yet, it should be promising for DSMC method, I will try it after my PhD thesis submission.

Recently, my colleague send me a link to the 'ofgpu' from symscape.com.
I attempt to compile this solver with mime, but It seems work only for windows version. Am I right? if not share me some tips to compile on Linux.
At least, give me some idea, how fast it is on GPU.

I am extremely busy this days to finish my PhD thesis. I do not have time to debug, profiling so many GPU solver, I have spent one week on the Telsa GPU on remote cluster, will give further profiling result for the Openfoam conference this year. I find there is a bug prevending me to compile with double precision support on GPU of remote cluster.

Any advice and suggestions are appreciated on ViennaCL and ofgpu.
Email: jasonyale (at) gmail.com

Qingfeng Xia
The University of Manchester

May 20, 2011.

================================================== ==========

gocarts · May 23, 2011, 11:13

Quote:

Originally Posted by qinmaple

Recently, my colleague send me a link to the 'ofgpu' from symscape.com.
I attempt to compile this solver with mime, but It seems work only for windows version. Am I right? if not share me some tips to compile on Linux.
At least, give me some idea, how fast it is on GPU.

ofgpu is cross-platform, supporting Windows, Linux, and likely Mac OS X too.

I don't currently have any benchmarks.

You can find the original CFD-Online announcement at:
http://www.cfd-online.com/Forums/ope...-openfoam.html

qinmaple · May 26, 2011, 09:31

There is a bench marked using PCG solver from the speedit class plugin
by Japanese guy. It shows it is 3times SLOWER than CPU !.
I have come with similar result on my laptop, but I am trying on our university HPC Tesla C2050. I got an error change from SP to DP, so I have yet finished the benchmark. The bottleneck seems to be the kernel schedule, Seeing from the visual profiler. it use only about 1% time to calculate the kernel(viennacl vector bench). but I am still new to GPU, I am not sure how to improve the performance.

I know the ofgpu can be built on Linux. but the install tutorial is a little messy. My understanding is that even linux users need to patch the source developed for windows, and need to rebuild the source. I think that is not necessary, am I right?

Dr Jasak said interface supdate in matrix muliplication(Amul() Tmul()) should not be overlooked. I am afraid this will make the GPU solver even slower. I have not dig into the speedit plugin. I am not sure how they make GPU work with MPI.

Thanks.

gocarts · May 26, 2011, 09:53

Quote:

Originally Posted by qinmaple

I know the ofgpu can be built on Linux. but the install tutorial is a little messy. My understanding is that even linux users need to patch the source developed for windows, and need to rebuild the source. I think that is not necessary, am I right?

The patch adds the Windows and Mac OS X platforms to the standard OpenFOAM distribution, making for a cross-platform source base. In addition it now also includes the hooks for the GPU-based linear solvers. Of course, at some point you have to (re)build.

I would classify the patch and build procedure as advanced, not messy.

mborgraeve · August 10, 2012, 12:00

Hi,
I am trying to use ofgpu too, and i face some difficulties with tht patching of OpenFoam...
Is there anybody who can help me ?
Thanks !
Matthieu

May 20, 2011, 08:10	OpenCL linear solver for OpenFoam 1.7 (alpha) ---------clFoam v0.1 come out	#1
qinmaple New Member Jason Join Date: Nov 2010 Posts: 3 Rep Power: 16	Dear OpenFoamers, ========================update==================== ============= Dear Openfoamers The OpenCL solver plugin : clFoam v0.1 come out for test. Until now, clFoam single precision has been tested on ATI 5650M GPU and NVidia Tesla C2050. The speed is slightly slower than CPU on Tesla C2050 for 160000 cells of case: cavity 4 times steps (clPCG). (see profilingDatasheet.xls in profiling data/ for details) The openCL solver is still promising, as it is a new tech and has great space to improve. download link: http://www.iesensor.com/download/clFoam_v0.1.zip Quite a lot of work to do, any advice on improving the efficiency is appreciated. further, there must be some errors in the manual, DO leave me a email to correct them. Thanks very much Yours, Qingfeng Xia services@iesensor.com ---------------------introduction------------------------------ 1. Project Layout # file system structure of the project generated by command: there are 3 projects(subfolders) in clFoam clUtils/ basic vector csrMatrix operation written by author (BSD licensed) Tested and profiled on AMD_STREAM_SDK, SP on GPU and DP on CPU clFoam/ clPCG and clPBICG solver based on clUtils/ (GPLv3 licensed) Tested and profiled on AMD_STREAM_SDK , single precison on GPU vclFoam/ a wrapper to call viennaCL blas solver (GPLv3 licensed) Not finished, there is a bug # other resource included doc/ some useful documents, tutorials, install manuals bin/ some bash scripts profiling data/ SpeedITOFPlugin1.1/ is downloaded from SpeedIT toolkit website and edited for SP support ** USABILITY*** (1)clUtils : single precision works for both AMD and NV GPU double precision past the test on openCL via GPU double precision on cuda 3.1, fails for "OUT_OF_RESOURCE" double precision NOT work properly on Tesla C2050 Cuda 3.1 (2)clFoam is usable for only single precison on GPU, clPCG and clPBiCG (see profilingDatasheet.xls in profiling data/ for details) For double precision, it should work but still buggy. I did not have hardware handy for debug, only ssh assess to the remote cluster without upgrade to CUDA 4.0 (3)vclFoam is totally not usable, As vclFoam will be not probably faster than clFoam, I do not spend quite a lot time on that plugin *************** --------------- 2. Requirements ----------------- clFoam requires the following: * A recent C++ compiler (e.g. gcc 4.x.x), GCC >4.4 is needed!!! * OpenFoam 1.7.X * OpenCL: For accessing GPUs(shared library and include files) For AMD GPUs, install the AMD_STREAM_SDK SEE installation guide: For Nvida GPUs, CUDA_SDK and CUDA_TOOLKIT SEE installation guide: optional vclFoam * uBLAS : (shipped with the Boost libraries) #sudo apt-get install boost * viennaCL 1.1 header has been put into vclFoam, ----------------- 3. Installation ----------------- the install tutorials are put in separate files: install_vclFoam_guide.txt install_clFoam_guide.txt install_clUtils_guide.txt install_speedIT_class_guide.txt ----------------------- 4. Authors and Contact ------------------------ Qingfeng Xia services@iesensor.com June 01 2011 Qingfeng Xia ======================== old post ============================== An openCL solver is planned Xmas 2011, inspired by speedIT plugin free for Single Precision. At first, I want wrapper the BLAS solvers from ViennaCL.1.0.5, but there is always some error, so I just write my only PCG and BiPCG solvers. I have not fully profile the solver, it is slower on my laptop ATI card, but I am trying on the Tesla C2050. The first version of technote(first and only test on my ATI 5650) is on my blog. http://qinmaple.wordpress.com/ The code will be release as GPL for solver wrapper and BSD for the clUtils(BLAS function). If someone is interested in the ViennaCL solver. I will upload my wrapper. So he/she can debug. I can not include the .hpp of ViennaCL. I am trying on the NVidida cards, hopefully, it can work. In my opinion, the GPU solver will not greatly faster than CPU, because the preconditioners of OF can not be paralleled. Yet, it should be promising for DSMC method, I will try it after my PhD thesis submission. Recently, my colleague send me a link to the 'ofgpu' from symscape.com. I attempt to compile this solver with mime, but It seems work only for windows version. Am I right? if not share me some tips to compile on Linux. At least, give me some idea, how fast it is on GPU. I am extremely busy this days to finish my PhD thesis. I do not have time to debug, profiling so many GPU solver, I have spent one week on the Telsa GPU on remote cluster, will give further profiling result for the Openfoam conference this year. I find there is a bug prevending me to compile with double precision support on GPU of remote cluster. Any advice and suggestions are appreciated on ViennaCL and ofgpu. Email: jasonyale (at) gmail.com Qingfeng Xia The University of Manchester May 20, 2011. ================================================== ========== Last edited by qinmaple; June 1, 2011 at 20:57. Reason: GPU solver clFoam v0.1 come out on 2011-06-01*

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
free C code for large sparse matrix linear solver	ztdep	Main CFD Forum	7	May 24, 2007 15:14
Linear Iterative Solver + Elliptic PDE	cfd101	Main CFD Forum	0	November 14, 2005 19:59
Setting a B.C using UserFortran in 4.3	tokai	CFX	10	July 17, 2001 17:25
linear solver overflow	peggy	CFX	1	February 8, 2001 02:39
solver for linear system with large sparse matrix	Yangang Bao	Main CFD Forum	1	October 25, 1999 05:22

May 26, 2011, 09:31		#3
qinmaple New Member Jason Join Date: Nov 2010 Posts: 3 Rep Power: 16	There is a bench marked using PCG solver from the speedit class plugin by Japanese guy. It shows it is 3times SLOWER than CPU !. I have come with similar result on my laptop, but I am trying on our university HPC Tesla C2050. I got an error change from SP to DP, so I have yet finished the benchmark. The bottleneck seems to be the kernel schedule, Seeing from the visual profiler. it use only about 1% time to calculate the kernel(viennacl vector bench). but I am still new to GPU, I am not sure how to improve the performance. I know the ofgpu can be built on Linux. but the install tutorial is a little messy. My understanding is that even linux users need to patch the source developed for windows, and need to rebuild the source. I think that is not necessary, am I right? Dr Jasak said interface supdate in matrix muliplication(Amul() Tmul()) should not be overlooked. I am afraid this will make the GPU solver even slower. I have not dig into the speedit plugin. I am not sure how they make GPU work with MPI. Thanks.

August 10, 2012, 12:00		#5
mborgraeve New Member Matthieu Borgraeve Join Date: Aug 2012 Posts: 17 Rep Power: 14	Hi, I am trying to use ofgpu too, and i face some difficulties with tht patching of OpenFoam... Is there anybody who can help me ? Thanks ! Matthieu