CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Verification & Validation

OpenFOAM on AMD GPUs. Container from Infinity Hub: user experiences and performance

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By mesh-monkey

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 23, 2023, 04:28
Default OpenFOAM on AMD GPUs. Container from Infinity Hub: user experiences and performance
  #1
New Member
 
Alexis Espinosa
Join Date: Aug 2009
Location: Australia
Posts: 20
Rep Power: 17
alexisespinosa is on a distinguished road
AMD recently provided an OpenFOAM container capable of running on AMD GPUs.


It is in their Infinity Hub:
https://www.amd.com/en/technologies/...y-hub/openfoam

And my questions are:

-How have been the experiences of the community using this OpenFOAM container on AMD GPUs?
-Are you reaching cool performance improvements vs just CPU solvers?

Thanks a lot,
Alexis

(PS. I will start using it and post my experiences too)

Last edited by alexisespinosa; March 6, 2023 at 22:00.
alexisespinosa is offline   Reply With Quote

Old   February 9, 2024, 04:51
Default
  #2
Senior Member
 
M. Montero
Join Date: Mar 2009
Location: Madrid
Posts: 155
Rep Power: 17
be_inspired is on a distinguished road
Hi,

Were you able to launch any simulation using the GPU version? Is it 100% GPU or only the pressure solver is solved in the GPU?

Do you know if it could be compatible que Nvidia GPU to test it?

Best Regards
Marcelino
be_inspired is offline   Reply With Quote

Old   April 28, 2024, 10:15
Default OpenFOAM on AMD GPUs. Container from Infinity Hub: Experiences with Radeon VII
  #3
New Member
 
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11
mesh-monkey is on a distinguished road
Thought I'd share my experiences with this!

My findings, unfortunately with my setup, have been that it remains much faster to solve on CPU than GPU.

I used the HPC_Motorbike example and code provided by AMD in the docker container (not available on the link any longer btw) as-is on my Radeon VII. For the CPU examples, I modified the run to suit a typical CPU-based set of solvers using the standard tutorial fvSolution files.


Results as follows. Times shown are SimpleFoam total ClockTime to 20 iterations; and time per iteration, excluding the first time step:
  • GPU: 473 seconds; 20.8s per iteration
  • CPU, with 'GPU-aligned' solvers: 343 seconds ; 16.7s per iteration
  • CPU, with 'normal' solvers: 205 seconds; 9.9s per iteration
Velocity and pressure solvers for each as follows:
  • GPU: PETSc-bcgs & PETSc-cg
  • CPU, with 'GPU-aligned' solvers: DILUPBiCGStab & DICPCG
  • CPU, with 'normal' solvers, per tutorial: smoothSolver & GAMG
The GPU appears seldom used, with sporadic spikes in utilisation and barely exceeding 40% of the GPU pipe. Most of the time within iterations cycle being seems to be spent doing not much (I/O maybe?). Unsurprisingly the 1st iteration is much longer as the model is read into vram, which you can see quite easily, but subsequent iterations are also slower than similar solvers on CPU. I have included a time per iteration from iterations 2-20 to illustrate the per-iteration slowdown to account for this.


I get that GPUs are made for large models but I am already nearly reaching the 16GB of vram even in this model (5,223,573 cells). I can't run the Medium sized model (~9M cells I think) because I run out of vram I'm running this on my desktop PC for funzies because I don't even want to know how much faster this will be on my usual solving machine (48 core xeon).



So, in summary, based on my experiences with a Radeon VII and the Small HPC_motorbike case:

  • GPU is half as fast as CPU when using CPU-native solvers
  • GPU is 20% slower vs CPU when using less-efficient 'GPU-aligned' solvers
Next step I think is to find more GPUs to test the scaling of larger models (love an excuse to keep scouring ebay for deals hehe)

Cheers,
Tom
mesh-monkey is offline   Reply With Quote

Old   April 28, 2024, 10:23
Default
  #4
New Member
 
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11
mesh-monkey is on a distinguished road
Quote:
Originally Posted by be_inspired View Post
Hi,

Were you able to launch any simulation using the GPU version? Is it 100% GPU or only the pressure solver is solved in the GPU?

Do you know if it could be compatible que Nvidia GPU to test it?

Best Regards
Marcelino
100% GPU as far as I'm aware. All solvers are petsc.

The initial run script appears to be flexible to support CUDA devices too. I've not dug any deeper and don't have a suitable GPU to test with further, sorry.

Code:
Available Options: HIP or CUDA
Only HIP is mentioned in the fvSolution file though, so I'd guess that the petsc solver has been tuned for AMD.
mesh-monkey is offline   Reply With Quote

Old   April 28, 2024, 14:51
Default
  #5
Senior Member
 
Domenico Lahaye
Join Date: Dec 2013
Posts: 802
Blog Entries: 1
Rep Power: 19
dlahaye is on a distinguished road
Thanks for your input. Much appreciated.

1/ Can you confirm that the bulk of the CPU time goes into the pressure-solve (independent of CPU vs. GPU)?

2/ How do you precondition PETSc-CG for the pressure solve?

3/ Are you willing to walk an extra mile and compare two flavours of PETSc-CG.

Flavour-1: using AMG to precondition PETSc-CG allowing AMG to do a set-up at each linear system solve.

Flavour-2: using AMG to precondition PETSc-CG (so far identical to Flavour-1), this time freezing the hierarchy that AMG construct.
dlahaye is offline   Reply With Quote

Old   April 29, 2024, 05:42
Default
  #6
New Member
 
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11
mesh-monkey is on a distinguished road
Quote:
Originally Posted by dlahaye View Post
Thanks for your input. Much appreciated.

1/ Can you confirm that the bulk of the CPU time goes into the pressure-solve (independent of CPU vs. GPU)?

2/ How do you precondition PETSc-CG for the pressure solve?

3/ Are you willing to walk an extra mile and compare two flavours of PETSc-CG.

Flavour-1: using AMG to precondition PETSc-CG allowing AMG to do a set-up at each linear system solve.

Flavour-2: using AMG to precondition PETSc-CG (so far identical to Flavour-1), this time freezing the hierarchy that AMG construct.

1) I don't have a specific clocktime breakdown, but it would appear so, yes.
2) PETSC-CG is preconditioned using GAMG:

Code:
p
    {
        solver          petsc;
        petsc
        {               
            options
            {
                ksp_type  cg;
                ksp_cg_single_reduction  true;
                ksp_norm_type none;
                mat_type    mpiaijhipsparse; //HIPSPARSE
                vec_type    hip;

                //preconditioner 
                pc_type gamg;
                pc_gamg_type "agg"; // smoothed aggregation                                                                            
                pc_gamg_agg_nsmooths "1"; // number of smooths for smoothed aggregation (not smoother iterations)                      
                pc_gamg_coarse_eq_limit "100";
                pc_gamg_reuse_interpolation true;
                pc_gamg_aggressive_coarsening "2"; //square the graph on the finest N levels
                pc_gamg_threshold "-1"; // increase to 0.05 if coarse grids get larger                                                 
                pc_gamg_threshold_scale "0.5"; // thresholding on coarse grids
                pc_gamg_use_sa_esteig true;

                // mg_level config
                mg_levels_ksp_max_it "1"; // use 2 or 4 if problem is hard (i.e stretched grids)
                mg_levels_esteig_ksp_type cg; //max_it "1"; // use 2 or 4 if problem is hard (i.e stretched grids)                     

                // coarse solve (indefinite PC in parallel with 2 cores)                                                               
                mg_coarse_ksp_type "gmres";
                mg_coarse_ksp_max_it "2";
        
                // smoother (cheby)                                                                                                    
                mg_levels_ksp_type chebyshev;
                mg_levels_ksp_chebyshev_esteig "0,0.05,0,1.1";
                mg_levels_pc_type "jacobi";
                
            }

            caching
            {
                matrix
                {
                    update always;
                }

                preconditioner
                {
                    //update always;     
                    update periodic;

                    periodicCoeffs
                    {
                        frequency  40;
                    }
                }
            }
        }
        tolerance       1e-07;
        relTol          0.1;
    }
3/ Sure, happy to. I'll need some guidance on how to set those flavours up.
dlahaye likes this.
mesh-monkey is offline   Reply With Quote

Old   April 29, 2024, 06:16
Default
  #7
Senior Member
 
Domenico Lahaye
Join Date: Dec 2013
Posts: 802
Blog Entries: 1
Rep Power: 19
dlahaye is on a distinguished road
Thanks again.

It appears that by setting

Code:
periodicCoeffs
    {
       frequency  40;
    }
you already have a blend between Flavour-1 (frequency 1) and Flavour-2 (frequency infinity). My question has thus been answered.

To obtain statistics on OpenFoam native GAMG coarsening, insert in system/controlDict


I have two follow-up questions if you allow.

1/ How does runtime of PETSc-GAMG compare with OpenFoam-native-GAMG (the latter used as a preconditioner to be fair)?

2/ Do you see statistics of PETSc-GAMG coarsening printed somewhere? It would be interesting to compare these statistics (in particular the geometric and algebraic complexities) with the statistics of OpenFoam-native-GAMG. The latter can be easily obtained by inserting debug switches in system/controlDict;

Quote:
// see /opt/OpenFOAM/OpenFOAM-v1906/etc/controlDict for a complete list of DebugSwitches
DebugSwitches
{
GAMG 2;
GAMGAgglomeration 0;
GAMGInterface 0;
GAMGInterfaceField 0;
GaussSeidel 0;
fvScalarMatrix 0;
lduMatrix 0;
lduMesh 0;
}
dlahaye is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance problems on AMD Epyc cluster crpvn Hardware 3 February 17, 2020 09:50


All times are GMT -4. The time now is 20:48.