|
[Sponsors] |
OpenFOAM on AMD GPUs. Container from Infinity Hub: user experiences and performance |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 23, 2023, 04:28 |
OpenFOAM on AMD GPUs. Container from Infinity Hub: user experiences and performance
|
#1 |
New Member
Alexis Espinosa
Join Date: Aug 2009
Location: Australia
Posts: 20
Rep Power: 17 |
AMD recently provided an OpenFOAM container capable of running on AMD GPUs.
It is in their Infinity Hub: https://www.amd.com/en/technologies/...y-hub/openfoam And my questions are: -How have been the experiences of the community using this OpenFOAM container on AMD GPUs? -Are you reaching cool performance improvements vs just CPU solvers? Thanks a lot, Alexis (PS. I will start using it and post my experiences too) Last edited by alexisespinosa; March 6, 2023 at 22:00. |
|
February 9, 2024, 04:51 |
|
#2 |
Senior Member
M. Montero
Join Date: Mar 2009
Location: Madrid
Posts: 155
Rep Power: 17 |
Hi,
Were you able to launch any simulation using the GPU version? Is it 100% GPU or only the pressure solver is solved in the GPU? Do you know if it could be compatible que Nvidia GPU to test it? Best Regards Marcelino |
|
April 28, 2024, 10:15 |
OpenFOAM on AMD GPUs. Container from Infinity Hub: Experiences with Radeon VII
|
#3 |
New Member
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11 |
Thought I'd share my experiences with this!
My findings, unfortunately with my setup, have been that it remains much faster to solve on CPU than GPU. I used the HPC_Motorbike example and code provided by AMD in the docker container (not available on the link any longer btw) as-is on my Radeon VII. For the CPU examples, I modified the run to suit a typical CPU-based set of solvers using the standard tutorial fvSolution files. Results as follows. Times shown are SimpleFoam total ClockTime to 20 iterations; and time per iteration, excluding the first time step:
I get that GPUs are made for large models but I am already nearly reaching the 16GB of vram even in this model (5,223,573 cells). I can't run the Medium sized model (~9M cells I think) because I run out of vram I'm running this on my desktop PC for funzies because I don't even want to know how much faster this will be on my usual solving machine (48 core xeon). So, in summary, based on my experiences with a Radeon VII and the Small HPC_motorbike case:
Cheers, Tom |
|
April 28, 2024, 10:23 |
|
#4 | |
New Member
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11 |
Quote:
The initial run script appears to be flexible to support CUDA devices too. I've not dug any deeper and don't have a suitable GPU to test with further, sorry. Code:
Available Options: HIP or CUDA |
||
April 28, 2024, 14:51 |
|
#5 |
Senior Member
|
Thanks for your input. Much appreciated.
1/ Can you confirm that the bulk of the CPU time goes into the pressure-solve (independent of CPU vs. GPU)? 2/ How do you precondition PETSc-CG for the pressure solve? 3/ Are you willing to walk an extra mile and compare two flavours of PETSc-CG. Flavour-1: using AMG to precondition PETSc-CG allowing AMG to do a set-up at each linear system solve. Flavour-2: using AMG to precondition PETSc-CG (so far identical to Flavour-1), this time freezing the hierarchy that AMG construct. |
|
April 29, 2024, 05:42 |
|
#6 | |
New Member
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11 |
Quote:
1) I don't have a specific clocktime breakdown, but it would appear so, yes. 2) PETSC-CG is preconditioned using GAMG: Code:
p { solver petsc; petsc { options { ksp_type cg; ksp_cg_single_reduction true; ksp_norm_type none; mat_type mpiaijhipsparse; //HIPSPARSE vec_type hip; //preconditioner pc_type gamg; pc_gamg_type "agg"; // smoothed aggregation pc_gamg_agg_nsmooths "1"; // number of smooths for smoothed aggregation (not smoother iterations) pc_gamg_coarse_eq_limit "100"; pc_gamg_reuse_interpolation true; pc_gamg_aggressive_coarsening "2"; //square the graph on the finest N levels pc_gamg_threshold "-1"; // increase to 0.05 if coarse grids get larger pc_gamg_threshold_scale "0.5"; // thresholding on coarse grids pc_gamg_use_sa_esteig true; // mg_level config mg_levels_ksp_max_it "1"; // use 2 or 4 if problem is hard (i.e stretched grids) mg_levels_esteig_ksp_type cg; //max_it "1"; // use 2 or 4 if problem is hard (i.e stretched grids) // coarse solve (indefinite PC in parallel with 2 cores) mg_coarse_ksp_type "gmres"; mg_coarse_ksp_max_it "2"; // smoother (cheby) mg_levels_ksp_type chebyshev; mg_levels_ksp_chebyshev_esteig "0,0.05,0,1.1"; mg_levels_pc_type "jacobi"; } caching { matrix { update always; } preconditioner { //update always; update periodic; periodicCoeffs { frequency 40; } } } } tolerance 1e-07; relTol 0.1; } |
||
April 29, 2024, 06:16 |
|
#7 | |
Senior Member
|
Thanks again.
It appears that by setting Code:
periodicCoeffs { frequency 40; } To obtain statistics on OpenFoam native GAMG coarsening, insert in system/controlDict I have two follow-up questions if you allow. 1/ How does runtime of PETSc-GAMG compare with OpenFoam-native-GAMG (the latter used as a preconditioner to be fair)? 2/ Do you see statistics of PETSc-GAMG coarsening printed somewhere? It would be interesting to compare these statistics (in particular the geometric and algebraic complexities) with the statistics of OpenFoam-native-GAMG. The latter can be easily obtained by inserting debug switches in system/controlDict; Quote:
|
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Performance problems on AMD Epyc cluster | crpvn | Hardware | 3 | February 17, 2020 09:50 |