GPU acceleration in Ansys Fluent

Duke711 · August 27, 2018, 12:05

Not only that, the solution process will probably break off, because of an error.

http://www.cadfem.de/fileadmin/CADFE...CADFEM_GPU.pdf

Echidna · August 27, 2018, 12:14

Can i use a Quadro K6000 plus a Tesla K80? Will they work together?

Micael · August 27, 2018, 12:32

Flow Setup:
as OP

Software/Hardware:
Operating system: CentOS Linux 7
Fluent version: Ansys Fluent 19.1
CPU: Dual Xeon Gold 6150, HT disabled
Memory: 192 GB DDR4-2666 ECC (12 dimm x 16 GB)
GPU: 4 x V100-32GB NVLINK

Did only Second Precision.

32-core
Simple None-GPU: 4.1 s
Simple 1-GPU: 4.1 s
Simple 4-GPU: 4.1 s
Coupled None-GPU: 13.2 s
Coupled 1-GPU: 28.1 s
Coupled 2-GPU: 24.2 s
Coupled 4-GPU: 22.2 s

4-core
Simple None-GPU: 22.6 s
Coupled None-GPU: 67.5 s
Coupled 1-GPU: 48.4 s
Coupled 2-GPU: 44.7 s
Coupled 4-GPU: 42.1 s

1-core
Simple None-GPU: 89.0 s
Coupled None-GPU: 273.5 s
Coupled 1-GPU: 132.1 s

flotus1 · August 27, 2018, 13:16

Great, finally some decent hardware. Would you mind running a larger case (coupled+DP would be enough)? I only chose such a small one due to the lack of VRAM on the GPUs I had available at the time. Would be interesting to see if you can get some GPU scaling going while running 32 CPU cores.

Duke711 · August 27, 2018, 14:13

Quote:

Originally Posted by Echidna

Can i use a Quadro K6000 plus a Tesla K80? Will they work together?

12 vs 24 GB, i dont know, try it

Micael · August 29, 2018, 12:58

Flow Setup:
as OP
excepted mesh is 215 x 215 x 215 (10M cells)

Software/Hardware:
Operating system: CentOS Linux 7
Fluent version: Ansys Fluent 19.1
CPU: Dual Xeon Gold 6150, HT disabled
Memory: 192 GB DDR4-2666 ECC (12 dimm x 16 GB)
GPU: 4 x V100-32GB NVLINK

Did only Second Precision.

32-core
Coupled None-GPU: 577 s
Coupled 1-GPU: Failed, apparently out of memory
Coupled 2-GPU: 541 s
Coupled 4-GPU: 394 s

mrodriguez · April 12, 2019, 19:02

Quote:

Originally Posted by KEDELLE

Did u use Titan V or Titan X?

I bough a Titan V. I did not know that this card does not work with ansys fluent. Someone ask you about how to resolve this problem? Is it possible load this GPU with a simulation? I followed every step to activating GPU, but it does not work. Help me pleased.

flotus1 · April 12, 2019, 20:15

Seems like this topic comes around again every now and then...
If the solver you are using is not utilizing the GPU despite it being activated in the Fluent launcher, then you won't see much benefit from the GPU anyway.
You could force Fluent to use the GPU with some TUI commands, but again, expect to see no improvement or even worse performance with a GPU enabled in these cases. https://www.sharcnet.ca/Software/Ans...-EC933A7E.html
Or maybe Ansys decided to use a whitelist for GPUs in Fluent just like they did with some of their other software. It's been a while since I last used it.

sida · June 1, 2020, 17:33

Quote:

Originally Posted by flotus1

If you feel that any specific aspect is missing or conclusions drawn are flawed I would recommend addressing it directly.
I will gladly re-run or add a few benchmarks with Tesla V100. Contact me through PN if you want to send over a few samples

Hi

I'm very eager to see the result of your tests with Tesla V100. Before reading your posts, I was going to combine Threadripper 3970x with Quadro RTX 4000, but now that GPU acceleration is not as effective/justifiable as advertised, what alternative do you suggest for Quadro RTX 4000?

flotus1 · June 1, 2020, 18:59

It should come as no surprise that I never got any samples. I was not really expecting that.

I don't have any alternative in the price range of an Quadro RTX 4000 card. Well Nvidia doesn't, but that is splitting hairs. GPU acceleration with Ansys products is for people with a virtually unlimited hardware budget, due to the fact that software, engineers and development time are so much more expensive than a workstation. My advice to everyone else is to focus on CPU performance first.
If you really want to do GPU acceleration on a budget, try used Quadro K6000 cards. They can be found for around 300-400$. That is of course if you want to do double precision. With single precision, any semi-recent CUDA capable card should do. The consumer cards offer much better value than the Quadro and Tesla lineup here.

sida · June 2, 2020, 01:38

Thanks for the quick response, it was a relief after days of research

Best

sida · June 2, 2020, 02:34

Also, this article, using Openfoam, can help us understand that an investment in CPU is much more reliable compared to an investment in GPGPU, at least when it comes to cfd.

Multi GPU Implementation to Accelerate
the CFD Simulation of a 3D Turbo-Machinery
Benchmark Using the RapidCFD Library

https://link.springer.com/chapter/10...030-38043-4_15

bhanuday.sharma · February 27, 2021, 12:28

It would have been helpful if you could uploaded your .cas / mesh file. So, that other users quickly test their system configuration.

Stabum · April 28, 2023, 10:08

Quote:

Originally Posted by flotus1

The topic of GPU acceleration for Ansys Fluent sometimes seems to be shrouded in mystery. So I ran a few benchmarks to answer some frequently asked questions and get a snapshot of the capability of this feature in 2017.

...

Edit: here is a nearly exhaustive list of Nvidia GPUs with high DP capabilities:

Please I kindly ask you to correct me if I'm summarizing it too roughly:

In case of medium parallelization (max 64/128 cores), GPGPU can be convenient only if:

1) you're using coupled algorithms;
AND
2) you're using powerful graphic cards (Quadro5000 or above).

Given all this, GPU RAM must be big enough to contain the mesh of the problem you're going to study. This means that, if your average mesh requires around 64 GB of RAM (I understand that it can sound quite small for some of you guys, but for those who don't work at NASA it's pretty much!), and you have planned to adopt Quadro RTX 5000s (16GB each), you should have a quad SLI or more...

Given all this, it's absolutely impossible for a "normal" user to make use of GPGPU technology in CFD.

Many thanks,
C.

flotus1 · April 28, 2023, 11:27

Some things changed since I originally posted my little experiment.
For example, Ansys now has a native GPU solver, which allegedly runs much faster. I can not comment on that claim.

Some things didn't change though, at least not for the better. Commercial GPU solvers -native or otherwise- don't have feature parity with the established CPU counterparts. If you have everything you need, or are willing to change your workflow to accommodate the missing features, maybe it is for you.
GPU memory is still a scarce resource. Nvidia now sells cards with 80GB of VRAM (e.g. H100), but of course these are at the high end for data centers.
And noteworthy FP64 performance is reserved for very few products at the high end. Everything else is cut down to a 1:32 divider for FP64.

My main motivation for writing this article in the first place was this: people here regularly inquired "which graphics card should I buy to get good acceleration in my new Fluent workstation. My total budget is -insert figure below 10000€-"
For the vast majority of cases, the answer is just stick to maximizing CPU performance.

GPU acceleration or computation with commercial CFD solvers is for data centers. The hardware is just too expensive to make it work in any other setting. This is a trend that has only accelerated over the last few years.
Or to put it very bluntly: if you have to ask me -a random stranger on the internet- for advice, you probably should not bother with GPUs

Of course, if you like to tinker with used hardware, don't let me stop you. P100 go for less than 300€ on ebay these days

arjun · May 11, 2023, 09:31

Quote:

Originally Posted by flotus1

GPU acceleration or computation with commercial CFD solvers is for data centers. The hardware is just too expensive to make it work in any other setting.

I agree with you pretty much everything here except this one bit (here i do not completely agree).

I understand this perception comes from presentations of Ansys and Siemens where they are showing results from top of the line GPUs that a normal person would not have on desktop. So yaa if the user is only confining itself to these limited few names then what you said is very true.

But here with Wildkatze i try to focus on what layman could have on the desktop and what we can gain out of it. Even with my old 2080ti I am able to gain almost 25 to 30x of speed up. Now if i have to pick current GPU like 40XX series this scaling would go long way and this a normal user can afford.

The only problem that i could see is that people still won't use the solver because they do not know the name (people use what they know of and don't want to try anything new). But if they want it then they can get good speed up from GPU here.

flotus1 · May 11, 2023, 12:21

I find it very commendable that you put in the effort to make it work with hardware we can actually get our hands on.
But a 25x speedup from a 2080TI compared to CPU begs the question: what CPU are you comparing to? And are we talking multi-threaded or single-threaded.
Please don't take this the wrong way, but such outrageously high speedups from GPU acceleration, when comparing to a reasonably modern CPU, usually get you a few raised eyebrows in the HPC community. Because it usually means that the CPU implementation simply does not have the same level of optimization as the GPU implementation.
When looking at the raw specs like theoretical FP32 operations per second, or memory bandwidth, there is not a 25x gap between CPUs and GPUs. At least leaving aside hardware accelerated operations.

KEDELLE · May 11, 2023, 12:28

I have a Rtxa6000 that I barely used for sale at 4500

What you need is vram size so the cad mesh don’t have to be continuously broken up and sent back and forth from the ssd to the cpu to the gpu

arjun · May 11, 2023, 13:29

Quote:

Originally Posted by flotus1

I find it very commendable that you put in the effort to make it work with hardware we can actually get our hands on.
But a 25x speedup from a 2080TI compared to CPU begs the question: what CPU are you comparing to? And are we talking multi-threaded or single-threaded.
Please don't take this the wrong way, but such outrageously high speedups from GPU acceleration, when comparing to a reasonably modern CPU, usually get you a few raised eyebrows in the HPC community. Because it usually means that the CPU implementation simply does not have the same level of optimization as the GPU implementation.
When looking at the raw specs like theoretical FP32 operations per second, or memory bandwidth, there is not a 25x gap between CPUs and GPUs. At least leaving aside hardware accelerated operations.

I was very casual so did not think of writing cpu etc. The CPU here is AMD 2990WX 32 Core. This i am sure you know of. The machine has 128GB RAM but the whole thing was run on GPU with double precision.

The idea that i am working on is to make a gpu engine where people should be able to run the case from different solvers. At the moment it runs from two software i have. If someone help me with openfoam loader (a translater) it shall be able to run those cases too. Thats the idea so far.

Edited to add: So far in our testing we are within 5% of starccm as cost per iteration. So this shall give some idea about the implementation (we usually take less iterations to converge compare to starccm with fluent never could compare).

JBeilke · May 11, 2023, 15:20

I'm not sure about the right terminology in GPU computing but Arjuns implementation comes without domain decomposition. So just one single processor-domain. This seems to be a good way to accelerate cases, which have a limited potential for a parallel speedup.

I have seen the MotorBike benchmark running on his machine with a preliminary version of the code and it was more than impressive.

August 27, 2018, 12:32		#43
Micael Senior Member Micael Join Date: Mar 2009 Location: Canada Posts: 157 Rep Power: 18	Flow Setup: as OP Software/Hardware: Operating system: CentOS Linux 7 Fluent version: Ansys Fluent 19.1 CPU: Dual Xeon Gold 6150, HT disabled Memory: 192 GB DDR4-2666 ECC (12 dimm x 16 GB) GPU: 4 x V100-32GB NVLINK Did only Second Precision. 32-core Simple None-GPU: 4.1 s Simple 1-GPU: 4.1 s Simple 4-GPU: 4.1 s Coupled None-GPU: 13.2 s Coupled 1-GPU: 28.1 s Coupled 2-GPU: 24.2 s Coupled 4-GPU: 22.2 s 4-core Simple None-GPU: 22.6 s Coupled None-GPU: 67.5 s Coupled 1-GPU: 48.4 s Coupled 2-GPU: 44.7 s Coupled 4-GPU: 42.1 s 1-core Simple None-GPU: 89.0 s Coupled None-GPU: 273.5 s Coupled 1-GPU: 132.1 s DungPham, flotus1, etinavid and 2 others like this.

August 29, 2018, 12:58		#46
Micael Senior Member Micael Join Date: Mar 2009 Location: Canada Posts: 157 Rep Power: 18	Flow Setup: as OP excepted mesh is 215 x 215 x 215 (10M cells) Software/Hardware: Operating system: CentOS Linux 7 Fluent version: Ansys Fluent 19.1 CPU: Dual Xeon Gold 6150, HT disabled Memory: 192 GB DDR4-2666 ECC (12 dimm x 16 GB) GPU: 4 x V100-32GB NVLINK Did only Second Precision. 32-core Coupled None-GPU: 577 s Coupled 1-GPU: Failed, apparently out of memory Coupled 2-GPU: 541 s Coupled 4-GPU: 394 s DungPham, flotus1, etinavid and 3 others like this.

June 1, 2020, 18:59		#50
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	It should come as no surprise that I never got any samples. I was not really expecting that. I don't have any alternative in the price range of an Quadro RTX 4000 card. Well Nvidia doesn't, but that is splitting hairs. GPU acceleration with Ansys products is for people with a virtually unlimited hardware budget, due to the fact that software, engineers and development time are so much more expensive than a workstation. My advice to everyone else is to focus on CPU performance first. If you really want to do GPU acceleration on a budget, try used Quadro K6000 cards. They can be found for around 300-400$. That is of course if you want to do double precision. With single precision, any semi-recent CUDA capable card should do. The consumer cards offer much better value than the Quadro and Tesla lineup here. lev likes this.

June 2, 2020, 02:34		#52
sida New Member sida Join Date: Dec 2019 Posts: 6 Rep Power: 6	Also, this article, using Openfoam, can help us understand that an investment in CPU is much more reliable compared to an investment in GPGPU, at least when it comes to cfd. Multi GPU Implementation to Accelerate the CFD Simulation of a 3D Turbo-Machinery Benchmark Using the RapidCFD Library https://link.springer.com/chapter/10...030-38043-4_15 lev likes this.

April 28, 2023, 11:27		#55
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	Some things changed since I originally posted my little experiment. For example, Ansys now has a native GPU solver, which allegedly runs much faster. I can not comment on that claim. Some things didn't change though, at least not for the better. Commercial GPU solvers -native or otherwise- don't have feature parity with the established CPU counterparts. If you have everything you need, or are willing to change your workflow to accommodate the missing features, maybe it is for you. GPU memory is still a scarce resource. Nvidia now sells cards with 80GB of VRAM (e.g. H100), but of course these are at the high end for data centers. And noteworthy FP64 performance is reserved for very few products at the high end. Everything else is cut down to a 1:32 divider for FP64. My main motivation for writing this article in the first place was this: people here regularly inquired "which graphics card should I buy to get good acceleration in my new Fluent workstation. My total budget is -insert figure below 10000€-" For the vast majority of cases, the answer is just stick to maximizing CPU performance. GPU acceleration or computation with commercial CFD solvers is for data centers. The hardware is just too expensive to make it work in any other setting. This is a trend that has only accelerated over the last few years. Or to put it very bluntly: if you have to ask me -a random stranger on the internet- for advice, you probably should not bother with GPUs Of course, if you like to tinker with used hardware, don't let me stop you. P100 go for less than 300€ on ebay these days oswald, Stabum and wkernkamp like this.

August 27, 2018, 12:05		#41
Duke711 Member Join Date: Dec 2016 Posts: 44 Rep Power: 9	Not only that, the solution process will probably break off, because of an error. http://www.cadfem.de/fileadmin/CADFE...CADFEM_GPU.pdf

August 27, 2018, 12:14		#42
Echidna Member Join Date: Jun 2010 Posts: 77 Rep Power: 16	Can i use a Quadro K6000 plus a Tesla K80? Will they work together?

August 27, 2018, 13:16		#44
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	Great, finally some decent hardware. Would you mind running a larger case (coupled+DP would be enough)? I only chose such a small one due to the lack of VRAM on the GPUs I had available at the time. Would be interesting to see if you can get some GPU scaling going while running 32 CPU cores.

April 12, 2019, 20:15		#48
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	Seems like this topic comes around again every now and then... If the solver you are using is not utilizing the GPU despite it being activated in the Fluent launcher, then you won't see much benefit from the GPU anyway. You could force Fluent to use the GPU with some TUI commands, but again, expect to see no improvement or even worse performance with a GPU enabled in these cases. https://www.sharcnet.ca/Software/Ans...-EC933A7E.html Or maybe Ansys decided to use a whitelist for GPUs in Fluent just like they did with some of their other software. It's been a while since I last used it.

June 2, 2020, 01:38		#51
sida New Member sida Join Date: Dec 2019 Posts: 6 Rep Power: 6	Thanks for the quick response, it was a relief after days of research Best

February 27, 2021, 12:28		#53
bhanuday.sharma New Member Bhanuday Sharma Join Date: Jun 2015 Posts: 18 Rep Power: 11	It would have been helpful if you could uploaded your .cas / mesh file. So, that other users quickly test their system configuration.

May 11, 2023, 12:21		#57
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	I find it very commendable that you put in the effort to make it work with hardware we can actually get our hands on. But a 25x speedup from a 2080TI compared to CPU begs the question: what CPU are you comparing to? And are we talking multi-threaded or single-threaded. Please don't take this the wrong way, but such outrageously high speedups from GPU acceleration, when comparing to a reasonably modern CPU, usually get you a few raised eyebrows in the HPC community. Because it usually means that the CPU implementation simply does not have the same level of optimization as the GPU implementation. When looking at the raw specs like theoretical FP32 operations per second, or memory bandwidth, there is not a 25x gap between CPUs and GPUs. At least leaving aside hardware accelerated operations.

May 11, 2023, 12:28	Rtx a6000	#58
KEDELLE New Member Join Date: Jun 2018 Posts: 7 Rep Power: 8	I have a Rtxa6000 that I barely used for sale at 4500 What you need is vram size so the cad mesh don’t have to be continuously broken up and sent back and forth from the ssd to the cpu to the gpu

May 11, 2023, 15:20		#60
JBeilke Senior Member Joern Beilke Join Date: Mar 2009 Location: Dresden Posts: 539 Rep Power: 20	I'm not sure about the right terminology in GPU computing but Arjuns implementation comes without domain decomposition. So just one single processor-domain. This seems to be a good way to accelerate cases, which have a limited potential for a parallel speedup. I have seen the MotorBike benchmark running on his machine with a preliminary version of the code and it was more than impressive. arjun likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Resolved] GPU on Fluent	Daveo643	FLUENT	4	March 7, 2018 09:02
How to open Icem mesh in Ansys Fluent?	emmkell	FLUENT	27	February 6, 2018 04:34
Can you help me with a problem in ansys static structural solver?	sourabh.porwal	Structural Mechanics	0	March 27, 2016 18:07
Running UDF with Supercomputer	roi247	FLUENT	4	October 15, 2015 14:41
Ansys structural and fluent for FSI	assafwei	FLUENT	1	June 20, 2014 11:56