Double precision cpu

Cinek_Poland · October 3, 2021, 06:51

When I clicked a button to turn on simulation in Abaqus, I watched warning in which software recommended running simulation in double precision FP64 because software need more than 20 milions iteration to do simulation.
So I would like to ask which for example 4-5 cpu should have top performance of double precision FP64 ?

flotus1 · October 3, 2021, 07:37

CPUs aren't GPUs

That is to say, the CPU market is not segmented into product lines with particularly high or low FP64 performance.
Assuming the code is not perfectly vectorized (which most software isn't) the main difference between running FP64 vs FP32 is an increase in memory bandwidth requirement. So you can get a slight increase in performance by making sure you get a CPU with a lot of memory channels. But the performance difference will be rather small compared to the factors of 16, 32 or even 64 we see in GPU floating point performance.

Cinek_Poland · October 3, 2021, 08:46

"That is to say, the CPU market is not segmented into product lines with particularly high or low FP64 performance"
Yes but for example low end cpu's will have less FP64 performance than
high end . I need to know which cpu's have the highest FP64 performance ,
for example epyc milan like 7313p or Xeon Silver ?
I need to know 4-5 cpu's with highest FP64 performance and they will be tested by shop in which I will buy CPU and motherboard.'
If for example simulation in Abaqus will start only if I use FP64 , I will have to use FP64 .
I dont want use FP64 but in some case I will have to so I should buy cpu with the best FP64 performance. Do you think epyc 7313p will has better fp6 performance than Xeon Silver 4316 ?

flotus1 · October 3, 2021, 12:47

Let me reiterate that: "FP64" isn't anything special for a CPU. Every time I or someone else here talks about performance of CPUs in general terms, that also includes floating point calculations with 64-Bit variables. Matter of fact, when running a CFD solver in double precision, memory bandwidth limitations may become more pronounced.
So you don't need any special advice for a system that can handle FP64 particularly well. All the information is already there.

Abaqus itself is a different story entirely. Their "standard" and "explicit" solvers behave quite differently. Explicit is more akin to a well-behaved CFD solver. Scales nicely even on distributed parallel systems.
Standard is...different. I pieced together that they have something along the lines of a hybrid OpenMP-MPI parallelization going on. Support wasn't able to tell me what exactly it is, or how to control stuff like core binding and mapping. Without any additional arguments, you only get the OpenMP type of parallelization, and it doesn't scale as nicely as explicit. The only upside of this solver -from the perspective of someone who doesn't use it- is that GPU acceleration works exceptionally well.

I urge you to take a step back before you start throwing money at the problem. For example, standard and explicit solvers both have their uses for specific types of problems. E.g. explicit can be better for dynamic problems like crash simulation.
If the solver decides that 20 million iterations are necessary, maybe you are using the wrong solver. Or there might be another solver setting that needs to be addressed:
https://info.simuleon.com/blog/7-tip...qus-run-faster
I can't tell you which one, you would need an expert in Abaqus for that. All I know is that I wold carefully re-evaluate my modelling approach if it turned out that my CFD simulation requires 20 million iterations. Because I know that no amount of money spent on hardware will get me through that.

Cinek_Poland · November 14, 2021, 09:50

So you mean processos have the same performance using single and double precision ?
For example if processor has 76 tflops in single precision so in double precision also processor will has 76 tflops ?

flotus1 · November 15, 2021, 03:33

Not necessarily the same as in identical FLOPS for FP32 and FP64. That's a rather complicated topic, and I can't say that I understand every minute detail. EUs can be multi-purpose, doing either one FP64 calculation, or two FP32 calculations in the same amount of time.
That means worst-case is 2:1 FLOPS for FP32 vs. FP64. But most CPUs these days are very similar in that regard, with none being particularly well-suited for either type of calculation.
That's all really theoretical anyway, since real software won't run anywhere near the limit for floating point calculations of a CPU. Especially for parallel CFD and FEM, other bottlenecks are hit before that, due to low computational intensity, and data access which is never fully predictable by the prefetcher. The cache and memory subsystem are limiting factors, preventing the FPUs from operating at their maximum theoretical throughput.

See e.g. here what it takes to get a CPU to operate near its theoretical FLOPS limit: https://stackoverflow.com/questions/...lops-per-cycle

Cinek_Poland · November 15, 2021, 06:14

There is some kind of simulations which Abaqus will not do and display warning in which would be written that it is necessary to do simulation using double precision only so, I consider what would be more effective, only good CPU +128gb example epyc 7313p or worse CPU like xeon or core i9 12900k and good GPU like quadro GP100 or Tesla P100 this GPU has theoretical double precision performance 4,7 Teraflops, so I need to know what theoretical double precision perfomance of processor is .

flotus1 · November 15, 2021, 07:33

Quote:

GPU has theoretical double precision performance 4,7 Teraflops, so I need to know what theoretical double precision perfomance of processor is .

No, you really don't need to know that. Peak FLOPS for GPU acceleration is even more irrelevant than for CPUs.
The recommendation for Abaqus -not just from me, but also from their support- is using more lower end GPUs instead of a single high-end one. Whether GPU acceleration works in your case, I don't know.
I'm out of things to add to this conversation. Everything I know is already here

gnwt4a · November 16, 2021, 04:17

Quote:

Originally Posted by Cinek_Poland

When I clicked a button to turn on simulation in Abaqus, I watched warning in which software recommended running simulation in double precision FP64 because software need more than 20 milions iteration to do simulation.
So I would like to ask which for example 4-5 cpu should have top performance of double precision FP64 ?

the absolute top in cpu fp64 is attained by xeons with 2 avx512 units and mkl.
==

October 3, 2021, 06:51	Double precision cpu	#1
Cinek_Poland New Member Join Date: Aug 2021 Posts: 22 Rep Power: 5	When I clicked a button to turn on simulation in Abaqus, I watched warning in which software recommended running simulation in double precision FP64 because software need more than 20 milions iteration to do simulation. So I would like to ask which for example 4-5 cpu should have top performance of double precision FP64 ?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
CFD by anderson, chp 10.... supersonic flow over flat plate	varunjain89	Main CFD Forum	18	May 11, 2018 07:31
Star cd es-ice solver error	ernarasimman	STAR-CD	2	September 12, 2014 00:01
Missing math.h header	Travis	FLUENT	4	January 15, 2009 11:48
what's wrong about my code for 2d burgers equation	morxio	Main CFD Forum	3	April 27, 2007 10:38
REAL GAS UDF	brian	FLUENT	6	September 11, 2006 08:23

October 3, 2021, 07:37		#2
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,426 Rep Power: 49	CPUs aren't GPUs That is to say, the CPU market is not segmented into product lines with particularly high or low FP64 performance. Assuming the code is not perfectly vectorized (which most software isn't) the main difference between running FP64 vs FP32 is an increase in memory bandwidth requirement. So you can get a slight increase in performance by making sure you get a CPU with a lot of memory channels. But the performance difference will be rather small compared to the factors of 16, 32 or even 64 we see in GPU floating point performance.

October 3, 2021, 08:46		#3
Cinek_Poland New Member Join Date: Aug 2021 Posts: 22 Rep Power: 5	"That is to say, the CPU market is not segmented into product lines with particularly high or low FP64 performance" Yes but for example low end cpu's will have less FP64 performance than high end . I need to know which cpu's have the highest FP64 performance , for example epyc milan like 7313p or Xeon Silver ? I need to know 4-5 cpu's with highest FP64 performance and they will be tested by shop in which I will buy CPU and motherboard.' If for example simulation in Abaqus will start only if I use FP64 , I will have to use FP64 . I dont want use FP64 but in some case I will have to so I should buy cpu with the best FP64 performance. Do you think epyc 7313p will has better fp6 performance than Xeon Silver 4316 ?

October 3, 2021, 12:47		#4
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,426 Rep Power: 49	Let me reiterate that: "FP64" isn't anything special for a CPU. Every time I or someone else here talks about performance of CPUs in general terms, that also includes floating point calculations with 64-Bit variables. Matter of fact, when running a CFD solver in double precision, memory bandwidth limitations may become more pronounced. So you don't need any special advice for a system that can handle FP64 particularly well. All the information is already there. Abaqus itself is a different story entirely. Their "standard" and "explicit" solvers behave quite differently. Explicit is more akin to a well-behaved CFD solver. Scales nicely even on distributed parallel systems. Standard is...different. I pieced together that they have something along the lines of a hybrid OpenMP-MPI parallelization going on. Support wasn't able to tell me what exactly it is, or how to control stuff like core binding and mapping. Without any additional arguments, you only get the OpenMP type of parallelization, and it doesn't scale as nicely as explicit. The only upside of this solver -from the perspective of someone who doesn't use it- is that GPU acceleration works exceptionally well. I urge you to take a step back before you start throwing money at the problem. For example, standard and explicit solvers both have their uses for specific types of problems. E.g. explicit can be better for dynamic problems like crash simulation. If the solver decides that 20 million iterations are necessary, maybe you are using the wrong solver. Or there might be another solver setting that needs to be addressed: https://info.simuleon.com/blog/7-tip...qus-run-faster I can't tell you which one, you would need an expert in Abaqus for that. All I know is that I wold carefully re-evaluate my modelling approach if it turned out that my CFD simulation requires 20 million iterations. Because I know that no amount of money spent on hardware will get me through that.

November 14, 2021, 09:50		#5
Cinek_Poland New Member Join Date: Aug 2021 Posts: 22 Rep Power: 5	So you mean processos have the same performance using single and double precision ? For example if processor has 76 tflops in single precision so in double precision also processor will has 76 tflops ?

November 15, 2021, 03:33		#6
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,426 Rep Power: 49	Not necessarily the same as in identical FLOPS for FP32 and FP64. That's a rather complicated topic, and I can't say that I understand every minute detail. EUs can be multi-purpose, doing either one FP64 calculation, or two FP32 calculations in the same amount of time. That means worst-case is 2:1 FLOPS for FP32 vs. FP64. But most CPUs these days are very similar in that regard, with none being particularly well-suited for either type of calculation. That's all really theoretical anyway, since real software won't run anywhere near the limit for floating point calculations of a CPU. Especially for parallel CFD and FEM, other bottlenecks are hit before that, due to low computational intensity, and data access which is never fully predictable by the prefetcher. The cache and memory subsystem are limiting factors, preventing the FPUs from operating at their maximum theoretical throughput. See e.g. here what it takes to get a CPU to operate near its theoretical FLOPS limit: https://stackoverflow.com/questions/...lops-per-cycle

November 15, 2021, 06:14		#7
Cinek_Poland New Member Join Date: Aug 2021 Posts: 22 Rep Power: 5	There is some kind of simulations which Abaqus will not do and display warning in which would be written that it is necessary to do simulation using double precision only so, I consider what would be more effective, only good CPU +128gb example epyc 7313p or worse CPU like xeon or core i9 12900k and good GPU like quadro GP100 or Tesla P100 this GPU has theoretical double precision performance 4,7 Teraflops, so I need to know what theoretical double precision perfomance of processor is .