Epyc vs Xeon Skylake SP

hpvd · July 11, 2017, 16:59

Epyc vs Xeon Skylake SP
- which one to choose?

First (non CFD) benchmarks are available:

full article:
http://www.anandtech.com/show/11544/...-of-the-decade

deeplink to begin of benchmarks:
http://www.anandtech.com/show/11544/...-the-decade/11

What do you think??

hpvd · July 11, 2017, 17:58

just copied this

Quote:

Originally Posted by kyle

Found a Euler3d benchmark for Skylake SP:

https://hothardware.com/reviews/inte...-review?page=6

Still nothing for EPYC that I see

hpvd · July 13, 2017, 09:14

Skylake SP's AVX512 Units seems to have some advantages also in fluent (v18.1).
- or is it just the 6Ch Memory??

Quote:

In Fluent, we’ve added support for Intel® Advanced Vector Extensions 2 (AVX2) optimized binary, so that we can take better advantage of the advanced vector processing capabilities of Intel Xeon processors. Our benchmark results also show the Intel Xeon Gold 6148 processor boosts performance for ANSYS Fluent 18.1 by up to 41% versus a previous-generation processor — and provides up to 34% higher performance per core.

In detail:
Xeon Gold 6148: 20c@2,4Ghz 2 AVX512 Units
vs
E5-2967v4: 18c@2,3Ghz

http://www.ansys-blog.com/boost-ansy...-technologies/

flotus1 · July 13, 2017, 11:35

I hardly think these performance gains are due to AVX. Last time I checked AVX2 instructions performance for one of our LBM codes, performance gains were in the low single digits. Which is to be expected in a memory bound workload.
6xDDR4-2666 vs 4xDDR4-2400 is 67% more theoretical memory bandwidth. Thats all there is to these performance gains. But it would not be clever to highlight this in a marketing publication because AMD Epyc is even better in exactly this metric. So they might try to pin the performance gains to AVX instead.

hpvd · July 13, 2017, 14:19

just the same what I was thinking when reading this....

Kaskade · July 14, 2017, 04:24

Sorry, if I'm highjacking the topic, but I've been loooking into getting a new server/cluster and I have two quick questions, you seem to able to answer.

1) If I have a multi-socket-server, do the CPUs share the memory bandwidth?

2) Is there some rule to determine if the interconnect or the memory bandwidth will be the bottle neck? Can two servers connected by infiniband be faster than a single server, given the same number of cores?

Kaskade · July 14, 2017, 08:14

Some more research and I answered my own question: each socket has it's own memory access.

Which means I have a to change the second question: would you expect a two-socket-board with two 16-core cpus to be faster, than a single 32-core cpu? The communication betweens the cores of the CPUs would be slowed down due to the interconnect, but memory bandwidth is doubled.

flotus1 · July 14, 2017, 11:13

Yes, a dual-socket setup with two 16-core CPUs is much faster for CFD workloads than a single 32-core CPU. The reasons are that latencies are still fairly low on a dual-socket board, so communication overhead is usually negligible. At least the effect is much less important than having twice the memory bandwidth.

Kaskade · July 14, 2017, 16:04

Thank you for the quick response.

That brings me back to my original question: Is there a rule to determine the bottleneck? For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs, but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?

flotus1 · July 14, 2017, 16:37

Quote:

For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs

Absolutely

Quote:

but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?

Infiniband interconnects do not really slow things down for only two nodes.
If the code you use supports distributed memory parallelization, you don't need to pay fo the expensive quad-socket hardware.

Let me share a recent experience I had with memory "bottlenecks". I refurbished an old workstation with 2x Xeon E5-2643. These are 4-core CPUs with 4 memory channels. Sounds like enough memory bandwidth per core, right?. Replacing the DDR3-1333 DIMMs with otherwise identical DDR3-1600 still increased performance in Ansys Fluent by 12% when solving on 8 cores. The point is, you really can not have too much memory performance for CFD.

Kaskade · July 14, 2017, 16:57

Unfortunately the extra cost of buying 2 servers, 4 cpus and infiband cards, instead of a single server with one 32-core-CPU is easily quantifiable, while the actual performance benefit is depending on the case.

At least I know now, that I was wrong in assuming that the interconnect would be the main issue.

For anyone who hasn't seen it yet: computerbase.de (german) compiled a nice comprehensive list of the current gen server cpus.

flotus1 · July 15, 2017, 07:14

You should at least consider the solution in between. A dual-socket board with 2x16 cores. A single 32-core CPU is just pointless for any CFD application.

Kaskade · July 17, 2017, 01:44

Last friday I asked our hardware vendor to give us quotes for 2x16 systems based on AMD and Intel to get an idea of the difference in price. I assume you would argue that the greater memory bandwidth of AMD will outweigh the higher clockrates and (potential) benefits of AVX-512, correct?

flotus1 · July 17, 2017, 03:32

That might be a possible outcome. But with no real-world benchmarks or available hardware in sight I am hesitant to draw this conclusion.
There are some details in the architecture of Epyc that have its drawbacks, like high latency for far cache access and lower bandwidth for memory access outside of the CCX. But then again, the latest iteration of Intel processors does not seem to be without flaws either. I just can't say with certainty without any CFD benchmarks.

Kaskade · July 17, 2017, 03:59

Our computationally most intensive cases would be multi-phase simulations using sliding meshes. Since most, if not all, benchmark case come down to single phase flow with an in- and an outlet condition, I would take benchmarks results with a grain of salt anyway.

Currently we are running our simulations on two virtual machines, so any bare-metal system would likely be a vast improvement.

Kaskade · July 20, 2017, 02:19

The first vendor got back to me. For now they only sell Intel, so the lack of benchmarks might turn out to not be the main issue.

They are quoting me various configurations based on Xeon Gold 6134, since it has the best per core performance. One benefit of the Skylake-SP is that the even the "Gold" CPU can be used in four-socket-systems, meaning I need only one machine (although the price compared to 2 systems with infiniband might stay the same). One drawback of 6 memory channels in combination with a 8 core CPU is that I end up with less than the 8 GB per core that I wanted or with significantly more.

I am also looking at the Xeon Gold 6136, which 10% more expensive but offers 50% more cores. I hope that the difference in base frequency is offset by the turbo-boost when only using 8 out of 12 cores.

Edit: If the information on wikichip is correct, the difference between 6134 and 6136 when using 8 cores is only 100 Mhz.

Kaskade · August 9, 2017, 06:06

Say I would be comparing two one-socket systems. One 8-core, one 16-core. Both run at the same clock rate, both have the same memory configuration. Even under the assumption that the memory bandwidth is the bottle neck, the 16-core system would still be almost twice as fast, wouldn't it?

flotus1 · August 9, 2017, 06:28

Nope, that's the whole point of the term "bottleneck".
The actual speedup you get depends on how "tight" the bottleneck is. Somewhere between a factor of 1 and 2.

chad · August 10, 2017, 13:46

Just out of curiosity, when will both of these CPU's be available to the "consumer" friendly market such as NewEgg/Amazon?

I see AMD links to SuperMicro and a few other distributors for the Epycs (the new Xeons have similar availability) but I was wondering if someone had a bit more insight on purchasing them.

Thanks.

Kaskade · August 10, 2017, 13:53

I think that question would be better directed at a retailer. (As a consumer I would rather just get a very nice used car instead of a Xeon Platinum.)

Right now even a lot of server vendors still list the old models on their websites.

July 11, 2017, 16:59	Epyc vs Xeon Skylake SP	#1
hpvd New Member Join Date: May 2013 Posts: 26 Rep Power: 13	Epyc vs Xeon Skylake SP - which one to choose? First (non CFD) benchmarks are available: full article: http://www.anandtech.com/show/11544/...-of-the-decade deeplink to begin of benchmarks: http://www.anandtech.com/show/11544/...-the-decade/11 What do you think??

July 20, 2017, 02:19		#16
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	The first vendor got back to me. For now they only sell Intel, so the lack of benchmarks might turn out to not be the main issue. They are quoting me various configurations based on Xeon Gold 6134, since it has the best per core performance. One benefit of the Skylake-SP is that the even the "Gold" CPU can be used in four-socket-systems, meaning I need only one machine (although the price compared to 2 systems with infiniband might stay the same). One drawback of 6 memory channels in combination with a 8 core CPU is that I end up with less than the 8 GB per core that I wanted or with significantly more. I am also looking at the Xeon Gold 6136, which 10% more expensive but offers 50% more cores. I hope that the difference in base frequency is offset by the turbo-boost when only using 8 out of 12 cores. Edit: If the information on wikichip is correct, the difference between 6134 and 6136 when using 8 cores is only 100 Mhz. Last edited by Kaskade; July 20, 2017 at 06:35.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
AMD EPYC or Intel Skylake-EP	xuegy	Hardware	18	August 18, 2017 14:11
AMD EPYC or Intel Skylake-EP	xuegy	OpenFOAM Running, Solving & CFD	0	June 27, 2017 09:00

July 13, 2017, 11:35		#4
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	I hardly think these performance gains are due to AVX. Last time I checked AVX2 instructions performance for one of our LBM codes, performance gains were in the low single digits. Which is to be expected in a memory bound workload. 6xDDR4-2666 vs 4xDDR4-2400 is 67% more theoretical memory bandwidth. Thats all there is to these performance gains. But it would not be clever to highlight this in a marketing publication because AMD Epyc is even better in exactly this metric. So they might try to pin the performance gains to AVX instead.

July 13, 2017, 14:19		#5
hpvd New Member Join Date: May 2013 Posts: 26 Rep Power: 13	just the same what I was thinking when reading this....

July 14, 2017, 04:24		#6
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Sorry, if I'm highjacking the topic, but I've been loooking into getting a new server/cluster and I have two quick questions, you seem to able to answer. 1) If I have a multi-socket-server, do the CPUs share the memory bandwidth? 2) Is there some rule to determine if the interconnect or the memory bandwidth will be the bottle neck? Can two servers connected by infiniband be faster than a single server, given the same number of cores?

July 14, 2017, 08:14		#7
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Some more research and I answered my own question: each socket has it's own memory access. Which means I have a to change the second question: would you expect a two-socket-board with two 16-core cpus to be faster, than a single 32-core cpu? The communication betweens the cores of the CPUs would be slowed down due to the interconnect, but memory bandwidth is doubled.

July 14, 2017, 11:13		#8
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	Yes, a dual-socket setup with two 16-core CPUs is much faster for CFD workloads than a single 32-core CPU. The reasons are that latencies are still fairly low on a dual-socket board, so communication overhead is usually negligible. At least the effect is much less important than having twice the memory bandwidth.

July 14, 2017, 16:04		#9
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Thank you for the quick response. That brings me back to my original question: Is there a rule to determine the bottleneck? For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs, but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?

July 14, 2017, 16:57		#11
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Unfortunately the extra cost of buying 2 servers, 4 cpus and infiband cards, instead of a single server with one 32-core-CPU is easily quantifiable, while the actual performance benefit is depending on the case. At least I know now, that I was wrong in assuming that the interconnect would be the main issue. For anyone who hasn't seen it yet: computerbase.de (german) compiled a nice comprehensive list of the current gen server cpus.

July 15, 2017, 07:14		#12
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	You should at least consider the solution in between. A dual-socket board with 2x16 cores. A single 32-core CPU is just pointless for any CFD application.

July 17, 2017, 01:44		#13
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Last friday I asked our hardware vendor to give us quotes for 2x16 systems based on AMD and Intel to get an idea of the difference in price. I assume you would argue that the greater memory bandwidth of AMD will outweigh the higher clockrates and (potential) benefits of AVX-512, correct?

July 17, 2017, 03:32		#14
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	That might be a possible outcome. But with no real-world benchmarks or available hardware in sight I am hesitant to draw this conclusion. There are some details in the architecture of Epyc that have its drawbacks, like high latency for far cache access and lower bandwidth for memory access outside of the CCX. But then again, the latest iteration of Intel processors does not seem to be without flaws either. I just can't say with certainty without any CFD benchmarks.

July 17, 2017, 03:59		#15
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Our computationally most intensive cases would be multi-phase simulations using sliding meshes. Since most, if not all, benchmark case come down to single phase flow with an in- and an outlet condition, I would take benchmarks results with a grain of salt anyway. Currently we are running our simulations on two virtual machines, so any bare-metal system would likely be a vast improvement.

August 9, 2017, 06:06		#17
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	Say I would be comparing two one-socket systems. One 8-core, one 16-core. Both run at the same clock rate, both have the same memory configuration. Even under the assumption that the memory bandwidth is the bottle neck, the 16-core system would still be almost twice as fast, wouldn't it?

August 9, 2017, 06:28		#18
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	Nope, that's the whole point of the term "bottleneck". The actual speedup you get depends on how "tight" the bottleneck is. Somewhere between a factor of 1 and 2.

August 10, 2017, 13:46		#19
chad New Member Chad Join Date: Jan 2017 Posts: 8 Rep Power: 9	Just out of curiosity, when will both of these CPU's be available to the "consumer" friendly market such as NewEgg/Amazon? I see AMD links to SuperMicro and a few other distributors for the Epycs (the new Xeons have similar availability) but I was wondering if someone had a bit more insight on purchasing them. Thanks.

August 10, 2017, 13:53		#20
Kaskade Senior Member Onno Join Date: Jan 2012 Location: Germany Posts: 120 Rep Power: 15	I think that question would be better directed at a retailer. (As a consumer I would rather just get a very nice used car instead of a Xeon Platinum.) Right now even a lot of server vendors still list the old models on their websites.