|
[Sponsors] |
July 11, 2017, 16:59 |
Epyc vs Xeon Skylake SP
|
#1 |
New Member
Join Date: May 2013
Posts: 26
Rep Power: 13 |
Epyc vs Xeon Skylake SP
- which one to choose? First (non CFD) benchmarks are available: full article: http://www.anandtech.com/show/11544/...-of-the-decade deeplink to begin of benchmarks: http://www.anandtech.com/show/11544/...-the-decade/11 What do you think?? |
|
July 11, 2017, 17:58 |
|
#2 | |
New Member
Join Date: May 2013
Posts: 26
Rep Power: 13 |
just copied this
Quote:
|
||
July 13, 2017, 09:14 |
|
#3 | |
New Member
Join Date: May 2013
Posts: 26
Rep Power: 13 |
Skylake SP's AVX512 Units seems to have some advantages also in fluent (v18.1).
- or is it just the 6Ch Memory?? Quote:
Xeon Gold 6148: 20c@2,4Ghz 2 AVX512 Units vs E5-2967v4: 18c@2,3Ghz http://www.ansys-blog.com/boost-ansy...-technologies/ |
||
July 13, 2017, 11:35 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
I hardly think these performance gains are due to AVX. Last time I checked AVX2 instructions performance for one of our LBM codes, performance gains were in the low single digits. Which is to be expected in a memory bound workload.
6xDDR4-2666 vs 4xDDR4-2400 is 67% more theoretical memory bandwidth. Thats all there is to these performance gains. But it would not be clever to highlight this in a marketing publication because AMD Epyc is even better in exactly this metric. So they might try to pin the performance gains to AVX instead. |
|
July 13, 2017, 14:19 |
|
#5 |
New Member
Join Date: May 2013
Posts: 26
Rep Power: 13 |
just the same what I was thinking when reading this....
|
|
July 14, 2017, 04:24 |
|
#6 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Sorry, if I'm highjacking the topic, but I've been loooking into getting a new server/cluster and I have two quick questions, you seem to able to answer.
1) If I have a multi-socket-server, do the CPUs share the memory bandwidth? 2) Is there some rule to determine if the interconnect or the memory bandwidth will be the bottle neck? Can two servers connected by infiniband be faster than a single server, given the same number of cores? |
|
July 14, 2017, 08:14 |
|
#7 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Some more research and I answered my own question: each socket has it's own memory access.
Which means I have a to change the second question: would you expect a two-socket-board with two 16-core cpus to be faster, than a single 32-core cpu? The communication betweens the cores of the CPUs would be slowed down due to the interconnect, but memory bandwidth is doubled. |
|
July 14, 2017, 11:13 |
|
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Yes, a dual-socket setup with two 16-core CPUs is much faster for CFD workloads than a single 32-core CPU. The reasons are that latencies are still fairly low on a dual-socket board, so communication overhead is usually negligible. At least the effect is much less important than having twice the memory bandwidth.
|
|
July 14, 2017, 16:04 |
|
#9 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Thank you for the quick response.
That brings me back to my original question: Is there a rule to determine the bottleneck? For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs, but four 8-core-CPUs installed in two two-socket-system connected by infiband won't? |
|
July 14, 2017, 16:37 |
|
#10 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Quote:
Quote:
If the code you use supports distributed memory parallelization, you don't need to pay fo the expensive quad-socket hardware. Let me share a recent experience I had with memory "bottlenecks". I refurbished an old workstation with 2x Xeon E5-2643. These are 4-core CPUs with 4 memory channels. Sounds like enough memory bandwidth per core, right?. Replacing the DDR3-1333 DIMMs with otherwise identical DDR3-1600 still increased performance in Ansys Fluent by 12% when solving on 8 cores. The point is, you really can not have too much memory performance for CFD. |
|||
July 14, 2017, 16:57 |
|
#11 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Unfortunately the extra cost of buying 2 servers, 4 cpus and infiband cards, instead of a single server with one 32-core-CPU is easily quantifiable, while the actual performance benefit is depending on the case.
At least I know now, that I was wrong in assuming that the interconnect would be the main issue. For anyone who hasn't seen it yet: computerbase.de (german) compiled a nice comprehensive list of the current gen server cpus. |
|
July 15, 2017, 07:14 |
|
#12 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
You should at least consider the solution in between. A dual-socket board with 2x16 cores. A single 32-core CPU is just pointless for any CFD application.
|
|
July 17, 2017, 01:44 |
|
#13 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Last friday I asked our hardware vendor to give us quotes for 2x16 systems based on AMD and Intel to get an idea of the difference in price. I assume you would argue that the greater memory bandwidth of AMD will outweigh the higher clockrates and (potential) benefits of AVX-512, correct?
|
|
July 17, 2017, 03:32 |
|
#14 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
That might be a possible outcome. But with no real-world benchmarks or available hardware in sight I am hesitant to draw this conclusion.
There are some details in the architecture of Epyc that have its drawbacks, like high latency for far cache access and lower bandwidth for memory access outside of the CCX. But then again, the latest iteration of Intel processors does not seem to be without flaws either. I just can't say with certainty without any CFD benchmarks. |
|
July 17, 2017, 03:59 |
|
#15 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Our computationally most intensive cases would be multi-phase simulations using sliding meshes. Since most, if not all, benchmark case come down to single phase flow with an in- and an outlet condition, I would take benchmarks results with a grain of salt anyway.
Currently we are running our simulations on two virtual machines, so any bare-metal system would likely be a vast improvement. |
|
July 20, 2017, 02:19 |
|
#16 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
The first vendor got back to me. For now they only sell Intel, so the lack of benchmarks might turn out to not be the main issue.
They are quoting me various configurations based on Xeon Gold 6134, since it has the best per core performance. One benefit of the Skylake-SP is that the even the "Gold" CPU can be used in four-socket-systems, meaning I need only one machine (although the price compared to 2 systems with infiniband might stay the same). One drawback of 6 memory channels in combination with a 8 core CPU is that I end up with less than the 8 GB per core that I wanted or with significantly more. I am also looking at the Xeon Gold 6136, which 10% more expensive but offers 50% more cores. I hope that the difference in base frequency is offset by the turbo-boost when only using 8 out of 12 cores. Edit: If the information on wikichip is correct, the difference between 6134 and 6136 when using 8 cores is only 100 Mhz. Last edited by Kaskade; July 20, 2017 at 06:35. |
|
August 9, 2017, 06:06 |
|
#17 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
Say I would be comparing two one-socket systems. One 8-core, one 16-core. Both run at the same clock rate, both have the same memory configuration. Even under the assumption that the memory bandwidth is the bottle neck, the 16-core system would still be almost twice as fast, wouldn't it?
|
|
August 9, 2017, 06:28 |
|
#18 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Nope, that's the whole point of the term "bottleneck".
The actual speedup you get depends on how "tight" the bottleneck is. Somewhere between a factor of 1 and 2. |
|
August 10, 2017, 13:46 |
|
#19 |
New Member
Chad
Join Date: Jan 2017
Posts: 8
Rep Power: 9 |
Just out of curiosity, when will both of these CPU's be available to the "consumer" friendly market such as NewEgg/Amazon?
I see AMD links to SuperMicro and a few other distributors for the Epycs (the new Xeons have similar availability) but I was wondering if someone had a bit more insight on purchasing them. Thanks. |
|
August 10, 2017, 13:53 |
|
#20 |
Senior Member
Onno
Join Date: Jan 2012
Location: Germany
Posts: 120
Rep Power: 15 |
I think that question would be better directed at a retailer. (As a consumer I would rather just get a very nice used car instead of a Xeon Platinum.)
Right now even a lot of server vendors still list the old models on their websites. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
AMD EPYC or Intel Skylake-EP | xuegy | Hardware | 18 | August 18, 2017 14:11 |
AMD EPYC or Intel Skylake-EP | xuegy | OpenFOAM Running, Solving & CFD | 0 | June 27, 2017 09:00 |