CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Epyc 7302 Scalability

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 1 Post By flotus1
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 23, 2020, 22:59
Default Epyc 7302 Scalability
  #1
Member
 
Join Date: Jun 2017
Posts: 55
Rep Power: 11
srsel6 is on a distinguished road
Hello everyone,

I'm currently sourcing for parts to build a workstation/server for OpenFoam for research. I was wondering about the scalability of the Epyc 7302.

OpenFOAM benchmarks on various hardware

I found this post which was pretty much what I was looking for. However I've been wondering whether if I only had a single Epyc 7302 processor with DDR4-3200 ECC 8 X 4GB, would it be reasonable to expect more or less the same results at 16 cores as in the link above?



I also found from this study that it would be better to have a cluster whereby the ratio of the cores to the memory channels are constant. Page 3 - 4 shows some test results comparing two different processors & configurations.

https://www.simutechgroup.com/images...tions-2018.pdf

If I were to add an additional 7302 to the mix over time, I'm just unsure if there will be a bottleneck in terms of the limited memory channels & bandwidth? Or does the additional processor provide the extra 8 memory channels and bandwidth?
srsel6 is offline   Reply With Quote

Old   February 24, 2020, 00:17
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
I'm currently sourcing for parts to build a workstation/server for OpenFoam for research. I was wondering about the scalability of the Epyc 7302.

OpenFOAM benchmarks on various hardware

I found this post which was pretty much what I was looking for. However I've been wondering whether if I only had a single Epyc 7302 processor with DDR4-3200 ECC 8 X 4GB, would it be reasonable to expect more or less the same results at 16 cores as in the link above?
That's a common misconception. At 16 cores, the benchmark you quoted is already utilizing shared resources from both CPUs. So you can not ecpect to get the same result with only one CPU.
A better estimate will be half the performance of this system when it it utilizing all 32 cores.

Quote:
If I were to add an additional 7302 to the mix over time, I'm just unsure if there will be a bottleneck in terms of the limited memory channels & bandwidth? Or does the additional processor provide the extra 8 memory channels and bandwidth?
You can start with one of these CPUs on a dual-socket board. Add all 8 DIMMs to the CPU you install first, and make sure to plug in all drives and PCIe devices to the slots connected to that CPU.
Adding a second CPU later (with another 8 DIMMs) will get you to the same level of performance in that benchmark you quoted. Minus a few percent penalty from using 4GB DIMMs, which will be single-rank. The other system used dual-rank DIMMs.
srsel6 likes this.
flotus1 is offline   Reply With Quote

Old   February 24, 2020, 00:23
Default
  #3
Member
 
Join Date: Jun 2017
Posts: 55
Rep Power: 11
srsel6 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That's a common misconception. At 16 cores, the benchmark you quoted is already utilizing shared resources from both CPUs. So you can not ecpect to get the same result with only one CPU.
A better estimate will be half the performance of this system when it it utilizing all 32 cores.
I see, that makes sense. So based on the first link, assuming that "performance" is only based on time, for a single 7302 running at 16 cores, the performance is better estimated by:

Performance(t) = 26.89 seconds (Based on 32 core performance) * 2
= 53.78s

Is this correct?
srsel6 is offline   Reply With Quote

Old   February 24, 2020, 00:27
Default
  #4
Member
 
Join Date: Jun 2017
Posts: 55
Rep Power: 11
srsel6 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
You can start with one of these CPUs on a dual-socket board. Add all 8 DIMMs to the CPU you install first, and make sure to plug in all drives and PCIe devices to the slots connected to that CPU.
Adding a second CPU later (with another 8 DIMMs) will get you to the same level of performance in that benchmark you quoted.
Okay, I understand. So it would definitely be much more cost effective to get an additional CPU instead of having to build another workstation/server from scratch and cluster them. This is in terms of not having to buy an additional power supply,case, etc since adding an additional CPU would give the same result as the clustering option?
srsel6 is offline   Reply With Quote

Old   February 24, 2020, 00:31
Default
  #5
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Yes, having both CPUs in one shared memory system only has advantages, I can't think of any downsides.
srsel6 likes this.
flotus1 is offline   Reply With Quote

Old   February 24, 2020, 02:20
Default
  #6
Member
 
Join Date: Jun 2017
Posts: 55
Rep Power: 11
srsel6 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Yes, having both CPUs in one shared memory system only has advantages, I can't think of any downsides.

Okay, thank you!
srsel6 is offline   Reply With Quote

Old   February 24, 2020, 08:07
Default
  #7
Member
 
Join Date: Mar 2009
Posts: 36
Rep Power: 17
Amiga500 is on a distinguished road
So AMD have launched the EYPC 7532 - 32 cores, but with the full 256MB of L3 cache.

They are apparently touting performance in CFX.

I'm dubious that they have sized the problem to the cache.

Would extra L3 really show significant speedup for a real problem that sits mostly in system memory?
Amiga500 is offline   Reply With Quote

Old   February 24, 2020, 08:16
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Depending on where your threshold lies, the advantage will not be significant. But it is there. In addition to that, this CPU has one of the highest TDP among the regular 32-core Epyc Rome CPUs. A little higher frequencies would have been nice to have though.
It will most likely be the most expensive 32-core CPU in the linup, but when you pay regular prices for CFX licenses, the CPU pays for itself within a few months tops.
flotus1 is offline   Reply With Quote

Old   February 24, 2020, 09:01
Default
  #9
Member
 
Join Date: Mar 2009
Posts: 36
Rep Power: 17
Amiga500 is on a distinguished road
Would a 32c CPU be worthwhile with 8ch memory?
24cores = 3 x 8

Certainly a trade off to consider there - extra 8 threads of license & added CPU cost.

I suppose, leaving licenses fixed at 32 threads - breaking out to a second socket and balancing 16 on each would be the better solution. Might even be cheaper too.
Amiga500 is offline   Reply With Quote

Old   February 24, 2020, 12:34
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You definitely have a point. Two 16-core CPUs would be much faster than a single 32-core CPU.
Last time I checked, Ansys licensing scheme increases core counts by a factor of 4. So you can run a single simulation with 8 threads with a basic license, adding 1 HPC pack gets you to 32 threads, adding another yields 128 threads. So there might not even be a market for dual-socket Ansys workstations with 2x32 cores.
But then again, we are dealing with marketing here...
flotus1 is offline   Reply With Quote

Old   February 25, 2020, 00:32
Default
  #11
Member
 
Join Date: Jun 2017
Posts: 55
Rep Power: 11
srsel6 is on a distinguished road
Why would two 16-core CPUs be much faster than a single 32-core CPU though?
srsel6 is offline   Reply With Quote

Old   February 25, 2020, 04:06
Default
  #12
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Shared resource contention.
CPU cores have their own resources, which they can use exclusively. Like their FP units, L2 cache etc.
But they also have to compete for shared resources, that all CPU cores have access to. For example L3 cache, TDP and thermal budget, and memory controllers.
With 2 CPUs, you double the amount of shared resources for the most part. L3 cache is an outlier here, because we are comparing two 16-core CPUs with 128MB L3 against one 32-core CPU with 256MB L3. The main advantage is having twice the amount of memory controllers/channels. So when all cores are hard at work, each one gets a larger share of these shared resources, leading to higher performance overall.
flotus1 is offline   Reply With Quote

Reply

Tags
epyc rome


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Xeon Gold Cascade Lake vs Epyc Rome - CFX & Fluent - Benchmarks (Windows Server 2019) SLC Hardware 18 June 13, 2020 17:48
Epyc 7551 vs 6850K; Ansys Mechanical Bench Duke711 Hardware 24 March 26, 2020 11:16
New 128 mini cluster - Cascade Lake SP or EPYC Rome? SLC Hardware 8 December 16, 2019 17:25
Epyc 7551 vs 6850K; Fluent Bench Duke711 Hardware 4 April 7, 2019 23:05
AMD Epyc CFD benchmarks with Ansys Fluent flotus1 Hardware 55 November 12, 2018 06:33


All times are GMT -4. The time now is 21:06.