CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Socket 2011-3 processors - an overwiew

Register Blogs Community New Posts Updated Threads Search

Like Tree14Likes
  • 14 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 12, 2017, 12:11
Default Socket 2011-3 processors - an overwiew
  #1
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Lineup
It has come to my attention that there is no source to get a quick overview on the current lineup of Intel processors for socket 2011-3, specifically the Broadwell-E and Broadwell-EP CPUs. And to be honest I was not aware of every single option that might be interesting from a CFD point of view. So here is my attempt to put as much relevant information as possible into a single table.
Some CPUs are missing because they are not freely available (e.g. exclusive for some OEMs) or rather irrelevant for CFD. The "all core turbo" frequency in the last column is -to my best knowledge- the frequency for execution of AXV code.





An attempt to rate the performance
Since all the processors above have the same architecture, it is possible to rate their relative performance based on the specifications. A very simple model to do this is Amdahl's Law. It makes the assumption that when N cores instead of one are used, the additional cores are not N-times faster. Instead, you see diminishing returns when using more cores. This can be caused for example by portions of the code that were not parallelized or by running out of memory bandwidth to keep the additional cores fed with data. The model is not perfect, but adding more complexity will not help making things clearer.
There are two more factors we have to take into account: Memory speed and cache per core since these factors are not equal along the lineup. So we make the assumption that less memory speed translates into less CFD computing power with an efficiency of 75%. For example: 11% less memory speed result in a penalty of 8.3%. More cache per core is given 20% efficiency, so 50% more cache per core give a bonus of 10%.
The result is a number that can be used to estimate the relative performance of these CPUs.
Note that for the dual-socket CPUs, twice the number of cores was used for Amdahl's law.
While we are at it we can try to rate the price/performance ratio aswell. We simply take the performance number and divide it by the cost for one or two CPUs plus the "typical" cost for the rest of the workstation. I used 2000$ for a dual-socket workstation and 1300$ for a single-socket system.
Without further ado, here are the results for five different scaling efficiencies. All numbers are normalized to the lowest value:



Now all you have to know is how does your software scale. An efficiency of 99% is highly unlikely for a CFD code, at least when using a very large number of cores. This is what is usually referred to as "memory bandwidth bottleneck". Which is why the CPUs with very high core counts are usually not beneficial for CFD or at least not the best use of your money. You should probably focus on the range 95%-98%.
To back up this claim we can take a look at this whitepaper from HP: http://www.peraglobal.com/upload/con...2555_13361.pdf
As a mean value for several Fluent benchmarks, they report a speedup of ~16 for a single node with 32 cores. This translates to a scaling efficiency of ~96.8%.
The CFX benchmarks show a speedup of ~18 on a single node with 24 cores which results in a scaling efficiency of 97.8% for the model of Amdahl's law.



Questions and answers:
Q: So which one is the best processor for CFD
A: It depends. The answer will usually be one of the processors with medium core counts and a high frequency. But you also have to factor in licensing issues. If all you have is parallel licenses for 8 cores, buying a 24-core workstation is usually not the best use of your budget.
Q: Why are some processors missing?
A: I focused on processors you can buy off the shelf. The quad-socket processors are also missing because you will have a hard time finding suitable motherboards, cases and power supplies as a normal customer. And at least from my limited experience with quad-socket, scaling is not always as good as you would expect.
Q: Why are the I7 processors even in this list? I heard that they are less reliable than Xeon processors.
A: This is not true. They are the same processors with a few features deactivated. For example, they do not officially support ECC memory. But on the other hand they are unlocked and can use faster memory which makes them an interesting alternative, especially for CFD.
Q: Isn't this about a year too late? These processors have been around for a while now.
A: Maybe, sorry about that. But they will remain your only option for at least a few more months.

Disclaimer: I take no responsibility for errors in the tables above. Please let me know if you find any.
Additional sources:
http://ark.intel.com/products/family...Family#@Server
http://ark.intel.com/products/family...ssors#@Desktop
https://www.microway.com/knowledge-c...ep-processors/
http://hexus.net/tech/news/cpu/91676...masked/?page=3
Attached Images
File Type: png lineup.png (79.3 KB, 160 views)
File Type: jpg performance.jpg (198.6 KB, 136 views)
Blanco, MaryBau, cth_yao and 11 others like this.

Last edited by flotus1; March 23, 2017 at 12:52.
flotus1 is offline   Reply With Quote

Old   April 14, 2017, 12:47
Default
  #2
Senior Member
 
Blanco
Join Date: Mar 2009
Location: Torino, Italy
Posts: 193
Rep Power: 17
Blanco is on a distinguished road
Hi,

thanks for the detailed analysis, it is really useful indeed!

I just have a question: I used to ponder on the fact that Xeon processors are mounted on motherboards having 4 RAM channel per socket, and I think this plays and important role in determining the performance gain for particular CPU models.

Let's consider a 4-cores cpu and a CFD-3D simulation as a reference, this is the "best" setup I can think since each core has its own memory channel. If I try to perform the same simulation with a 8-cores cpu than ideally I would expect to obtain the results in half of the time (100% efficiency), then as you mentioned real efficiency will be lower, this also because this time each core has to share its memory channel with another core (2 cores/memory channel). What happens if I try to use a 6-cores cpu? I suppose things get worse because now I have an unbalanced system: 2 of the 6 cores have their own memory channel, while each couple of the rest of the cores has to use a single memory channel. What do you think about scaling efficiency in this second case? I expect that the 2 "lucky" cores will wait for the other 4 if memory bandwidth is the limit, therefore the scaling efficiency will be much lower than expected.

I think this would affect a lot the scaling efficiency of each cpu having a number of cores which is not simply divisible by 4, what do you think? Therefore we should somehow consider that scaling efficiency of these CPUs is lower than what we achieve theoretically.

Regards
Blanco is offline   Reply With Quote

Old   April 14, 2017, 16:41
Default
  #3
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I think your concern is based on a misconception how memory access works.
Quote:
2 of the 6 cores have their own memory channel, while each couple of the rest of the cores has to use a single memory channel
The CPU cores do not have their "own" memory channel, i.e. they do not handle memory access themselves. Instead, the CPU cores communicate with the integrated memory controller that in turn handles memory access via a queuing system. This additional layer ensures that memory bandwidth is distributed evenly among the cores. So having a number of cores that is not an even multiple of the number of memory channels is not really an issue.

If you want to learn more about the whole topic, this is a good place to start: http://frankdenneman.nl/2015/02/18/m...y-blog-series/
flotus1 is offline   Reply With Quote

Old   April 14, 2017, 17:03
Default
  #4
Senior Member
 
Blanco
Join Date: Mar 2009
Location: Torino, Italy
Posts: 193
Rep Power: 17
Blanco is on a distinguished road
Thanks a lot for the explanation and the reference, that solves my doubts! Regards

Sent from my HUAWEI TAG-L01 using CFD Online Forum mobile app
Blanco is offline   Reply With Quote

Old   April 19, 2017, 17:52
Default
  #5
Member
 
Join Date: Jul 2010
Posts: 52
Rep Power: 16
MaryBau is on a distinguished road
I am building a workstation to run CFD simulations with OpenFoam and ~10 million cells. Because of budget, I am limited to 8 or 10 cores only. I was hesitating between the i7-6900k, E5-1660v4, E5-2630v4 or E5-2640v4.

If I understand flotus1 analysis correctly, it seems that having 10 slower cores (E5-2x) is better in terms of performance and performance/$ than having 8 faster cores (i7 or the E5-1x). Or is the comparison 20 slower cores vs. 8 faster cores?

Does it make sense to have one E5-2x on a single processor computer?

And a bit off topic, but will 64 GB (4x16GB) of RAM be enough for this type of simulations? Will it also be enough for pre/post-processing and visualization of simulations with ~50 million cells that I will run in another server?

Thanks,

Mary
MaryBau is offline   Reply With Quote

Old   April 20, 2017, 03:35
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
If I understand flotus1 analysis correctly [...] Or is the comparison 20 slower cores vs. 8 faster cores?
This. Consequently, a single-socket system is usually the better choice for a small budget.
Quote:
Does it make sense to have one E5-2x on a single processor computer?
Unless for some reason you need a large amount of cores for a workflow that is not memory bound: no. You usually get more performance for your money with the single-socket CPUs. This is why I did not attempt to compare the dual-socket CPUs in single-CPU setups.
Quote:
And a bit off topic, but will 64 GB (4x16GB) of RAM be enough for this type of simulations?
That should be more than sufficient.
Quote:
Will it also be enough for pre/post-processing and visualization of simulations with ~50 million cells that I will run in another server?
It might be enough. Post-processing with ParaView should be no problem. I don't really know about the pre-processors for OpenFOAM. However, RAM is expensive these days. See if it works with 64GB. Otherwise you can still upgrade with 4 additional DIMMs.
flotus1 is offline   Reply With Quote

Old   May 9, 2017, 06:01
Default
  #7
New Member
 
Join Date: May 2016
Posts: 3
Rep Power: 10
greenday is on a distinguished road
Hi

I'm struggling between E5-2620 v3(6c 2.4GHz) and E5-2620 v4(8c 2.1GHz).
They are almost the same price.
Would you give me some adverse from a CFD view?

Thanks
greenday is offline   Reply With Quote

Old   May 9, 2017, 06:17
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Pros and cons for the E5-2620 v4 compared to its predecessor
+more cores
+supports faster memory
+newer architecture (->higher IPC compensates for slightly lower clock speed)
+more L3 cache
+does not cost more
-can't think of any...
flotus1 is offline   Reply With Quote

Old   June 17, 2017, 08:48
Default
  #9
New Member
 
Ramón
Join Date: Mar 2016
Location: The Netherlands
Posts: 11
Rep Power: 10
F1aerofan is on a distinguished road
First of all, great overview flotus1!

I do have some questions though:
1. Within your explanation or the webpages talking about Amdahl's Law, there is no talk about the clockspeed of the cores. Why is this?
2. As you say, these processors are easier to compare because of their similar architecture. But how would you go about quantifying performance between a E5-2637 v3 and a E5-2667 v4? And how would this performance be related to a FLUENT solver rating?
F1aerofan is offline   Reply With Quote

Old   June 17, 2017, 09:35
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by F1aerofan View Post
1. Within your explanation or the webpages talking about Amdahl's Law, there is no talk about the clockspeed of the cores. Why is this?
Because it is implied. Cores with higher clock speed are faster. The scaling is usually not 100%, especially when bandwidth limits occur. But the performance estimates I gave account for differences in clock speed.

Quote:
Originally Posted by F1aerofan View Post
2. As you say, these processors are easier to compare because of their similar architecture. But how would you go about quantifying performance between a E5-2637 v3 and a E5-2667 v4? And how would this performance be related to a FLUENT solver rating?
Did you really have to pick two CPUs with different amounts of cores ?
Well I would be tentative to estimate the differences quantitatively, but here is what I would consider: The IPC improved somewhere between 5% and 10% between the two generations. V3 CPUs only supported DDR4-2133, so another 10% penalty for the v3 CPUs, at least for CFD workloads. Then I would look up the turbo clock speeds for both CPUs (E5-2637v3: 3.6GHz, E5-2667v4: 3.5 GHz), so a slight bonus for Haswell-EP. And in the end one could feed these numbers along with the core counts into some more or less complicated model, for example Amdahl's law.
If you really need quantitative numbers that are accurate, better look at benchmarks.

Why do you ask? Are you considering buying used hardware for your new cluster?
flotus1 is offline   Reply With Quote

Old   June 17, 2017, 10:47
Default
  #11
New Member
 
Ramón
Join Date: Mar 2016
Location: The Netherlands
Posts: 11
Rep Power: 10
F1aerofan is on a distinguished road
Oke, thanks I will give that a go then.

Quote:
Originally Posted by flotus1 View Post
If you really need quantitative numbers that are accurate, better look at benchmarks.
Trouble with the Ansys FLUENT benchmarks is that most recent benchmarks are with different OEMS, at different memory speeds and only between 2.1 and 2.6 GHz processors with 32 or 36 cores/node.

Quote:
Originally Posted by flotus1 View Post
Why do you ask? Are you considering buying used hardware for your new cluster?
No the V3 is in our current workstation, the v4 is considered (other thread) for the new cluster. However, I am asked to make more detailed estimations of performance gain for the proposed new setups. But more on that maybe later.
F1aerofan is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
v4.2.0 diverges with 32 processors but converges with 16. v4.0.0 worked with 32 asthelen SU2 0 July 6, 2016 01:27
999999 (../../src/mpsystem.c@1123):mpt_read: failed:errno = 11 UDS_rambler FLUENT 2 November 22, 2011 10:46
NUMAP-FOAM Summer School 2011 hjasak OpenFOAM Announcements from Other Sources 0 March 7, 2011 05:51
Parallel Computing on Multi-Core Processors Upgrading Hardware CFX 6 June 7, 2007 16:54
64-bit processors for home computing Ananda Himansu Main CFD Forum 2 March 16, 2004 13:48


All times are GMT -4. The time now is 13:08.