|
[Sponsors] |
32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
July 7, 2016, 05:21 |
32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance
|
#1 |
New Member
Aurélien
Join Date: Jun 2014
Posts: 3
Rep Power: 12 |
Hello everyone.
· I’ve been working for some time on the definition of a 32 cores mini-cluster configuration, mainly to run CFX. · Most of the components have been chosen, but one very important question remains about the number of computing nodes: What is the difference in terms of performance between a 2 nodes configuration with 2*8 cores each & a 4 nodes configuration with 2*4 cores each ? · CPU’s would be Xeon E5-2600 v4, 2667 for 2 node configuration & 2637 for 4 cores configuration · I remember reading a thread where Glenn Horrocks said that cfp2006 benchmark is a good indicator of final CFX-Fluent performance, so I performed some comparison based on cfp2006 benchmark results (see attachment) · The conclusion is that the 4 nodes configuration would be 70% faster than the 2 nodes configuration, which seems unbelievable. · Base frequency is only 10% higher for 4 cores CPU but Memory Bandwidth per core is 50% also higher for 4 cores CPU. Is it enough to explain such difference ? · I assumed scalability is perfect because we’ll be using Infiniband, but even if inter-node scalability is 90% the gap will still be huge. · The only drawback of the 4 nodes configuration is the total CPU dissipated power which is doubled compared to the 2 nodes config... Does anyone has some advice or comment about that ? Did I miss something in my logic ? Thanks for reading. Aurelien |
|
July 7, 2016, 08:47 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
If you can afford it, my advice would be a 4-node setup.
The reason why it is faster is, as always, memory bandwidth. Four dual-socket nodes deliver twice the memory bandwidth compared to two nodes. This will translate into a huge performance increase in CFX. In addition to that, the Xeon E5-2637 v4 is the processor with the highest amount of L3-cache per core which also helps. If money is not an issue at all, you might even consider installing the E5-2667 v4 on all 4 nodes. If you use only 8 cores per node, it has even more L3-Cache per (used) core and about the same clock speed. Keep in mind that the base frequency is not a good indicator for the actual clock speed under load. These processors usually run with higher frequencies even when all cores are used. In fact, the turbo frequency for all cores is 3.6GHz for the E5-2637v4 and 3.5GHz for the E5-2667v4. |
|
July 7, 2016, 11:39 |
|
#3 |
New Member
Aurélien
Join Date: Jun 2014
Posts: 3
Rep Power: 12 |
Thanks Alex.
Sadly money is an issue, so the 4-node setup with E5-2637 v4 will be fine . And the E5-2637 4-node setup is only 10% more expensive than the E5-2667 2-node setup, so it is a worthwhile investment. Do you really believe it will be 70% faster or should I expect something between 20 & 50% for some reason ? (Even 20% it is still great anyway) Best Regards |
|
July 7, 2016, 12:37 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
It is hard to say how much faster it will be exactly, this also depends on the type of cases you run. But on average the difference in performance will definitely be more than just 20%. Somewhere around 50% should be a conservative estimate.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Compute Cluster with diskless compute nodes | Pauli | Hardware | 0 | October 6, 2015 17:48 |
OF211 with mvapich2 on redhat cluster, error when using more than 64 cores? | ripperjack | OpenFOAM Installation | 4 | August 30, 2014 04:47 |
Performance Improvements by core locking | RobertB | STAR-CCM+ | 7 | October 22, 2010 08:59 |
Linux Cluster Performance with a bi-processor PC | M. | FLUENT | 1 | April 22, 2005 10:25 |
CFX4.3 -build analysis form | Chie Min | CFX | 5 | July 13, 2001 00:19 |