|
[Sponsors] |
July 12, 2018, 07:53 |
|
#81 |
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
It's pretty amazing that 4x EPYC 7351's are roughly equivalent to (maybe slightly faster than) my whole cluster, which has 10x 10 core processors and almost perfect node scaling thanks to infiniband.
Memory bandwidth FTW |
|
July 12, 2018, 11:48 |
|
#82 | |
Senior Member
Join Date: Oct 2011
Posts: 242
Rep Power: 17 |
Quote:
I just switched off smt nothing more on the optimization side. I only ran this serie of tests it is not an average |
||
July 12, 2018, 12:02 |
|
#83 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
A few things come to mind since the dual Epyc 7601 running 32 cores was also slower than my dual Epyc 7301 setup.
I suspect that internal memory organization plays an important role here. Mine used dual-rank modules which I highly recommend. I can only speculate that the 7601 setup used single-rank. And of course different linux kernels were used. From my experience, older versions can severely hurt performance for Epyc CPUs and in general for CPUs released after the kernel. Then when benchmarking (or running heavy jobs) I make sure the system is as idle as possible, caches are cleared and turbo modes are used to the full extent. |
|
July 13, 2018, 05:28 |
|
#84 | |
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
Agreed. I saw significant performance gains from dual rank vs single rank DDR3.
Quote:
|
||
July 13, 2018, 05:35 |
|
#85 | ||
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
Quote:
Quote:
I'm curious now, haha |
|||
July 13, 2018, 06:10 |
|
#86 |
Senior Member
Join Date: Oct 2011
Posts: 242
Rep Power: 17 |
It should be as I specifically asked for it. I did not check though. Is there a terminal command to check that, I d rather not open the case now
|
|
July 13, 2018, 06:14 |
|
#87 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Quote:
Before running a time-critical simulation I do Code:
# free && sync && echo 3 > /proc/sys/vm/drop_caches && free Quote:
Many people will try to tell you that clearing caches beforehand is not necessary because the system will do a sufficient job of organizing memory. I found that this is not always the case. |
|||
July 13, 2018, 06:37 |
|
#88 |
Senior Member
Join Date: Oct 2011
Posts: 242
Rep Power: 17 |
dmidecode -t memory tells dual rank
|
|
July 13, 2018, 06:38 |
|
#89 |
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
||
July 13, 2018, 06:39 |
|
#90 |
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
||
July 13, 2018, 06:40 |
|
#91 |
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
||
July 25, 2018, 04:47 |
|
#92 |
New Member
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 8 |
2x IBM POWER9 Sforza 22 core CPUs [1], 8x16GB 2Rx4 DDR4-2400 registered ECC, OpenFOAM 5.x (GitHub version), Ubuntu 18.04, kernel 4.18-rc1
Code:
# Cores Wall time [s] ------------------------------------------------------------ 1 677.38 2 366.04 4 180.1 6 124.17 8 96.64 12 70.16 16 56.39 20 47.47 24 41.76 44 36.71 |
|
July 25, 2018, 04:55 |
|
#93 |
Member
Join Date: Jan 2014
Posts: 32
Rep Power: 12 |
||
July 25, 2018, 05:06 |
|
#94 |
New Member
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 8 |
Fully understood, that's a high end professional workstation we benchmarked
In general, POWER9 pricing isn't that bad compared to Intel / EPYC; while the 22 core CPUs are the top-end, rather expensive parts, take a look at the 18 core devices (basically one step down from the premium 22-core CPUs) for best value. Performance will be pretty close to the full 22 core results in practice since the 18 core can boost to higher clocks before it hits thermal limits. |
|
July 25, 2018, 05:35 |
|
#95 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
That is pretty impressive for only 8 memory channels. Does it have some kind of L4 cache?
|
|
July 25, 2018, 06:04 |
|
#96 |
New Member
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 8 |
No, but each module has over 100MB of L3 cache; this system had 220MB L3 in total (5MB/core, 44 cores). POWER is also traditionally very strong on I/O of all sorts including to and from DRAM.
|
|
July 25, 2018, 17:46 |
|
#97 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
Doesn't POWER9 have 8 memory channels per socket? If so you're only using half of the memory channels.
|
|
July 25, 2018, 18:01 |
|
#98 | |
New Member
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 8 |
Quote:
For an example of just how large LaGrange/Monza packages are, check out the picture here: https://www.tomshardware.com/news/ib...ers,36054.html That's what's needed for 8 memory channels to be exposed alongside all the PCIe lanes, etc. Sforza's roughly 1/2 the size on each side. Last edited by tpearson-raptor; July 25, 2018 at 21:39. |
||
July 26, 2018, 03:49 |
|
#99 |
New Member
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 8 |
After a bit of tuning....
2x IBM POWER9 Sforza 22 core CPUs, 8x16GB 2Rx4 DDR4-2400 registered ECC, OpenFOAM 5.x (GitHub version), Ubuntu 18.04, kernel 4.18-rc1, OpenFOAM modified to build with mcpu=power9 instead of mcpu=power8: Code:
# Cores Wall time [s] ------------------------------------------------------------ 1 659.81 2 355.5 4 176.6 6 121.2 8 94.65 12 68.4 16 55.63 20 46.81 24 41.51 44 36.3 Last edited by tpearson-raptor; July 27, 2018 at 06:43. |
|
July 26, 2018, 05:09 |
|
#100 | |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 540
Rep Power: 20 |
Quote:
Just some questions:
Jörn |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |