|
[Sponsors] |
October 5, 2020, 04:54 |
Result: 2x AMD EPYC 7542 32-core/ Ubuntu 18 / ESI 2006
|
#321 |
New Member
Andi
Join Date: Jun 2018
Posts: 13
Rep Power: 8 |
THX to all the contributers here!
Here are my results. Compared to the other results of the Epyc 7542 the results are pretty similar as expected. I am pretty happy with this setup. System: 2x AMD EPYC 7542 32-core 16*32GB 3200MHz/ Ubuntu 18 / ESI 2006 Result: PHP Code:
Last edited by meshingpumpkins; October 7, 2020 at 08:48. Reason: correction of data |
|
October 7, 2020, 07:46 |
|
#322 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Hi Meshingpumpkins,
I think your last column might need correction; it needs to be the inverse... You divided the runtime by 100, but what you should do for it/s is to divide 100 by runtime... But no biggy... Also: As on many platforms the throughput is limited by memory bandwidth, and often max performance is reached well before all cores are utilized - would it not be interesting to somehow get power consumption into the results? Not trivial, I know, but if you get 90% of performance with 60% of used cores, this would be an interesting investigation, no? Cheers, Kai. |
|
October 7, 2020, 08:55 |
Result: 2x AMD EPYC 7542 32-core/ Ubuntu 18 / ESI 2006 addon
|
#323 | |
New Member
Andi
Join Date: Jun 2018
Posts: 13
Rep Power: 8 |
Quote:
about the performance vs power consumption question: this is interessting. but if you have multiple users of a server you would share the cores. in my opinion it also depends on the used case. but one could say that if you make parameter studies of your case it would be a better idea to use the half number of the cores to increase efficiency. speedup_1_s6.jpg |
||
October 7, 2020, 10:09 |
|
#324 | |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 533
Rep Power: 20 |
Quote:
Speedup in the last line means runtine 1 core divided by runtine 64 cores. And the original table is right. |
||
October 7, 2020, 10:21 |
|
#325 |
New Member
Andi
Join Date: Jun 2018
Posts: 13
Rep Power: 8 |
||
October 7, 2020, 12:02 |
|
#326 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
... however, the it/s was an interesting metric! Could you add it back in?
;-) |
|
October 14, 2020, 14:08 |
Power requirements...
|
#327 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Hello all,
I recently got an HP DL380P with (only) 2x E5-2630v1, 16x 2Rx4Gb 1333 DDR3 (64Gb total), and through the excellent iLO I can also monitor power draw (see attached image). I'll do some comparisons of bare metal Ubuntu 20.04, ESXi 6.7, Win10 WSL, and lastly Freenas with an Ubuntu VM (just for kicks). All will use OF7 from .org natively installed through their Ubuntu repository, so no software optimizing at all. To start off, here's Ubuntu 20.04: Code:
SnappyHexMesh Cores Pwr(W) Time(s) kWh 1 147 2447 0.100 2 158 1557 0.068 4 207 906 0.052 6 223 636 0.039 8 240 522 0.035 12 275 422 0.032 Sim Cores Pwr(W) Time(s) kWh 1 162 1252 0.056 2 184 645 0.033 4 256 290 0.021 6 288 211 0.017 8 320 176 0.016 12 358 149 0.015 However: I take away from this that running all 12 cores reduces cost of the benchmark by 2/3 for SHM, and nearly 3/4 for the sim, when compared to running single-core. Results using VMs on the same hardware will follow over the next few days. Cheers, Kai. |
|
October 15, 2020, 22:48 |
|
#328 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Kai,
What is your idle power? Will |
|
October 16, 2020, 04:17 |
|
#329 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Hi Will,
Idle power hovers between 90 and 100 Watts. Cheers, Kai. |
|
October 16, 2020, 10:34 |
|
#330 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Ok now with ESXi 6.5, same hardware as above (DL380p, 2x E5-2630v1, 16x 2Rx4, 1333MHz), vm is identical to bare-metal setup above.
Code:
SnappyHexMesh Cores Pwr(W) Time(s) kWh 1 158 2522 0.110 2 166 1635 0.747 4 209 936 0.054 6 230 646 0.041 8 239 535 0.036 12 273 430 0.033 Sim Cores Pwr(W) Time(s) kWh 1 169 1285 0.060 2 189 670 0.035 4 257 302 0.022 6 288 217 0.017 8 317 182 0.016 12 357 154 0.015 Kai. |
|
October 16, 2020, 17:46 |
|
#331 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Now with bhyve, under freenas 11.3;
Code:
SnappyHexMesh Cores Pwr(W) Time(s) kWh 1 160 2728 0.12 2 180 1719 0.086 4 207 1028 0.059 6 232 717 0.046 8 249 604 0.042 12 262 924 0.067 Sim Cores Pwr(W) Time(s) kWh 1 179 1617 0.080 2 210 756 0.044 4 245 427 0.029 6 266 339 0.025 8 285 317 0.025 12 280 556 0.043 If anyone has any information on this please let me know - it would be very interesting for me to run this under Freenas directly, rather than having to revert to running ESXi, then Freenas as one VM, and Openfoam in another. Any help much appreciated. Kai. |
|
November 16, 2020, 16:46 |
|
#332 |
New Member
M Shaaban
Join Date: Jun 2019
Posts: 11
Rep Power: 7 |
For OpenFoam8 (foundation), user will have to:
1. comment the function objects (streamlines and "wallBoundedStreamLines") in the control dict. 2. change the etc director to #includeEtc "caseDicts/mesh/generation/meshQualityDict" in the meshQuality Dict 3. copy the 'surfaceFeatureDict' from the tutorial case, and change the surfacefeatureExtract application Allmesh in the base case to "runApplication surfaceFeatures" in line 9. then it works. Let's see how my server stands out. |
|
November 17, 2020, 13:16 |
|
#333 |
New Member
M Shaaban
Join Date: Jun 2019
Posts: 11
Rep Power: 7 |
4 x Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz
128 GB DDR3 1600 MHz openFoam 8 Ubuntu 20. # cores Wall time (s): ------------------------ 48 77.45 44 77.66 40 77.43 36 77.34 32 77.59 28 78.45 24 79.93 16 89.9 8 133.07 4 245.4 2 652.24 1 27.39 Meshing: 48 real 4m19.655s 44 real 3m43.624s 40 real 3m54.778s 36 real 3m51.182s 32 real 3m48.851s 28 real 3m54.084s 24 real 4m19.289s 16 real 5m46.104s 8 real 7m19.078s 4 real 12m8.124s 2 real 23m45.691s 1 real 0m3.501s Hitting some ceiling there. I verified that I have 32GB per NUMA nodes. Any ideas for checking the reason for the bottleneck beyond 24 cores? |
|
November 17, 2020, 14:43 |
|
#334 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
How is the memory populated? 16*16GB?
# dmidecode -t 17 In case you need to find out. Htop provides a quick and easy way to check which cores are utilized. |
|
November 18, 2020, 09:09 |
|
#335 | |
New Member
M Shaaban
Join Date: Jun 2019
Posts: 11
Rep Power: 7 |
Quote:
Thanks for your reply Flotus1. There are 8 x 16 GB x 1600 MHz. attached at banks: 0 1 12 13 24 25 36 37 I guess I will need to get more rams. |
||
November 18, 2020, 09:12 |
|
#336 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Yeah, my math didn't check out. I meant 16x8GB.
Anyway, you would need 16 identical DIMMs to get peak performance with this system. The scaling behavior you got is pretty typical for not having all memory channels populated. |
|
November 19, 2020, 16:04 |
|
#337 |
New Member
Roman G.
Join Date: Apr 2017
Posts: 16
Rep Power: 9 |
We just bought a new Workstation for our department. Thanks to this Thread we were able to find a good configuration.
The following setup was done: OpenFOAM was compiled with the tag "-march=znver1". Also SMT was switched off and all processors were set to performance mode using "cpupower frequency-set -g performance" from the HPC Tuning Guide provided by AMD ( http://developer.amd.com/wp-content/resources/56420.pdf). CPU: 2x AMD EPYC 7532 (Zen2-Rome) 32-Core CPU, 200W, 2.4GHz, 256MB L3 Cache, DDR4-3200 RAM: 256GB (16x 16GB) DDR4-3200 DIMM, REG, ECC, 2R OpenFOAM v7 cores time (s) speedup 1 677,34 1,00 2 363,04 1,87 4 161,42 4,20 6 101,82 6,65 8 77,16 8,78 12 52,28 12,96 16 39,4 17,19 20 32,01 21,16 24 27,31 24,80 28 24,15 28,05 32 21,53 31,46 36 21,32 31,77 40 20,46 33,11 44 18,99 35,67 48 18,12 37,38 52 17,45 38,82 56 17,06 39,70 60 16,5 41,05 64 15,91 42,57 Until 32 cores the scalling is perfect, afterwards it starts to drop... Is it just caused by the bandwith or can there be other things causing this drop? |
|
November 19, 2020, 17:12 |
|
#338 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Any particular reason for the use of znver1 instead of znver2?
Bandwidth will be part of the reason why scaling tapers off. Lower CPU frequency with more busy cores might be another contribution. But overall, performance looks pretty impressive. |
|
November 19, 2020, 21:29 |
|
#339 | |
New Member
M Shaaban
Join Date: Jun 2019
Posts: 11
Rep Power: 7 |
Quote:
# cores Wall time (s): ------------------------ 48 45.04 44 45.62 40 46.08 36 47.52 32 49 28 52.01 24 56.36 16 73.13 8 127.29 4 239.67 2 602.69 So the added ram made it faster and more scalable. Results are similar to other Xeon processors. Any recommendations or hints for best practices if I run several aimulations on the same machine? |
||
November 20, 2020, 03:14 |
|
#340 | |
New Member
Roman G.
Join Date: Apr 2017
Posts: 16
Rep Power: 9 |
Quote:
Ups sorry, actually we did compile it using znver2. |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |