|
[Sponsors] |
November 20, 2020, 05:58 |
|
#341 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
Limit and pin each simulation to a separate NUMA node to avoid the higher latency. And watch out for total memory usage. Each subset of simulations run on a NUMA node should not exceed memory available on this node. |
||
November 22, 2020, 14:14 |
|
#342 |
New Member
Jon
Join Date: Mar 2013
Posts: 15
Rep Power: 13 |
Dear all,
cpu: AMD Epyc 7f52 (ES), ram: 8x16gb 2666 DDR4 2R, storage:Samsung 970 evo plus, 1TB openfoam v8, compiled with znver2, SMT OFF, cpus set to performance. Results: # cores Wall time (s): ------------------------ 1 593.62 2 315.34 4 6 98.33 8 78.05 12 16 43.66 for some reason did not get results for 4 and 12 cores, cheers! Last edited by CHUIKOV; November 24, 2020 at 07:54. |
|
November 28, 2020, 09:21 |
Up-to-date Benchmark Results Compilation
|
#343 |
New Member
Andrew
Join Date: Apr 2012
Posts: 15
Rep Power: 14 |
Hello, fellow CFD users)
Benchmarks data compiled previously by topicstarter here https://openfoamwiki.net/index.php/Benchmarks was updated in late 2018, so currently it is missing all the modern configurations. It is quite a lot of work to read through all the posts, so short summary can save a lot of time for occasional visitor searching for guidance on optimal config. So, I used the concept from blackcatxiii and just trimmed his list to commercially available as brand-new single- and dual-CPU systems in workstation and server range. If you're interested in broader range of systems, you can still check them there . I also added diagram that compares performance on max number of available cores relative to (at the moment) champion in bang-for-$ 2*EPYC 7542 system. Please keep in mind that performance may be influenced by a lot of things - major factors are at least memory speed, OS, NUMA configuration, OF version, compiler used. Obviously, they differ between different machines that were used by forumers to run benchmark. You can check the actual settings using the link to respective original messages in the first column of table. |
|
December 23, 2020, 18:47 |
|
#344 | ||
Senior Member
Join Date: Apr 2020
Location: UK
Posts: 737
Rep Power: 14 |
Dear speedsters,
here are my results from the motorbike testcase. System: Quote:
Quote:
I ran the cases out of the box, and I am really happy with the results, but - are there any tweaks I can do to improve the inter-CPU comms? Many thanks. |
|||
December 23, 2020, 19:51 |
|
#345 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
You could verify what is happening by running the same benchmark with 1, 2 and 3 threads, pinned to a single CCD. For example on cores 0, 1 and 2.
My guess is that it's just memory bandwidth. With 16 cores, each CCD gets 2 threads assigned to its cores. With 17 threads, One of the CCDs has to get 3 threads, which limits overall execution speed due to a lack of memory bandwidth. In that test with 1-3 threads, you should see exactly that: linear scaling from 1 to 2 threads, and a deviation from that line with 3 threads. Which rules out communication between CCDs or sockets. |
|
December 29, 2020, 22:01 |
Strange...
|
#346 | ||
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Hi all,
I'm just comparing all quad capable v2 Xeons in terms of performance that were posted here (filling my shopping list for a low budget 4 socket workstation), and came across @wkernkamps listing in #199 of 2x E5-4627 v2, 8 cores@3.3 GHz, that seem low (well, not low, but lower than similar quad config-ed machines): Quote:
Quote:
Core question (excuse the pun): all other things being equal, would the E5-4627 v2 in quad config be expected to beat the others (4650/4657L v2 etc) due to it's higher clock speed irrespective of lower core count? Many thanks in advance, Kai. |
|||
December 30, 2020, 06:53 |
|
#347 | |
Senior Member
Join Date: Apr 2020
Location: UK
Posts: 737
Rep Power: 14 |
Quote:
Run time on 1 core is 20% faster, as expected from the higher clock speed, but interestingly the speed gain quickly falls off (10% faster on 6 cores, negligible difference on 16 cores). The good news, Chuikov, is that if you decide to throw in a second CPU, then the evidence from my time trial is that the run times continue to scale pretty well at higher core counts. |
||
January 6, 2021, 03:20 |
|
#348 | |
New Member
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 19
Rep Power: 6 |
Quote:
Anxious to see how it runs with more memory, Similar to mine, but with v2 procs. |
||
January 6, 2021, 10:52 |
|
#349 |
Senior Member
Join Date: May 2012
Posts: 551
Rep Power: 16 |
2 x Intel Xeon 2673v3, DDR4 2R 2133 MT/s. Debian 10. OpenFOAM 7. HT enabled.
Code:
# cores Wall time (s): ------------------------ 1 889.57 2 488.47 4 228.37 8 128.43 16 85.43 24 76.29 |
|
January 7, 2021, 10:29 |
9800x
|
#350 |
New Member
Francisco
Join Date: Sep 2018
Location: Portugal
Posts: 27
Rep Power: 8 |
An i7-9800x @stock running OF 7 on Ubuntu 20 with hyperthreading enabled and 4x8 GB 3600-cl16 single rank memory (XMP).
Code:
# cores Wall time (s): ------------------------ 1 776.35 2 388.64 4 208.41 6 162.96 8 149.45 An interesting alternative to 1st gen threadrippers, I think. |
|
January 27, 2021, 23:06 |
|
#352 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
|
||
February 1, 2021, 22:32 |
|
#353 |
Member
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6 |
Ok that's what I thought. I have ordered my 4x 4627 V2's and am eagerly awaiting delivery to do more benchmarking. Will report as soon as they arrive, the rest of the machine is already here and waiting (DL560 G8, 16 sticks of 1866 2R 16Gb).
|
|
February 7, 2021, 17:19 |
|
#354 |
Senior Member
julien
Join Date: Dec 2018
Posts: 107
Rep Power: 7 |
Hello
2 x Intel E5-2678 v3 @ 2.50GHz with SZMZ X99 Z8 motherboard from Aliexpress default settings 128 GB (8x16) DDR4 2133 MHz openFoam 8 Ubuntu 20.04 I use the tutorial motorbike. Benchmark is not working even if I have follow recommandations of this topic for newer OF versions. So I have report the Clock time of simpleFoam that I run for this tutorial with the following cores number: # cores Clock time (s): ------------------------ 1 622 2 339 4 161 6 118 10 82 12 70 18 56 24 57 Hyper threading enabled. |
|
February 10, 2021, 14:32 |
7f52
|
#355 | ||
Member
Erik Andresen
Join Date: Feb 2016
Location: Denmark
Posts: 35
Rep Power: 10 |
A 7F52 is nice cpu for cfd. Epycs with 256 MB of cache have a higher memory bandwidth than to Epyss with 128 MB (or much worse 64 MB cache), but they should be used with 8 x DDR4 3200 2R.
Quote:
Quote:
|
|||
February 10, 2021, 15:08 |
|
#356 |
New Member
Rob
Join Date: Apr 2018
Posts: 18
Rep Power: 8 |
edit - this post was intended to respond to a post much further back in the thread. Apologies:
I agree with the previous statement. Intel V2 era hardware with DDR3 RAM is massive value for CFD use. I am currently running a stack of these. I periodically check what newer/better hardware might cost and it so far has not shown enough performance to warrant the cost of upgrading. By that I mean it's generally ~double the cost for ~10-20% gain. These are extrapolated numbers based on some benchmarks I've done on newer borrowed systems and it's very likely that a game-changing solution will arrive in time but V4/DDR4 hardware doesn't appear to be that. |
|
February 11, 2021, 17:47 |
Ryzen 5600x benchmark
|
#357 |
New Member
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6 |
Fellow foamers,
I hereby submit my results from my system comprising of the following hardware: CPU: AMD Ryzen 5600x RAM: GSkill 2x8Gb (f4-3600c16d-16gtzn) Mobo: Asus TUF X570 gaming plus (Running the latest updates) PSU: Corsair 850w Currently running OpenFOAM 1912, compiled from source, in ubuntu 20.04 with updated kernel 5.10. I have used DRAM calculator on A0/B0 Fast settings and that was as far as I could get with a stable system given my knoweledge and time availiable. These are my best results. I also attach a spreadsheet with all my benchmarks and the Ryzen benchmarks I could find in this thread. These show clearly how much the overall performance depends upon RAM performance. Nothing new I know, but still I think it is nice. cores Time Scale 1 431.36 1.00 2 263.03 1.64 4 198.1 2.18 6 184.35 2.34 To be honest I think I can do better with the timings and improve the overall results and the scaling. I would expect I would be able to approach close to x3. Also the lower latency (c14) model of the same ram looks like it would improve the situation a lot. Still, even the above are great results for a system that costs around 1000 euros (depends on where you live and the availiabillity) and is an all rounder. Moreover, when compared to the other Ryzen systems the scale performance is worse, with other systems reaching more than 3x scale. My guess is that because the CPU is faster the RAM creates an even stronger bottleneck in the computations. Maybe a more knowledgeable person can comment on that. Additionally I did run a bunch of bencharmks which you might find interesting. Phoronix Test Suite v10.2.1: Memory Copy - Array Size: 4096 MiB: Average: 23256.531 MiB/s Memory Copy, Fixed Block Size - Array Size: 4096 MiB: Average: 11724.460 MiB/s Memory Bandwidth (mbw): Method: MEMCPY Elapsed: 0.37036 MiB: 5000.00000 Copy: 13500.378 MiB/s Method: DUMB Elapsed: 0.21765 MiB: 5000.00000 Copy: 22973.085 MiB/s Method: MCBLOCK Elapsed: 0.29842 MiB: 5000.00000 Copy: 16754.909 MiB/s y-cruncher with 2,500,000,000 decimal digits on 6 cores (this one calculates pi up to a specific accuracy and it is a personal favourite) : Total Computation Time: 190.771 seconds ( 3.180 minutes ) Start-to-End Wall Time: 202.055 seconds ( 3.368 minutes ) CPU Utilization: 698.81 % + 0.43 % kernel overhead Multi-core Efficiency: 58.23 % + 0.04 % kernel overhead Blender Benchmark: bmw27: 3m43s classroom: 10m6s fishy_cat: 4m45s koro: 5m29s pavillon_barcelona: 10m30s victor: 16m25s Throughout the above benchmarks the processor did not pass 81 degress with the stock cooler and adequate case cooling with 6 fans. No thermal throttling was observed. I hope the above will be helpfull to someone. Cheers Last edited by gpouliasis; February 13, 2021 at 14:36. |
|
February 12, 2021, 04:01 |
|
#358 |
Senior Member
Join Date: May 2012
Posts: 551
Rep Power: 16 |
I think you can gain some performance with better memory settings. If you are lucky you might also be able to increase the Infinity Fabric beyond 1800 MHz.
Anyways, I would not purchase more expensive RAM in your shoes. It is one of the fastest results we have seen for single threaded performance which will benefit a number of cases. Nice value! |
|
February 12, 2021, 06:34 |
|
#359 |
Senior Member
Join Date: Apr 2020
Location: UK
Posts: 737
Rep Power: 14 |
Agreed - keep in mind that the Ryzen 5600x only has 2 memory channels, and so despite the 6cores/12threads I assume that it will still bottleneck on access to the RAM, whatever the RAM performance.
|
|
February 12, 2021, 08:45 |
|
#360 | |
New Member
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6 |
Quote:
P.S. Great choice of whiskey |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |