CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree547Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 20, 2020, 05:58
Default
  #341
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Any recommendations or hints for best practices if I run several aimulations on the same machine?
Avoid oversubscription, i.e. no physical core should have to run more than one thread.
Limit and pin each simulation to a separate NUMA node to avoid the higher latency.
And watch out for total memory usage. Each subset of simulations run on a NUMA node should not exceed memory available on this node.
flotus1 is offline   Reply With Quote

Old   November 22, 2020, 14:14
Default
  #342
New Member
 
Jon
Join Date: Mar 2013
Posts: 15
Rep Power: 13
CHUIKOV is on a distinguished road
Dear all,


cpu: AMD Epyc 7f52 (ES), ram: 8x16gb 2666 DDR4 2R, storage:Samsung 970 evo plus, 1TB


openfoam v8, compiled with znver2, SMT OFF, cpus set to performance.


Results:


# cores Wall time (s):
------------------------
1 593.62
2 315.34
4
6 98.33
8 78.05
12
16 43.66


for some reason did not get results for 4 and 12 cores,


cheers!
Crowdion likes this.

Last edited by CHUIKOV; November 24, 2020 at 07:54.
CHUIKOV is offline   Reply With Quote

Old   November 28, 2020, 09:21
Post Up-to-date Benchmark Results Compilation
  #343
New Member
 
Andrew
Join Date: Apr 2012
Posts: 15
Rep Power: 14
Malinator is on a distinguished road
Hello, fellow CFD users)

Benchmarks data compiled previously by topicstarter here https://openfoamwiki.net/index.php/Benchmarks was updated in late 2018, so currently it is missing all the modern configurations. It is quite a lot of work to read through all the posts, so short summary can save a lot of time for occasional visitor searching for guidance on optimal config.


So, I used the concept from blackcatxiii and just trimmed his list to commercially available as brand-new single- and dual-CPU systems in workstation and server range. If you're interested in broader range of systems, you can still check them there .

I also added diagram that compares performance on max number of available cores relative to (at the moment) champion in bang-for-$ 2*EPYC 7542 system.

Please keep in mind that performance may be influenced by a lot of things - major factors are at least memory speed, OS, NUMA configuration, OF version, compiler used. Obviously, they differ between different machines that were used by forumers to run benchmark. You can check the actual settings using the link to respective original messages in the first column of table.
Attached Images
File Type: png CFDbench2020.png (22.5 KB, 263 views)
Attached Files
File Type: xlsx BenchShortList2020.xlsx (20.1 KB, 91 views)
Malinator is offline   Reply With Quote

Old   December 23, 2020, 18:47
Default
  #344
Senior Member
 
Join Date: Apr 2020
Location: UK
Posts: 737
Rep Power: 14
Tobermory will become famous soon enough
Dear speedsters,

here are my results from the motorbike testcase.

System:
Quote:
2x AMD EPYC 7302 (16-core, 3.0GHz, 128MB cache), 16*8GB (DDR4, 2666Hz), Ubuntu 20.04, OpenFOAM 8
Results:
Quote:
# cores Wall time (s):
------------------------
1 711.46
2 378.65
4 164.69
6 109.98
8 83.08
12 57.2
16 44.32
20 40.2
24 37.05
28 33.17
32 30.52
i.e. linear scaling with core count up to 16 cores, after which the scaling is still linear but at a reduced gradient/efficiency. Is this because of poor communication between the CPUs, do you think, or saturation of the RAM bandwidth?

I ran the cases out of the box, and I am really happy with the results, but - are there any tweaks I can do to improve the inter-CPU comms? Many thanks.
Attached Images
File Type: png Epyc7302.png (23.9 KB, 143 views)
hokhay and Crowdion like this.
Tobermory is offline   Reply With Quote

Old   December 23, 2020, 19:51
Default
  #345
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You could verify what is happening by running the same benchmark with 1, 2 and 3 threads, pinned to a single CCD. For example on cores 0, 1 and 2.
My guess is that it's just memory bandwidth. With 16 cores, each CCD gets 2 threads assigned to its cores. With 17 threads, One of the CCDs has to get 3 threads, which limits overall execution speed due to a lack of memory bandwidth.
In that test with 1-3 threads, you should see exactly that: linear scaling from 1 to 2 threads, and a deviation from that line with 3 threads. Which rules out communication between CCDs or sockets.
flotus1 is offline   Reply With Quote

Old   December 29, 2020, 22:01
Default Strange...
  #346
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Hi all,

I'm just comparing all quad capable v2 Xeons in terms of performance that were posted here (filling my shopping list for a low budget 4 socket workstation), and came across @wkernkamps listing in #199 of 2x E5-4627 v2, 8 cores@3.3 GHz, that seem low (well, not low, but lower than similar quad config-ed machines):

Quote:
1 938.32
2 506.25
4 236.32
8 131.57
12 108.1
14 102.96
16 101.14
This contrasts with v2's used in quad configuration that reach higher numbers even at 16 threads, even with much lower clock speeds (just quoting the 4657L v2, 12 cores@2.4GHz, of @wildeman in #339);

Quote:
48 45.04
44 45.62
40 46.08
36 47.52
32 49
28 52.01
24 56.36
16 73.13
8 127.29
4 239.67
2 602.69
Is this because of the extra mem channels that are available in quad cpu config vs dual cpu (note 16 thread results)? I ask because the 4657L seems to get it's mem saturated at around 8 cores/CPU as expected... I note also that the 4627 got tested with faster RAM (14900 vs 12800). Is the 4657L so good because of it's much bigger cache (30Mb vs 18Mb)?

Core question (excuse the pun): all other things being equal, would the E5-4627 v2 in quad config be expected to beat the others (4650/4657L v2 etc) due to it's higher clock speed irrespective of lower core count?

Many thanks in advance,

Kai.
Kailee71 is offline   Reply With Quote

Old   December 30, 2020, 06:53
Default
  #347
Senior Member
 
Join Date: Apr 2020
Location: UK
Posts: 737
Rep Power: 14
Tobermory will become famous soon enough
Quote:
Originally Posted by CHUIKOV View Post
cpu: AMD Epyc 7f52 (ES), ram: 8x16gb 2666 DDR4 2R, storage:Samsung 970 evo plus, 1TB
openfoam v8, compiled with znver2, SMT OFF, cpus set to performance.


Results:

# cores Wall time (s):
------------------------
1 593.62
2 315.34
4
6 98.33
8 78.05
12
16 43.66
This is interesting - Chuikov's 7F52 Epyc CPU is essentially the same as my 7302, but with a 17% faster base clock speed and twice the L3 cache (256MB, as opposed to 128MB). It still has 8 memory channels.

Run time on 1 core is 20% faster, as expected from the higher clock speed, but interestingly the speed gain quickly falls off (10% faster on 6 cores, negligible difference on 16 cores).

The good news, Chuikov, is that if you decide to throw in a second CPU, then the evidence from my time trial is that the run times continue to scale pretty well at higher core counts.
Tobermory is offline   Reply With Quote

Old   January 6, 2021, 03:20
Default
  #348
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 19
Rep Power: 6
kstuart is on a distinguished road
Quote:
Originally Posted by wildemam View Post
4 x Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz

128 GB DDR3 1600 MHz
openFoam 8
Ubuntu 20.


# cores Wall time (s):
------------------------
48 77.45
44 77.66
40 77.43
36 77.34
32 77.59
28 78.45
24 79.93
16 89.9
8 133.07
4 245.4
2 652.24
1 27.39

Meshing:
48 real 4m19.655s
44 real 3m43.624s
40 real 3m54.778s
36 real 3m51.182s
32 real 3m48.851s
28 real 3m54.084s
24 real 4m19.289s
16 real 5m46.104s
8 real 7m19.078s
4 real 12m8.124s
2 real 23m45.691s
1 real 0m3.501s


Hitting some ceiling there. I verified that I have 32GB per NUMA nodes. Any ideas for checking the reason for the bottleneck beyond 24 cores?

Anxious to see how it runs with more memory, Similar to mine, but with v2 procs.
kstuart is offline   Reply With Quote

Old   January 6, 2021, 10:52
Default
  #349
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
2 x Intel Xeon 2673v3, DDR4 2R 2133 MT/s. Debian 10. OpenFOAM 7. HT enabled.




Code:
# cores   Wall time (s):
------------------------
1 889.57
2 488.47
4 228.37
8 128.43
16 85.43
24 76.29
Simbelmynė is offline   Reply With Quote

Old   January 7, 2021, 10:29
Default 9800x
  #350
New Member
 
Francisco
Join Date: Sep 2018
Location: Portugal
Posts: 27
Rep Power: 8
ships26 is on a distinguished road
An i7-9800x @stock running OF 7 on Ubuntu 20 with hyperthreading enabled and 4x8 GB 3600-cl16 single rank memory (XMP).


Code:
# cores   Wall time (s):
------------------------
1         776.35
2         388.64
4         208.41
6         162.96
8         149.45

An interesting alternative to 1st gen threadrippers, I think.
ships26 is offline   Reply With Quote

Old   January 12, 2021, 12:00
Default
  #351
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Quote:
Originally Posted by kstuart View Post
Anxious to see how it runs with more memory, Similar to mine, but with v2 procs.
This has been done and reported in #339, unless I missed something...
Kailee71 is offline   Reply With Quote

Old   January 27, 2021, 23:06
Default
  #352
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by Kailee71 View Post
Hi all,

I'm just comparing all quad capable v2 Xeons in terms of performance that were posted here (filling my shopping list for a low budget 4 socket workstation), and came across @wkernkamps listing in #199 of 2x E5-4627 v2, 8 cores@3.3 GHz, that seem low (well, not low, but lower than similar quad config-ed machines):



This contrasts with v2's used in quad configuration that reach higher numbers even at 16 threads, even with much lower clock speeds (just quoting the 4657L v2, 12 cores@2.4GHz, of @wildeman in #339);



Is this because of the extra mem channels that are available in quad cpu config vs dual cpu (note 16 thread results)? I ask because the 4657L seems to get it's mem saturated at around 8 cores/CPU as expected... I note also that the 4627 got tested with faster RAM (14900 vs 12800). Is the 4657L so good because of it's much bigger cache (30Mb vs 18Mb)?

Core question (excuse the pun): all other things being equal, would the E5-4627 v2 in quad config be expected to beat the others (4650/4657L v2 etc) due to it's higher clock speed irrespective of lower core count?

Many thanks in advance,

Kai.
Yes, the memory channels! A quad processor config running 16 cores has four active cores per processor and four memory channels. A dual processor config will have 8 active cores per processor and the same four memory channels. So it is already bottlenecked by the memory.
wkernkamp is offline   Reply With Quote

Old   February 1, 2021, 22:32
Default
  #353
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Ok that's what I thought. I have ordered my 4x 4627 V2's and am eagerly awaiting delivery to do more benchmarking. Will report as soon as they arrive, the rest of the machine is already here and waiting (DL560 G8, 16 sticks of 1866 2R 16Gb).
Kailee71 is offline   Reply With Quote

Old   February 7, 2021, 17:19
Default
  #354
Senior Member
 
julien
Join Date: Dec 2018
Posts: 107
Rep Power: 7
julieng is on a distinguished road
Hello

2 x Intel E5-2678 v3 @ 2.50GHz with SZMZ X99 Z8 motherboard from Aliexpress default settings

128 GB (8x16) DDR4 2133 MHz
openFoam 8
Ubuntu 20.04

I use the tutorial motorbike. Benchmark is not working even if I have follow recommandations of this topic for newer OF versions.

So I have report the Clock time of simpleFoam that I run for this tutorial with the following cores number:

# cores Clock time (s):
------------------------
1 622
2 339
4 161
6 118
10 82
12 70
18 56
24 57

Hyper threading enabled.
julieng is offline   Reply With Quote

Old   February 10, 2021, 14:32
Default 7f52
  #355
Member
 
Erik Andresen
Join Date: Feb 2016
Location: Denmark
Posts: 35
Rep Power: 10
ErikAdr is on a distinguished road
A 7F52 is nice cpu for cfd. Epyc’s with 256 MB of cache have a higher memory bandwidth than to Epys’s with 128 MB (or much worse 64 MB cache), but they should be used with 8 x DDR4 3200 2R.


Quote:
Originally Posted by Tobermory View Post
This is interesting - Chuikov's 7F52 Epyc CPU is essentially the same as my 7302, but with a 17% faster base clock speed and twice the L3 cache (256MB, as opposed to 128MB). It still has 8 memory channels.

Run time on 1 core is 20% faster, as expected from the higher clock speed, but interestingly the speed gain quickly falls off (10% faster on 6 cores, negligible difference on 16 cores).

The good news, Chuikov, is that if you decide to throw in a second CPU, then the evidence from my time trial is that the run times continue to scale pretty well at higher core counts.
Quote:
Originally Posted by CHUIKOV View Post
Dear all,


cpu: AMD Epyc 7f52 (ES), ram: 8x16gb 2666 DDR4 2R, storage:Samsung 970 evo plus, 1TB


openfoam v8, compiled with znver2, SMT OFF, cpus set to performance.


Results:


# cores Wall time (s):
------------------------
1 593.62
2 315.34
4
6 98.33
8 78.05
12
16 43.66


for some reason did not get results for 4 and 12 cores,


cheers!
ErikAdr is offline   Reply With Quote

Old   February 10, 2021, 15:08
Default
  #356
New Member
 
Rob
Join Date: Apr 2018
Posts: 18
Rep Power: 8
Morlind is on a distinguished road
edit - this post was intended to respond to a post much further back in the thread. Apologies:


I agree with the previous statement. Intel V2 era hardware with DDR3 RAM is massive value for CFD use. I am currently running a stack of these. I periodically check what newer/better hardware might cost and it so far has not shown enough performance to warrant the cost of upgrading. By that I mean it's generally ~double the cost for ~10-20% gain. These are extrapolated numbers based on some benchmarks I've done on newer borrowed systems and it's very likely that a game-changing solution will arrive in time but V4/DDR4 hardware doesn't appear to be that.
Froz3nTree likes this.
Morlind is offline   Reply With Quote

Old   February 11, 2021, 17:47
Default Ryzen 5600x benchmark
  #357
New Member
 
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6
gpouliasis is on a distinguished road
Fellow foamers,

I hereby submit my results from my system comprising of the following hardware:
CPU: AMD Ryzen 5600x
RAM: GSkill 2x8Gb (f4-3600c16d-16gtzn)
Mobo: Asus TUF X570 gaming plus (Running the latest updates)
PSU: Corsair 850w

Currently running OpenFOAM 1912, compiled from source, in ubuntu 20.04 with updated kernel 5.10. I have used DRAM calculator on A0/B0 Fast settings and that was as far as I could get with a stable system given my knoweledge and time availiable. These are my best results. I also attach a spreadsheet with all my benchmarks and the Ryzen benchmarks I could find in this thread. These show clearly how much the overall performance depends upon RAM performance. Nothing new I know, but still I think it is nice.

cores Time Scale
1 431.36 1.00
2 263.03 1.64
4 198.1 2.18
6 184.35 2.34

To be honest I think I can do better with the timings and improve the overall results and the scaling. I would expect I would be able to approach close to x3. Also the lower latency (c14) model of the same ram looks like it would improve the situation a lot. Still, even the above are great results for a system that costs around 1000 euros (depends on where you live and the availiabillity) and is an all rounder. Moreover, when compared to the other Ryzen systems the scale performance is worse, with other systems reaching more than 3x scale. My guess is that because the CPU is faster the RAM creates an even stronger bottleneck in the computations. Maybe a more knowledgeable person can comment on that.

Additionally I did run a bunch of bencharmks which you might find interesting.
Phoronix Test Suite v10.2.1:
Memory Copy - Array Size: 4096 MiB: Average: 23256.531 MiB/s
Memory Copy, Fixed Block Size - Array Size: 4096 MiB: Average: 11724.460 MiB/s

Memory Bandwidth (mbw):
Method: MEMCPY Elapsed: 0.37036 MiB: 5000.00000 Copy: 13500.378 MiB/s
Method: DUMB Elapsed: 0.21765 MiB: 5000.00000 Copy: 22973.085 MiB/s
Method: MCBLOCK Elapsed: 0.29842 MiB: 5000.00000 Copy: 16754.909 MiB/s

y-cruncher with 2,500,000,000 decimal digits on 6 cores (this one calculates pi up to a specific accuracy and it is a personal favourite) :
Total Computation Time: 190.771 seconds ( 3.180 minutes )
Start-to-End Wall Time: 202.055 seconds ( 3.368 minutes )

CPU Utilization: 698.81 % + 0.43 % kernel overhead
Multi-core Efficiency: 58.23 % + 0.04 % kernel overhead

Blender Benchmark:
bmw27: 3m43s
classroom: 10m6s
fishy_cat: 4m45s
koro: 5m29s
pavillon_barcelona: 10m30s
victor: 16m25s

Throughout the above benchmarks the processor did not pass 81 degress with the stock cooler and adequate case cooling with 6 fans. No thermal throttling was observed.

I hope the above will be helpfull to someone.

Cheers
Attached Files
File Type: xlsx Scale_Tests.xlsx (9.4 KB, 14 views)
ships26 and linuxguy123 like this.

Last edited by gpouliasis; February 13, 2021 at 14:36.
gpouliasis is offline   Reply With Quote

Old   February 12, 2021, 04:01
Default
  #358
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
I think you can gain some performance with better memory settings. If you are lucky you might also be able to increase the Infinity Fabric beyond 1800 MHz.


Anyways, I would not purchase more expensive RAM in your shoes. It is one of the fastest results we have seen for single threaded performance which will benefit a number of cases. Nice value!
Simbelmynė is offline   Reply With Quote

Old   February 12, 2021, 06:34
Default
  #359
Senior Member
 
Join Date: Apr 2020
Location: UK
Posts: 737
Rep Power: 14
Tobermory will become famous soon enough
Quote:
Originally Posted by gpouliasis View Post
Fellow foamers,
These show clearly how much the overall performance depends upon RAM performance. Nothing new I know, but still I think it is nice.

cores Time Scale
1 431.36 1.00
2 263.03 1.64
4 198.1 2.18
6 184.35 2.34
Agreed - keep in mind that the Ryzen 5600x only has 2 memory channels, and so despite the 6cores/12threads I assume that it will still bottleneck on access to the RAM, whatever the RAM performance.
Tobermory is offline   Reply With Quote

Old   February 12, 2021, 08:45
Default
  #360
New Member
 
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6
gpouliasis is on a distinguished road
Quote:
Originally Posted by Tobermory View Post
Agreed - keep in mind that the Ryzen 5600x only has 2 memory channels, and so despite the 6cores/12threads I assume that it will still bottleneck on access to the RAM, whatever the RAM performance.
Indeed, this true. That it why I chose 6 cores and not more. They would be useless for my applications, they would probably need a different cooler than the stock one and many core cpus usually require better motherboards. So an overall increase in cost that is not justified.

P.S. Great choice of whiskey
Tobermory likes this.
gpouliasis is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 15:22.