OpenFOAM benchmarks on various hardware

wkernkamp · December 10, 2019, 20:25

Geekbench 4 is better. It includes memory bandwidth testing. (The Epyc has almost twice the score on bandwidth.) Did not find results for the newest processors. Therefore a comparison between the previous generations Epyc/Threadripper:

Processor 7401P 2970WX

Single-Core Score 3619 4509

Multi-Core Score 58498 55139

Single-Core Bandwith Score 7401P 2970WX
4378 2701
23.4 GB/sec 14.4 GB/sec

Multi-Core Bandwith Score 7401P 2970WX
8745 4577
46.7 GB/sec 24.4 GB/sec

ctd · December 18, 2019, 22:03

Quote:

Originally Posted by flotus1

Due to an appalling lack of Epyc Rome equipment on my part, I can not help you with finding that bios option. But I would not be surprised if Supermicro just left it out. "Screw that noise, more options would just confuse our customers"

It is partly out of curiosity, but I also think it should give you some better performance with NUMA-aware software like OpenFOAM.

The following results with the 2X EPYC 7302 were obtained with NPS4 set in the bios:

# cores Wall time (s):
------------------------
8 79.41
16 41.55
32 26.53

SLC,

Unfortunately I don't have access to Fluent.

Maphteach · December 20, 2019, 14:30

Hello,

I have a question that is at a very hard level :

(If there is a need to add shemas or diagrams to illustrate my interrogation and assumption to make it clearer I can add some, because I have some in my computers)

What i can notice all along this thread, as a personal conclusion, is that to ensure the best performances of cfd calculation on a cluster it is better to in a way or another having the number of cores of CPUs equal to the total number of memory channels. In this way I am sure there will be absolutely no bottleneck in the performances in having a hardware ratio of "1 CPU core / 1 memory channel".

Indeed the results of this benchmark show that almost 100% of the dual socket epyc tests that have been made in this thread result to :

Speedup of the calculation = number of cpu cores used (when number of cores = or < number of memory channels)

And

Speedup of the calculation starts to become < number of cpu cores used (when number of cores > number of memory channels)

Right ? :-). And so that the speedup results of allmost all the dual socket epyc tests were about at least 16 when 16 cores were used. And even a speedup about more than 32 with 32 cores of calculation with the 2x2 epyc connected with infiniband test of havref ... And this is because all epyc processors have 8 memory channels each.

(even if I must admit that I do not understand why in reality your results often even respectively exceed the value of 16 and 32 for current dual epyc tests and for the 2x2 epyc tests of havref, but this is an other story, and I don't want to add difficulties to the current reasoning problem ...)

But we also have seen that when number of cores becomes > number of memory channels, then a stall suddenly appears in the speedup results and a small decrease of the speedup value starts becoming bigger and bigger all along the number of cpu cores increases...

But I have a question as the version of PCI-E 3.0 is going to change through the PCI-E 4.0 : Is that going to change in some way that general rule of performance that I am describing to you in this post ?

As the bandwidth of the new bus is going to be multiplied by 2, does it means that the new rule of performance with PCI-E 4.0 will become :

Speedup of the calculation = number of cpu cores used (when number of cores = or < 2 x number of memory channels)

Because of a new bandwidth twice larger than the previous one with PCI-E 4.0. To some extent my question could be : is the RAM bandwidth will also be multiplied by 2 all along the path of the datas between the cpu and the ram (all along the path...) ? And is this bandwidth increase can be to some extent a substitution to the half of the "channel memory" well known problem ...

I know that this is a really hard question what I am asking. So I don't want to force anyone to respond to it.

Thank you all

Maphteach

flotus1 · December 20, 2019, 18:13

Quote:

to associate one cpu core with one memory channel

If the premise here is that CPU cores get exclusive use of a memory channel: this is not the case.
The integrated memory controller acts as an additional layer, and handles memory access for all cores associated to it. Or more precisely: for all memory attached to it.
The reason for the slowdown is much simpler: with more cores processing the calculations faster, thus more data per unit of time has to be fed to the CPU cores. The memory bandwidth becomes the bottleneck. On top of that, memory latency increases with higher load on the memory subsystem.

As for the PCIe topic: this has nothing to do with system memory.
PCIe is used to connect stuff like graphics cards or NVMe SSDs. Integrated memory controllers, as the name suggests, are integrated into the CPU itself. They do not communicate with CPU cores via PCIe. New versions of the PCIe standard will have no influence on system memory bandwidth or latency.

Maphteach · December 23, 2019, 12:29

Thank you for your fast response flotus1, I will read it carefully in order to understand well ...

Maphteach

1st update:

Quote:

Originally Posted by flotus1

If the premise here is that CPU cores get exclusive use of a memory channel: this is not the case.

Actually, what I tried to say wasn't really that 1 memory channel has to be associated with 1 core of cpu. I have written my post more in order to say that the best performance of calculation is achieved when the number of memory channel is equal to the number of cpu cores ...

Anyway this is what I conclude when I see the trends of the results of your tests and benchmark and I find they encouraging in the way that we can understand what increase the performance of the speedup thanks to they.

And I note that the Epyc CPUs communicate each other really faster than the intel CPUs and I think this is really noteworthy ...

Because communication between CPUs and nodes is a really important parameter of the cfd parallel speed calculation.

Quote:

Originally Posted by flotus1

As for the PCIe topic: this has nothing to do with system memory.
PCIe is used to connect stuff like graphics cards or NVMe SSDs. Integrated memory controllers, as the name suggests, are integrated into the CPU itself. They do not communicate with CPU cores via PCIe. New versions of the PCIe standard will have no influence on system memory bandwidth or latency.

But for the PCI-E, I am not sure. That is true and I agree with you when you say that "Integrated memory controllers, as the name suggests, are integrated into the CPU itself."

A few days ago I was learning about this subject, and I learnt that the Front-side bus (FSB) that was in the old days connecting the CPU with the memory controller doesn't exist anymore nowadays because the memory controller is now integrated inside the CPU. So I really agree with you on this point.

However I wouldn't say that the PCI-E is a kind of an intruder relatively speaking to the field of the cfd performance calculation, because I have heard that this is half the number of lines of PCI-E that belongs to each CPU which determines the speed of communication between several CPUs on a motherboard. 64 PCI-E lines for each Epyc CPUs against something that seems to be lower concerning the communication between several Intel CPUs on a motherboard ... Explaining probably the difference of calculation performance between the two brands of CPU.

(for the 64 lines I found those informations on the wikipedia Epyc web page https://en.wikipedia.org/wiki/Epyc)

Furthermore, even knowing that memory controllers are now integrated inside the CPUs, and that the FSB disappeared a long time ago. Yesterday I was looking for informations on that subject and I found architectures schemes of the CPU Epyc 7002 generation that suggest that the PCI-E is playing an important role inside the CPU architecture even if right now I don't exactly know what kind of role it is playing. (scheme : https://www.microway.com/hpc-tech-ti...me-chipletdie/)

And seriously speaking if the PCI-E plays an important role inside the CPU, so why not in the RAM operations ? Because today, memory controllers are integrated inside the CPU architecture.

2nd update:
I have found more informations about Epyc CPUs architecture, schemes that are describing memory channel and PCI-E more like 2 different lines connecting very close to each other but with a different and perpendicular direction for each one. They are more like crossing each other but not overlaying. So I think they are not sharing a bandwidth And so that respond to my question.

Consequently we can say that the PCI-e 4.0 doesn't solve the RAM Channel problem. But this was a difficult question to know the reel differences between PCI-e and RAM channel as for example the person who wrote the wikipedia AMD Epyc page made the response to this question a little bit ambiguous in writting "eight DDR4 SDRAM and PCIe 4.0 channels" to describe the AMD CPU in its article, like if PCI-e was also the channels ... But it's not the case.

Thank you

Maphteach

Maphteach · January 3, 2020, 23:20

Hello,

Havref, is it possible to know which version of infiniband you have used to realize your Epyc dual servers test and getting such high results !?

I mean which one of these versions, to know the bandwidth of your connection :
10 Gbit/s (SDR, Single Data Rate), 20 Gbit/s (DDR, Double Data Rate), 40 Gbit/s (QDR, Quad Data Rate), 56 Gbit/s (FDR, Fourteen Data Rate), 100 Gbit/s (EDR, Eighteen DataRate) or 200 Gbit/s (HDR, High Data Rate) ?

Thank you

Maphteach

erik87 · January 4, 2020, 06:34

Quote:

Originally Posted by Maphteach

Hello,

Havref, is it possible to know which version of infiniband you have used to realize your Epyc dual servers test and getting such high results !?

I mean which one of these versions, to know the bandwidth of your connection :
10 Gbit/s (SDR, Single Data Rate), 20 Gbit/s (DDR, Double Data Rate), 40 Gbit/s (QDR, Quad Data Rate), 56 Gbit/s (FDR, Fourteen Data Rate), 100 Gbit/s (EDR, Eighteen DataRate) or 200 Gbit/s (HDR, High Data Rate) ?

Thank you

Maphteach

Minimum Infiniband FDR 40 Gbits would be fine to get good results as i ve seen the comparison between 1 Gbit, 10 Gbit and Infiniband FDR 40 Gbit at

https://www.hpcadvisorycouncil.com/p...l_2680_FCA.pdf

So its all about lowing latencies between the network connections, and a good infiniband network connection can handle it to achive remarkable results. But scaling would be best for using more than just 2 nodes. For the purpose of building a dualnode cluster 1 Gbit is ok.

Maphteach · January 4, 2020, 07:08

Thank you erik87.

That's an interesting post and response.

Maphteach

Maphteach · January 7, 2020, 13:50

Does anyone has tried with more than 40 Gbits/s for the inifiniband bandwidth ?

Quote:

Originally Posted by havref

Though I had published our Epyc 7351 results already, but here they are. We have two servers which are getting the same results:

Code:

1x dual Epyc 7351 - 16x8 1R DDR5 2666MHz - Ubuntu 16.04 - OpenFOAM 6 - caches cleared
# cores   Wall time (s):
------------------------
1 	1035.30
2 	 583.49
4 	 236.28
6 	 155.03
8 	 111.10
12	  78.33
16	  58.51
20	  54.19
24	  47.13
28	  45.89
32	  36.82


2x dual Epyc 7351 InfiniBand connected - Ubuntu 16.04  - OpenFOAM 6 - caches cleared
# cores   Wall time (s):    Processors in use pr. server:
------------------------------------------------
32 	       28.54			16 
40 	       26.50  			20
48 	       22.90  			24
56 	       21.00    		28
64 	       19.16			32

Havref, were your results obtained with a 40Gbits/s bandwidth or a superior one ?

I think the best solution would be to have an inifiniband bandwidth equal to the bus connecting two Epyc CPUs on a motherboard. That's maybe possible because HDR supports up to 200Gbits/s. But a very expensive solution I know it ... A HDR card is at something like 1000 usd. That means 1000 usd for each node. A very high price !

Thanks

Maphteach

Maphteach · January 13, 2020, 21:51

Quote:

Originally Posted by tpearson-raptor

2x IBM POWER9 Sforza 22 core CPUs [1], 8x16GB 2Rx4 DDR4-2400 registered ECC, OpenFOAM 5.x (GitHub version), Ubuntu 18.04, kernel 4.18-rc1

Code:

# Cores           Wall time [s]
------------------------------------------------------------              
1         677.38
2         366.04
4         180.1
6         124.17
8         96.64
12         70.16
16         56.39
20         47.47
24         41.76
44         36.71

[1] https://raptorcs.com/content/CP9M08/intro.html

Hello tpearson-raptor, how did you find IBM CPUs on the market, I cannot find it and I don't know why ...

Is raptorcs.com your website ?

What I am looking for are the LaGrange/Monza packages witth more than 4 channel memory ! Please say me where I can't find this kind of quality products for high level computation ?

Thank you

Maphteach

metalman · January 15, 2020, 18:36

OS- Ubuntu 16.04
OpenFOAM 4

CPU- Auto
RAM- AUTO MODE (UMA) 3200CL16

# cores Wall time (s):
------------------------
1 800.82
2 440.71
4 240.89
6 198.61
8 185.58

CPU- Auto
RAM- CHANNEL MODE (NUMA) 3200CL16
# cores Wall time (s):
------------------------
1 773.59
2 446.37
4 230.31
6 181.91
8 158.45

CPU- Auto
RAM- CHANNEL MODE (NUMA) 3400CL16
# cores Wall time (s):
------------------------
1 746.15
2 427.8
4 206.95
6 182.88
8 150.72

CPU- 4.2 ALL CORE
RAM- CHANNEL MODE (NUMA) 3400CL16
# cores Wall time (s):
------------------------
1 738.21
2 421.37
4 204.84
6 176.33
8 150.08

Analysis- A 9.6% all core overclock yielded 0% gains. UMA to NUMA topology change yielded a 14.5% advantage. A 5.9% RAM overclock yielded a 4.9% advantage. Conclusion- Gen 1 Threadripper is bandwidth starved in CFD applications.

I have a 1920X on the way and I will confirm that Threadripper hits a wall at 12 cores.

Edit- SMT was OFF for all runs.

samouyal · February 3, 2020, 10:38

Hi everyone,

Here are the performance results for 2x Intel Xeon 6240 (18 cores @ 2.6 GHz), RAM 32GB x 12 DIMM DDR4 Synchronous 2933 MHz running on CentOS Linux release 7.7.1908 (Core).
The following results are for 1 node with 2 CPUs. I'll soon try with multiple nodes to test scaling - Infiniband is used so I expect good results for that as well.

# cores Wall time (s) Speedup
----------------------------------
1 819.96 -
2 457.78 1.79
4 200.44 4.09
6 135.14 6.07
8 102.61 7.99
12 72.34 11.33
16 58.81 13.94
18 54.46 15.06
20 51.13 16.04
24 46.85 17.05
28 43.78 18.72
32 41.62 19.70
36 40.74 20.13

Near-perfect results up to 12-16 cores, then scaling drops. From reading the early posts, I assume it's related to the memory channels available.

There is however one issue, I am getting the following error after the iterations are done, and after the writing of solution files:

[9]
[9]
[9] --> FOAM FATAL ERROR:
[9] Attempt to return primitive entry ITstream : IOstream.functions.streamLines.seedSampleSet, line 0, IOstream: Version 2.0, format ascii, line 0, OPENED, GOOD
primitiveEntry 'seedSampleSet' comprises
on line 0: word 'uniform'
as a sub-dictionary
[9]
[9] From function virtual const Foam::dictionary& Foam:

rimitiveEntry::dict() const
[9] in file db/dictionary/primitiveEntry/primitiveEntry.C at line 184.
[9]

Any idea how to resolve it? These errors are messing with my profiling

flotus1 · February 3, 2020, 10:41

Probably related to streamlines. Here are the workarounds, unfortunately these were never included in the original post.
OpenFOAM benchmarks on various hardware

samouyal · February 3, 2020, 12:34

Quote:

Originally Posted by flotus1

Probably related to streamlines. Here are the workarounds, unfortunately these were never included in the original post.
OpenFOAM benchmarks on various hardware

Hi Alex,

Thanks for the fast answer. That actually solved it and I was able to run IPM to get an estimation of the MPI time.

As expected, communication is relatively low with MPI taking about 15% of the total wall time when using 36 cores. Therefore the lack of scaling when using more than 12 cores is most likely memory bound.

Han_sel · February 22, 2020, 09:31

Hi,
i run the benchmark on 2x xeon 2687w v2 (3.4GHz) (edited) - 8x8GB 2Rx4 DDR3-1600 PC3-12800R ECC vlp - with OF 7 - Elementary os 5.1.2 - kernel 5.3.0-40-generic
HyperThreading: off
NUMA setting: on

The run itself:
# cores Wall time (s):
------------------------
1 901.43
2 485.12
4 219.16
6 157.99
8 132.82
12 110.53
16 102.02

Meshing:
1 25m43,839s
2 17m15,793s
4 9m54,322s
6 6m55,330s
8 5m46,486s
12 5m46,486s
16 4m37,098s

The scaling for 16 is not the best ~8.8. I used the two scripts from flotus1 and they took around a second from the run on all cores, so thank you for the tip.

Simbelmynë · February 22, 2020, 12:14

@Han_sel

Those numbers seem really good, especially considering you only use 1600 MHz memory modules. The highest Ivy Bridge results (in this thread) are around 1 iteration/s and with 1866 MHz memory, so your setup is definitely doing something good!

You write 3.5 GHz, since the official base frequency is 3.4 GHz, does it mean that you have done some light overclock (fsb overclock perhaps)? Since the all-core turbo-boost is 3.6 GHz this is a bit puzzling though.

Anyhow, seems to work great.

Han_sel · February 23, 2020, 06:05

Quote:

Originally Posted by Simbelmynë

@Han_sel

Those numbers seem really good, especially considering you only use 1600 MHz memory modules. The highest Ivy Bridge results (in this thread) are around 1 iteration/s and with 1866 MHz memory, so your setup is definitely doing something good!

You write 3.5 GHz, since the official base frequency is 3.4 GHz, does it mean that you have done some light overclock (fsb overclock perhaps)? Since the all-core turbo-boost is 3.6 GHz this is a bit puzzling though.

Anyhow, seems to work great.

Sorry for the confusion, I remembered the info incorrectly, the base is 3.4 GHz. It’s dell t7610 so no OC possible as far as I know. It does run on all-core turbo 3.6 GHz during the benchmark as you said though.
I have looked at others results and now I wonder as well why it performs

flotus1 · February 23, 2020, 07:23

I think the results are well within expectations. Here are my results with the predecessor, Xeon E5-2687W"v1" OpenFOAM benchmarks on various hardware
Performance-wise, many of the benchmark numbers posted here are on the lower end of the spectrum. Running benchmarks is hard, especially if you want consistently good results.

Han_sel · February 24, 2020, 15:36

Quote:

Originally Posted by flotus1

I think the results are well within expectations. Here are my results with the predecessor, Xeon E5-2687W"v1" OpenFOAM benchmarks on various hardware
Performance-wise, many of the benchmark numbers posted here are on the lower end of the spectrum. Running benchmarks is hard, especially if you want consistently good results.

Thank you, seeing these results, mine seem reasonable. Both of these I understand and also experienced the variability in results I got, so I rather posted results which seemed close to average.

kstuart · February 29, 2020, 02:57

Dell R820 4x E5-4640 2.6ghz 16x4gb PC312800 This was my first run, I'm hoping with some work it can go better. This cost me $520 shipped.

# cores Wall time (s):
------------------------
1 1137.55
2 619.25
4 264.8
6 187.7
8 142.35
10 125.07
12 105.16
14 96.76
16 85.26
18 85.96
20 77.75
22 78.91
24 71.71
26 75.19
28 69.97
32 72.9

December 20, 2019, 14:30	Pci-e 4.0 and new performances	#243
Maphteach New Member Clément Join Date: Dec 2019 Location: France Posts: 6 Rep Power: 7	Hello, I have a question that is at a very hard level : (If there is a need to add shemas or diagrams to illustrate my interrogation and assumption to make it clearer I can add some, because I have some in my computers) What i can notice all along this thread, as a personal conclusion, is that to ensure the best performances of cfd calculation on a cluster it is better to in a way or another having the number of cores of CPUs equal to the total number of memory channels. In this way I am sure there will be absolutely no bottleneck in the performances in having a hardware ratio of "1 CPU core / 1 memory channel". Indeed the results of this benchmark show that almost 100% of the dual socket epyc tests that have been made in this thread result to : Speedup of the calculation = number of cpu cores used (when number of cores = or < number of memory channels) And Speedup of the calculation starts to become < number of cpu cores used (when number of cores > number of memory channels) Right ? :-). And so that the speedup results of allmost all the dual socket epyc tests were about at least 16 when 16 cores were used. And even a speedup about more than 32 with 32 cores of calculation with the 2x2 epyc connected with infiniband test of havref ... And this is because all epyc processors have 8 memory channels each. (even if I must admit that I do not understand why in reality your results often even respectively exceed the value of 16 and 32 for current dual epyc tests and for the 2x2 epyc tests of havref, but this is an other story, and I don't want to add difficulties to the current reasoning problem ...) But we also have seen that when number of cores becomes > number of memory channels, then a stall suddenly appears in the speedup results and a small decrease of the speedup value starts becoming bigger and bigger all along the number of cpu cores increases... But I have a question as the version of PCI-E 3.0 is going to change through the PCI-E 4.0 : Is that going to change in some way that general rule of performance that I am describing to you in this post ? As the bandwidth of the new bus is going to be multiplied by 2, does it means that the new rule of performance with PCI-E 4.0 will become : Speedup of the calculation = number of cpu cores used (when number of cores = or < 2 x number of memory channels) Because of a new bandwidth twice larger than the previous one with PCI-E 4.0. To some extent my question could be : is the RAM bandwidth will also be multiplied by 2 all along the path of the datas between the cpu and the ram (all along the path...) ? And is this bandwidth increase can be to some extent a substitution to the half of the "channel memory" well known problem ... I know that this is a really hard question what I am asking. So I don't want to force anyone to respond to it. Thank you all Maphteach Last edited by Maphteach; December 21, 2019 at 13:22.

January 3, 2020, 23:20		#246
Maphteach New Member Clément Join Date: Dec 2019 Location: France Posts: 6 Rep Power: 7	Hello, Havref, is it possible to know which version of infiniband you have used to realize your Epyc dual servers test and getting such high results !? I mean which one of these versions, to know the bandwidth of your connection : 10 Gbit/s (SDR, Single Data Rate), 20 Gbit/s (DDR, Double Data Rate), 40 Gbit/s (QDR, Quad Data Rate), 56 Gbit/s (FDR, Fourteen Data Rate), 100 Gbit/s (EDR, Eighteen DataRate) or 200 Gbit/s (HDR, High Data Rate) ? Thank you Maphteach __________________ If you want to learn more about myself, and about what I am working on : https://www.maphteach-cfd-simulations.fr/

January 4, 2020, 07:08		#248
Maphteach New Member Clément Join Date: Dec 2019 Location: France Posts: 6 Rep Power: 7	Thank you erik87. That's an interesting post and response. Maphteach __________________ If you want to learn more about myself, and about what I am working on : https://www.maphteach-cfd-simulations.fr/

January 15, 2020, 18:36	Threadripper 1900x	#251
metalman New Member Matthew Sparkman Join Date: Jan 2020 Posts: 1 Rep Power: 0	OS- Ubuntu 16.04 OpenFOAM 4 CPU- Auto RAM- AUTO MODE (UMA) 3200CL16 # cores Wall time (s): ------------------------ 1 800.82 2 440.71 4 240.89 6 198.61 8 185.58 CPU- Auto RAM- CHANNEL MODE (NUMA) 3200CL16 # cores Wall time (s): ------------------------ 1 773.59 2 446.37 4 230.31 6 181.91 8 158.45 CPU- Auto RAM- CHANNEL MODE (NUMA) 3400CL16 # cores Wall time (s): ------------------------ 1 746.15 2 427.8 4 206.95 6 182.88 8 150.72 CPU- 4.2 ALL CORE RAM- CHANNEL MODE (NUMA) 3400CL16 # cores Wall time (s): ------------------------ 1 738.21 2 421.37 4 204.84 6 176.33 8 150.08 Analysis- A 9.6% all core overclock yielded 0% gains. UMA to NUMA topology change yielded a 14.5% advantage. A 5.9% RAM overclock yielded a 4.9% advantage. Conclusion- Gen 1 Threadripper is bandwidth starved in CFD applications. I have a 1920X on the way and I will confirm that Threadripper hits a wall at 12 cores. Edit- SMT was OFF for all runs. flotus1 and erik87 like this.

February 3, 2020, 10:38	Intel Xeon 6540	#252
samouyal New Member Solal Amouyal Join Date: Sep 2019 Posts: 3 Rep Power: 7	Hi everyone, Here are the performance results for 2x Intel Xeon 6240 (18 cores @ 2.6 GHz), RAM 32GB x 12 DIMM DDR4 Synchronous 2933 MHz running on CentOS Linux release 7.7.1908 (Core). The following results are for 1 node with 2 CPUs. I'll soon try with multiple nodes to test scaling - Infiniband is used so I expect good results for that as well. # cores Wall time (s) Speedup ---------------------------------- 1 819.96 - 2 457.78 1.79 4 200.44 4.09 6 135.14 6.07 8 102.61 7.99 12 72.34 11.33 16 58.81 13.94 18 54.46 15.06 20 51.13 16.04 24 46.85 17.05 28 43.78 18.72 32 41.62 19.70 36 40.74 20.13 Near-perfect results up to 12-16 cores, then scaling drops. From reading the early posts, I assume it's related to the memory channels available. There is however one issue, I am getting the following error after the iterations are done, and after the writing of solution files: [9] [9] [9] --> FOAM FATAL ERROR: [9] Attempt to return primitive entry ITstream : IOstream.functions.streamLines.seedSampleSet, line 0, IOstream: Version 2.0, format ascii, line 0, OPENED, GOOD primitiveEntry 'seedSampleSet' comprises on line 0: word 'uniform' as a sub-dictionary [9] [9] From function virtual const Foam::dictionary& Foam:rimitiveEntry::dict() const [9] in file db/dictionary/primitiveEntry/primitiveEntry.C at line 184. [9] Any idea how to resolve it? These errors are messing with my profiling

December 10, 2019, 20:25		#241
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 372 Rep Power: 14	Geekbench 4 is better. It includes memory bandwidth testing. (The Epyc has almost twice the score on bandwidth.) Did not find results for the newest processors. Therefore a comparison between the previous generations Epyc/Threadripper: Processor 7401P 2970WX Single-Core Score 3619 4509 Multi-Core Score 58498 55139 Single-Core Bandwith Score 7401P 2970WX 4378 2701 23.4 GB/sec 14.4 GB/sec Multi-Core Bandwith Score 7401P 2970WX 8745 4577 46.7 GB/sec 24.4 GB/sec

February 3, 2020, 10:41		#253
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	Probably related to streamlines. Here are the workarounds, unfortunately these were never included in the original post. OpenFOAM benchmarks on various hardware

February 22, 2020, 09:31	2x Intel Xeon e5-2687w V2	#255
Han_sel New Member Jan Join Date: Apr 2018 Location: Czechia Posts: 3 Rep Power: 8	Hi, i run the benchmark on 2x xeon 2687w v2 (3.4GHz) (edited) - 8x8GB 2Rx4 DDR3-1600 PC3-12800R ECC vlp - with OF 7 - Elementary os 5.1.2 - kernel 5.3.0-40-generic HyperThreading: off NUMA setting: on The run itself: # cores Wall time (s): ------------------------ 1 901.43 2 485.12 4 219.16 6 157.99 8 132.82 12 110.53 16 102.02 Meshing: 1 25m43,839s 2 17m15,793s 4 9m54,322s 6 6m55,330s 8 5m46,486s 12 5m46,486s 16 4m37,098s The scaling for 16 is not the best ~8.8. I used the two scripts from flotus1 and they took around a second from the run on all cores, so thank you for the tip. Last edited by Han_sel; February 23, 2020 at 05:47. Reason: Can’t remember specs - base frequency is 3.4GHz thank you @Simbelmynë

February 22, 2020, 12:14		#256
Simbelmynë Senior Member Join Date: May 2012 Posts: 552 Rep Power: 16	@Han_sel Those numbers seem really good, especially considering you only use 1600 MHz memory modules. The highest Ivy Bridge results (in this thread) are around 1 iteration/s and with 1866 MHz memory, so your setup is definitely doing something good! You write 3.5 GHz, since the official base frequency is 3.4 GHz, does it mean that you have done some light overclock (fsb overclock perhaps)? Since the all-core turbo-boost is 3.6 GHz this is a bit puzzling though. Anyhow, seems to work great.

February 23, 2020, 07:23		#258
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	I think the results are well within expectations. Here are my results with the predecessor, Xeon E5-2687W"v1" OpenFOAM benchmarks on various hardware Performance-wise, many of the benchmark numbers posted here are on the lower end of the spectrum. Running benchmarks is hard, especially if you want consistently good results.

February 29, 2020, 02:57		#260
kstuart New Member Kurt Stuart Join Date: Feb 2020 Location: Southern illinois Posts: 19 Rep Power: 6	Dell R820 4x E5-4640 2.6ghz 16x4gb PC312800 This was my first run, I'm hoping with some work it can go better. This cost me $520 shipped. # cores Wall time (s): ------------------------ 1 1137.55 2 619.25 4 264.8 6 187.7 8 142.35 10 125.07 12 105.16 14 96.76 16 85.26 18 85.96 20 77.75 22 78.91 24 71.71 26 75.19 28 69.97 32 72.9

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology	wyldckat	OpenFOAM	17	November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days	joegi.geo	OpenFOAM Announcements from Other Sources	0	October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016	cfd.direct	OpenFOAM Announcements from Other Sources	0	May 3, 2016 05:57
New OpenFOAM Forum Structure	jola	OpenFOAM	2	October 19, 2011 07:55
Hardware for OpenFOAM LES	LijieNPIC	Hardware	0	November 8, 2010 10:54