CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree542Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 14, 2019, 08:07
Default
  #181
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
hi guys could you please tell me how to edit the "run" script in the motorbike case in order to run the simulation on a 2 node cluster in an authomatic way? i don't know how to "tell" the script to read the " --hostfile machinefile " and the machinefile should change in the following way:

A)
Master cpu = 2
node cpu = 2

B)
Master cpu=3
node cpu=3

C)
master cpu=4
node cpu =4

D)
master cpu=6
node cpu=4

Could anyone give me kindly any suggestion please?

Astan
Astan is offline   Reply With Quote

Old   March 14, 2019, 08:14
Default
  #182
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
This is one way (how I do it):


(It also repeats the calculation 3 times) Also, I have a NFS mounted folder called "cloud" as you can see in the example.


Note that you might want to edit the Allmesh file as well in order to mesh using the cluster.


I would probably just update the "machines" manually.




Code:
  #!/bin/bash
 

 

 for (( t=0; t<3; t++ )); do
 

     # Clear old data
     for i in 14; do
         cd run_${i}
         if [ -d ./100 ]; then
         # Control will enter here if $DIRECTORY exists and delete it if so.
         rm -r 100
         fi
         x=${i}
         for (( c=0; c<x; c++ ));do
         if [ -d ./processor${c} ]; then    
            cd processor${c}
            if [ -d ./100 ]; then
                # Control will enter here if $DIRECTORY exists and delete it if so.
                rm -r 100
            fi
            cd ..
         fi
         done
         if [ -f ./log.simpleFoam ]; then
         # Control will enter here if $FILE exists and delete it if so.
         rm log.simpleFoam
         fi    
 

         cd ..
     done
 

     # Run cases
     for i in 14; do
         echo "Run for ${i}..."
         cd run_$i
         if [ $i -eq 1 ] 
         then
         simpleFoam > log.simpleFoam 2>&1
         else
         #mpirun --hostfile ~/dev/cloud/bench_template/machines -np ${i} simpleFoam -parallel
         mpirun --hostfile ~/dev/cloud/bench_template/machines -np ${i} --bind-to core simpleFoam -parallel > log.simpleFoam 2>&1
         
         fi
         cd ..
     done
 

     # Extract times
     echo "# cores   Wall time (s):"
     echo "------------------------"
     for i in 14; do
         echo $i `grep Execution run_${i}/log.simpleFoam | tail -n 1 | cut -d " " -f 3`
     done
 

 done
Simbelmynė is offline   Reply With Quote

Old   March 14, 2019, 08:27
Default
  #183
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
hi Simbelmynė, thanks you very much for the script, i'll try to run the simulation and post the results.

I've noticed the option " --bind-to-core " it is not the first time i read it, from what i've read on the internet it is used when dealing with clusters, but i don't really understand what it is used for.

Could you please explain me what does it do? does the running time decrease with respect to the usual " mpirun --hostfile ecc ecc " without the "bind-to-core" option?

Astan
Astan is offline   Reply With Quote

Old   March 14, 2019, 08:48
Default
  #184
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
bind-to-core was something that I tested when the performance was poor. In some cases it can dramatically improve performance, but in my case it did not. Test with it and/or without it.


Also, here is some code that you can add if you wish to have it automated:


Code:
  for ((t=0; t<4; t++)); do
 

     echo "Master cpu = $((t+2))" > machinesFile
     echo "node cpu = $((t+2))" >> machinesFile
 

 if [ ${t} -eq 3 ]; then
     echo "Master cpu = $((t+3))" > machinesFile
     echo "node cpu = $((t+1))" >> machinesFile
 fi
 

 done
(change the top part of the original for loop and add the echo statements at the top)
Simbelmynė is offline   Reply With Quote

Old   March 15, 2019, 19:38
Default Also trying Cluster
  #185
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
Hi Simbelmynė,


I am trying to build a cluster of Dell R810's quad processors. Just got the second one (and the two-node cluster) working. However, it turns out this second one is much slower than the first. 50% slower single core and 200% slower at 32 cores. The important difference is that the first (good) one has all dimm slots filled with 8Gb while the second one has half of the slots filled with 16Gb RDIMMs R4x4. The behaviour indicated a memory bottleneck so I ran mem diagnostics. There were no issues. Also went ahead and did an overall system check. Everything shows a Pass.


BIOS settings are same between machines, except that I also disabled prefetch this morning with no effect. It looks like I am just getting half the bandwith with half the slots filled. How can I get around that?


Will


P.S. The new machine has 4x E7-8870 2.4 Hz and the good machine has 4x E7-4870 2.4 Hz.
wkernkamp is offline   Reply With Quote

Old   March 16, 2019, 04:01
Default
  #186
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
Perhaps the memory is incorrectly populated?


Thermal throttling on one or more CPUs?


Did you buy from a reputable source? If not, then perhaps you have engineering samples in the new setup?


What operating system do you run? Check the hardware info.



Btw, this question is probably better off in a new thread.
Simbelmynė is offline   Reply With Quote

Old   March 16, 2019, 09:46
Default
  #187
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
thanks you for the information Simbelmynė, i'll try and post the results!

Astan
Astan is offline   Reply With Quote

Old   March 16, 2019, 14:35
Default
  #188
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
A single Dell Poweredge R810 4x(E7-4870 2.4 Ghz x 10 cores/20 threads)
1 2138.42
2 1213.28
4 454.14
6 329.9
8 243.29
10 166.17
12 182.94
16 160.47
20 149.23
24 144.08
30 148.82
32 139.42
36 138.79
40 151.01


Two Dell Poweredge R810 4x(E7-4870 2.4 Ghz x 10 cores/20 threads)
First half of processes on one node, the rest on the other.

8 263.8
16 155.51
24 122.77
32 105.56
40 102.47
48 86.83
64 78.25


The speed of the network is just 1Gb Ethernet right now (switch limited), so I might improve on this a bit. The difference for 8 nodes is 20 seconds.

Last edited by wkernkamp; March 17, 2019 at 20:53.
wkernkamp is offline   Reply With Quote

Old   March 28, 2019, 21:28
Default Dell R710 benchmark
  #189
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
Dell Poweredge R710 2x Xeon E5649 2.53ghz 12-Cores / 48gb
memory runs at 1066 MHz.
(Max memory speed is 1333 MHz, Faster processors available such as Xeon X5690)

Performance with 2x E5649 is:
1 1486.54
2 880.04
4 422.03
6 342.61
8 317.83
12 307.18
wkernkamp is offline   Reply With Quote

Old   March 30, 2019, 11:57
Default
  #190
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
i7-940 @ 4.2 GHz. DDR3 1600 MT/s, rank 2. Ubuntu 16.04. OpenFOAM v6



4 429.09


Interesting side note: I did not install anything on the computer, in fact I just moved the SSD from a different computer, OS and all (because I am lazy). Correct installation and building from source with a later gcc might improve the results a bit.


Quite impressed with the performance of such an old mainstream chip.
Simbelmynė is offline   Reply With Quote

Old   March 30, 2019, 13:13
Default
  #191
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Triple-channel memory?
flotus1 is offline   Reply With Quote

Old   March 30, 2019, 14:41
Default
  #192
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynė is on a distinguished road
Yes, triple channel.
Simbelmynė is offline   Reply With Quote

Old   March 30, 2019, 15:34
Default
  #193
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
What result does this refer to?
wkernkamp is offline   Reply With Quote

Old   March 30, 2019, 16:54
Default
  #194
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
What result does this refer to?
wkernkamp is offline   Reply With Quote

Old   April 2, 2019, 08:52
Default
  #195
New Member
 
Join Date: Aug 2018
Posts: 4
Rep Power: 8
hennings is on a distinguished road
Let me share results of two quad-socket machines.

4x Xeon E7-8857v2, 48x16GB 2Rx4 DDR3-1866, CentOS 7.2 (on VMware ESXi), OpenFoam 2.3.x
Code:
# cores   Wall time (s)   speedup:
------------------------------------------------
1         981.09          1
2         468.17          2.09
4         233.54          4.20
6         161.95          6.05
8         121.9           8.04
12        87.34           11.23
16        67.97           14.43
20        59.46           16.5
24        54              18.16
28        50.64           19.37
32        47.39           20.70
4x Xeon Gold 6130, 24x32GB 2Rx4 DDR4-2666, CentOS 7.2 (on VMware ESXi), OpenFoam 2.3.x
Code:
# cores   Wall time (s)   speedup:
------------------------------------------------
1         726.86          1
2         372.47          1.95
4         190.08          3.82
6         133.52          5.44
8         90.02           8.07
12        74.69           9.73
16        52.41           13.86
20        48.08           15.11
24        43.93           16.54
28        35.43           20.51
32        33.17           21.91
40        25.15           28.9
48        23.27           31.23
56        22.18           32.77
64        22.3            32.59
Overall, quite convincing results for a virtualized environment I'd say.
hennings is offline   Reply With Quote

Old   April 17, 2019, 00:24
Default
  #196
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by hennings View Post
Let me share results of two quad-socket machines.

4x Xeon E7-8857v2, 48x16GB 2Rx4 DDR3-1866, CentOS 7.2 (on VMware ESXi), OpenFoam 2.3.x
Code:
# cores   Wall time (s)   speedup:
------------------------------------------------
1         981.09          1
2         468.17          2.09
4         233.54          4.20
6         161.95          6.05
8         121.9           8.04
12        87.34           11.23
16        67.97           14.43
20        59.46           16.5
24        54              18.16
28        50.64           19.37
32        47.39           20.70
4x Xeon Gold 6130, 24x32GB 2Rx4 DDR4-2666, CentOS 7.2 (on VMware ESXi), OpenFoam 2.3.x
Code:
# cores   Wall time (s)   speedup:
------------------------------------------------
1         726.86          1
2         372.47          1.95
4         190.08          3.82
6         133.52          5.44
8         90.02           8.07
12        74.69           9.73
16        52.41           13.86
20        48.08           15.11
24        43.93           16.54
28        35.43           20.51
32        33.17           21.91
40        25.15           28.9
48        23.27           31.23
56        22.18           32.77
64        22.3            32.59
Overall, quite convincing results for a virtualized environment I'd say.

Nice results and interesting. I have bought a few old R810 servers and found that you need to fill all dimm slots and that the Rank of the dimms has to be right so that the total rank per channel is 8. Otherwise you loose 60% in speed on this benchmark. It looks like you are not optimal, but I did not dive into the manuals to see how many dimm slots your machines have. Don't think the virtualization makes a big difference.


Let me know if you manage to speed it up even more!


Will
wkernkamp is offline   Reply With Quote

Old   April 17, 2019, 03:46
Default
  #197
New Member
 
Join Date: Aug 2018
Posts: 4
Rep Power: 8
hennings is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
Nice results and interesting. I have bought a few old R810 servers and found that you need to fill all dimm slots and that the Rank of the dimms has to be right so that the total rank per channel is 8.
The memory cartridges are at least fully populated according to HP's best practices (HP DL580 Gen8 and HP DL560 Gen10). Still, there may be room for improvement...
hennings is offline   Reply With Quote

Old   April 27, 2019, 22:11
Default R710 now with faster E5675 processor 6% faster than E5649
  #198
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
Dell Poweredge R710
12x4Gb Rdimm 1067Mhz

old result:

2xE5649 2.53ghz 6 cores per cpu:
Meshing Times:
1 2319.46
2 1526.14
4 840.08
6 653.38
8 547.74
10 540.91
12 533.59
Flow Calculation:
1 1486.54
2 880.04
4 422.03
6 342.61
8 317.83
10 333.38
12 307.18



new result:

2xX5675 3.07ghz 6 cores per cpu

Meshing Times:
1 1998.08
2 1313.22
4 719.71
6 558.17
8 466.22
12 449.43
Flow Calculation:
1 1322.84
2 787.4
4 375.77
6 305.44
8 286.3
12 278.02

Last edited by wkernkamp; May 1, 2019 at 01:49.
wkernkamp is offline   Reply With Quote

Old   May 19, 2019, 02:57
Default 2x E5-4627 v2 rdimm 16x 8Gb 1866 GHz
  #199
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14
wkernkamp is on a distinguished road
2x E5-4627 v2 rdimm 16x 8Gb 1866 GHz


Meshing Times:
1 1480.36
2 992.82
4 542.2
8 329.79
12 294.62
14 245
16 246.43
Flow Calculation:
1 938.32
2 506.25
4 236.32
8 131.57
12 108.1
14 102.96
16 101.14
wkernkamp is offline   Reply With Quote

Old   July 8, 2019, 22:24
Default
  #200
Member
 
Hector
Join Date: Jul 2010
Location: Barcelona
Posts: 30
Rep Power: 16
hectorgabriel85 is on a distinguished road
I am wondering about adding renumberMesh to the process and how it will change speed-up results and/or absolute values of each wall time.
hectorgabriel85 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 19:02.