CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree547Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 23, 2018, 10:21
Default
  #121
New Member
 
Join Date: Nov 2016
Posts: 15
Rep Power: 10
nmc1988 is on a distinguished road
Quote:
Originally Posted by Simbelmynë View Post
Here's the modified script file. There is no fail-safe to check if the meshes exist so you have to run the cases at least once before using this. It is reasonably easy to add that check though.


Code:
#!/bin/bash


for (( t=0; t<3; t++ )); do

    # Clear old data
    for i in 2 4; do
        cd run_${i}
        if [ -d ./100 ]; then
        # Control will enter here if $DIRECTORY exists and delete it if so.
        rm -r 100
        fi
        x=${i}
        for (( c=0; c<x; c++ ));do
        if [ -d ./processor${c} ]; then    
           cd processor${c}
           if [ -d ./100 ]; then
               # Control will enter here if $DIRECTORY exists and delete it if so.
               rm -r 100
           fi
           cd ..
        fi
        done
        if [ -f ./log.simpleFoam ]; then
        # Control will enter here if $FILE exists and delete it if so.
        rm log.simpleFoam
        fi    

        cd ..
    done

    # Run cases
    for i in 2 4; do
        echo "Run for ${i}..."
        cd run_$i
        if [ $i -eq 1 ] 
        then
        simpleFoam > log.simpleFoam 2>&1
        else
        mpirun -np ${i} simpleFoam -parallel > log.simpleFoam 2>&1
        fi
        cd ..
    done

    # Extract times
    echo "# cores   Wall time (s):"
    echo "------------------------"
    for i in 2 4; do
        echo $i `grep Execution run_${i}/log.simpleFoam | tail -n 1 | cut -d " " -f 3`
    done

done
It works well. Thank you
nmc1988 is offline   Reply With Quote

Old   August 25, 2018, 22:16
Default
  #122
New Member
 
cody
Join Date: Mar 2009
Posts: 9
Rep Power: 17
codygo is on a distinguished road
First run on 1x Epyc 7351, 8x 16GB rank 2 memory @ 2666hz, H11DSI-NT motherboard. The benchmark ran on OpenFOAM v1806, DP, without the streamline functions.

Code:
# cores   Exec. time (s)  Clocktime (s)
--------------------------------------------------
1            1023.46         1024
2            545.19          545
4            222.54          223
6            157.33          158
8            115.97          116
12           98.95           99
16           76.71           77 
20           85.53           108
24           64.33           99
I decided to rename Exec and Clocktime due to the results over 16 cores showing "better" performance. I was watching the runs complete in top and saw that the runs with over 16 processes were indeed slower.

Same hardware on Windows 10 Pro, OpenFOAM v1806 running natively in bash on Windows. Half the speed!
Code:
# cores   Wall time (s):
------------------------
1           1605.89
2           598.03
4           320.08
6           228.38
8           188.62
12          162.96
16          154.91

Last edited by codygo; August 30, 2018 at 23:47. Reason: Added Windows Results
codygo is offline   Reply With Quote

Old   August 29, 2018, 17:22
Default 1920x results
  #123
Member
 
Geir Karlsen
Join Date: Nov 2013
Location: Norway
Posts: 59
Rep Power: 14
gkarlsen is on a distinguished road
Some really good deals on first generation Threadrippers out there now. Here are my results for the 1920x (stock). 3200 MT/s RAM, Ubuntu 18.04LTS, OpenFoam 6. SMT off

# cores Wall time (s):
------------------------
1 779.33
2 391.56
4 218.26
6 180.71
8 155.25
10 149.34
12 142.2
gkarlsen is offline   Reply With Quote

Old   August 29, 2018, 20:17
Default
  #124
New Member
 
Join Date: Nov 2016
Posts: 15
Rep Power: 10
nmc1988 is on a distinguished road
Quote:
Originally Posted by gkarlsen View Post
Some really good deals on first generation Threadrippers out there now. Here are my results for the 1920x (stock). 3200 MT/s RAM, Ubuntu 18.04LTS, OpenFoam 6. SMT off

# cores Wall time (s):
------------------------
1 779.33
2 391.56
4 218.26
6 180.71
8 155.25
10 149.34
12 142.2
It looks better than 1950x (result on the first page)
nmc1988 is offline   Reply With Quote

Old   August 30, 2018, 05:39
Default
  #125
Senior Member
 
Simbelmynë's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynë is on a distinguished road
Quote:
Originally Posted by gkarlsen View Post
Some really good deals on first generation Threadrippers out there now. Here are my results for the 1920x (stock). 3200 MT/s RAM, Ubuntu 18.04LTS, OpenFoam 6. SMT off

# cores Wall time (s):
------------------------
1 779.33
2 391.56
4 218.26
6 180.71
8 155.25
10 149.34
12 142.2



Yeah, the results are better than the 1950X we have. Exactly what memory did you use? Is is single or dual rank?
Simbelmynë is offline   Reply With Quote

Old   August 30, 2018, 06:46
Default
  #126
Member
 
Geir Karlsen
Join Date: Nov 2013
Location: Norway
Posts: 59
Rep Power: 14
gkarlsen is on a distinguished road
Quote:
Originally Posted by Simbelmynë View Post
Yeah, the results are better than the 1950X we have. Exactly what memory did you use? Is is single or dual rank?
I use 4x8 GB of GSkill FlareX 3200 MT/s, CL 14-14-14-34. Model no: F4-3200C14Q-32GFX.

SMT off improved my results by a tiny bit. So did flashing the BIOS for the most recent AGESA. I also had significantly better results with OpenFoam 6 packaged for Ubuntu than with v-1806 through Docker (which makes sense I guess). Which OpenFoam version did you use?

Some day, I will attempt pushing the RAM and/or CPU to higher clocks to see if there is any gain to be had. I have decent thermal headroom with the Enermax TR4 AIO
gkarlsen is offline   Reply With Quote

Old   October 7, 2018, 14:37
Default
  #127
Member
 
Ed O'Malley
Join Date: Nov 2017
Posts: 30
Rep Power: 9
edomalley1 is on a distinguished road
Here's some results from the slowest end of the spectrum. I ran this on a brand new HP Spectre x360 13 inch touchscreen convertible laptop. i7-8550U 4 core 1.8 Ghz (up to 4 GHz) processor with 8MB cache. 16 GB LPDDR3-2133 memory.

OpenFOAM 6 run on Windows 10 with Ubuntu app.

Code:
# cores   Wall time (s):
------------------------
1        1434.42
2        575.16
4        469.43
edomalley1 is offline   Reply With Quote

Old   October 9, 2018, 16:32
Default
  #128
Senior Member
 
Join Date: Oct 2011
Posts: 242
Rep Power: 17
naffrancois is on a distinguished road
Hello, OpenFoam v1806 linux binaries:


2x Intel Xeon Gold 6148 +12x8Go 2666MHz single rank
# cores Wall time (s):
------------------------
10 105.82
20 64.77
40 49.28
naffrancois is offline   Reply With Quote

Old   October 11, 2018, 19:56
Default
  #129
New Member
 
Join Date: Oct 2018
Posts: 3
Rep Power: 8
DÆdalus is on a distinguished road
Hi. Are there any benchmarks on the new ryzen threadripper 2990wx with 32 cores and 64 threads? I would like to see how it scales on some time dependent 3D fluid dynamic problem, on some mpi implementation.
DÆdalus is offline   Reply With Quote

Old   October 11, 2018, 20:12
Default
  #130
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You can take a look at the Threadripper results that are already there. It won't get any better with a 2990WX.
flotus1 is offline   Reply With Quote

Old   October 11, 2018, 21:12
Default
  #131
New Member
 
Join Date: Oct 2018
Posts: 3
Rep Power: 8
DÆdalus is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
You can take a look at the Threadripper results that are already there. It won't get any better with a 2990WX.
Ok, thank you. Should I expect it to scale linearly over the 64 threads as if it were 64 independent processors? BTW, I would like to know what kind of algorithms are being tested, how is the algorithm parallelized. Are there any benchmarks with time dependent domain decomposition methods?
DÆdalus is offline   Reply With Quote

Old   October 12, 2018, 01:27
Default
  #132
Senior Member
 
Simbelmynë's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynë is on a distinguished road
Quote:
Originally Posted by DÆdalus View Post
Ok, thank you. Should I expect it to scale linearly over the 64 threads as if it were 64 independent processors? BTW, I would like to know what kind of algorithms are being tested, how is the algorithm parallelized. Are there any benchmarks with time dependent domain decomposition methods?

No. You can expect it to scale according to memory bandwidth. In the case of Threadripper and 3200 MHz memory we see that up to 8 cores give nice scaling and at 12 cores we hit the wall.



Regarding time-dependent benchmarks I suggest the damBreak 3D. We could open a separate thread for that.
Simbelmynë is offline   Reply With Quote

Old   October 12, 2018, 05:41
Default
  #133
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
tl;dr: don't buy Threadrippr CPUs with more than 16 cores for CFD. They are designed for the exact opposite of the computing spectrum: compute-bound algorithms.

Quote:
Should I expect it to scale linearly over the 64 threads as if it were 64 independent processors?
Absolutely not. Best case scenario would be that you get the same performance and scaling as with a TR 1950X. And that will only happen if you manage to keep threads away from the "rendering cores" without direct memory access.

Quote:
BTW, I would like to know what kind of algorithms are being tested, how is the algorithm parallelized
Parallelization is done via MPI with a static domain decomposition. IIRC, METIS is used by default.
Don't know the exact algorithm used in this benchmark, we both would have to look it up.
flotus1 is offline   Reply With Quote

Old   October 12, 2018, 12:22
Default
  #134
New Member
 
Join Date: Oct 2018
Posts: 3
Rep Power: 8
DÆdalus is on a distinguished road
Great, thank you both. What would be the technical reason why it wouldn't scale? there is latency related to the message passing? I would like to understand what you have said about the memory bandwidth.
DÆdalus is offline   Reply With Quote

Old   October 12, 2018, 13:02
Default
  #135
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Building a "home workstation" for ANSYS Fluent
Building a "home workstation" for ANSYS Fluent
Epyc 7301 ws

Parallel CFD codes are usually bandwidth limited. This means that the code balance (memory traffic per floating point operation) is rather high. In other words: the amount of actual calculations is low.
Most current CPUs have a much lower machine balance (peak memory bandwidth divided by peak floating point operations per second). So while doing CFD calculations, most of the time is spent waiting for data from memory. This is what usually causes poor scaling beyond 2-3 cores per memory channel.
This imbalance gets worse the more cores a CPU has, rendering CPUs with very high core counts basically useless for CFD. You could have bough a cheaper CPU with less cores and get more or less the same performance.
Threadripper CPUs with 24 and 32 cores have another issue on top of that: half of their cores have no direct connection to memory. See my second link.
flotus1 is offline   Reply With Quote

Old   October 18, 2018, 06:29
Default
  #136
Senior Member
 
Simbelmynë's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynë is on a distinguished road
Finally after many weeks of waiting (the motherboard was DOA) I got our new system up and running.

2 x EPYC 7301. 16*8 GB DDR4 2666 MHz, Ubuntu 18.04.1, OpenFOAM6

Code:
# cores   Wall time (s):
------------------------
1   704.64
2   358.31
4   169.83
6   118.95
8   91.48
16  51.13
32  32.76
*** Update *** When comparing against other results here I became a bit skeptical to the results. They seemed too good. The reason may be that I used the motorbike basecase from the OpenFOAM6 directories since the original basecase did not work. Everything went fine but it seems that 500 rather than 100 iterations were taken (and obviously something else as well since this seems very unreasonable).



It turns out that the snappyHexMesh run needs some etc files which are not located in /caseDicts, but rather in /caseDicts/mesh/. I am re-running the cases now with the standard basecase.


Here are the correct (and more reasonable) results
Code:
# cores   Wall time (s):
------------------------
1   1023.87
2   507.03
4   237.98
6   151.23
8   123.04
16  57.86
32  36.81
flotus1 likes this.

Last edited by Simbelmynë; October 18, 2018 at 11:38. Reason: Possible errors in the calculation
Simbelmynë is offline   Reply With Quote

Old   October 18, 2018, 06:33
Default
  #137
New Member
 
Join Date: Nov 2016
Posts: 15
Rep Power: 10
nmc1988 is on a distinguished road
Quote:
Originally Posted by Simbelmynë View Post
Finally after many weeks of waiting (the motherboard was DOA) I got our new system up and running.

2 x EPYC 7301. 16*8 GB DDR4 2666 MHz, Ubuntu 18.04.1, OpenFOAM6

Code:
# cores   Wall time (s):
------------------------
1   704.64
2   358.31
4   169.83
6   118.95
8   91.48
16  51.13
32  32.76
The result looks great. How much is your new system ?
nmc1988 is offline   Reply With Quote

Old   October 18, 2018, 08:35
Default
  #138
Senior Member
 
Simbelmynë's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynë is on a distinguished road
Quote:
Originally Posted by nmc1988 View Post
The result looks great. How much is your new system ?

Do you mean price? Around 5000 Euro, excluding tax. The memory was approximately 50% of the total cost (only 128 GB).
Simbelmynë is offline   Reply With Quote

Old   October 18, 2018, 09:11
Default
  #139
New Member
 
Join Date: Nov 2016
Posts: 15
Rep Power: 10
nmc1988 is on a distinguished road
Quote:
Originally Posted by Simbelmynë View Post
Do you mean price? Around 5000 Euro, excluding tax. The memory was approximately 50% of the total cost (only 128 GB).


Yes, I mean the price! The DDR4 memory is still expensive. I am considering to buy a secondhand 2x2690v2 system with price around 1500$. What do you think ?
nmc1988 is offline   Reply With Quote

Old   October 18, 2018, 11:40
Default
  #140
Senior Member
 
Simbelmynë's Avatar
 
Join Date: May 2012
Posts: 551
Rep Power: 16
Simbelmynë is on a distinguished road
The 2690v2 is a great price/performance choice if you can accept buying a refurbished system!
Simbelmynë is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 15:42.