|
[Sponsors] |
October 25, 2024, 15:54 |
Performance of Epyc Turin
|
#801 |
Member
dab bence
Join Date: Mar 2013
Posts: 48
Rep Power: 13 |
An AMD performance brief compares the Turin 9755 (2x128) with the Genoa 9654 (2x96) and shows a 43% uplift on the composite OpenFoam benchmarks they chose.
http://www.amd.com/content/dam/amd/e...b-openfoam.pdf |
|
October 29, 2024, 02:41 |
|
#802 |
New Member
Marc
Join Date: Mar 2022
Posts: 6
Rep Power: 4 |
Hi all,
I'm trying to run the benchmark attached on the original post named "bench_template.tar.gz" on my PC: 2 x Intel Xeon E5-2690 v4 (14 cores, 2.6 GHz, 35Mb L3) | 8 x 16GB DDR4 ECC | 1TB HDD | Ubuntu 24.04 LTS | openFOAM-2312 It looks like snappyHexMesh is failing to create the mesh. Is there an updated version maybe? |
|
October 29, 2024, 04:18 |
|
#803 |
Senior Member
andy
Join Date: May 2009
Posts: 327
Rep Power: 18 |
It didn't work for me either (Ubuntu 24.04) but the test case seemed to be one of the tutorials with, if memory serves, an increased grid density so I ran that. I'm not in the office at the moment but will look for the script I used when I am back.
|
|
October 29, 2024, 06:53 |
|
#804 |
Senior Member
andy
Join Date: May 2009
Posts: 327
Rep Power: 18 |
OK some of what I did is coming back but I am not an openfoam user and was hacking to get something working rather than carefully setting up a benchmark.
Version 11 and 12 of openfoam are organised significantly differently and require different scripts. Don't know about earlier versions. I ran version 12 something like this (change the list of number of processors for the mesh, solver and writing the timing to taste): Code:
#!/bin/bash # PREPROCS="1 2 4 8 16" # RUNPROCS="1 2 4 8 16" PREPROCS="" RUNPROCS="1" TIMPROCS="1 2 4 8 16" # Prepare cases # This example runs on 1, 2 and 4 cores for i in $PREPROCS; do d=run_$i echo "Prepare case ${d}..." cp -r basecase $d cd $d cp $FOAM_TUTORIALS/resources/geometry/motorBike.obj.gz constant/geometry/ surfaceFeatures > log.surfaceFeatures 2>&1 blockMesh > log.blockMesh 2>&1 if [ $i -eq 1 ] then snappyHexMesh -overwrite > log.snappyHexMesh 2>&1 else sed -i "s/numberOfSubdomains.*/numberOfSubdomains ${i};/" system/decomposeParDict decomposePar -copyZero > log.decomposePar 2>&1 mpirun -np ${i} snappyHexMesh -overwrite -parallel > log.snappyHexMesh 2>&1 fi cd .. done # Run cases for i in $RUNPROCS; do echo "Run for ${i}..." cd run_$i if [ $i -eq 1 ] then potentialFoam > log.potentialFoam 2>&1 foamRun -solver incompressibleFluid > log.incompressibleFluid 2>&1 else # mpirun -np ${i} patchSummary -parallel > log.patchSummary 2>&1 mpirun -np ${i} potentialFoam -parallel > log.potentialFoam 2>&1 mpirun -np ${i} foamRun -solver incompressibleFluid -parallel > log.incompressibleFluid 2>&1 reconstructPar -latestTime > log.reconstructPar 2>&1 # foamRun -solver incompressibleFluid -parallel #mpirun -np ${i} foamRun -solver incompressibleFluid -parallel > log.simpleFoam 2>&1 fi cd .. done # Extract times echo "# cores Wall time (s):" echo "------------------------" for i in $TIMPROCS; do echo $i `grep Execution run_${i}/log.incompressibleFluid | tail -n 1 | cut -d " " -f 3` done Code:
#!/bin/bash # Prepare cases # This example runs on 1, 2 and 4 cores for i in 1 2 4; do d=run_$i echo "Prepare case ${d}..." cp -r basecase $d cd $d if [ $i -eq 1 ] then mv Allmesh_serial Allmesh fi sed -i "s/method.*/method scotch;/" system/decomposeParDict sed -i "s/numberOfSubdomains.*/numberOfSubdomains ${i};/" system/decomposeParDict time ./Allmesh cd .. done # Run cases for i in 1 2 4; do echo "Run for ${i}..." cd run_$i if [ $i -eq 1 ] then simpleFoam > log.simpleFoam 2>&1 else mpirun -np ${i} simpleFoam -parallel > log.simpleFoam 2>&1 fi cd .. done # Extract times echo "# cores Wall time (s):" echo "------------------------" for i in 1 2 4; do echo $i `grep Execution run_${i}/log.simpleFoam | tail -n 1 | cut -d " " -f 3` done system/controlDict: < endTime 500; > endTime 100; system/blockMeshDict: < hex (0 1 2 3 4 5 6 7) (20 8 8) simpleGrading (1 1 1) > hex (0 1 2 3 4 5 6 7) (40 16 16) simpleGrading (1 1 1) system/decomposeParDict: < numberOfSubdomains 16; > numberOfSubdomains 2; The first two change the number of iterations and the mesh size hints to match. Not sure about the 3rd but I may have been fiddling. I am not an openfoam user and was making guesses at the likely meaning of parameters. What is really needed is for an openfoam user to diff the earlier benchmark and the current tutorial and keep the parameter changes that are relevant. Whatever, my benchmark results are broadly in line with expectations and so if there are differences they are small. Last edited by andy_; October 29, 2024 at 12:41. |
|
October 29, 2024, 19:56 |
|
#805 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
I find it extremely annoying that the basic call for the simpleFoam solution has been changed by openfoam.org. They remained the same for OpenFoam.com. (OpenFOAM v2312). I remember loosing all interest in the ruby language because they kept changing the language so that you had to rewrite your code for every version. Developers that don't know that better is the enemy of good, should be shot to save us all a lot of time. |
||
October 30, 2024, 06:24 |
|
#806 |
Senior Member
andy
Join Date: May 2009
Posts: 327
Rep Power: 18 |
||
October 30, 2024, 22:24 |
|
#807 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
I looked at your previous posts. I was confused because some are quotes of other people 's results. Is your current system the one doing just over 100 seconds? If it is, you can do ~64 seconds with two 16+ cores. My fastest one does 60 seconds. They have the same total 8 memory channels at 2400 MT/s as you. However, they have much more L3 cache and that is a factor too. Upgrade your bios to the latest before installing the high core count cpus!
|
|
October 31, 2024, 05:35 |
|
#808 |
New Member
Marc
Join Date: Mar 2022
Posts: 6
Rep Power: 4 |
Hi all,
I finally modified the motorbike tutorial to match the same configuration as in the benchmark from the original post. I've attached the modified code which worked for v2312. These are the results I got for this PC config: HP Z840 | 2 x Intel Xeon E5-2690 v4 (14 cores, 2,6 GHz, 35Mb L3) | 8 x 16GB DDR4 ECC | 1TB HDD | Ubuntu 24.04 LTS Code:
cores MeshTime(s) RunTime(s) ----------------------------------- 1 1403.79 1098.68 2 949.89 551.16 4 495.73 246.11 6 361.35 163.72 8 293.58 128.46 12 244.06 99.28 16 229.99 84.12 20 186.59 78.14 24 183.3 74.44 28 177.25 72.7 |
|
November 10, 2024, 15:56 |
Mac M4 Clusters ?
|
#809 |
Member
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7 |
The new Mac Mini M4 is very fast and really cheap and can be purchased with a 10 GB Ethernet port.
How would a cluster (8 or 16) of Minis perform using a 10 GB Ethernet backbone ? The plain Mini also has a Thunderbolt 4 port that can transfer data at 40Gb/s while the Pro Mini has a Thunderbolt 5 port that can transfer data at 120Gb/s. I bet that a special router could be designed to give these machines incredible backbone bandwidth. Thunderbolt encapsulates PCIe. https://en.wikipedia.org/wiki/Thunderbolt_(interface) |
|
November 10, 2024, 17:40 |
|
#810 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
|
||
November 11, 2024, 07:01 |
|
#811 | |
Senior Member
andy
Join Date: May 2009
Posts: 327
Rep Power: 18 |
Quote:
So I contacted Apple to get some values for relevant benchmarks rather than the irrelevant ones PC publications tended to use and Apple was using in their advertising to demonstrate how much "faster" their chip was compared to current intel chips. They didn't have any. So I asked to be put through to their internal technical support. Extraordinarly (to naive me) they didn't have that either. Technical support was provided by 3rd parties and so they put me through to a chain of shops which indeed had a small technical support department. Unfortunately it was technical support for what Apple customers tend to want to do with Apple computers (e.g. generating media using point and click) rather than crunching numbers. They were happy to give me access to the hardware but they had little idea what I was on about and when I sat down to compile and run some benchmarks the Apple development environment had not even been installed. As expected the benchmarks ran fast on tiny problems but slowly on normal sized problems. The department's cluster ended up using fairly expensive motherboards with fast memory support and the cheapest intel chips (i.e. lowest clockspeed) that supported it. Given how Apple operates, their target market and how they price things the possibility of any Apple hardware offering a general high technical performance for the money is pretty low. It is not zero though and given the effectiveness of their marketing people looking to purchase clusters to crunch numbers will benefit from relevant hard evidence (unless they are fanboys of course). Clusters of ARM chips may well be about to become a good choice for CFD but I rather doubt Apple will be the supplier because of their pricing. Perhaps I should add that Apple may be a reasonable choice for a desktop if CFD is only part of what is done with the machine. Indeed for 6 years I used an Apple laptop for office, lab and presenting but less so for software development or running simulations like CFD. |
||
November 11, 2024, 12:43 |
|
#812 | ||||
Member
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7 |
A lot has changed since then.
Quote:
Quote:
Quote:
10 cores, 16 GB RAM, 256 GB SSD, 3 Thunderbolt 4 ports, US$600. Can add 10 GB Ethernet for $125. Please show me a faster unit of computing for less money. Quote:
I am not an Apple fan. I don't own a single Apple product. I'm just looking for the cheapest way to run CFD cases fast. If you can show why an M4 Mac Mini won't do that then I am all ears. Otherwise you are adding nothing to this conversation. |
|||||
November 11, 2024, 12:54 |
|
#813 | |
Member
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7 |
Quote:
How big is a "small" cluster ? 10 nodes ? 20 ? 32 ? M4 Mac Minis supposedly have a memory bandwidth of "120 GB/s". M4 Mac Mini Pros supposedly have "over half a terrabyte/sec" of memory bandwidth . https://en.wikipedia.org/wiki/Apple_M4 AMD EPYC Rome (Zen2, 7002) has a memory bandwidth of ~ 200 GB/sec (single socket). 8 Channels of DDR4-3200. |
||
November 12, 2024, 20:01 |
|
#814 |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
M4 Mac mini base model 4P+6E
# cores Wall time (s): ------------------------ 1 315.54 2 191.29 4 118.64 8 111.61 The efficiency core is kind of useless. Can't wait to see M4 Pro/M4 Max results. |
|
November 12, 2024, 20:10 |
|
#815 | |
Member
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7 |
Quote:
10 M4 Minis in a cluster would be 15 seconds ? |
||
November 12, 2024, 21:05 |
|
#816 |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
||
November 12, 2024, 23:04 |
|
#817 |
Member
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7 |
||
November 12, 2024, 23:07 |
|
#818 |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
||
November 15, 2024, 05:31 |
|
#819 |
Member
|
||
November 17, 2024, 18:32 |
|
#820 |
New Member
Kevin Nolan
Join Date: Nov 2012
Posts: 13
Rep Power: 14 |
So I've gotten my hands on a dual E5-2630-v3 Xeon Workstation with 128 (8x16) gigs of ECC 2133 MHz RAM. It's an ASUS Z10PA-D8 motherboard.
I've installed Ubunto 24.04 and OpenFOAM 2406 and ran gumersindu's updated benchmark. Code:
cores MeshTime(s) RunTime(s) ----------------------------------- 1 1692.75 1095.52 2 1161.71 561.19 4 575.49 252.13 6 449.91 172.21 8 371.42 140.73 12 296.43 111.46 16 272.53 98.67 I've also got an M4 Pro (12 Core 48GB) Mac Mini on the way. For reference here is my M3 Max again. Code:
cores MeshTime(s) RunTime(s) ----------------------------------- 1 510.43 377.13 2 311.33 209.7 4 195.35 110.33 6 145.09 77.5 8 124.87 63.6 12 125.53 81.98 Last edited by Kolan; November 17, 2024 at 19:11. Reason: updated 6 and 12 core runs for the M3 Max |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |