|
[Sponsors] |
May 27, 2022, 15:30 |
|
#501 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Well, I had a new workstation to play around with. Unfortunately, I can't get the benchmark to run properly.
I tried both compiling 2112 from source, as well as using the OpenFOAM 2112 installation from the OpenSUSE science repository. The solver runs, but the mesh is not created properly, leading to solver run times of ~16s on a single core. I used the bench_template_v02.zip provided by Simbelmynė The problems are the same. Here are the mesh logs from the single-core directory: blockMesh.txt decomposePar.txt snappyHexMesh.txt surfaceFeatures.txt Maybe one of you can point me in the right direction. |
|
May 28, 2022, 01:26 |
|
#502 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
I have run OF v2112. My MeshQualityDict in system has this includeEtc:
#includeEtc "caseDicts/meshQualityDict" You seem to have the one that calls out caseDicts/mesh/generation/meshQualityDict. As is shown in the snappyHexMesh.txt file. That may be for OpenFOAM v9. (Not sure). If this doesn't solve it, I will upload my entire basecase directory. Just let me know. |
|
May 28, 2022, 05:39 |
|
#503 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Thanks, I changed that line in meshQualityDict.
Unfortunately, that didn't do the trick. If you could provide me with a basecase and run script known to work with 2112, that would be great. |
|
May 28, 2022, 13:46 |
|
#504 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Here it is. Run it with run.tst The file has a list of numbers of nodes at the beginning. A little further down you can set prep=0 to avoid recalculating the mesh if you already have a valid mesh. In the loop for running openFOAM itself, I remove the simpleFoam log files, etc to allow a rerun to proceed. On the first try, these files are not there yet, so you see an error message that you can ignore.
|
|
May 28, 2022, 18:39 |
|
#505 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Phew, that finally worked. If you don't mind, I would like to add your script to the first post of this thread, or link to your post. Please let me know if you are ok with that.
Anyway, here is my new toy. Well not actually mine, but I still got to play with it for a while. Hardware: 2x AMD Epyc 7543, Gigabyte MZ72-HB0, 16x64GB DDR4-3200 (RDIMM, 2Rx4) Bios settings: SMT disabled, workload tuning: HPC optimized, power settings: default, ACPI SRAT L3 cache as NUMA domain: enabled (results in 16 NUMA nodes) Software: OpenSUSE Leap 15.3 with backport kernel., OpenFOAM v2112 compiled via gcc 11.2.1, using march=znver3, OpenMPI 4.1.4, scaling governor: performance, cleared caches before each run using "echo 3 > /proc/sys/vm/drop_caches" Code:
simpleFoam run times for 100 iterations: #threads | runtime/s ==================== 01 | 471.92 02 | 227.14 04 | 108.51 08 | 52.11 16 | 28.81 32 | 18.11 48 | 15.46 64 | 13.81 Also, using one NUMA node per CCX is still a little faster than the usual recommendation of NPS=4. But of course would have huge drawbacks for software that isn't NUMA agnostic. Tweaking bios settings can be tricky. I got consistently worse performance when tweaking the power settings more towards performance. There is probably still a little more to gain, but I'd rather not overdo it with bios settings on someone else's hardware. I should also note that some of the runs with intermediate thread counts needed some hand-holding. E.g. the threads for the 02 run got mapped to cores on the same memory controller with default settings. Running with "mpirun -np 2 --bind-to core --rank-by core --map-by socket" fixes that. Last edited by flotus1; May 29, 2022 at 08:19. |
|
May 28, 2022, 20:29 |
|
#506 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Go right ahead posting my modification of the original script. Before you do, you might include the mpirun you used for certain cases. I have been doing similar things as you can see from the number of mpirun versions that were commented out. I have a version somewhere that splits it out based on number of cores. The strategy to set run parameters will be different for each cpu. It is still nice to have it in the script so that people can develop their plan without having to reinvent the wheel.
Nice job evaluating the borrowed machine. I also found that bios tweaking does not do much, except the memory has to be set for performance (obviously). I also don't bother setting the fans to maximum. Some servers are very noisy that way. Plus, the fans will spin as needed. |
|
May 29, 2022, 08:16 |
|
#507 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Well, the precise mpirun commands for consistent results vary with the number of threads. Someone else might be able to find a single command that works for all thread counts, but then there are still the variables of hardware, NUMA topology and MPI libraries. I don't think there is a "one size fits all" solution here.
I could try to go more into detail about what to look for, but it would end up being a rather lengthy post titled "how to benchmark correctly". Which -as pedants in the field may argue- we are all doing wrong anyway by leaving turbo boost enabled for such a short benchmark Maybe another day. |
|
May 29, 2022, 18:53 |
|
#508 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Agreed, but what I meant was to leave the default as is, but add your special case commented out with a short description explaining the specific use.
|
|
May 30, 2022, 03:24 |
|
#509 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
There are many ways to achieve the same result, most of them more elegant than what I did: https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php
What I ended up using: Code:
mpirun -np 2 --bind-to core --rank-by core --map-by socket mpirun -np 4 --bind-to core --rank-by core --cpu-list 0,16,32,48 mpirun -np 8 --bind-to core --rank-by core --cpu-list 0,8,16,24,32,40,48,56 mpirun -np 16 --bind-to core --rank-by core --map-by numa same from here on Also lscpu and lstopo to find out about NUMA topology and shared resources like L3 cache. Which cores reside on a shared IMC needs to be figured out the hard way as far as I know. Reading docs and such... Last edited by flotus1; May 30, 2022 at 04:29. |
|
June 1, 2022, 09:05 |
|
#510 | |
Member
Marco Bernardes
Join Date: May 2009
Posts: 59
Rep Power: 17 |
Quote:
|
||
June 2, 2022, 16:24 |
|
#511 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
Probably similar. The 6300 processors are an improvement over the 6200. With the cpus so cheap, I think you could try with yours and upgrade cpu if necessary. Note that messing with the bios is risky. You might cause your machine to no longer boot! Performance without overclock is pretty decent due to the 16 available memory channels. |
||
June 3, 2022, 09:09 |
AMD Ryzen 4800H under WSL Ubuntu 20.04
|
#512 |
Member
Marco Bernardes
Join Date: May 2009
Posts: 59
Rep Power: 17 |
AMD Ryzen 4800H:
# cores Wall time (s): ------------------------ Meshing Times: 1 1003.94 2 707.64 4 500.12 6 396.02 8 364.08 Flow Calculation: 1 753.92 2 486.19 4 351.89 6 329.93 8 323.98 |
|
June 3, 2022, 09:11 |
AMD Threadripper 1950X under WSL Ubuntu 20.04
|
#513 |
Member
Marco Bernardes
Join Date: May 2009
Posts: 59
Rep Power: 17 |
AMD Threadripper 1950X under WSL Ubuntu 20.04
# cores Wall time (s): ------------------------ Meshing Times: 1 1056.81 2 701.65 4 496.73 6 393.98 8 381.59 10 360.49 12 339.13 14 323.9 16 343.45 Flow Calculation: 1 822.07 2 498.66 4 350.45 6 326.8 8 324.14 10 319.38 12 314.45 14 315.73 16 324.57 |
|
June 4, 2022, 08:07 |
Benchmark run on Laptop With i7-11800H and 2x8GB (3200MHZ) on WSL2 Ubuntu20.04
|
#514 |
New Member
Erdi
Join Date: Jun 2022
Posts: 2
Rep Power: 0 |
OpenFOAM benchmark run on Laptop (Dell XPS 15) With i7-11800H and 2x8GB (3200MHZ) on WSL2 on Ubuntu 20.04 with openFOAMv9
Out of curiosity I wanted to try the benchmark on my laptop. First i tried with the default confiquration however it took a lot of time to run the one core version so I cnhanged the run.sh file and then run with 8 cores directly but it started to thermal throthle a lot so (what a suprise ) and then I tried to use 6 cores and got : Code:
real 7m38.091s user 45m16.834s sys 0m11.617s Run for 6... # cores Wall time (s): ------------------------ 6 367.37 |
|
June 5, 2022, 00:48 |
|
#515 | ||
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
Your performance is equal to my Dell r710 with dual E5649. That makes sense, because that server has six memory channels running at 1066 MT/s which is comparable to two at 3200 MT/s. Quote:
Your cpu must be thermal throttling otherwise you would get 305.44 sec like the 2x X5675 or better. |
|||
June 5, 2022, 00:50 |
Wsl2
|
#516 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
I don't know how the benchmark performs on WSL2. I have only run linux. So that might be another issue erdi.
|
|
June 6, 2022, 05:31 |
Two is better than one
|
#517 |
Member
Marco Bernardes
Join Date: May 2009
Posts: 59
Rep Power: 17 |
Hi!
I was wondering if the run of 2 benchmarks simultaneously would be better than 1 run after another. So the results of the 2 runs were surprisingly: # cores Wall time (s): ------------------------ 1 2 4 6 8 10 12 14 16 Meshing Times: 1 1151.36 2 857.94 4 623.2 6 563.06 8 537.3 10 526.86 12 518.92 14 523.49 16 569 Flow Calculation: 1 1034.82 2 763.45 4 550.1 6 523.57 8 542.37 10 600.15 12 625.04 14 668.28 16 710.25 # cores Wall time (s): ------------------------ 1 2 4 6 8 10 12 14 16 Meshing Times: 1 1126.39 2 861.46 4 622.28 6 558.93 8 539.72 10 527.49 12 518.96 14 521.65 16 564.03 Flow Calculation: 1 1032.88 2 762.35 4 548.58 6 526.27 8 559.72 10 606.09 12 633.89 14 682.39 16 683.36 2 x 1 runs separately took in the best case (12 cores) approximately 630 seconds (previous posts) 1 x 2 runs simultaneously took in the best case (6 cores) approximately 526 seconds Concluding: 1 x 2 runs simultaneously was 20% faster. Any comments? |
|
June 6, 2022, 05:53 |
|
#518 |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
@masb, I do not understand your post. You have two recent posts, one of which is a 1950X with 16 cores that finish the benchmark in about 314 seconds. I do not see how this is slower than your latest test.
It should also be noted that 314 seconds is about twice as long time to finish the benchmark compared to my 1950X (specs available on the first page of this thread). WSL is not ideal, but if you do not access the file system through frequent saves then it should be fast enough. My guess is slow memory and/or timings. |
|
June 6, 2022, 05:56 |
|
#519 | |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
Quote:
You cannot make comparisons like that. There is a huge difference between some systems with identical theoretical bandwidth. |
||
June 6, 2022, 07:42 |
Sorry for the confusing posts.
|
#520 |
Member
Marco Bernardes
Join Date: May 2009
Posts: 59
Rep Power: 17 |
Firstly I posted the benchmarks for 1950x and Ryzen 4800H just as an information. In the latest post I run two benchmarks simultaneously under WSL and 1950x. As I have to run lots of cases, I was just trying to analyze the performance for the both, sequentially and simultaneously. So, the run for two cases simultaneously using 6 cores was faster than the run for the same two cases running in 12 cores sequentially:
sequentially: run1: 314.45 seconds run2: 314.45 seconds total: run1 +run2 = 629 seconds simultaneously: run1 || run2: 526.27 seconds Is it clear now? Last edited by masb; June 6, 2022 at 11:40. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |