|
[Sponsors] |
January 12, 2023, 01:19 |
|
#621 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
You should be getting the same single core performance as the single 7742 run. Your all core runs should tend to 16 seconds. This under performance might indicate a hardware problem. |
||
January 12, 2023, 03:10 |
|
#622 | |
New Member
Eduardo
Join Date: Feb 2019
Posts: 9
Rep Power: 7 |
Quote:
I have tried the settings you suggest and the 64-core time decreases from 33.22s to 32.23, so, as you say, it is nothing drastic. In fact, the fastest run I have obtained was with SMT off, NPS=4 but without setting any "--bind-to" option in mpirun (I actually do not know what is the default). In that situation, I got 31.65s. Also negligible impact. I also attach the scaling curve compared with Yannick's results but normalizing all the curves by the same single-core result (936s reported by ym92). As you can see, the absolute results are better for fewer cores but, beyond say ~25 cores, the memory bandwith problem starts hitting and the performance is worse. At the end of the day, the fastest simulation Yannick got with two identical processors was about 15s whilst mine is about double of that (32s). Makes kind of sense provided we have half the CPUs and thus half the memory bandwith. So I am starting to convince myself we are at the peak performance of the hardware we have. I don'n know if you may find if useful, but if you want to include these results in the global compilation of benchmarks, feel free to do so. Best regards and thanks for the help Eduardo |
||
January 12, 2023, 03:46 |
|
#623 | |
New Member
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8 |
Quote:
Thanks for indicating this. Yes, my all core run was 15.46s, which seems ok. But you are right. I think I run these benchmarks before we replaced the SSD, which was acting weirdly (sometimes fast, sometimes very slow). I don't know, if some read/write processes are included in the benchmark, but if yes, that could explain it. Anyway, I will probably rerun the benchmark in the next days just to confirm. EDIT: checked again on another server with same specs (only less RAM). Results are just around 10% better for single core. Maybe it is an issue with my docker or the old version OF I am using. Anyway, as all-core performance is ok (and I am usually not working with OF), I will probably not look further into it. Last edited by ym92; January 12, 2023 at 09:47. |
||
January 12, 2023, 22:10 |
|
#624 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
System:
Gigabyte MD80-F34, 2x E5-2683 (16 core, 45 MB Cache), 16x *GB DDR4-2400 2R two RDIMMs per channel, Debian Linux 5.15 Software: OpenFOAM v2212 from openfoam.com Code:
Meshing Times: 1 1426.06 2 931.58 4 524.9 8 326.17 12 247.9 16 221.57 20 187.4 24 171.64 28 176.2 32 159.53 Flow Calculation: 1 1041.68 2 523.79 4 238.93 8 126.03 12 91.68 16 76.93 20 67.85 24 64.21 28 61.2 32 60.19 |
|
January 13, 2023, 17:30 |
|
#625 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
There is currently a super cheap QUANTA server on ebay that could achieve the above performance as my MD90-F34 system if configured right. https://www.ebay.com/itm/13404888714...oAAOSwmPhjaqph I have no relationship with the seller.
These 1U systems get noisy when fans spin up. With the E5-2683 v4 chips, the temperature when running openfoam on all cores gets to about 62C and the fans barely spin up. Power consumption is about 360W during the calc, 95W on idle. A student could put together a system with 64 GB of RAM for under $500. This system would have twice the performance of a DDR5 Desktop system. Graphics you can do on your laptop. |
|
January 24, 2023, 03:33 |
|
#626 |
Member
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12 |
Hardware:
2xEPYC 7302, 16x16 3200 Mhz DDR4 Dual Rank ECC Software: OpenFOAM v2212 from openfoam.com Code:
# cores Wall time (s): ------------------------ Meshing Times: 1 1195.69 2 777.66 4 436.66 8 247.98 12 182.25 16 160.27 20 136.17 24 125.75 28 126.47 30 134.66 32 121.23 Flow Calculation: 1 822.59 2 415.07 4 178.69 8 85.59 12 57.74 16 43.96 20 39.97 24 36.1 28 31.14 30 30.12 32 31.38 |
|
January 24, 2023, 15:02 |
|
#627 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
There are others that have achieved somewhat better results. Both single core and 32 core are better. This should not be as you have what appears to be the best possible hardware..
Found this one: Quote:
and this one: |
||
January 24, 2023, 15:13 |
|
#628 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
There is a spreadsheet started by blackcatxiii that I have updated somewhat for myself (attached). It had the following result:
# 2x EPYC 7302 1 723.64 2 328.11 4 164.21 8 81.4 12 55.2 16 41.1 20 37.53 24 34.27 28 29.99 32 26.89 This is the history of the Benchmarking spreadsheet: Quote:
|
||
January 25, 2023, 04:28 |
Very curious of the performance of Apple M1 Ultra...
|
#629 |
New Member
Guangyu Zhu
Join Date: May 2013
Posts: 12
Rep Power: 13 |
As the M1 Ultra provides super high memory bandwith (up to 800GB/s), would it beat the lateset gen of EPYC or XEON with the similar cores (20cores) in OpenFOAM test? The price of Mac Stuidio equipped with a 20 cores M1 Ultra and 128GB ram is USD 4799.
|
|
January 25, 2023, 05:53 |
|
#630 |
Member
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12 |
It is strange that a system with slower RAM gets 26.89s@32cores.
Do you have any suggestions on where I should look for optimizations? I think the first step for me is to double-check the RAM modules. I bought the system second-hand, so perhaps the RAM modules are different than I thought. |
|
January 25, 2023, 18:47 |
|
#631 |
Member
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7 |
One can buy a Mac M2 mini (8 cores) for $600. Supposedly they have a lot of memory bandwidth. Would a cluster of M2 minis be cost competitive with EPYC machines ? I believe one can get a 10 Gbe port for an extra $100.
https://www.reddit.com/r/homelab/com..._m2m2_pro_mac/ Has anyone run the OpenFOAM benchmark on the M2 processors ? M2, M2 Pro, M2 Max, M2 Ultra ? |
|
January 25, 2023, 23:54 |
|
#632 |
Senior Member
Join Date: Jun 2016
Posts: 102
Rep Power: 10 |
Ookami HPC (48 core, A64FX, 32GB HBM RAM, 1TB/s memory bandwidth) same architecture as Fugaku.
I only tested the performance with 1 node. I have to say the performance is terrible. Apple gave us the illusion that ARM64 is very powerful but the fact is ARM64 HPC is still far away from x86. # cores Wall time (s): ------------------------ 1 2416.19 2 1185.66 4 582.35 8 299.75 12 207.08 24 108.55 36 78.28 48 65.31 |
|
January 26, 2023, 03:30 |
|
#633 |
Member
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12 |
I reran the tests. The previous test was started through a NoMachine, while this test was started through SSH. I suspect that not showing the desktop through NoMachine can have a minor impact on the results.
Hardware: 2xEPYC 7302, 16x16 3200 Mhz DDR4 Dual Rank ECC Software: OpenFOAM v2212 from openfoam.com Code:
# cores Wall time (s): ------------------------ 1 2 4 8 12 16 20 24 28 30 32 Meshing Times: 1 1205.84 2 779.58 4 439.87 8 248.06 12 181.88 16 160.38 20 134.94 24 125.01 28 126.6 30 135.61 32 116.24 Flow Calculation: 1 794.95 2 414.41 4 177.73 8 86.68 12 58.03 16 43.69 20 41.85 24 36.36 28 31.35 30 29.62 32 29.44 |
|
January 28, 2023, 04:16 |
|
#634 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
sudo dmidecode -t 17 sudo is an utility to temporarily become super user. If you are already super user you can just do: dmidecode -t 17 This command gives you information on all your memory slots. Result for each DIMM slot looks like this: Handle 0x005F, DMI type 17, 34 bytes Memory Device Array Handle: 0x0049 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 8 GB Form Factor: DIMM Set: None Locator: P2-DIMMH2 Bank Locator: P1_Node1_Channel3_Dimm1 Type: DDR3 Type Detail: Registered (Buffered) Speed: 1866 MT/s Manufacturer: Kingston Serial Number: 3630B135 Asset Tag: DimmH2_AssetTag Part Number: KP9RN2-HYC Rank: 2 Configured Memory Speed: 1866 MT/s |
||
January 29, 2023, 17:12 |
|
#635 |
Member
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12 |
Hi Wkernkamp,
Thanks for taking the time to look into this. This is what I get when running dmidecode -t 17: Code:
Handle 0x005D, DMI type 17, 84 bytes Memory Device Array Handle: 0x0024 Error Information Handle: 0x005C Total Width: 72 bits Data Width: 64 bits Size: 16 GB Form Factor: DIMM Set: None Locator: P2-DIMMH1 Bank Locator: P1_Node0_Channel7_Dimm0 Type: DDR4 Type Detail: Synchronous Registered (Buffered) Speed: 3200 MT/s Manufacturer: Samsung Serial Number: 03B9A5D1 Asset Tag: P2-DIMMH1_AssetTag (date:51/00) Part Number: M393A2K43DB3-CWE Rank: 2 Configured Memory Speed: 3200 MT/s Minimum Voltage: 1.2 V Maximum Voltage: 1.2 V Configured Voltage: 1.2 V Memory Technology: DRAM Memory Operating Mode Capability: Volatile memory Firmware Version: M393A2K43DB3-CWE Module Manufacturer ID: Bank 1, Hex 0xCE Module Product ID: Unknown Memory Subsystem Controller Manufacturer ID: Unknown Memory Subsystem Controller Product ID: Unknown Non-Volatile Size: None Volatile Size: 16 kB Cache Size: None Logical Size: None |
|
January 29, 2023, 19:01 |
|
#636 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
This RDIMM is functioning properly as you can see: Configured Memory Speed: 3200 MT/s Your RDIMMs should all have the same rank, this one's rank is 2: Rank: 2 If there is one not right, you can find it's slot using this record: Locator: P2-DIMMH1 If all your RDIMM's show this, your memory is functioning OK. No need to open box! |
||
January 30, 2023, 19:57 |
|
#637 |
Member
Jógvan
Join Date: Feb 2014
Posts: 32
Rep Power: 12 |
Hi Wkernkamp,
I ran sudo dmidecode -t 17 | grep "Memory Speed" and sudo dmidecode -t 17 | grep "Rank". Both commands gave 16 identical lines of output, so the memory should be working correctly. |
|
February 28, 2023, 13:58 |
|
#638 |
Senior Member
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 7 |
Here is a results I got (see picture in attachment) from another type of case. OF 2212v
Last edited by Tibo99; February 28, 2023 at 15:20. |
|
February 28, 2023, 14:19 |
|
#639 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Your table shows a residual for all runs of 1.0e-3. Does this means that you changed the termination criteria for the benchmark? The benchmark normally is made to run for just 100 iterations. The times you are getting are very much longer than what they should be on your machine. I would expect about 65 seconds on all cores and around 1200 seconds on one. The shape of the curve #cores versus runtime looks normal. Your processor, the E5-2699C v4, has a low frequency of 2.2 MHz that boosts only to 2.4 MHz regardless of the number of running cores. For the high core count runs, this should not make much of a difference as the memory bandwidth is the limiting factor. However, for your general use of the workstation, the higher clock processors, up to 3.6 MHz two cores active might be attractive. These processors are now very cheap if you look at core counts around 16. As you discovered yourself, at some point, the extra cores do no longer speed up the run. I have a very similar machine, a Gigabyte MD90-F34 with dual E5-2683 v4. I got that one to complete the benchmark in 60 seconds. Yours has more cache so might be slightly faster if you configure your memory perfectly. That is really the purpose of this benchmark: tune your machine optimally for CFD. You can see that my memory configuration allows a small performance gain all the way to 32 cores, whereas your curve flattens at 24 cores. Quote:
|
||
February 28, 2023, 14:46 |
|
#640 |
Senior Member
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 7 |
Thank very much for your quick reply!
Unfortunatly, I saw this post after I perform this analysis, so that why I didnt use your benchmak. I just chose to share the results even though. I'll for sure download the benchmark and post the results. Regarding the results I got, I in fact noticed what you just say about the curve flatten around 24 cores. From this results, I was wondering if there is a number of cores where the performance getting worse instead of slowly getting better (asymptote), but you answered well this question. Its really nice to have feedback from somone that has a similar workstation. I would certainly like to optimise this machine. Since you have a similar machine, do you know where I could find good documentation online in order to configuring it? Thank you again! Last edited by Tibo99; February 28, 2023 at 15:59. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |