|
[Sponsors] |
March 1, 2023, 01:17 |
|
#641 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
The most important factor for CFD is the memory. On my machine I have two RDIMMs per channel and forced to 2400 MT/s. The RDIMMS are rank 2. There are then four ranks per channel. Another machine that has a single RDIMM per channel is not as fast (64 seconds instead of 60). However, that is not the exact same machine, so I need to investigate a bit.
In addition there are the BIOS settings but that can wait. I would suggest you run the benchmark with the machine as is to see where you are at. Also list your memory configuration with "sudo dmidecode -t 17. In addition run -t 19 and -t 20. With that info, I can compare to my machine. Good luck! |
|
March 1, 2023, 08:46 |
|
#642 |
Senior Member
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 6 |
Thank you very much for your help with this. You'll see in attachment a text file for each command. If you want me to post the results in the text instead, let me know.
I don't know what was the performance of your machine before you tweak it, but regarding your actual results compare to the one I got, I'm desapointed with the performance to be honest.... I noticed that its rank 4. Its seems that too many ranks in the channel can cause excessive loading and decrease the speed of the channel. Do you think one of the reason for the lack of performance could come from this? Also, I took a screenshot of the 'system monitor' app during the run and I noticed that even if the machine has 192GB of RAM, it shows that it use about 8.7GB. And it was like that during the whole process. Is that normal? Am I missing something here? Lenovo ThinkStation P710, 2x E5-2699C v4 (44 cores, 12 * GB DDR4[384GB max.]) with 192GB, Ubuntu 22.04 LTS Software: OpenFOAM 2212v Code:
# cores Wall time (s): ------------------------ 1 2 4 8 12 16 20 24 28 32 Meshing Times: 1 1715.24 2 1107.93 4 632.17 8 367.64 12 278.97 16 256.85 20 221.81 24 209.41 28 219.07 32 203.43 Flow Calculation: 1 1286 2 627.89 4 277.85 8 161.43 12 139.14 16 135.37 20 131.43 24 131.68 28 130.73 32 131.21 Last edited by Tibo99; March 1, 2023 at 11:19. |
|
March 1, 2023, 10:49 |
Apple Mac Studio with M1 Ultra CPU (20 cores)
|
#643 |
Super Moderator
Philip Cardiff
Join Date: Mar 2009
Location: Dublin, Ireland
Posts: 1,093
Rep Power: 34 |
Hardware
Times up to 16 cores: Code:
1 432.76 2 250.05 4 137.34 6 88.55 8 69.04 12 53.38 16 46.44 |
|
March 1, 2023, 14:45 |
|
#644 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
When I looked at 17.txt, I found that your DIMMs are not in the right slots In sequence, it shows the speeds as: Configured Memory Speed: 2133 MT/s Configured Memory Speed: 2133 MT/s Configured Memory Speed: 2133 MT/s Configured Memory Speed: Unknown Configured Memory Speed: Unknown Configured Memory Speed: Unknown Configured Memory Speed: 2133 MT/s Configured Memory Speed: 2133 MT/s Configured Memory Speed: 2133 MT/s Configured Memory Speed: Unknown Configured Memory Speed: Unknown Configured Memory Speed: Unknown The corresponding bank locations are: Bank Locator: CPU1_A0 Bank Locator: CPU1_B0 Bank Locator: CPU1_B1 Bank Locator: CPU1_C0 Bank Locator: CPU1_D0 Bank Locator: CPU1_D1 Bank Locator: CPU2_A0 Bank Locator: CPU2_B0 Bank Locator: CPU2_B1 Bank Locator: CPU2_C0 Bank Locator: CPU2_D0 Bank Locator: CPU2_D1 You have DIMMs in A0, B0 and B1 for each cpu. This will give you just dual channel memory instead of quad channel. In addition, the memory is not balanced, because channel B has more memory than channel A. What you need to do is put a DIMM of the same type and size in A0, B0, C0 and D0 for each CPU. You will need two extra DIMMs of the same rank , type and size. A difference can slow your memory down and will never make it faster! Leave B1 and D1 empty. Typically, the B0 slot would be farther from the CPU than the adjacent B1 slot. The B1 and D1 slots have a different color than the "0" slots. This upgrade will get you most of the way to decent performance somewhere between 60 and 70 seconds. Another less significant issue is that the speed is 2133 MT/s, which is below the maximum of the CPU. If the DIMMs are rated as 2400 MT/s then removing the DIMMs from the B1 slots will probably fix it. If your DIMMs are rated 2133 MT/s, you can go in the BIOS and enforce the higher speed. Overclocking your DIMMs like that is not guaranteed to succeed! You will find a setting for the memory speed that reads "auto", but can be set to "2400". Going from 2133 to 2400 will reduce time at high core counts by 11%. If the machine won't boot go back to auto in BIOS. |
||
March 1, 2023, 14:55 |
|
#645 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
Very nice result! The MAC will also be very power efficient. Some people measure Watts at the socket and publish iterations per Wh. I am sure you would be the winner of that competition! |
||
March 1, 2023, 14:59 |
|
#646 |
Senior Member
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 6 |
Again, I can't say thank you enough for your help!
I will follow you procedure and re-run the benchmark to see where it stand after this. It will maybe take some time before I post the new results since I'll need to wait for 2 extra DIMMs. Lastly, I would like to hear your insight about the comparaison of the actual processor I got,E5-2699C v4, and the E5-2698 v4? The reason why is, I ordered the E5-2698 v4 processor but I receive the other one. Then, after further review, I noticed that even the E5-2698 v4 processor has 4 cores and memory cache less then the E5-2699C v4 processor, the E5-2698 v4 processor can reach 3.6 GHz in turbo mode compare to 2.4 GHz for the one I actualy have. So, if the E5-2698 v4 processor is better, I'll certainly ask for a swap. Thank you again and best regards, |
|
March 1, 2023, 21:38 |
|
#647 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
A good place to look at for comparing processors is this one: https://en.wikipedia.org/wiki/List_o...eon_E5-2698_v4 As you can see by comparing the E5-2698 v4 and the 2699C v4 below it, you can see that the 2699C has 2 more cores a lower all core turbo of 2.4 versus 2.7, a lower two and one core turbo of 2.4 versus 3.6, but a larger cache of 55 MB versus 50 MB and a higher Thermal Design Power (TDP) of 145 W versus 135 W. The cache is quite important for the benchmark, so the all core solution will probably compensate for the lower all core clock. The TDP is complicated, because the processor is allowed to exceed it for a limited time (setting in BIOS). When a core is waiting for memory, it uses less power, so probably no effect on the benchmark from TDP. The TDP is the power use when all cores are running at the base frequency. That is 2.2 GHz for both and 145/135 =~ 22/20. You can check the core frequencies for n+1 cores with: Code:
sudo cpupower -c 0-n frequency-info I would prefer the E5-2698 v4 because single core programs will be a lot faster. That is important for a work station. In addition, the higher clocks will benefit any gaming you night want to do. On ebay (from China): Code:
Processor Price Cores Turbo E5-2699C v4 $216.00 22 2.4 E5-2698 v4 $199.99 20 3.6 E5-2696 v4 $157.00 22 3.7 E5-2697 v4 $94.00 18 3.6 E5-2697A v4 $99.00 16 3.6 E5-2684 v4 $39.95 16 3.0 Looking at these prices, you might be better off selling these E5-2699C processors yourself after you bought the cheaper processors on Ebay. That would leave some money to take your wife to dinner to make up for the time spent on your workstation. Last edited by wkernkamp; March 1, 2023 at 22:08. Reason: Further comments |
||
March 3, 2023, 12:27 |
|
#648 |
Senior Member
René Thibault
Join Date: Dec 2019
Location: Canada
Posts: 114
Rep Power: 6 |
Thank you very much for your suggestions and the IT support!
I especially liked this one - "That would leave some money to take your wife to dinner to make up for the time spent on your workstation." I'll try to get some RAM asap, configure the machine the way you suggested and re-run the benchmark. So, we'll see from there where the machine will stand. New benchmark's results soon to come......... Best Regards, |
|
March 7, 2023, 11:59 |
EPYC Genoa
|
#649 |
New Member
Josh Dyson
Join Date: Mar 2011
Posts: 21
Rep Power: 15 |
Hi Everyone,
Lucky enough that new job equals new HPC benchmarking. 2x 9374F, OpenFOAM v2212, CentOS 7.9. 24x 16GB 4800MT/s RAM. SMT is on. Code:
1 437.74 2 246.09 4 119.22 8 74.38 16 34.49 24 22.01 32 16.03 48 12.68 64 11.07 Last edited by jd210; March 7, 2023 at 12:51. Reason: Added results for 24 & 48 cores |
|
March 7, 2023, 12:13 |
|
#650 | |
New Member
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6 |
Quote:
Comparing to older benchmarks, it seems to me that you can leverage much more performance by using faster ram. |
||
March 7, 2023, 12:24 |
|
#651 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
There is no faster RAM available for these CPUs. And once there is, it will still have to be run at 4800MT/s. It's a server platform, memory overclocking won't work. The boards don't offer the necessary settings, and the CPUs are locked.
|
|
March 7, 2023, 13:19 |
|
#652 |
New Member
Josh Dyson
Join Date: Mar 2011
Posts: 21
Rep Power: 15 |
Interestingly AMD said to me the other day that the 9454 is proving to be a good option for CFD. So may get some results for that and other SKUs at some point.
|
|
March 7, 2023, 13:39 |
|
#653 |
New Member
George
Join Date: Jul 2020
Location: TU Delft, The Netherlands
Posts: 18
Rep Power: 6 |
I clearly need to do some further reading of the spec sheets then
|
|
March 7, 2023, 14:16 |
|
#654 | |
New Member
Josh Dyson
Join Date: Mar 2011
Posts: 21
Rep Power: 15 |
Quote:
The caveat is the CPU cache, which is way faster than RAM, but quite small. Milan-X chips have 768MB while the Genoa's have up to 256MB. If your case is small enough (like this one might be) most if not all of it fits in cache making it really fast. As flotus says you can't overclock, but you wouldn't want to anyway as it's meant to run 24/365. And in such situations you're less concerned about raw speed and more about speed per watt consumed or speed per $ spent to buy. |
||
March 7, 2023, 18:43 |
|
#655 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
I have my doubts that these numbers reflect the actual capabilities of your CPUs.
I contributed some benchmarks for regular (non-X) 32-core Milan CPUs here, that are only slightly slower. Your single-core result seems fine, but scaling could probably be improved. Here is what I would usually recommend for low variance, and maximum performance:
|
|
March 8, 2023, 05:02 |
|
#656 | |
New Member
Josh Dyson
Join Date: Mar 2011
Posts: 21
Rep Power: 15 |
Quote:
Tried the core binding, it knocked a few tenths off the 64 core run but others were generally similar. My thought is that this case is now too small for the latest EPYC chips and they run it so fast that disk read etc is having quite an effect. That said a 25% bump over a 7543 is still not insignificant! Might try running for more iterations or messing about with what it writes and see what I can come up with. |
||
March 14, 2023, 13:28 |
Error getting 'numberOfSubdomains' from 'system/decomposeParDict' benchmark v2212
|
#657 | |
New Member
Chermac Rolle
Join Date: Mar 2023
Posts: 5
Rep Power: 3 |
Hi all,
I am making an attempt to use the benchmark scripts provided in the below post [#624], however I get the above error related to' numberOfSubdomains'. I note a previous post where this issue cropped up and it was suggested it could be an issue with sed and paths (as the decomposeParDict in that particular scenario didn't seem to get updated). I can confirm each numberOfSubdomains of my created run cases have actually been edited accordingly. For e.g.: Code:
/*--------------------------------*- C++ -*----------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 4.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; object decomposeParDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // numberOfSubdomains 2; method scotch; // method scotch; simpleCoeffs { n (4 1 1); delta 0.001; } hierarchicalCoeffs { n (3 2 1); delta 0.001; order xyz; } manualCoeffs { dataFile "cellDecomposition"; } // ************************************************************************* // Quote:
|
||
March 15, 2023, 01:21 |
|
#658 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
I have not used windows myself. Sometimes windows programs require a carriage return and then a newline, instead of just newline as in unix.
The resulting file you produced shows "numberOfSubdomains 2;" while In the basecase/system directory it is 6. So it appears sed is working as intended. If you change directory to the run_2 directory and then execute the commands by hand, do you still get the error? Note that before running openfoam in linux you need to do: Code:
source OpenFOAM/OpenFOAM-v2212/etc/bashrc Do the log files for blockMesh, etc show successful completion? |
|
March 15, 2023, 08:29 |
|
#659 | |||
New Member
Chermac Rolle
Join Date: Mar 2023
Posts: 5
Rep Power: 3 |
Hi Will, thanks for the quick reply.
Quote:
The Windows binary uses an implementation of MSYS2 and environment variables are set (at least appears so) on launch of the shell via the launch script openfoam.com provides. Their implementation does not provide sudo, and I have a suspicion it may be an issue with file permissions - but unsure. In the meantime I have set up the machine for a Linux dual-boot. I have got OpenFOAM installed and successfully ran the benchmarking in Ubuntu 22.04. I am using a Ryzen 5950x with 3600MHz Dual Rank and the tl;dr is to stick with the 5600x or 5800X3D if opting for this architecture for single simulations (as discussed in previous posts). TODO: Check out the efficacy with parallel parametric and multi-simulations. I am still getting an odd failure when using 5-cores (for tinkering and interest purposes I benchmarked 1-16 in increments of 1). I will investigate this oddity a bit more later. Quote:
Quote:
I initially also thought the same about line endings, but the MSYS2 implementation is *nix based so this changed my mind on that one. I will investigate further. I will do some more tinkering when I finally boot back into Windows. I'll review log files as suggested, and take the benchmark through a manual workflow to see if that makes a difference. I will probably (should have done really) run some of the tutorial cases I usually test when installing a new version of FOAM. It will be useful for my purposes if I can get similar performance with this Windows binary as compared to bare metal under Linux as opposed to resorting to WSL. So I will keep at it and I'll update my findings here if I am successful. |
||||
March 15, 2023, 11:01 |
|
#660 |
New Member
Chermac Rolle
Join Date: Mar 2023
Posts: 5
Rep Power: 3 |
I have resolved my issue with the Windows binary with the help of forum post [OpenFOAM.com] error while loading shared libraries: mingw.
It seems that just double-clicking on the installer causes it to be installed under the user's Roaming AppData folder and does not install MS-MPI. Correct installation requires Right-click > Install as Admin so that MS-MPI can be installed, with the OpenFOAM installation folder placed by default under Program Files instead of Roaming. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |