|
[Sponsors] |
Intel i9 13900K with 8 channel were are Game Changer for CFD |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
January 6, 2023, 15:12 |
Intel i9 13900K with 8 channel were are Game Changer for CFD
|
#1 |
Member
Join Date: Dec 2016
Posts: 44
Rep Power: 10 |
Testcase: Fluent 2,2 Mio.
AMD 7900x "suck up" Intels E-cores works for CFD. 13900 has only 8 P (Power Cores) an speed up with more select Cores in the Setup. https://www.dropbox.com/s/on91aqe5zi...luent.jpg?dl=0 |
|
January 6, 2023, 17:47 |
|
#2 |
Member
Join Date: Oct 2019
Posts: 65
Rep Power: 7 |
Hi, something is going wrong with your benchmark:
On 8 cores the 13900k result should be at least twice as fast as the ancient 2695 v2 regardless of the memory bandwidth limit! |
|
January 6, 2023, 19:00 |
|
#3 | |
Member
Join Date: Dec 2016
Posts: 44
Rep Power: 10 |
Quote:
No, the 13900k has only two Memory Channel. The Memory Bandwith of the 2695 v2 with eight Memory Channel and 110 GB/s is higher than 85 GB/s of the 13900 k. The 13900 K has to small Memory Bandwith. |
||
January 7, 2023, 06:11 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Please disable the E-cores in Bios, disable Hyperthreading, and run the test again with 8 threads.
|
|
January 7, 2023, 10:40 |
|
#5 |
Member
Join Date: Dec 2016
Posts: 44
Rep Power: 10 |
SMT has no very effect of Solution Time; ~ 5% - 10%. But Fluent is by select too many Cores on too small Memory Bandwidth very buggy. Solution Speed can by solving very slow down, with and without SMT. By a repeat the results, by 12 and 16 selected Cores, were a same. Only a Bug on 8 selected Cores. Now i think the Solution Time on 8, 12 and 16 selected Cores are the same. The difference between 291, 322 and 286 seconds are probably measuring tolerance. The 13900 K has to few Memory Channel to find out that E-Cores to perform on CFD. I think that is possible.
SMT on, E-Cores on https://www.dropbox.com/s/xl8np06bcs...uent1.jpg?dl=0 https://www.dropbox.com/s/euthm0u6mz...uent2.jpg?dl=0 https://www.dropbox.com/s/iik0mavptq...uent3.jpg?dl=0 |
|
January 7, 2023, 12:56 |
|
#6 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
The reason I brought this up: I highly suspect that the results you got initially were mostly influenced by scheduler issues. I.e. the operating system not being clever enough to pin the threads exclusively to performance cores.
Now you could try to manually pin threads and monitor how that goes... or much easier, just disable SMT and E-cores. If I am reading this right, your second batch of results confirms my suspicion. Since you were able to get pretty much maximum performance on 8 threads. |
|
January 7, 2023, 14:02 |
|
#7 |
Member
Join Date: Dec 2016
Posts: 44
Rep Power: 10 |
SMT and E Cores off has no effect-
SMT and E Cores on: 4 Cores:-> 353 seconds 6 Cores:-> 316 seconds 8 Cores:-> 291 seconds SMT and E Cores off: 4 Cores:-> 380 seconds 6 Cores:-> 325 seconds 8 Cores:-> 270 seconds |
|
January 7, 2023, 14:25 |
|
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Then what happened with the results in the first post here?
Anyway, I think the hypothesis stated in the first post -about E-cores in current-gen desktop CPUs being useful for CFD- has been thoroughly debunked. Not that I had much doubt about that, but it can't hurt to check from time to time. |
|
January 7, 2023, 15:51 |
|
#9 |
Member
Join Date: Dec 2016
Posts: 44
Rep Power: 10 |
with only one P Core:
E Cores only runs with 4300 Mhz (P Core / 5500 Mhz) SMT on, all E Cores on, seven P Cores deaktivated: 4 Cores:-> 600 seconds 6 Cores:-> 579 seconds 8 Cores:-> 440 seconds 7900X ; DDR5 3600 Mhz 8 Cores:-> 443 seconds 2x E5 2695 v2 4 Cores:-> 967 seconds 6 Cores:-> 743 seconds 8 Cores:-> 534 seconds E Cores works for CFD. Slower (only -15% performance lost -> 5500 / 4300 or 7900X ) but with very high efficiency and very low power consumption. |
|
January 7, 2023, 20:06 |
|
#10 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Why did you not run the 7900x at DDR5-5200?
|
|
January 7, 2023, 21:10 |
|
#11 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Quote:
Comparing it to a knee-capped, different CPU is not the point. The point about E-cores being useless for CFD is this: You can not run a simulation across both P- and E-cores. It will limit execution speed to whatever the slower E-cores can handle. Now you could start to get creative with load balancing, but: the 8 P-cores already provide enough FP performance to saturate the memory subsystem. It gets even worse when per-core licenses are involved. |
||
August 7, 2023, 12:31 |
|
#12 |
New Member
Join Date: Aug 2023
Posts: 3
Rep Power: 3 |
I have i5-12600KF CPU running on Win11 and I'm having problems with simulations using ANSYS (both CFX and Fluent).
First of all, I investigated that maximal performance for my PC is when I start run at all cores (P + E) and HT is turned on. In my case - 16 processes. The second is, almost everytime I have random solver crashes with no reason. I run the same task on different PC with i7-11gen CPUs without problems. I tried various combos of processes and bios settings like HT on/off, E-cores on/off, C-states on/off, 6/10/12/16 processes, intelmpi/msmpi with no luck at all. I turned off MS Defender and Firewall, did clean Win11 install, tried Win10 etc. Tried different versions of ANSYS starting from 2020R2. Tried various affinity commands and env variables. I believe something is wrong with MPI or Windows for these hybrid 12th/13th gen Intel CPUs. Or may be I have a faulty hardware... Can anybody share experience of using 12th/13th cpus on Windows with ANSYS CFD? |
|
August 9, 2023, 16:37 |
|
#13 |
Senior Member
Lucky
Join Date: Apr 2011
Location: Orlando, FL USA
Posts: 5,762
Rep Power: 66 |
I run on 12th gen i9-12950HX (with E+P HT cores enabled by the brilliant IT folks) and have no issues. Mine is an (8+8)+8 = (16)/24 configuration
User specific hardware problems are not uncommon. I used to have issues with a previous PC with an unlocked CPU that came with a lot of factory overclocked settings and I also had random crashes when the PC was heavily loaded. It was eventually resolved by running everything at base speeds and base multiplier for CPU, RAM, and the RTX2070 as well. |
|
August 12, 2023, 13:26 |
|
#14 |
New Member
Join Date: Aug 2023
Posts: 3
Rep Power: 3 |
So you have 24 logical cores and how many cores you select to run fluent? what is your windows version? fluent version? and what MPI you use?
Last edited by rocket_science; August 13, 2023 at 10:26. |
|
September 15, 2023, 16:51 |
|
#15 | |
Senior Member
Join Date: Jun 2011
Posts: 208
Rep Power: 16 |
Quote:
As a very first step in your case I would reseat all the RAM cards. Also with some motherboards the stray capacitance between the chassis plate and the RAM traces on the motherboard affect the performance/stability of the memory. For example, a couple of years ago, the support team of Asrock recommended me to lift their motherboard a couple of mm above the chassis, or cut off the chassis plate beneath the RAM slots. The latter fixed the memory errors I was getting. Last edited by CFDfan; September 16, 2023 at 05:11. |
||
April 16, 2024, 06:30 |
|
#16 | |
New Member
Davide
Join Date: Jun 2019
Posts: 3
Rep Power: 7 |
Quote:
I tested the 3D cavity with 1M cells (https://develop.openfoam.com/committ...oFoam/cavity3D) Results are quite strange/interesting 8 core: ExecutionTime = 24.07 s ClockTime = 24 s 24 core: ExecutionTime = 22.33 s ClockTime = 23 s The processor has 8 performance core and 16 efficient cores. Seems like the efficient cores are not contributing to the simulation, but from the lscpu log seems like they are enabled. Code:
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ 0 0 0 0 0:0:0:0 si 5300,0000 800,0000 800.000 1 0 0 0 0:0:0:0 si 5300,0000 800,0000 800.000 2 0 0 1 4:4:1:0 si 5300,0000 800,0000 800.000 3 0 0 1 4:4:1:0 si 5300,0000 800,0000 800.000 4 0 0 2 8:8:2:0 si 5300,0000 800,0000 800.000 5 0 0 2 8:8:2:0 si 5300,0000 800,0000 900.007 6 0 0 3 12:12:3:0 si 5300,0000 800,0000 800.000 7 0 0 3 12:12:3:0 si 5300,0000 800,0000 800.000 8 0 0 4 16:16:4:0 si 5600,0000 800,0000 800.000 9 0 0 4 16:16:4:0 si 5600,0000 800,0000 800.000 10 0 0 5 20:20:5:0 si 5600,0000 800,0000 800.000 11 0 0 5 20:20:5:0 si 5600,0000 800,0000 800.000 12 0 0 6 24:24:6:0 si 5300,0000 800,0000 800.000 13 0 0 6 24:24:6:0 si 5300,0000 800,0000 800.000 14 0 0 7 28:28:7:0 si 5300,0000 800,0000 800.000 15 0 0 7 28:28:7:0 si 5300,0000 800,0000 800.000 16 0 0 8 32:32:8:0 si 4200,0000 800,0000 800.000 17 0 0 9 33:33:8:0 si 4200,0000 800,0000 800.000 18 0 0 10 34:34:8:0 si 4200,0000 800,0000 800.000 19 0 0 11 35:35:8:0 si 4200,0000 800,0000 800.000 20 0 0 12 36:36:9:0 si 4200,0000 800,0000 800.000 21 0 0 13 37:37:9:0 si 4200,0000 800,0000 799.876 22 0 0 14 38:38:9:0 si 4200,0000 800,0000 800.000 23 0 0 15 39:39:9:0 si 4200,0000 800,0000 800.000 24 0 0 16 40:40:10:0 si 4200,0000 800,0000 800.000 25 0 0 17 41:41:10:0 si 4200,0000 800,0000 800.000 26 0 0 18 42:42:10:0 si 4200,0000 800,0000 800.000 27 0 0 19 43:43:10:0 si 4200,0000 800,0000 800.000 28 0 0 20 44:44:11:0 si 4200,0000 800,0000 800.000 29 0 0 21 45:45:11:0 si 4200,0000 800,0000 800.000 30 0 0 22 46:46:11:0 si 4200,0000 800,0000 800.000 31 0 0 23 47:47:11:0 si 4200,0000 800,0000 800.000 |
||
April 17, 2024, 18:01 |
|
#17 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
8 core: ExecutionTime = 9.62 s 16 core: ExecutionTime = 6.54 s 24 core: ExecutionTime = 5.88 s This workstation shows reduced benefit from additional cores due to the memory bandwidth bottleneck as well. On your machine, there is the second factor of additional cores having a lower performance. To sort out the different factors, it will be necessary to run a few more cases. Before you do that you should check your memory speed with "sudo dmidecode -t 17" if you can, or involve the IT department in obtaining the maximum DDR5 memory speed (probably 7200 MT/s). Your CDF performance is essentially proportional to the memory speed. This should be done anyway, because your results are a lot slower than this workstation. That is not normal. This workstation has a similar performance to your system with properly tuned memory. Keep in mind that two threads on a P-core are slower than two threads on two E-cores. Since you have no access to the bios, you may use an openmpi configuration that distributes threads over P and E cores one to each core (so that you don't use P-cores for two threads. It is important to give the operating system freedom to choose which core gets which thread. This will cause P-cores to become available for slow E-core threads as soon as they finish their work. |
||
April 18, 2024, 05:26 |
|
#18 | |
New Member
Davide
Join Date: Jun 2019
Posts: 3
Rep Power: 7 |
Quote:
Code:
Array Handle: 0x000C Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 32 GB Form Factor: DIMM Set: None Locator: Controller1-DIMM1 Bank Locator: BANK 0 Type: DDR5 Type Detail: Synchronous Speed: 4800 MT/s Manufacturer: Crucial Technology Serial Number: E7EC59E5 Asset Tag: 9876543210 Part Number: CT32G48C40U5.C16A1 Rank: 2 Configured Memory Speed: 4800 MT/s Minimum Voltage: 1.1 V Maximum Voltage: 1.1 V Configured Voltage: 1.1 V Memory Technology: DRAM Memory Operating Mode Capability: Volatile memory Firmware Version: Not Specified Module Manufacturer ID: Bank 6, Hex 0x9B Module Product ID: Unknown Memory Subsystem Controller Manufacturer ID: Unknown Memory Subsystem Controller Product ID: Unknown Non-Volatile Size: None Volatile Size: 32 GB Cache Size: None Logical Size: None Regarding OpenMPI: do you have any suggestion? Doing some test with bigger meshes and different solvers I also found 1. Going from 8 to 24 core the trend is the same, with a 30% speed-up limited by the memory 2. Going from 8 to 16 there is no speed-up. My only thought is that it is caused by the bottleneck caused by the E-cores, so a different OpenMPI configuration could be useful. Thanks! Last edited by dshado; April 18, 2024 at 14:30. |
||
April 19, 2024, 19:34 |
|
#19 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
The 13900K should be run with DDR5-7200. That should be perfectly stable. The CFD jobs should speedup by a ratio of ~ 7200/4800. That is why you should have your tech support reconfigure your memory for the higher speed. It is not expensive and really increases your productivity.
For the Ryzen 7700X you should do the same. (If you care about the performance of that CPU). See the discussion here //www.cfd-online.com/Forums/hardware/255589-g-skill-release-ddr5-8400-cl40-kit The correct use of P-cores and E-cores has only a marginal effect on performance. Not sure how you do a parallel run in Fluent exactly. If it is run in parallel through openmpi you would call something like "mpirun --cpu-set 0,2,4,6,8,10,12,14,16-31 -np 24 Fluent" This should run single threads on each of the cores without binding them to let them move to a faster core when available. Not sure if I did this exactly right because I have not recently been fiddling with openmpi settings. The eventual settings to be used can be put in a config file or remain on the command line. Note that when np < 24, the threads will execute preferentially on the performance cores. So the cpu-set should work for lower thread counts. |
|
May 4, 2024, 09:35 |
|
#20 |
New Member
WeiHeming
Join Date: Feb 2024
Posts: 2
Rep Power: 0 |
i have similar problem(7700 with dual memories 6000Mhz and my laptop 7845hx dual memories 4800mhz). i closed pbo,and tried intel mpi and msmpi。they all crash but except different ErroMessages.
intel mpi probably have some bugs while the Internet doesn't work. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
LES Setup of a cyclic channel flow for compressible solver | Phil910 | OpenFOAM Running, Solving & CFD | 3 | November 14, 2024 08:42 |
[OpenFOAM.com] Compile OpenFoam using Intel ICC on OpenLogic Centos 7.3 for Intel MPI and INFINIBAND | kishoremg040 | OpenFOAM Installation | 1 | May 6, 2018 14:21 |
[OpenFOAM] Color display problem to view OpenFOAM results. | Sargam05 | ParaView | 16 | May 11, 2013 01:10 |
CFX11 + Fortran compiler ? | Mohan | CFX | 20 | March 30, 2011 19:56 |