CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Intel i9 13900K with 8 channel were are Game Changer for CFD

Register Blogs Community New Posts Updated Threads Search

Like Tree10Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 6, 2023, 15:12
Default Intel i9 13900K with 8 channel were are Game Changer for CFD
  #1
Member
 
Join Date: Dec 2016
Posts: 44
Rep Power: 10
Duke711 is on a distinguished road
Testcase: Fluent 2,2 Mio.



AMD 7900x "suck up"



Intels E-cores works for CFD. 13900 has only 8 P (Power Cores) an speed up with more select Cores in the Setup.




https://www.dropbox.com/s/on91aqe5zi...luent.jpg?dl=0
Crowdion likes this.
Duke711 is offline   Reply With Quote

Old   January 6, 2023, 17:47
Default
  #2
Member
 
Join Date: Oct 2019
Posts: 65
Rep Power: 7
Habib-CFD is on a distinguished road
Hi, something is going wrong with your benchmark:
On 8 cores the 13900k result should be at least twice as fast as the ancient 2695 v2 regardless of the memory bandwidth limit!
Habib-CFD is offline   Reply With Quote

Old   January 6, 2023, 19:00
Default
  #3
Member
 
Join Date: Dec 2016
Posts: 44
Rep Power: 10
Duke711 is on a distinguished road
Quote:
Originally Posted by Habib-CFD View Post
Hi, something is going wrong with your benchmark:
On 8 cores the 13900k result should be at least twice as fast as the ancient 2695 v2 regardless of the memory bandwidth limit!

No, the 13900k has only two Memory Channel. The Memory Bandwith of the 2695 v2 with eight Memory Channel and 110 GB/s is higher than 85 GB/s of the 13900 k. The 13900 K has to small Memory Bandwith.
wkernkamp and fini like this.
Duke711 is offline   Reply With Quote

Old   January 7, 2023, 06:11
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Please disable the E-cores in Bios, disable Hyperthreading, and run the test again with 8 threads.
flotus1 is offline   Reply With Quote

Old   January 7, 2023, 10:40
Default
  #5
Member
 
Join Date: Dec 2016
Posts: 44
Rep Power: 10
Duke711 is on a distinguished road
SMT has no very effect of Solution Time; ~ 5% - 10%. But Fluent is by select too many Cores on too small Memory Bandwidth very buggy. Solution Speed can by solving very slow down, with and without SMT. By a repeat the results, by 12 and 16 selected Cores, were a same. Only a Bug on 8 selected Cores. Now i think the Solution Time on 8, 12 and 16 selected Cores are the same. The difference between 291, 322 and 286 seconds are probably measuring tolerance. The 13900 K has to few Memory Channel to find out that E-Cores to perform on CFD. I think that is possible.




SMT on, E-Cores on


https://www.dropbox.com/s/xl8np06bcs...uent1.jpg?dl=0
https://www.dropbox.com/s/euthm0u6mz...uent2.jpg?dl=0
https://www.dropbox.com/s/iik0mavptq...uent3.jpg?dl=0
Duke711 is offline   Reply With Quote

Old   January 7, 2023, 12:56
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The reason I brought this up: I highly suspect that the results you got initially were mostly influenced by scheduler issues. I.e. the operating system not being clever enough to pin the threads exclusively to performance cores.
Now you could try to manually pin threads and monitor how that goes... or much easier, just disable SMT and E-cores.
If I am reading this right, your second batch of results confirms my suspicion. Since you were able to get pretty much maximum performance on 8 threads.
flotus1 is offline   Reply With Quote

Old   January 7, 2023, 14:02
Default
  #7
Member
 
Join Date: Dec 2016
Posts: 44
Rep Power: 10
Duke711 is on a distinguished road
SMT and E Cores off has no effect-


SMT and E Cores on:


4 Cores:-> 353 seconds
6 Cores:-> 316 seconds
8 Cores:-> 291 seconds


SMT and E Cores off:

4 Cores:-> 380 seconds
6 Cores:-> 325 seconds
8 Cores:-> 270 seconds
Duke711 is offline   Reply With Quote

Old   January 7, 2023, 14:25
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Then what happened with the results in the first post here?

Anyway, I think the hypothesis stated in the first post -about E-cores in current-gen desktop CPUs being useful for CFD- has been thoroughly debunked. Not that I had much doubt about that, but it can't hurt to check from time to time.
fini likes this.
flotus1 is offline   Reply With Quote

Old   January 7, 2023, 15:51
Default
  #9
Member
 
Join Date: Dec 2016
Posts: 44
Rep Power: 10
Duke711 is on a distinguished road
with only one P Core:


E Cores only runs with 4300 Mhz (P Core / 5500 Mhz)




SMT on, all E Cores on, seven P Cores deaktivated:

4 Cores:-> 600 seconds
6 Cores:-> 579 seconds
8 Cores:-> 440 seconds



7900X ; DDR5 3600 Mhz

8 Cores:-> 443 seconds



2x E5 2695 v2

4 Cores:-> 967 seconds
6 Cores:-> 743 seconds
8 Cores:-> 534 seconds




E Cores works for CFD. Slower (only -15% performance lost -> 5500 / 4300 or 7900X ) but with very high efficiency and very low power consumption.
Duke711 is offline   Reply With Quote

Old   January 7, 2023, 20:06
Default
  #10
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Why did you not run the 7900x at DDR5-5200?
wkernkamp is offline   Reply With Quote

Old   January 7, 2023, 21:10
Default
  #11
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
E Cores works for CFD. Slower (only -15% performance lost -> 5500 / 4300 or 7900X ) but with very high efficiency and very low power consumption.
Let's stick with comparing apples to apples. The P-cores on the I9-13900k are 63% faster than its E-cores, according to your results.
Comparing it to a knee-capped, different CPU is not the point.

The point about E-cores being useless for CFD is this: You can not run a simulation across both P- and E-cores. It will limit execution speed to whatever the slower E-cores can handle. Now you could start to get creative with load balancing, but: the 8 P-cores already provide enough FP performance to saturate the memory subsystem. It gets even worse when per-core licenses are involved.
JulioPieri and fini like this.
flotus1 is offline   Reply With Quote

Old   August 7, 2023, 12:31
Default
  #12
New Member
 
Join Date: Aug 2023
Posts: 3
Rep Power: 3
rocket_science is on a distinguished road
I have i5-12600KF CPU running on Win11 and I'm having problems with simulations using ANSYS (both CFX and Fluent).

First of all, I investigated that maximal performance for my PC is when I start run at all cores (P + E) and HT is turned on. In my case - 16 processes.

The second is, almost everytime I have random solver crashes with no reason. I run the same task on different PC with i7-11gen CPUs without problems.

I tried various combos of processes and bios settings like HT on/off, E-cores on/off, C-states on/off, 6/10/12/16 processes, intelmpi/msmpi with no luck at all. I turned off MS Defender and Firewall, did clean Win11 install, tried Win10 etc. Tried different versions of ANSYS starting from 2020R2. Tried various affinity commands and env variables.

I believe something is wrong with MPI or Windows for these hybrid 12th/13th gen Intel CPUs. Or may be I have a faulty hardware...

Can anybody share experience of using 12th/13th cpus on Windows with ANSYS CFD?
fini likes this.
rocket_science is offline   Reply With Quote

Old   August 9, 2023, 16:37
Default
  #13
Senior Member
 
Lucky
Join Date: Apr 2011
Location: Orlando, FL USA
Posts: 5,762
Rep Power: 66
LuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura aboutLuckyTran has a spectacular aura about
I run on 12th gen i9-12950HX (with E+P HT cores enabled by the brilliant IT folks) and have no issues. Mine is an (8+8)+8 = (16)/24 configuration


User specific hardware problems are not uncommon. I used to have issues with a previous PC with an unlocked CPU that came with a lot of factory overclocked settings and I also had random crashes when the PC was heavily loaded. It was eventually resolved by running everything at base speeds and base multiplier for CPU, RAM, and the RTX2070 as well.
rocket_science and fini like this.
LuckyTran is offline   Reply With Quote

Old   August 12, 2023, 13:26
Default
  #14
New Member
 
Join Date: Aug 2023
Posts: 3
Rep Power: 3
rocket_science is on a distinguished road
So you have 24 logical cores and how many cores you select to run fluent? what is your windows version? fluent version? and what MPI you use?

Last edited by rocket_science; August 13, 2023 at 10:26.
rocket_science is offline   Reply With Quote

Old   September 15, 2023, 16:51
Default
  #15
Senior Member
 
Join Date: Jun 2011
Posts: 208
Rep Power: 16
CFDfan is on a distinguished road
Quote:
Originally Posted by rocket_science View Post
I have i5-12600KF CPU running on Win11 and I'm having problems with simulations using ANSYS (both CFX and Fluent).

First of all, I investigated that maximal performance for my PC is when I start run at all cores (P + E) and HT is turned on. In my case - 16 processes.

The second is, almost everytime I have random solver crashes with no reason. I run the same task on different PC with i7-11gen CPUs without problems.

I tried various combos of processes and bios settings like HT on/off, E-cores on/off, C-states on/off, 6/10/12/16 processes, intelmpi/msmpi with no luck at all. I turned off MS Defender and Firewall, did clean Win11 install, tried Win10 etc. Tried different versions of ANSYS starting from 2020R2. Tried various affinity commands and env variables.

I believe something is wrong with MPI or Windows for these hybrid 12th/13th gen Intel CPUs. Or may be I have a faulty hardware...

Can anybody share experience of using 12th/13th cpus on Windows with ANSYS CFD?
There are free tools checking the stability of the hardware (CPU, memory, GPU, hard drives, etc), like OCCT, Aida64, IntelBurnTest, Prime95, memtest, FurMark, BurnInTest. Also you could run the embedded in Windows "Windows Memory Diagnostic" that will test you RAM. Running such tools you might be able to identify the hardware section that is failing and replace it. Overclocking the CPU and/or the memory doesn't help with the crashes either.
As a very first step in your case I would reseat all the RAM cards. Also with some motherboards the stray capacitance between the chassis plate and the RAM traces on the motherboard affect the performance/stability of the memory. For example, a couple of years ago, the support team of Asrock recommended me to lift their motherboard a couple of mm above the chassis, or cut off the chassis plate beneath the RAM slots. The latter fixed the memory errors I was getting.
fini likes this.

Last edited by CFDfan; September 16, 2023 at 05:11.
CFDfan is offline   Reply With Quote

Old   April 16, 2024, 06:30
Default
  #16
New Member
 
Davide
Join Date: Jun 2019
Posts: 3
Rep Power: 7
dshado is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Let's stick with comparing apples to apples. The P-cores on the I9-13900k are 63% faster than its E-cores, according to your results.
Comparing it to a knee-capped, different CPU is not the point.

The point about E-cores being useless for CFD is this: You can not run a simulation across both P- and E-cores. It will limit execution speed to whatever the slower E-cores can handle. Now you could start to get creative with load balancing, but: the 8 P-cores already provide enough FP performance to saturate the memory subsystem. It gets even worse when per-core licenses are involved.
Actually I'm testing a Intel i9-13900 with OpenFoam v2312 - Ubuntu 22.04 with efficiency cores and HT enabled by default (actually I can't access the BIOS), as far as I know

I tested the 3D cavity with 1M cells (https://develop.openfoam.com/committ...oFoam/cavity3D)

Results are quite strange/interesting

8 core: ExecutionTime = 24.07 s ClockTime = 24 s
24 core: ExecutionTime = 22.33 s ClockTime = 23 s

The processor has 8 performance core and 16 efficient cores. Seems like the efficient cores are not contributing to the simulation, but from the lscpu log seems like they are enabled.

Code:
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ     MHZ
  0    0      0    0 0:0:0:0           si 5300,0000 800,0000 800.000
  1    0      0    0 0:0:0:0           si 5300,0000 800,0000 800.000
  2    0      0    1 4:4:1:0           si 5300,0000 800,0000 800.000
  3    0      0    1 4:4:1:0           si 5300,0000 800,0000 800.000
  4    0      0    2 8:8:2:0           si 5300,0000 800,0000 800.000
  5    0      0    2 8:8:2:0           si 5300,0000 800,0000 900.007
  6    0      0    3 12:12:3:0         si 5300,0000 800,0000 800.000
  7    0      0    3 12:12:3:0         si 5300,0000 800,0000 800.000
  8    0      0    4 16:16:4:0         si 5600,0000 800,0000 800.000
  9    0      0    4 16:16:4:0         si 5600,0000 800,0000 800.000
 10    0      0    5 20:20:5:0         si 5600,0000 800,0000 800.000
 11    0      0    5 20:20:5:0         si 5600,0000 800,0000 800.000
 12    0      0    6 24:24:6:0         si 5300,0000 800,0000 800.000
 13    0      0    6 24:24:6:0         si 5300,0000 800,0000 800.000
 14    0      0    7 28:28:7:0         si 5300,0000 800,0000 800.000
 15    0      0    7 28:28:7:0         si 5300,0000 800,0000 800.000
 16    0      0    8 32:32:8:0         si 4200,0000 800,0000 800.000
 17    0      0    9 33:33:8:0         si 4200,0000 800,0000 800.000
 18    0      0   10 34:34:8:0         si 4200,0000 800,0000 800.000
 19    0      0   11 35:35:8:0         si 4200,0000 800,0000 800.000
 20    0      0   12 36:36:9:0         si 4200,0000 800,0000 800.000
 21    0      0   13 37:37:9:0         si 4200,0000 800,0000 799.876
 22    0      0   14 38:38:9:0         si 4200,0000 800,0000 800.000
 23    0      0   15 39:39:9:0         si 4200,0000 800,0000 800.000
 24    0      0   16 40:40:10:0        si 4200,0000 800,0000 800.000
 25    0      0   17 41:41:10:0        si 4200,0000 800,0000 800.000
 26    0      0   18 42:42:10:0        si 4200,0000 800,0000 800.000
 27    0      0   19 43:43:10:0        si 4200,0000 800,0000 800.000
 28    0      0   20 44:44:11:0        si 4200,0000 800,0000 800.000
 29    0      0   21 45:45:11:0        si 4200,0000 800,0000 800.000
 30    0      0   22 46:46:11:0        si 4200,0000 800,0000 800.000
 31    0      0   23 47:47:11:0        si 4200,0000 800,0000 800.000
dshado is offline   Reply With Quote

Old   April 17, 2024, 18:01
Default
  #17
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by dshado View Post
Actually I'm testing a Intel i9-13900 with OpenFoam v2312 - Ubuntu 22.04 with efficiency cores and HT enabled by default (actually I can't access the BIOS), as far as I know

I tested the 3D cavity with 1M cells (https://develop.openfoam.com/committ...oFoam/cavity3D)

Results are quite strange/interesting

8 core: ExecutionTime = 24.07 s ClockTime = 24 s
24 core: ExecutionTime = 22.33 s ClockTime = 23 s
For comparison, I ran this case on Dual E5-2697 v2, 24 cores and 48 threads total, with 8-channel (2x4-channel) memory at 1866 MT/s. Total memory is 128 GB.

8 core: ExecutionTime = 9.62 s
16 core: ExecutionTime = 6.54 s
24 core: ExecutionTime = 5.88 s

This workstation shows reduced benefit from additional cores due to the memory bandwidth bottleneck as well. On your machine, there is the second factor of additional cores having a lower performance. To sort out the different factors, it will be necessary to run a few more cases. Before you do that you should check your memory speed with "sudo dmidecode -t 17" if you can, or involve the IT department in obtaining the maximum DDR5 memory speed (probably 7200 MT/s). Your CDF performance is essentially proportional to the memory speed. This should be done anyway, because your results are a lot slower than this workstation. That is not normal. This workstation has a similar performance to your system with properly tuned memory.

Keep in mind that two threads on a P-core are slower than two threads on two E-cores. Since you have no access to the bios, you may use an openmpi configuration that distributes threads over P and E cores one to each core (so that you don't use P-cores for two threads. It is important to give the operating system freedom to choose which core gets which thread. This will cause P-cores to become available for slow E-core threads as soon as they finish their work.
wkernkamp is offline   Reply With Quote

Old   April 18, 2024, 05:26
Default
  #18
New Member
 
Davide
Join Date: Jun 2019
Posts: 3
Rep Power: 7
dshado is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
For comparison, I ran this case on Dual E5-2697 v2, 24 cores and 48 threads total, with 8-channel (2x4-channel) memory at 1866 MT/s. Total memory is 128 GB.

8 core: ExecutionTime = 9.62 s
16 core: ExecutionTime = 6.54 s
24 core: ExecutionTime = 5.88 s

This workstation shows reduced benefit from additional cores due to the memory bandwidth bottleneck as well. On your machine, there is the second factor of additional cores having a lower performance. To sort out the different factors, it will be necessary to run a few more cases. Before you do that you should check your memory speed with "sudo dmidecode -t 17" if you can, or involve the IT department in obtaining the maximum DDR5 memory speed (probably 7200 MT/s). Your CDF performance is essentially proportional to the memory speed. This should be done anyway, because your results are a lot slower than this workstation. That is not normal. This workstation has a similar performance to your system with properly tuned memory.

Keep in mind that two threads on a P-core are slower than two threads on two E-cores. Since you have no access to the bios, you may use an openmpi configuration that distributes threads over P and E cores one to each core (so that you don't use P-cores for two threads. It is important to give the operating system freedom to choose which core gets which thread. This will cause P-cores to become available for slow E-core threads as soon as they finish their work.
That's the dmidecode's output, of one RAM. The speed is 4800 MT/s.

Code:
	Array Handle: 0x000C
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: Controller1-DIMM1
	Bank Locator: BANK 0
	Type: DDR5
	Type Detail: Synchronous
	Speed: 4800 MT/s
	Manufacturer: Crucial Technology
	Serial Number: E7EC59E5
	Asset Tag: 9876543210
	Part Number: CT32G48C40U5.C16A1  
	Rank: 2
	Configured Memory Speed: 4800 MT/s
	Minimum Voltage: 1.1 V
	Maximum Voltage: 1.1 V
	Configured Voltage: 1.1 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: Not Specified
	Module Manufacturer ID: Bank 6, Hex 0x9B
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None
In this system there is 4x32GB of memory. There is something IT could do about that?

Regarding OpenMPI: do you have any suggestion?
Doing some test with bigger meshes and different solvers I also found

1. Going from 8 to 24 core the trend is the same, with a 30% speed-up limited by the memory
2. Going from 8 to 16 there is no speed-up. My only thought is that it is caused by the bottleneck caused by the E-cores, so a different OpenMPI configuration could be useful.


Thanks!

Last edited by dshado; April 18, 2024 at 14:30.
dshado is offline   Reply With Quote

Old   April 19, 2024, 19:34
Default
  #19
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
The 13900K should be run with DDR5-7200. That should be perfectly stable. The CFD jobs should speedup by a ratio of ~ 7200/4800. That is why you should have your tech support reconfigure your memory for the higher speed. It is not expensive and really increases your productivity.

For the Ryzen 7700X you should do the same. (If you care about the performance of that CPU). See the discussion here //www.cfd-online.com/Forums/hardware/255589-g-skill-release-ddr5-8400-cl40-kit

The correct use of P-cores and E-cores has only a marginal effect on performance. Not sure how you do a parallel run in Fluent exactly. If it is run in parallel through openmpi you would call something like "mpirun --cpu-set 0,2,4,6,8,10,12,14,16-31 -np 24 Fluent" This should run single threads on each of the cores without binding them to let them move to a faster core when available. Not sure if I did this exactly right because I have not recently been fiddling with openmpi settings. The eventual settings to be used can be put in a config file or remain on the command line.

Note that when np < 24, the threads will execute preferentially on the performance cores. So the cpu-set should work for lower thread counts.
wkernkamp is offline   Reply With Quote

Old   May 4, 2024, 09:35
Default
  #20
New Member
 
WeiHeming
Join Date: Feb 2024
Posts: 2
Rep Power: 0
mithraLa is on a distinguished road
i have similar problem(7700 with dual memories 6000Mhz and my laptop 7845hx dual memories 4800mhz). i closed pbo,and tried intel mpi and msmpi。they all crash but except different ErroMessages.
intel mpi probably have some bugs while the Internet doesn't work.
mithraLa is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
LES Setup of a cyclic channel flow for compressible solver Phil910 OpenFOAM Running, Solving & CFD 3 November 14, 2024 08:42
[OpenFOAM.com] Compile OpenFoam using Intel ICC on OpenLogic Centos 7.3 for Intel MPI and INFINIBAND kishoremg040 OpenFOAM Installation 1 May 6, 2018 14:21
[OpenFOAM] Color display problem to view OpenFOAM results. Sargam05 ParaView 16 May 11, 2013 01:10
CFX11 + Fortran compiler ? Mohan CFX 20 March 30, 2011 19:56


All times are GMT -4. The time now is 08:58.