Intel i9 13900K with 8 channel were are Game Changer for CFD

Duke711 · January 6, 2023, 15:12

Testcase: Fluent 2,2 Mio.

AMD 7900x "suck up"

Intels E-cores works for CFD. 13900 has only 8 P (Power Cores) an speed up with more select Cores in the Setup.

https://www.dropbox.com/s/on91aqe5zi...luent.jpg?dl=0

Habib-CFD · January 6, 2023, 17:47

Hi, something is going wrong with your benchmark:
On 8 cores the 13900k result should be at least twice as fast as the ancient 2695 v2 regardless of the memory bandwidth limit!

Duke711 · January 6, 2023, 19:00

Quote:

Originally Posted by Habib-CFD

Hi, something is going wrong with your benchmark:
On 8 cores the 13900k result should be at least twice as fast as the ancient 2695 v2 regardless of the memory bandwidth limit!

No, the 13900k has only two Memory Channel. The Memory Bandwith of the 2695 v2 with eight Memory Channel and 110 GB/s is higher than 85 GB/s of the 13900 k. The 13900 K has to small Memory Bandwith.

flotus1 · January 7, 2023, 06:11

Please disable the E-cores in Bios, disable Hyperthreading, and run the test again with 8 threads.

Duke711 · January 7, 2023, 10:40

SMT has no very effect of Solution Time; ~ 5% - 10%. But Fluent is by select too many Cores on too small Memory Bandwidth very buggy. Solution Speed can by solving very slow down, with and without SMT. By a repeat the results, by 12 and 16 selected Cores, were a same. Only a Bug on 8 selected Cores. Now i think the Solution Time on 8, 12 and 16 selected Cores are the same. The difference between 291, 322 and 286 seconds are probably measuring tolerance. The 13900 K has to few Memory Channel to find out that E-Cores to perform on CFD. I think that is possible.

SMT on, E-Cores on

https://www.dropbox.com/s/xl8np06bcs...uent1.jpg?dl=0
https://www.dropbox.com/s/euthm0u6mz...uent2.jpg?dl=0
https://www.dropbox.com/s/iik0mavptq...uent3.jpg?dl=0

flotus1 · January 7, 2023, 12:56

The reason I brought this up: I highly suspect that the results you got initially were mostly influenced by scheduler issues. I.e. the operating system not being clever enough to pin the threads exclusively to performance cores.
Now you could try to manually pin threads and monitor how that goes... or much easier, just disable SMT and E-cores.
If I am reading this right, your second batch of results confirms my suspicion. Since you were able to get pretty much maximum performance on 8 threads.

Duke711 · January 7, 2023, 14:02

SMT and E Cores off has no effect-

SMT and E Cores on:

4 Cores:-> 353 seconds
6 Cores:-> 316 seconds
8 Cores:-> 291 seconds

SMT and E Cores off:

4 Cores:-> 380 seconds
6 Cores:-> 325 seconds
8 Cores:-> 270 seconds

flotus1 · January 7, 2023, 14:25

Then what happened with the results in the first post here?

Anyway, I think the hypothesis stated in the first post -about E-cores in current-gen desktop CPUs being useful for CFD- has been thoroughly debunked. Not that I had much doubt about that, but it can't hurt to check from time to time.

Duke711 · January 7, 2023, 15:51

with only one P Core:

E Cores only runs with 4300 Mhz (P Core / 5500 Mhz)

SMT on, all E Cores on, seven P Cores deaktivated:

4 Cores:-> 600 seconds
6 Cores:-> 579 seconds
8 Cores:-> 440 seconds

7900X ; DDR5 3600 Mhz

8 Cores:-> 443 seconds

2x E5 2695 v2

4 Cores:-> 967 seconds
6 Cores:-> 743 seconds
8 Cores:-> 534 seconds

E Cores works for CFD. Slower (only -15% performance lost -> 5500 / 4300 or 7900X ) but with very high efficiency and very low power consumption.

wkernkamp · January 7, 2023, 20:06

Why did you not run the 7900x at DDR5-5200?

flotus1 · January 7, 2023, 21:10

Quote:

E Cores works for CFD. Slower (only -15% performance lost -> 5500 / 4300 or 7900X ) but with very high efficiency and very low power consumption.

Let's stick with comparing apples to apples. The P-cores on the I9-13900k are 63% faster than its E-cores, according to your results.
Comparing it to a knee-capped, different CPU is not the point.

The point about E-cores being useless for CFD is this: You can not run a simulation across both P- and E-cores. It will limit execution speed to whatever the slower E-cores can handle. Now you could start to get creative with load balancing, but: the 8 P-cores already provide enough FP performance to saturate the memory subsystem. It gets even worse when per-core licenses are involved.

rocket_science · August 7, 2023, 12:31

I have i5-12600KF CPU running on Win11 and I'm having problems with simulations using ANSYS (both CFX and Fluent).

First of all, I investigated that maximal performance for my PC is when I start run at all cores (P + E) and HT is turned on. In my case - 16 processes.

The second is, almost everytime I have random solver crashes with no reason. I run the same task on different PC with i7-11gen CPUs without problems.

I tried various combos of processes and bios settings like HT on/off, E-cores on/off, C-states on/off, 6/10/12/16 processes, intelmpi/msmpi with no luck at all. I turned off MS Defender and Firewall, did clean Win11 install, tried Win10 etc. Tried different versions of ANSYS starting from 2020R2. Tried various affinity commands and env variables.

I believe something is wrong with MPI or Windows for these hybrid 12th/13th gen Intel CPUs. Or may be I have a faulty hardware...

Can anybody share experience of using 12th/13th cpus on Windows with ANSYS CFD?

LuckyTran · August 9, 2023, 16:37

I run on 12th gen i9-12950HX (with E+P HT cores enabled by the brilliant IT folks) and have no issues. Mine is an (8+8)+8 = (16)/24 configuration

User specific hardware problems are not uncommon. I used to have issues with a previous PC with an unlocked CPU that came with a lot of factory overclocked settings and I also had random crashes when the PC was heavily loaded. It was eventually resolved by running everything at base speeds and base multiplier for CPU, RAM, and the RTX2070 as well.

rocket_science · August 12, 2023, 13:26

So you have 24 logical cores and how many cores you select to run fluent? what is your windows version? fluent version? and what MPI you use?

CFDfan · September 15, 2023, 16:51

Quote:

Originally Posted by rocket_science

I have i5-12600KF CPU running on Win11 and I'm having problems with simulations using ANSYS (both CFX and Fluent).

First of all, I investigated that maximal performance for my PC is when I start run at all cores (P + E) and HT is turned on. In my case - 16 processes.

The second is, almost everytime I have random solver crashes with no reason. I run the same task on different PC with i7-11gen CPUs without problems.

I tried various combos of processes and bios settings like HT on/off, E-cores on/off, C-states on/off, 6/10/12/16 processes, intelmpi/msmpi with no luck at all. I turned off MS Defender and Firewall, did clean Win11 install, tried Win10 etc. Tried different versions of ANSYS starting from 2020R2. Tried various affinity commands and env variables.

I believe something is wrong with MPI or Windows for these hybrid 12th/13th gen Intel CPUs. Or may be I have a faulty hardware...

Can anybody share experience of using 12th/13th cpus on Windows with ANSYS CFD?

There are free tools checking the stability of the hardware (CPU, memory, GPU, hard drives, etc), like OCCT, Aida64, IntelBurnTest, Prime95, memtest, FurMark, BurnInTest. Also you could run the embedded in Windows "Windows Memory Diagnostic" that will test you RAM. Running such tools you might be able to identify the hardware section that is failing and replace it. Overclocking the CPU and/or the memory doesn't help with the crashes either.
As a very first step in your case I would reseat all the RAM cards. Also with some motherboards the stray capacitance between the chassis plate and the RAM traces on the motherboard affect the performance/stability of the memory. For example, a couple of years ago, the support team of Asrock recommended me to lift their motherboard a couple of mm above the chassis, or cut off the chassis plate beneath the RAM slots. The latter fixed the memory errors I was getting.

dshado · April 16, 2024, 06:30

Quote:

Originally Posted by flotus1

Let's stick with comparing apples to apples. The P-cores on the I9-13900k are 63% faster than its E-cores, according to your results.
Comparing it to a knee-capped, different CPU is not the point.

The point about E-cores being useless for CFD is this: You can not run a simulation across both P- and E-cores. It will limit execution speed to whatever the slower E-cores can handle. Now you could start to get creative with load balancing, but: the 8 P-cores already provide enough FP performance to saturate the memory subsystem. It gets even worse when per-core licenses are involved.

Actually I'm testing a Intel i9-13900 with OpenFoam v2312 - Ubuntu 22.04 with efficiency cores and HT enabled by default (actually I can't access the BIOS), as far as I know

I tested the 3D cavity with 1M cells (https://develop.openfoam.com/committ...oFoam/cavity3D)

Results are quite strange/interesting

8 core: ExecutionTime = 24.07 s ClockTime = 24 s
24 core: ExecutionTime = 22.33 s ClockTime = 23 s

The processor has 8 performance core and 16 efficient cores. Seems like the efficient cores are not contributing to the simulation, but from the lscpu log seems like they are enabled.

Code:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ     MHZ
  0    0      0    0 0:0:0:0           si 5300,0000 800,0000 800.000
  1    0      0    0 0:0:0:0           si 5300,0000 800,0000 800.000
  2    0      0    1 4:4:1:0           si 5300,0000 800,0000 800.000
  3    0      0    1 4:4:1:0           si 5300,0000 800,0000 800.000
  4    0      0    2 8:8:2:0           si 5300,0000 800,0000 800.000
  5    0      0    2 8:8:2:0           si 5300,0000 800,0000 900.007
  6    0      0    3 12:12:3:0         si 5300,0000 800,0000 800.000
  7    0      0    3 12:12:3:0         si 5300,0000 800,0000 800.000
  8    0      0    4 16:16:4:0         si 5600,0000 800,0000 800.000
  9    0      0    4 16:16:4:0         si 5600,0000 800,0000 800.000
 10    0      0    5 20:20:5:0         si 5600,0000 800,0000 800.000
 11    0      0    5 20:20:5:0         si 5600,0000 800,0000 800.000
 12    0      0    6 24:24:6:0         si 5300,0000 800,0000 800.000
 13    0      0    6 24:24:6:0         si 5300,0000 800,0000 800.000
 14    0      0    7 28:28:7:0         si 5300,0000 800,0000 800.000
 15    0      0    7 28:28:7:0         si 5300,0000 800,0000 800.000
 16    0      0    8 32:32:8:0         si 4200,0000 800,0000 800.000
 17    0      0    9 33:33:8:0         si 4200,0000 800,0000 800.000
 18    0      0   10 34:34:8:0         si 4200,0000 800,0000 800.000
 19    0      0   11 35:35:8:0         si 4200,0000 800,0000 800.000
 20    0      0   12 36:36:9:0         si 4200,0000 800,0000 800.000
 21    0      0   13 37:37:9:0         si 4200,0000 800,0000 799.876
 22    0      0   14 38:38:9:0         si 4200,0000 800,0000 800.000
 23    0      0   15 39:39:9:0         si 4200,0000 800,0000 800.000
 24    0      0   16 40:40:10:0        si 4200,0000 800,0000 800.000
 25    0      0   17 41:41:10:0        si 4200,0000 800,0000 800.000
 26    0      0   18 42:42:10:0        si 4200,0000 800,0000 800.000
 27    0      0   19 43:43:10:0        si 4200,0000 800,0000 800.000
 28    0      0   20 44:44:11:0        si 4200,0000 800,0000 800.000
 29    0      0   21 45:45:11:0        si 4200,0000 800,0000 800.000
 30    0      0   22 46:46:11:0        si 4200,0000 800,0000 800.000
 31    0      0   23 47:47:11:0        si 4200,0000 800,0000 800.000

wkernkamp · April 17, 2024, 18:01

Quote:

Originally Posted by dshado

Actually I'm testing a Intel i9-13900 with OpenFoam v2312 - Ubuntu 22.04 with efficiency cores and HT enabled by default (actually I can't access the BIOS), as far as I know

I tested the 3D cavity with 1M cells (https://develop.openfoam.com/committ...oFoam/cavity3D)

Results are quite strange/interesting

8 core: ExecutionTime = 24.07 s ClockTime = 24 s
24 core: ExecutionTime = 22.33 s ClockTime = 23 s

For comparison, I ran this case on Dual E5-2697 v2, 24 cores and 48 threads total, with 8-channel (2x4-channel) memory at 1866 MT/s. Total memory is 128 GB.

8 core: ExecutionTime = 9.62 s
16 core: ExecutionTime = 6.54 s
24 core: ExecutionTime = 5.88 s

This workstation shows reduced benefit from additional cores due to the memory bandwidth bottleneck as well. On your machine, there is the second factor of additional cores having a lower performance. To sort out the different factors, it will be necessary to run a few more cases. Before you do that you should check your memory speed with "sudo dmidecode -t 17" if you can, or involve the IT department in obtaining the maximum DDR5 memory speed (probably 7200 MT/s). Your CDF performance is essentially proportional to the memory speed. This should be done anyway, because your results are a lot slower than this workstation. That is not normal. This workstation has a similar performance to your system with properly tuned memory.

Keep in mind that two threads on a P-core are slower than two threads on two E-cores. Since you have no access to the bios, you may use an openmpi configuration that distributes threads over P and E cores one to each core (so that you don't use P-cores for two threads. It is important to give the operating system freedom to choose which core gets which thread. This will cause P-cores to become available for slow E-core threads as soon as they finish their work.

dshado · April 18, 2024, 05:26

Quote:

Originally Posted by wkernkamp

For comparison, I ran this case on Dual E5-2697 v2, 24 cores and 48 threads total, with 8-channel (2x4-channel) memory at 1866 MT/s. Total memory is 128 GB.

8 core: ExecutionTime = 9.62 s
16 core: ExecutionTime = 6.54 s
24 core: ExecutionTime = 5.88 s

This workstation shows reduced benefit from additional cores due to the memory bandwidth bottleneck as well. On your machine, there is the second factor of additional cores having a lower performance. To sort out the different factors, it will be necessary to run a few more cases. Before you do that you should check your memory speed with "sudo dmidecode -t 17" if you can, or involve the IT department in obtaining the maximum DDR5 memory speed (probably 7200 MT/s). Your CDF performance is essentially proportional to the memory speed. This should be done anyway, because your results are a lot slower than this workstation. That is not normal. This workstation has a similar performance to your system with properly tuned memory.

Keep in mind that two threads on a P-core are slower than two threads on two E-cores. Since you have no access to the bios, you may use an openmpi configuration that distributes threads over P and E cores one to each core (so that you don't use P-cores for two threads. It is important to give the operating system freedom to choose which core gets which thread. This will cause P-cores to become available for slow E-core threads as soon as they finish their work.

That's the dmidecode's output, of one RAM. The speed is 4800 MT/s.

Code:

	Array Handle: 0x000C
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: Controller1-DIMM1
	Bank Locator: BANK 0
	Type: DDR5
	Type Detail: Synchronous
	Speed: 4800 MT/s
	Manufacturer: Crucial Technology
	Serial Number: E7EC59E5
	Asset Tag: 9876543210
	Part Number: CT32G48C40U5.C16A1  
	Rank: 2
	Configured Memory Speed: 4800 MT/s
	Minimum Voltage: 1.1 V
	Maximum Voltage: 1.1 V
	Configured Voltage: 1.1 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: Not Specified
	Module Manufacturer ID: Bank 6, Hex 0x9B
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

In this system there is 4x32GB of memory. There is something IT could do about that?

Regarding OpenMPI: do you have any suggestion?
Doing some test with bigger meshes and different solvers I also found

1. Going from 8 to 24 core the trend is the same, with a 30% speed-up limited by the memory
2. Going from 8 to 16 there is no speed-up. My only thought is that it is caused by the bottleneck caused by the E-cores, so a different OpenMPI configuration could be useful.

Thanks!

wkernkamp · April 19, 2024, 19:34

The 13900K should be run with DDR5-7200. That should be perfectly stable. The CFD jobs should speedup by a ratio of ~ 7200/4800. That is why you should have your tech support reconfigure your memory for the higher speed. It is not expensive and really increases your productivity.

For the Ryzen 7700X you should do the same. (If you care about the performance of that CPU). See the discussion here //www.cfd-online.com/Forums/hardware/255589-g-skill-release-ddr5-8400-cl40-kit

The correct use of P-cores and E-cores has only a marginal effect on performance. Not sure how you do a parallel run in Fluent exactly. If it is run in parallel through openmpi you would call something like "mpirun --cpu-set 0,2,4,6,8,10,12,14,16-31 -np 24 Fluent" This should run single threads on each of the cores without binding them to let them move to a faster core when available. Not sure if I did this exactly right because I have not recently been fiddling with openmpi settings. The eventual settings to be used can be put in a config file or remain on the command line.

Note that when np < 24, the threads will execute preferentially on the performance cores. So the cpu-set should work for lower thread counts.

mithraLa · May 4, 2024, 09:35

i have similar problem(7700 with dual memories 6000Mhz and my laptop 7845hx dual memories 4800mhz). i closed pbo,and tried intel mpi and msmpi。they all crash but except different ErroMessages.
intel mpi probably have some bugs while the Internet doesn't work.

January 6, 2023, 15:12	Intel i9 13900K with 8 channel were are Game Changer for CFD	#1
Duke711 Member Join Date: Dec 2016 Posts: 44 Rep Power: 10	Testcase: Fluent 2,2 Mio. AMD 7900x "suck up" Intels E-cores works for CFD. 13900 has only 8 P (Power Cores) an speed up with more select Cores in the Setup. https://www.dropbox.com/s/on91aqe5zi...luent.jpg?dl=0 Crowdion likes this.

January 7, 2023, 14:25		#8
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	Then what happened with the results in the first post here? Anyway, I think the hypothesis stated in the first post -about E-cores in current-gen desktop CPUs being useful for CFD- has been thoroughly debunked. Not that I had much doubt about that, but it can't hurt to check from time to time. fini likes this.

August 7, 2023, 12:31		#12
rocket_science New Member Join Date: Aug 2023 Posts: 3 Rep Power: 3	I have i5-12600KF CPU running on Win11 and I'm having problems with simulations using ANSYS (both CFX and Fluent). First of all, I investigated that maximal performance for my PC is when I start run at all cores (P + E) and HT is turned on. In my case - 16 processes. The second is, almost everytime I have random solver crashes with no reason. I run the same task on different PC with i7-11gen CPUs without problems. I tried various combos of processes and bios settings like HT on/off, E-cores on/off, C-states on/off, 6/10/12/16 processes, intelmpi/msmpi with no luck at all. I turned off MS Defender and Firewall, did clean Win11 install, tried Win10 etc. Tried different versions of ANSYS starting from 2020R2. Tried various affinity commands and env variables. I believe something is wrong with MPI or Windows for these hybrid 12th/13th gen Intel CPUs. Or may be I have a faulty hardware... Can anybody share experience of using 12th/13th cpus on Windows with ANSYS CFD? fini likes this.

August 9, 2023, 16:37		#13
LuckyTran Senior Member Lucky Join Date: Apr 2011 Location: Orlando, FL USA Posts: 5,762 Rep Power: 66	I run on 12th gen i9-12950HX (with E+P HT cores enabled by the brilliant IT folks) and have no issues. Mine is an (8+8)+8 = (16)/24 configuration User specific hardware problems are not uncommon. I used to have issues with a previous PC with an unlocked CPU that came with a lot of factory overclocked settings and I also had random crashes when the PC was heavily loaded. It was eventually resolved by running everything at base speeds and base multiplier for CPU, RAM, and the RTX2070 as well. rocket_science and fini like this.

August 12, 2023, 13:26		#14
rocket_science New Member Join Date: Aug 2023 Posts: 3 Rep Power: 3	So you have 24 logical cores and how many cores you select to run fluent? what is your windows version? fluent version? and what MPI you use? Last edited by rocket_science; August 13, 2023 at 10:26.

January 6, 2023, 17:47		#2
Habib-CFD Member Join Date: Oct 2019 Posts: 65 Rep Power: 7	Hi, something is going wrong with your benchmark: On 8 cores the 13900k result should be at least twice as fast as the ancient 2695 v2 regardless of the memory bandwidth limit!

January 7, 2023, 06:11		#4
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	Please disable the E-cores in Bios, disable Hyperthreading, and run the test again with 8 threads.

January 7, 2023, 10:40		#5
Duke711 Member Join Date: Dec 2016 Posts: 44 Rep Power: 10	SMT has no very effect of Solution Time; ~ 5% - 10%. But Fluent is by select too many Cores on too small Memory Bandwidth very buggy. Solution Speed can by solving very slow down, with and without SMT. By a repeat the results, by 12 and 16 selected Cores, were a same. Only a Bug on 8 selected Cores. Now i think the Solution Time on 8, 12 and 16 selected Cores are the same. The difference between 291, 322 and 286 seconds are probably measuring tolerance. The 13900 K has to few Memory Channel to find out that E-Cores to perform on CFD. I think that is possible. SMT on, E-Cores on https://www.dropbox.com/s/xl8np06bcs...uent1.jpg?dl=0 https://www.dropbox.com/s/euthm0u6mz...uent2.jpg?dl=0 https://www.dropbox.com/s/iik0mavptq...uent3.jpg?dl=0

January 7, 2023, 12:56		#6
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,428 Rep Power: 49	The reason I brought this up: I highly suspect that the results you got initially were mostly influenced by scheduler issues. I.e. the operating system not being clever enough to pin the threads exclusively to performance cores. Now you could try to manually pin threads and monitor how that goes... or much easier, just disable SMT and E-cores. If I am reading this right, your second batch of results confirms my suspicion. Since you were able to get pretty much maximum performance on 8 threads.

January 7, 2023, 14:02		#7
Duke711 Member Join Date: Dec 2016 Posts: 44 Rep Power: 10	SMT and E Cores off has no effect- SMT and E Cores on: 4 Cores:-> 353 seconds 6 Cores:-> 316 seconds 8 Cores:-> 291 seconds SMT and E Cores off: 4 Cores:-> 380 seconds 6 Cores:-> 325 seconds 8 Cores:-> 270 seconds

January 7, 2023, 15:51		#9
Duke711 Member Join Date: Dec 2016 Posts: 44 Rep Power: 10	with only one P Core: E Cores only runs with 4300 Mhz (P Core / 5500 Mhz) SMT on, all E Cores on, seven P Cores deaktivated: 4 Cores:-> 600 seconds 6 Cores:-> 579 seconds 8 Cores:-> 440 seconds 7900X ; DDR5 3600 Mhz 8 Cores:-> 443 seconds 2x E5 2695 v2 4 Cores:-> 967 seconds 6 Cores:-> 743 seconds 8 Cores:-> 534 seconds E Cores works for CFD. Slower (only -15% performance lost -> 5500 / 4300 or 7900X ) but with very high efficiency and very low power consumption.

January 7, 2023, 20:06		#10
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 372 Rep Power: 14	Why did you not run the 7900x at DDR5-5200?

April 19, 2024, 19:34		#19
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 372 Rep Power: 14	The 13900K should be run with DDR5-7200. That should be perfectly stable. The CFD jobs should speedup by a ratio of ~ 7200/4800. That is why you should have your tech support reconfigure your memory for the higher speed. It is not expensive and really increases your productivity. For the Ryzen 7700X you should do the same. (If you care about the performance of that CPU). See the discussion here //www.cfd-online.com/Forums/hardware/255589-g-skill-release-ddr5-8400-cl40-kit The correct use of P-cores and E-cores has only a marginal effect on performance. Not sure how you do a parallel run in Fluent exactly. If it is run in parallel through openmpi you would call something like "mpirun --cpu-set 0,2,4,6,8,10,12,14,16-31 -np 24 Fluent" This should run single threads on each of the cores without binding them to let them move to a faster core when available. Not sure if I did this exactly right because I have not recently been fiddling with openmpi settings. The eventual settings to be used can be put in a config file or remain on the command line. Note that when np < 24, the threads will execute preferentially on the performance cores. So the cpu-set should work for lower thread counts.

May 4, 2024, 09:35		#20
mithraLa New Member WeiHeming Join Date: Feb 2024 Posts: 2 Rep Power: 0	i have similar problem(7700 with dual memories 6000Mhz and my laptop 7845hx dual memories 4800mhz). i closed pbo,and tried intel mpi and msmpi。they all crash but except different ErroMessages. intel mpi probably have some bugs while the Internet doesn't work.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
LES Setup of a cyclic channel flow for compressible solver	Phil910	OpenFOAM Running, Solving & CFD	3	November 14, 2024 08:42
[OpenFOAM.com] Compile OpenFoam using Intel ICC on OpenLogic Centos 7.3 for Intel MPI and INFINIBAND	kishoremg040	OpenFOAM Installation	1	May 6, 2018 14:21
[OpenFOAM] Color display problem to view OpenFOAM results.	Sargam05	ParaView	16	May 11, 2013 01:10
CFX11 + Fortran compiler ?	Mohan	CFX	20	March 30, 2011 19:56