|
[Sponsors] |
April 13, 2023, 09:53 |
my EPYC's low performance in fluent
|
#1 |
New Member
BaiYu
Join Date: Apr 2023
Posts: 1
Rep Power: 0 |
I recently built a 7T83 workstation with 2*64 core to run fluent, but its performance is not as good as the previous 2*48 core 7R32.After multiple cases are opened, the cpu usage of 7T83 is abnormally high, and the memory usage is very low, and the computing speed is very low after the CPU usage up to 80%.Is this normal?Is there any way to improve its performance (mainly to open more cases and maintain a certain speed in fluent)?
|
|
April 13, 2023, 11:23 |
|
#2 |
Member
Matt
Join Date: May 2011
Posts: 44
Rep Power: 15 |
At least for DDR4, CFD solvers are starved for memory bandwidth once there are more than four cores per memory channel (EPYC Milan has 8 memory channels, so that would be around 32 cores). Clock speeds go down as core counts go up, which is why you are seeing worse performance with 64 cores vs. 48 (and a 32-core Milan CPU may even be faster than either for CFD, especially Milan-X).
DDR5 are 3D cache are different stories; OpenFOAM benchmarks with EPYC Genoa show good speedup up to 64 cores, and near-linear speedup up to 48 cores (Genoa has 12 memory channels of 50% faster DDR5, so 225% of Milan's memory bandwidth overall). Perhaps Genoa-X (3D cache) may even be able to benefit from the max 96 cores (as Milan-X was able to do with 64 cores). Since that's an OEM CPU, I assume it's vendor-locked. Thus you may not be able to fetch a good price on the used market for those chips. Similarly, if it's a OEM-proprietary motherboard, you may not be able to upgrade to Milan-X CPUs (7773X, for instance). Although if you are running simulations with tens of millions of cells, you may not see much benefit to 3-D cache anyway. So you may just have to chalk it up to a lesson learned and do a better job matching the core count to the available memory bandwidth next upgrade cycle. |
|
April 13, 2023, 12:00 |
|
#3 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,186
Rep Power: 23 |
I'm wondering if it's just a little lower and explainable by the previous post, or is something misconfigured with the system?
How much worse are we talking here? Are all 8 memory channels populated evenly on both CPUs? |
|
April 13, 2023, 13:22 |
|
#4 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
One thread of a Fluent solver run will show 100% utilization for the core it is running on. I.e. if you start simulations with a total of 64 threads, you should see half of all 128 cores running with 100% utilization, with the other cores being mostly idle. htop would be a good tool to check that on Linux. There are a few things you need to check. 1) Hardware/memory: you need 16 identical DIMMs to get the best performance from these CPUs. If your motherboard has more than 16 DIMM slots, you also need to check they are in the correct slots. 2) Bios settings: memory needs to be set to 3200MT/s, if your DIMMs support that. Memory interleaving needs to be set to NPS=4. CPU turbo has to be enabled. And SMT has to be disabled. 3) Thermals. There are a lot of things that can overheat with in such a setup. The CPUs themselves are pretty low on that list. More likely: CPU VRMs, the memory modules themselves, and the memory VRMs. Check the temperatures of these components. At the very least, check that CPU core frequency is in a reasonable range. 4) Core binding: Especially when running several solver instances at the same time, you need to make absolutely sure that each thread gets pinned to its own core. Again, htop can give you a first indication of where the threads are actually running. And just to state the obvious: no oversubscribing. Running solver instances with more than 128 threads total is guaranteed to tank performance. Side note: I had to make a lot of assumptions here. So feel free to give us more information about your setup. Motherboard, memory, case/cooling, operating system... Edit, forgot one important part: check that all memory channels are present, if they are not, re-seat the CPUs Last edited by flotus1; April 14, 2023 at 07:47. |
||
Tags |
cfd, epyc, performance analysis, performance testing, solve time |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The fluent stopped and errors with "Emergency: received SIGHUP signal" | yuyuxuan | FLUENT | 0 | December 3, 2013 23:56 |
Fluent jobs through pbs | ibnkureshi | FLUENT | 5 | June 9, 2011 14:43 |
Is Fluent applicable to simulate velocity distribution under low pressure (~100pa)? | beastieboys | FLUENT | 0 | March 3, 2010 02:55 |
Multicomponent fluid | Andrea | CFX | 2 | October 11, 2004 06:12 |
Performance of fluent on win200 and linux | Seb | FLUENT | 7 | February 5, 2004 16:08 |