|
[Sponsors] |
Updating CFD server E5-2697AV4 to something faster (x2/x3 the speed) |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
October 11, 2024, 04:31 |
|
#41 | |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
Quote:
What tends to matter most for implicit CFD simulations is the number of memory channels. Your new system has either 8 or 16 depending on whether you have 1 or 2 processors. Your old dual Xeon system has 8 channels. So if your new system has only 1 processor I would expect implicit CFD performance to be in the same ballpark with the same memory. But information on parallel efficiency would help with being sure about where the bottleneck lies. PS If you are only using 8 memory chips with 2 processors then that is very likely the problem since there are 16 memory channels. 8 more memory chips is likely to double the performance for runs using higher numbers of cores. |
||
October 11, 2024, 06:44 |
|
#42 |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
Thanks for your inpout Andy. Memory size is no issue, its propably the Speed.
And new RAM wasnt in budget unfortunatly. So we are working on the DIMM population order at the moment. According to the manual (https://www.hpe.com/psnow/doc/a00038346enw) we have set the 8 DIMMs in all white slots of Channels C, D, G and H of both processors, so they should run in quad channel mode. There is just this note at 8 DIMMs ***: *** Recommended only with processors that have 128 MB L3 cache or less. Well, the 7F52 has 256MB, is this an issue? Should be indeed also use more than 8 Dimms then? Previously we had the 8 DIMMs in two channels per processor. Strangely enough this was faster? edit:found the error, 1 DIMM wasnt clicked in correctly. |
|
October 11, 2024, 07:27 |
|
#43 |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
As far as I can see your system is behaving as expected when populated with insufficient memory chips. When budget allows buying the extra memory chips will likely double the performance when using higher number of cores with an implicit CFD problem of reasonable size and hence bring the performance inline with what you had hoped.
I have no experience with how to configure too few memory chips because it is not something I have ever considered doing given it substantially reduces the effective number of cores available for largish CFD runs. If you run a reasonably large implicit CFD benchmark on 1, 2, 4, 8, 16, 32, 64 cores and plot the parallel efficiency it will likely show there is currently little to be gained by using more than around 16 cores. If the machine is also used for other types of simulations these may run with a better parallel efficiency. |
|
October 11, 2024, 07:33 |
|
#44 |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
||
October 11, 2024, 08:28 |
|
#45 |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
Compared to the old server we now have a Speedup of +64%
(test with flowbench simulation). Both configurations used 60 cores with HT on. 30 cores without HT has a speedup of +36%, the wrong DIMM config with 60 cores had only +10%, the one with only 7 effective DIMMs even had -14%. |
|
October 11, 2024, 10:24 |
|
#46 |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
Thanks but have you got any information on parallel efficiency with cores using a single reasonably large implicit CFD test case? I am asking because it would be interesting to know how much performance is lost by not fully populating the memory slots. I had assumed when memory was the bottleneck it would be pretty much linear but if you can configure the cpu to memory connections perhaps this is not the case?
|
|
October 11, 2024, 17:02 |
|
#47 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
|
||
October 11, 2024, 18:20 |
|
#48 | |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
Quote:
In this case we have a lot of unknowns and it may not be possible to sort out quite what is going on without a widely used and understood CFD benchmark. The one pinned at the top of this forum has lots of results although I had to fiddle a bit to get it to run which I guess is going to put people off. The NAS parallel benchmarks were really useful for understanding this sort of thing but they didn't seem to catch on possibly because they produced a range of plots rather than a single number and later versions became rather supercomputer orientated. |
||
October 12, 2024, 10:37 |
|
#49 |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
Hi,
my testcase is a relativly small Flowbench Simulation, with up to 600.000 cells. I ran the exact same case several times with different configurations. And when calculating the speedup its very similar if I consider a) overall runtime or b) average Walltime per Timestep. With RAM limitation I dont mean the overall amount (the case needs less than 16GB), its rather that slow 2133MHz RAM where up to 3200MHz is supported now. And I am sorry, I wont do any benchmarks with different software, as I cant waste time with non project related work in my job. But I will rerun a full cycle simulation and compare the results as well. Here we talk about cases with up to 1.5 million cells, including detailed chemistry etc. |
|
October 13, 2024, 05:24 |
|
#50 |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
Thanks for the clarification. It will be interesting to see how much the 64% changes with a larger more representative simulation but without more information it looks like we will have to speculate about what might be going on and what will or will not bring improvements.
|
|
October 15, 2024, 05:44 |
|
#51 |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
I have some numbers, but of course I didnt rerun full cycle simulations with various numbers of cores. My impression is that the average walltime per timestep is reduced by 34-37% which equals nearly a speedup of 52-58% for full cycle simulations.
We are happy with that, considering we only made the following changes :
|
|
October 15, 2024, 05:55 |
|
#52 |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
I did ran a small test, 20°CA of an engine simulation and analyzed the WallTimes. HT was activated all the time.
See the figure below. We see an improvement until 32 cores, however HT works very good with this machine as 60 cores gives another improvement of 35% compared to 32 cores. |
|
October 15, 2024, 10:19 |
|
#53 |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
That does not seem to be scaling as I would expect for a typical distributed memory implicit CFD code with a reasonably large grid running on a shared memory machine (e.g. the pinned openfoam benchmark). Do you know how the solver in your program works and how it is parallelised? Can you provide a link to it because my googling "flowbench simulation" hasn't thrown up something obvious.
(And we are back to wanting to run something like the NAS parallel benchmarks in order to understand the performance). |
|
October 15, 2024, 16:01 |
|
#54 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
Quote:
I agree with andy_. If you configure your memory right with more dimms and appropriate speed, you will get linear performance to 16 cores I would think. Then it will fall off and even go down when you exceed 32 cores. |
||
October 16, 2024, 03:15 |
|
#55 | |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
Quote:
Since I am only the user, I can not install OpenFoam (have never worked with it) on my own and do some benchmark there, even if that would be very interesing. Furthermore, we are on Windows Server 2016, no Linux. If you can guide me on how to run that benchmark I may be able to convince our IT to install and configure OpenFoam, although I dont really care about performance against other CFD Servers. We see an improvement from our old server which is noteable and may be further improved with e.g. 16x8GB 3200MHz DIMMs. I dont need more RAM, since I have never exceeded 100GB of usage. |
||
October 16, 2024, 04:54 |
|
#56 | |
Senior Member
andy
Join Date: May 2009
Posts: 306
Rep Power: 18 |
Quote:
If you are using a single commercial code then running it with representative models is likely to be the most relevant benchmark. If you use other codes to perhaps check how efficiently your current commercial code is implemented then they will need to perform the same simulation or at least the same type of simulation. I am not familiar with the details of the converge code but given the size of the efficiency improvements reported for version 4 it is likely still in the process of becoming well developed. This is perhaps to be expected for a newish code (assuming it is newish) and should improve with time if the company is competently run and profitable. |
||
October 16, 2024, 05:41 |
|
#57 | |
Senior Member
Tobias
Join Date: May 2016
Location: Germany
Posts: 295
Rep Power: 11 |
Quote:
The CONVERGE Code isnt that new (more than 15/20 years old) and should be more than profitable (80% of engine developers worldwide use it). I had a look at the results again and have to make some corrections. I was using reported runtime for the speedup calculations, but a closer look showed that at short runtimes of 4 to 80 mins (which I had for core variations) the impact of simulation setup and writing output becomes overweight. So when calulating the Speedup by reported time for solving the transport equations only it is the same profile, but higher. E.g.: 16 Cores: 12.1 Speedup 32 Cores: 22.1 Speedup 48 Cores: 22.2 Speedup 60 Cores: 29.7 Speedup 64 Cores: 28.2 Speedup |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
On the CFD market and trends | sbaffini | Main CFD Forum | 14 | June 13, 2017 12:48 |
CFD Online Celebrates 20 Years Online | jola | Site News & Announcements | 22 | January 31, 2015 01:30 |
how to solve the diverage of high speed centrifugal compressor, CFD code is STAR CCM | layth | STAR-CCM+ | 3 | May 21, 2012 06:48 |
Which is better to develop in-house CFD code or to buy a available CFD package. | Tareq Al-shaalan | Main CFD Forum | 10 | June 13, 1999 00:27 |
public CFD Code development | Heinz Wilkening | Main CFD Forum | 38 | March 5, 1999 12:44 |