|
[Sponsors] |
May 3, 2018, 08:31 |
|
#41 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 15 |
Quote:
The CPU cores stay running solidly at 3.9 Ghz during all the benchmarks. I've run the trial version of AIDA64 and Cinebench R15, and the benchmark results are as expected when compared to other published Skylake-SP results. As for interleaving options in the BIOS, the only interleaving related parameter I can change is memory "node interleaving" where the default is "disabled" (which means NUMA is turned on). This is the Dell default and recommended setting when using NUMA-aware OS/applications. I think the "problem" I'm seeing is that the single thread memory bandwidth of Skylake-SP is actually pretty poor, and lower than previous generation Xeons and way lower than Epyc processers. See the table of results here: https://www.anandtech.com/show/11544...-the-decade/12 As a comparison, if I run the fluent benchmark on my laptop (Skylake mobile xeon), I get the following results: System CPU: 1x Intel Xeon E3-1535M v5 RAM: 4 x 16GB DDR4-2133 non-ECC OS: Windows 10 Pro Fluent: 19.0 1) External Flow Over an Aircraft Wing (aircraft_2m), single precision INTEL Single Node, 1 core, 10 iterations: 202 s So my Skylake laptop is actually 14 % faster on a single core compared to the Xeon Gold 6146 Last edited by SLC; May 4, 2018 at 05:01. |
||
May 3, 2018, 09:42 |
|
#42 |
Senior Member
Micael
Join Date: Mar 2009
Location: Canada
Posts: 157
Rep Power: 18 |
System
64 GB DDR3-2133 (8x8) i7-4960X OC 4.6 GHz 6-core windows 7 FLUENT 19.0 External Flow Over an Aircraft Wing (aircraft_2m), single precision 1 core, 10 iterations: 135 s 4 cores, 100 iterations: 380s |
|
May 3, 2018, 09:45 |
|
#43 |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 15 |
Re-ran the benchmarks after a fresh system reboot as a sanity check (no settings changed), and got a better result for the aircraft_2m, dual node 32 core test (from 87 to 77 seconds). Also tested aircraft_14m on 24 cores on a single node, as well as 36 cores on dual nodes (seeing as this is now the number of cores I can use with 2 HPC packs).
System CPU: 2x Intel Xeon Gold 6146 (12 cores, 3.9 GHz all-core turbo, 4.2 GHz single-core turbo) RAM: 12 x 8GB DDR4-2666 ECC (single rank) Interconnect: 10 GbE OS: Windows 10 Pro Fluent: 19.0 1) External Flow Over an Aircraft Wing (aircraft_2m), single precision INTEL Single Node, 1 core, 10 iterations: 234 s INTEL Single Node, 24 cores, 100 iterations: 107 s INTEL Dual Node, 32 cores, 100 iterations: 77 s INTEL Dual Node, 36 cores, 100 iterations: 68 s 2) External Flow Over an Aircraft Wing (aircraft_14m), double precision INTEL Single Node, 24 cores, 10 iterations: 141 s INTEL Dual Node, 24 cores, 10 iterations: 101 s INTEL Dual Node, 32 cores, 10 iterations: 84 s INTEL Dual Node, 36 cores, 10 iterations: 77 s INCORRECT BENCHMARKS, SEE UPDATED POST AMD Epyc CFD benchmarks with Ansys Fluent The time of 141 s for the aircraft_14m Single node 24 core run is a little disconcerting - it compares to 118.2 s that you got flotus1 on your intel system... Edit: stupid question perhaps, but how are you guys actually running the benchmarks? Just opening a Fluent session manually, opening the case file, initializing, and then running the set amount of iterations? Or are you running via batch/scripted? What are you reporting as the benchmark time? Last edited by SLC; May 4, 2018 at 07:46. |
|
May 4, 2018, 05:55 |
|
#44 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
parallel timer reset (iterate 10) ---wait for the simulation to finish--- parallel timer usage I reported the total wall clock time Edit: note that I did not initialize the case as it would overwrite the data from the benchmark file. But since we have different operating systems and Fluent versions, comparing results should be done with caution...if at all |
||
May 4, 2018, 07:45 |
|
#45 |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 15 |
Ok so that changed things. I had been previously initializing before running iterations.
So, procedure for others in case it isn't clear: - Open Fluent manually and load benchmark case and data. - Then in TUI: parallel timer resetReport the total wall clock time. Updated results: System CPU: 2x Intel Xeon Gold 6146 (12 cores, 3.9 GHz all-core turbo, 4.2 GHz single-core turbo) RAM: 12 x 8GB DDR4-2666 ECC (single rank) Interconnect: 10 GbE OS: Windows 10 Pro Fluent: 19.0 1) External Flow Over an Aircraft Wing (aircraft_2m), single precision INTEL Single Node, 1 core, 10 iterations: 165.6 s INTEL Single Node, 24 cores, 100 iterations: 95.8 s INTEL Dual Node, 32 cores, 100 iterations: 67.3 s INTEL Dual Node, 36 cores, 100 iterations: 60.4 s 2) External Flow Over an Aircraft Wing (aircraft_14m), double precision INTEL Single Node, 24 cores, 10 iterations: 108.0 s INTEL Dual Node, 24 cores, 10 iterations: 79.2 s INTEL Dual Node, 32 cores, 10 iterations: 66.2 s INTEL Dual Node, 36 cores, 10 iterations: 61.7 s |
|
May 4, 2018, 08:01 |
|
#46 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Glad we got that out of the way. So the difference up to this point was that you initialized the simulation again after loading data from benchmark files?
Now to find out how much of a bottleneck 10G Ethernet is, I would run a simulation on a single machine with 18 cores and on both machines with 36 cores. Scaling should be nearly linear (i.e. execution times cut in half) if the case is large enough and the interconnect is not slowing things down. |
|
May 4, 2018, 09:08 |
|
#47 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 15 |
Quote:
I've run through the "official" benchmark script for Fluent, here are the results (the benchmark is single precision, with 25 "timed" iterations after running 5 non-timed iterations first): Aircraft_wing_14m Note the negative scaling in running more than 20 nodes on one machine (i.e., more than 10 cores per CPU). "Node scaling" in going from 18 cores on one node to 36 cores on two nodes is 1.96 using a 10 GbE interconnect and Intel MPI. In other words, 2 % away from perfectly linear scaling. Out of interest I disabled the 10 GbE connection and ran using a 1 GbE link, and the performance reduced by 0.5 % for the 36 core run. So not a big difference between 1 GbE and 10 GbE for just two nodes. Do you think I would have gotten perfectly linear scaling with infiniband? We can compare my results to the results Ansys has published: https://www.ansys.com/solutions/solu...craft-wing-14m Their 2 x Epyc 7601 32C results are as follows: Code:
#Test: aircraft_wing_14m #Application: Fluent 18.1.0 #Platform-Short: amd-epyc_7601,2200 #Platform-Long: AMD white box,EPYC 7601, 64 cores, 2.2 GHz #Vendor-File: amd-epyc_7601,2200.txt #Details: 128GB_RAM #Processes Machines Core_Solver_Rating Core_Solver_Speedup Core_Solver_Efficiency 16 1 422.5455 16.000 100.00% 32 1 639.1714 24.203 75.63% 64 1 840.6714 31.833 49.74% 128 2 1635.5892 61.933 48.39% Solver rating comparison: 16 cores A single Intel Xeon Gold 6146 node: 368.3 A single Epyc 7601 node: 422.5 (Epyc is 14.7 % faster). 32 cores Dual Intel Xeon Gold 6146 nodes: 724.7 A single Epyc 7601 node: 639.2 (Intel is 13.4 % faster). I suspect I'm paying a lot of money for that 13.4 % improvement in 32 core performance!! |
||
May 4, 2018, 09:45 |
|
#48 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Nice writeup!
Quote:
Quote:
Although I must say that I had anticipated a slightly higher advantage for dual-node Intel. Now if only AMD had bothered to release a 16-core variant with higher clock speeds. |
|||
May 30, 2018, 09:42 |
|
#49 |
Senior Member
Micael
Join Date: Mar 2009
Location: Canada
Posts: 157
Rep Power: 18 |
Some other interesting benchmark
Dual Xeon 6150 (18-core, 2.7 GHz), 12x16GB DDR4-2666 OS: CentOS 7 CPU governor: performance SMT/Hyperthreading: off As new licensing rules now allow to add 4-core on top of HPC pack, I did run on all 36-core as well. 1) External Flow Over an Aircraft Wing (aircraft_2m), single precision FLUENT R182, 32-cores, 100 iterations: 75.4 s FLUENT R190, 24-cores, 100 iterations: 88.1 s FLUENT R190, 32-cores, 100 iterations: 74.3 s FLUENT R190, 36-cores, 100 iterations: 67.3 s 2) External Flow Over an Aircraft Wing (aircraft_14m), double precision FLUENT R182, 32-cores, 10 iterations: 73.7 FLUENT R190, 24-cores, 10 iterations: 85.3 s FLUENT R190, 32-cores, 10 iterations: 73.4 s FLUENT R190, 36-cores, 10 iterations: 70.3 s |
|
June 23, 2018, 06:44 |
|
#50 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
I know this is a forum about CFD, but could it be possible to run some benchmark comparisons on FEA between AMD and Intel CPUs?
|
|
June 25, 2018, 18:21 |
|
#51 |
Senior Member
Robert
Join Date: Jun 2010
Posts: 117
Rep Power: 17 |
Anandtech made some interesting statements about the effect of cache searching scheme on the performance of OpenFoam. The difference was referenced to be 20%.
https://www.anandtech.com/show/11544...f-the-decade/5 Has anyone else tried this? |
|
July 8, 2018, 13:44 |
|
#52 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
AMD has made some serious steps forward and Intel is indeed in a very bad situation right now!
But i thing that buying a first generation Epyc at this time is not the best possible decision, unless someone needs a modern system ASAP. Second generation EPYC is coming in 2019, based on the new "Rome" 7nm architecture. "Infinity Fabric" improvements on Gen2 may make AMD the only viable option for server customers. Even if Epyc 2nd gen is too expensive when released you can buy 1st gen Epyc at a reasonably lower price than today. |
|
July 8, 2018, 16:28 |
|
#53 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Waiting for a scheduled AMD-release in 2019? Sounds like a bit of a stretch. I learned my lesson while waiting for Epyc 1st gen availability.
There is always something new and shiny on the hardware market horizon, so the waiting game could always be played and I usually advise against it. But I would not wait for an AMD release in particular. Currently it is not the CPUs that make a CFD workstation expensive. 2x 16-core Epyc 7301: 1800$. 16x16GB DDR4: 3000$. And RAM prices probably won't go down in the foreseeable future. |
|
July 8, 2018, 16:50 |
|
#54 |
Member
Join Date: Jun 2010
Posts: 77
Rep Power: 16 |
You're right that the waiting game in high tech products is endless, but if Infinity Fabric is indeed improved in Gen2 Epyc then maybe the wait will be worth it.
As i am in the market for a new system and i am still not 100% convinced that Epyc really beats Intel (even in price/performance given that you can source some relatively cheap refurbished Xeons) would it be possible to send you an Ansys Mechanical benchmark file to make a comparison between the Epyc and E5-V4? If you can do this, please send me a pm. |
|
July 9, 2018, 07:42 |
|
#55 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Unfortunately, i don't have an Ansys license any more.
|
|
November 12, 2018, 06:33 |
|
#56 |
Member
Osman
Join Date: Oct 2012
Location: Japan
Posts: 53
Rep Power: 14 |
Hi Flotus1,
Would you recommend Ryzen Tr 2950x or i9 9900k for CFD using ANSYS. I would appreciate a lot your reply and recommendations. Thanks in advance. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Using inlet mpi in parallel ANSYS fluent with AMD processors | freebird | ANSYS | 1 | June 16, 2017 10:04 |
Can you help me with a problem in ansys static structural solver? | sourabh.porwal | Structural Mechanics | 0 | March 27, 2016 18:07 |
CFD papers Numerical study - Upwind schemes ANSYS FLUENT | Volumeoffluid | FLUENT | 0 | January 31, 2014 13:21 |
CFD papers Numerical study- upwind schemes ANSYS FLUENT | Volumeoffluid | Main CFD Forum | 0 | January 30, 2014 12:19 |
Free UK seminars: ANSYS CFD software | Gavin Butcher | CFX | 0 | November 23, 2004 10:13 |