|
[Sponsors] |
AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
October 25, 2017, 06:22 |
AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X
|
#1 |
Member
benoit paillard
Join Date: Mar 2010
Posts: 96
Rep Power: 16 |
Hi all,
After all the talks about these two new core families, I had the opportunity to build two new stations, one with each. AMD Ryzen Threadripper 1920X, 3.5 GHz, 12 cores, 24 threads, 658.25€ in France http://www.amd.com/fr/products/cpu/a...adripper-1920x Intel Core i7 7820X, 3.6 GHz, 8 cores, 16 threads, 541.58€ in France https://www.intel.fr/content/www/fr/.../i7-7820x.html Motherboard for AMD is 38 euros more expensive, the cooling is 30 euros more expensive, and the power supply is bigger so 15 euros more expensive. So let's assume the overall cost is 741.25 for AMD Both cores were tried hyperthreaded. They have the exact same memory fitted : Corsair Mémoire PC Vengeance LPX - DDR4 - Kit 32Go (4x 8 Go) - 3200 MHz - CL16 - The memory was more than enough for all cases tested. And the exact same drives. No overclocking was used. The results on OpenFOAM are Motorbike simpleFOAM (OF v5.0) on 6 cores AMD : ExecutionTime = 153.31 s ClockTime = 155 s Intel : ExecutionTime = 148.6 s ClockTime = 155 s DTCHull interDyMFOAM (OF v1706) on 8 cores AMD : ExecutionTime = 56577.9 s ClockTime = 56665 s Intel : ExecutionTime = 52854.7 s ClockTime = 52888 s If you compute the "euros * time /core" index you get : AMD : 3500244 Intel : 3580385 So it is very close, but AMD is still a good choice. I'd like to add that AMD temperature sensing was messy, with lm-sensors not reading it. But after managing to see the temperature during the runs, Intel reached 70deg C while AMD was only 50. Last edited by bennn; October 25, 2017 at 09:28. Reason: Changed dimm and temperature info ; added platform cost |
|
October 25, 2017, 06:41 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Thanks for sharing your results.
However, I am not quite convinced by your metric. So far, the Intel chip (let alone the platform) costs less and is faster. I would be more interested in a comparison running with the maximum amount of physical cores available. Which exact memory are you using? Did both cases fit in the memory? |
|
October 25, 2017, 09:14 |
|
#3 |
Member
benoit paillard
Join Date: Mar 2010
Posts: 96
Rep Power: 16 |
Well my understanding is that, thinking in not hyperthreaded logic, AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.
Ho and by the way the motherboard is 38 euros more expensive for AMD now. I should add that indeed. I've updated my initial post with answers re dimms I'm open to any feedback or test that you think make sense. |
|
October 25, 2017, 09:29 |
|
#4 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
Because it still has 4 cores left idling? That seems like quite a daring extrapolation. Go ahead and try it, you might be surprised. CFD performance usually does not scale linearly with the number of cores. That's why I would be more interested in a comparison with the full amount of physical cores. 12 for AMD, 8 for Intel. |
||
October 25, 2017, 09:35 |
|
#5 |
Member
benoit paillard
Join Date: Mar 2010
Posts: 96
Rep Power: 16 |
You understand though that I can't just increase the amount of parallel domains just for one chip, otherwise the results are biased right ?
Is it ok for you if I launch concurrently 2 of the same motorbike case on 8 cores on Intel, and 3 on AMD ? |
|
October 25, 2017, 09:45 |
|
#6 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Biased in which sense? Higher communication overhead due to a larger number of smaller domains? That is exactly why I always prefer a smaller number of faster cores over a larger number of slower cores.
Running several cases concurrently, the results will also be "biased" due to a lack of total memory bandwidth. Plus you need 50% more memory in total if you want to run 50% more cases simultaneously. Which increases the hardware cost. When I need a result, I am interested in how fast my computer can provide it. Avoiding biases caused by parallel efficiencies <100% is usually the least of my worries and sounds more like cherry-picking to me. |
|
October 26, 2017, 11:52 |
|
#7 |
New Member
Join Date: Apr 2016
Posts: 12
Rep Power: 10 |
I'm also interested in some results for these chips with some specific settings:
1. Hyperthreading turned off. 2. All cores are used on both cpus, but for only one job/CPU. 3. Run the parallel threads with affinity set (mpirun -np (number of cores) -bind-to hwthread) As I have read it on this forum many times, and experienced it myself too, hyperthreading is most of the time useless for CFD. I think that all cores should be used if possible. Off course it will be biased in some way, but you won't buy hardware with 12 cores to have 4 idling. The last thing, affinity will help the AMD CPU most likely, as due to the architecture it acts like as multiple CPUs (considering the higher latency communication between the different CCX-es ). Also, I don't know, if the different available instruction sets (AVX2 vs AVX512) have influence on the results, but it's possible that they do. |
|
October 26, 2017, 12:57 |
|
#8 |
Member
benoit paillard
Join Date: Mar 2010
Posts: 96
Rep Power: 16 |
Hi all, latest tests :
motorBike on all CPUs : AMD : 113s Intel : 135s and now that is counter-intuitive for me, but using --bind-to hwthread actually makes computation time twice as long for AMD and 1.5 for Intel. Using --bind-to none solves the issue, and is the way to get for several single-threaded jobs. |
|
October 27, 2017, 07:25 |
|
#9 |
Senior Member
Robert
Join Date: Jun 2010
Posts: 117
Rep Power: 17 |
Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?
If it is half as fast it looks like you might of locked to both the physical and hyperthreaded core and left half the cores unused. Iirc (and I may not) you need to lock to every other core 0,2. We always found core locking worked better on the Xeons, admittedly dual processor systems where a thread being pushed to the other core would cause a major loss in cache efficiency. |
|
October 30, 2017, 09:17 |
|
#10 |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 539
Rep Power: 20 |
Hi Benoit,
we ran the Motorbike case on a Xeon E5-1650 v3 (6 core processor) with hyperthreading turned off on 6 cores and got: ExecutionTime = 167.03 s ClockTime = 169 s How does this compare to your machines, with HT disabled? Thanks Jörn |
|
October 30, 2017, 11:44 |
|
#11 | |
New Member
Join Date: Apr 2016
Posts: 12
Rep Power: 10 |
Quote:
On my WS the results (Clocktime, Motorbike case, OFv5): 73s (with -bind-to hwthread) 110s (withouth it) The machine is: Dual Xeon E5-2673 v3 (all-core turbo 2.7 GHz, 12core/cpu) 8x8GB single rank dimms HT off |
||
November 2, 2017, 04:28 |
|
#12 |
Member
benoit paillard
Join Date: Mar 2010
Posts: 96
Rep Power: 16 |
Ok so the results with HT off is exactly the same. With HT on running with 8 or 16 cores for intel chip, and 12 or 24 cores for AMD chip, all give the same results as well.
No improvement with any bind-to setting for now. Testing multiple single CPU jobs in the next few days. |
|
November 24, 2017, 11:51 |
|
#13 | |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
Quote:
|
||
November 27, 2017, 08:33 |
|
#14 |
New Member
Join Date: Apr 2016
Posts: 12
Rep Power: 10 |
I have used the same, default hierarchical decomposition (with n = (6 4 1)) with the same number of domains. So yes, it show the 'performance' of process binding.
|
|
November 27, 2017, 09:25 |
|
#15 |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
So do you time the simpleFoam execution or is it everything in the Allrun script file?
Using 14 threads on a 7940X (HT enabled), with decomposition (7-2-1), I have done some benchmarks. Assuming you time the simpleFoam only then: Code:
$ time mpirun -np 14 -bind-to none simpleFoam -parallel Code:
$ time mpirun -np 14 -bind-to hwthread simpleFoam -parallel A simple Code:
$ time ./Allrun |
|
November 27, 2017, 09:36 |
|
#16 |
New Member
Join Date: Apr 2016
Posts: 12
Rep Power: 10 |
If you use -bind-to-hwthread with HT turned on, I guess processes will be bind to the 'real' and 'HT' cores as well. So it may be better to use bind to cores. I only timed the simpleFoam execution btw.
|
|
January 23, 2018, 23:07 |
|
#17 |
New Member
Join Date: Jan 2018
Posts: 7
Rep Power: 8 |
Hi and thanks for this and other similar conversations, buying kit can be a pain without some information beforehand, and this forum eases that pain quite significantly
I'd like to add the overclocking capabilities of Skylake-X to this conversation. I recently purchased a 7820X and am running OpenFOAM with it, quite succesfully. My chip (and pretty much all of them) will run 4,5 GHz on all cores on air cooling with ease. This is of course true (with some limitations) on the i9 chips as well, and the results improve beyond their AMD counterparts. With 32 GB of 3200 MHz memory, I can run the simpleFoam part of motorBike-tutorial in 121 seconds on 8 threads, which in my mind makes the Skylake look better value than Threadripper for OF use at least, when considering the disparity in motherboard and cooling costs. Cheers |
|
January 24, 2018, 02:01 |
|
#18 |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 539
Rep Power: 20 |
Thanks for sharing the results. We usually used 6 cores for this benchmark. So it is easier to compare the results.
It would be interesting to see some results from the Epyc for this benchmark. |
|
January 24, 2018, 04:37 |
|
#19 |
Senior Member
Join Date: May 2012
Posts: 552
Rep Power: 16 |
Thank you for sharing the OC results. Was it with Allrun or with just the solver?
|
|
January 24, 2018, 15:57 |
|
#20 |
New Member
Join Date: Jan 2018
Posts: 7
Rep Power: 8 |
6 cores run in 134 seconds.
Both results are for just the solver, with Code:
time mpirun -bind-to none -np 6 simpleFoam -parallel |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
solving a conduction problem in FLUENT using UDF | Avin2407 | Fluent UDF and Scheme Programming | 1 | March 13, 2015 03:02 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
[OpenFOAM] Color display problem to view OpenFOAM results. | Sargam05 | ParaView | 16 | May 11, 2013 01:10 |
CFX11 + Fortran compiler ? | Mohan | CFX | 20 | March 30, 2011 19:56 |
AMD X2 & INTEL core 2 are compatible for parallel? | nikolas | FLUENT | 0 | October 5, 2006 07:49 |