AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X

bennn · October 25, 2017, 06:22

Hi all,

After all the talks about these two new core families, I had the opportunity to build two new stations, one with each.

AMD Ryzen Threadripper 1920X, 3.5 GHz, 12 cores, 24 threads, 658.25€ in France
http://www.amd.com/fr/products/cpu/a...adripper-1920x

Intel Core i7 7820X, 3.6 GHz, 8 cores, 16 threads, 541.58€ in France
https://www.intel.fr/content/www/fr/.../i7-7820x.html

Motherboard for AMD is 38 euros more expensive, the cooling is 30 euros more expensive, and the power supply is bigger so 15 euros more expensive. So let's assume the overall cost is 741.25 for AMD

Both cores were tried hyperthreaded.

They have the exact same memory fitted :
Corsair Mémoire PC Vengeance LPX - DDR4 - Kit 32Go (4x 8 Go) - 3200 MHz - CL16 -
The memory was more than enough for all cases tested.

And the exact same drives. No overclocking was used.

The results on OpenFOAM are

Motorbike simpleFOAM (OF v5.0) on 6 cores
AMD : ExecutionTime = 153.31 s ClockTime = 155 s
Intel : ExecutionTime = 148.6 s ClockTime = 155 s

DTCHull interDyMFOAM (OF v1706) on 8 cores
AMD : ExecutionTime = 56577.9 s ClockTime = 56665 s
Intel : ExecutionTime = 52854.7 s ClockTime = 52888 s

If you compute the "euros * time /core" index you get :

AMD : 3500244
Intel : 3580385

So it is very close, but AMD is still a good choice.

I'd like to add that AMD temperature sensing was messy, with lm-sensors not reading it. But after managing to see the temperature during the runs, Intel reached 70deg C while AMD was only 50.

flotus1 · October 25, 2017, 06:41

Thanks for sharing your results.
However, I am not quite convinced by your metric. So far, the Intel chip (let alone the platform) costs less and is faster. I would be more interested in a comparison running with the maximum amount of physical cores available.
Which exact memory are you using? Did both cases fit in the memory?

bennn · October 25, 2017, 09:14

Well my understanding is that, thinking in not hyperthreaded logic, AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.

Ho and by the way the motherboard is 38 euros more expensive for AMD now. I should add that indeed.

I've updated my initial post with answers re dimms

I'm open to any feedback or test that you think make sense.

flotus1 · October 25, 2017, 09:29

Quote:

Originally Posted by bennn

AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.

Because it still has 4 cores left idling? That seems like quite a daring extrapolation. Go ahead and try it, you might be surprised. CFD performance usually does not scale linearly with the number of cores. That's why I would be more interested in a comparison with the full amount of physical cores. 12 for AMD, 8 for Intel.

bennn · October 25, 2017, 09:35

You understand though that I can't just increase the amount of parallel domains just for one chip, otherwise the results are biased right ?

Is it ok for you if I launch concurrently 2 of the same motorbike case on 8 cores on Intel, and 3 on AMD ?

flotus1 · October 25, 2017, 09:45

Biased in which sense? Higher communication overhead due to a larger number of smaller domains? That is exactly why I always prefer a smaller number of faster cores over a larger number of slower cores.
Running several cases concurrently, the results will also be "biased" due to a lack of total memory bandwidth. Plus you need 50% more memory in total if you want to run 50% more cases simultaneously. Which increases the hardware cost.
When I need a result, I am interested in how fast my computer can provide it. Avoiding biases caused by parallel efficiencies <100% is usually the least of my worries and sounds more like cherry-picking to me.

lac · October 26, 2017, 11:52

I'm also interested in some results for these chips with some specific settings:
1. Hyperthreading turned off.
2. All cores are used on both cpus, but for only one job/CPU.
3. Run the parallel threads with affinity set (mpirun -np (number of cores) -bind-to hwthread)

As I have read it on this forum many times, and experienced it myself too, hyperthreading is most of the time useless for CFD.
I think that all cores should be used if possible. Off course it will be biased in some way, but you won't buy hardware with 12 cores to have 4 idling.
The last thing, affinity will help the AMD CPU most likely, as due to the architecture it acts like as multiple CPUs (considering the higher latency communication between the different CCX-es ).
Also, I don't know, if the different available instruction sets (AVX2 vs AVX512) have influence on the results, but it's possible that they do.

bennn · October 26, 2017, 12:57

Hi all, latest tests :

motorBike on all CPUs :
AMD : 113s
Intel : 135s

and now that is counter-intuitive for me, but using --bind-to hwthread actually makes computation time twice as long for AMD and 1.5 for Intel. Using --bind-to none solves the issue, and is the way to get for several single-threaded jobs.

RobertB · October 27, 2017, 07:25

Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?

If it is half as fast it looks like you might of locked to both the physical and hyperthreaded core and left half the cores unused.

Iirc (and I may not) you need to lock to every other core 0,2.

We always found core locking worked better on the Xeons, admittedly dual processor systems where a thread being pushed to the other core would cause a major loss in cache efficiency.

JBeilke · October 30, 2017, 09:17

Hi Benoit,

we ran the Motorbike case on a Xeon E5-1650 v3 (6 core processor) with hyperthreading turned off on 6 cores and got:

ExecutionTime = 167.03 s ClockTime = 169 s

How does this compare to your machines, with HT disabled?

Thanks
Jörn

lac · October 30, 2017, 11:44

Quote:

Originally Posted by RobertB

Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?

You can try to run it with -bind-to core if HT was turned on. It would explain why you had this slow down.
On my WS the results (Clocktime, Motorbike case, OFv5):
73s (with -bind-to hwthread)
110s (withouth it)
The machine is:
Dual Xeon E5-2673 v3 (all-core turbo 2.7 GHz, 12core/cpu)
8x8GB single rank dimms
HT off

bennn · November 2, 2017, 04:28

Ok so the results with HT off is exactly the same. With HT on running with 8 or 16 cores for intel chip, and 12 or 24 cores for AMD chip, all give the same results as well.

No improvement with any bind-to setting for now.

Testing multiple single CPU jobs in the next few days.

Simbelmynë · November 24, 2017, 11:51

Quote:

Originally Posted by lac

You can try to run it with -bind-to core if HT was turned on. It would explain why you had this slow down.
On my WS the results (Clocktime, Motorbike case, OFv5):
73s (with -bind-to hwthread)
110s (withouth it)
The machine is:
Dual Xeon E5-2673 v3 (all-core turbo 2.7 GHz, 12core/cpu)
8x8GB single rank dimms
HT off

Just curious. When you make comparisons, using different decomposition of the motorbike case, how do you know that you are decomposing the domain similarly? Or is this just indication of the performance of -bind-to hwthread?

lac · November 27, 2017, 08:33

I have used the same, default hierarchical decomposition (with n = (6 4 1)) with the same number of domains. So yes, it show the 'performance' of process binding.

Simbelmynë · November 27, 2017, 09:25

So do you time the simpleFoam execution or is it everything in the Allrun script file?

Using 14 threads on a 7940X (HT enabled), with decomposition (7-2-1), I have done some benchmarks.

Assuming you time the simpleFoam only then:

Code:

$ time mpirun -np 14 -bind-to none simpleFoam -parallel

Gives a real time of 117s.

Code:

$ time mpirun -np 14 -bind-to hwthread simpleFoam -parallel

Yields a real time of 150s.

A simple

Code:

$ time ./Allrun

Results in 157s of real time. (this is without -bind-to hwthread)

lac · November 27, 2017, 09:36

If you use -bind-to-hwthread with HT turned on, I guess processes will be bind to the 'real' and 'HT' cores as well. So it may be better to use bind to cores. I only timed the simpleFoam execution btw.

The_Sle · January 23, 2018, 23:07

Hi and thanks for this and other similar conversations, buying kit can be a pain without some information beforehand, and this forum eases that pain quite significantly

I'd like to add the overclocking capabilities of Skylake-X to this conversation. I recently purchased a 7820X and am running OpenFOAM with it, quite succesfully. My chip (and pretty much all of them) will run 4,5 GHz on all cores on air cooling with ease. This is of course true (with some limitations) on the i9 chips as well, and the results improve beyond their AMD counterparts.

With 32 GB of 3200 MHz memory, I can run the simpleFoam part of motorBike-tutorial in 121 seconds on 8 threads, which in my mind makes the Skylake look better value than Threadripper for OF use at least, when considering the disparity in motherboard and cooling costs.

Cheers

JBeilke · January 24, 2018, 02:01

Thanks for sharing the results. We usually used 6 cores for this benchmark. So it is easier to compare the results.

It would be interesting to see some results from the Epyc for this benchmark.

Simbelmynë · January 24, 2018, 04:37

Thank you for sharing the OC results. Was it with Allrun or with just the solver?

The_Sle · January 24, 2018, 15:57

6 cores run in 134 seconds.

Both results are for just the solver, with

Code:

time mpirun -bind-to none -np 6 simpleFoam -parallel

October 25, 2017, 06:22	AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X	#1
bennn Member benoit paillard Join Date: Mar 2010 Posts: 96 Rep Power: 16	Hi all, After all the talks about these two new core families, I had the opportunity to build two new stations, one with each. AMD Ryzen Threadripper 1920X, 3.5 GHz, 12 cores, 24 threads, 658.25€ in France http://www.amd.com/fr/products/cpu/a...adripper-1920x Intel Core i7 7820X, 3.6 GHz, 8 cores, 16 threads, 541.58€ in France https://www.intel.fr/content/www/fr/.../i7-7820x.html Motherboard for AMD is 38 euros more expensive, the cooling is 30 euros more expensive, and the power supply is bigger so 15 euros more expensive. So let's assume the overall cost is 741.25 for AMD Both cores were tried hyperthreaded. They have the exact same memory fitted : Corsair Mémoire PC Vengeance LPX - DDR4 - Kit 32Go (4x 8 Go) - 3200 MHz - CL16 - The memory was more than enough for all cases tested. And the exact same drives. No overclocking was used. The results on OpenFOAM are Motorbike simpleFOAM (OF v5.0) on 6 cores AMD : ExecutionTime = 153.31 s ClockTime = 155 s Intel : ExecutionTime = 148.6 s ClockTime = 155 s DTCHull interDyMFOAM (OF v1706) on 8 cores AMD : ExecutionTime = 56577.9 s ClockTime = 56665 s Intel : ExecutionTime = 52854.7 s ClockTime = 52888 s If you compute the "euros * time /core" index you get : AMD : 3500244 Intel : 3580385 So it is very close, but AMD is still a good choice. I'd like to add that AMD temperature sensing was messy, with lm-sensors not reading it. But after managing to see the temperature during the runs, Intel reached 70deg C while AMD was only 50. flotus1 and BlnPhoenix like this. Last edited by bennn; October 25, 2017 at 09:28. Reason: Changed dimm and temperature info ; added platform cost

October 25, 2017, 06:41		#2
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	Thanks for sharing your results. However, I am not quite convinced by your metric. So far, the Intel chip (let alone the platform) costs less and is faster. I would be more interested in a comparison running with the maximum amount of physical cores available. Which exact memory are you using? Did both cases fit in the memory? BlnPhoenix likes this.

October 26, 2017, 11:52		#7
lac New Member Join Date: Apr 2016 Posts: 12 Rep Power: 10	I'm also interested in some results for these chips with some specific settings: 1. Hyperthreading turned off. 2. All cores are used on both cpus, but for only one job/CPU. 3. Run the parallel threads with affinity set (mpirun -np (number of cores) -bind-to hwthread) As I have read it on this forum many times, and experienced it myself too, hyperthreading is most of the time useless for CFD. I think that all cores should be used if possible. Off course it will be biased in some way, but you won't buy hardware with 12 cores to have 4 idling. The last thing, affinity will help the AMD CPU most likely, as due to the architecture it acts like as multiple CPUs (considering the higher latency communication between the different CCX-es ). Also, I don't know, if the different available instruction sets (AVX2 vs AVX512) have influence on the results, but it's possible that they do. ashokac7 likes this.

October 26, 2017, 12:57		#8
bennn Member benoit paillard Join Date: Mar 2010 Posts: 96 Rep Power: 16	Hi all, latest tests : motorBike on all CPUs : AMD : 113s Intel : 135s and now that is counter-intuitive for me, but using --bind-to hwthread actually makes computation time twice as long for AMD and 1.5 for Intel. Using --bind-to none solves the issue, and is the way to get for several single-threaded jobs. elvis, flotus1, BlnPhoenix and 2 others like this.

October 27, 2017, 07:25		#9
RobertB Senior Member Robert Join Date: Jun 2010 Posts: 117 Rep Power: 17	Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores? If it is half as fast it looks like you might of locked to both the physical and hyperthreaded core and left half the cores unused. Iirc (and I may not) you need to lock to every other core 0,2. We always found core locking worked better on the Xeons, admittedly dual processor systems where a thread being pushed to the other core would cause a major loss in cache efficiency. lac likes this.

October 25, 2017, 09:14		#3
bennn Member benoit paillard Join Date: Mar 2010 Posts: 96 Rep Power: 16	Well my understanding is that, thinking in not hyperthreaded logic, AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient. Ho and by the way the motherboard is 38 euros more expensive for AMD now. I should add that indeed. I've updated my initial post with answers re dimms I'm open to any feedback or test that you think make sense.

October 25, 2017, 09:35		#5
bennn Member benoit paillard Join Date: Mar 2010 Posts: 96 Rep Power: 16	You understand though that I can't just increase the amount of parallel domains just for one chip, otherwise the results are biased right ? Is it ok for you if I launch concurrently 2 of the same motorbike case on 8 cores on Intel, and 3 on AMD ?

October 25, 2017, 09:45		#6
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,427 Rep Power: 49	Biased in which sense? Higher communication overhead due to a larger number of smaller domains? That is exactly why I always prefer a smaller number of faster cores over a larger number of slower cores. Running several cases concurrently, the results will also be "biased" due to a lack of total memory bandwidth. Plus you need 50% more memory in total if you want to run 50% more cases simultaneously. Which increases the hardware cost. When I need a result, I am interested in how fast my computer can provide it. Avoiding biases caused by parallel efficiencies <100% is usually the least of my worries and sounds more like cherry-picking to me.

October 30, 2017, 09:17		#10
JBeilke Senior Member Joern Beilke Join Date: Mar 2009 Location: Dresden Posts: 539 Rep Power: 20	Hi Benoit, we ran the Motorbike case on a Xeon E5-1650 v3 (6 core processor) with hyperthreading turned off on 6 cores and got: ExecutionTime = 167.03 s ClockTime = 169 s How does this compare to your machines, with HT disabled? Thanks Jörn lac and AhmadZ like this.

November 2, 2017, 04:28		#12
bennn Member benoit paillard Join Date: Mar 2010 Posts: 96 Rep Power: 16	Ok so the results with HT off is exactly the same. With HT on running with 8 or 16 cores for intel chip, and 12 or 24 cores for AMD chip, all give the same results as well. No improvement with any bind-to setting for now. Testing multiple single CPU jobs in the next few days.

November 27, 2017, 08:33		#14
lac New Member Join Date: Apr 2016 Posts: 12 Rep Power: 10	I have used the same, default hierarchical decomposition (with n = (6 4 1)) with the same number of domains. So yes, it show the 'performance' of process binding.

November 27, 2017, 09:25		#15
Simbelmynë Senior Member Join Date: May 2012 Posts: 552 Rep Power: 16	So do you time the simpleFoam execution or is it everything in the Allrun script file? Using 14 threads on a 7940X (HT enabled), with decomposition (7-2-1), I have done some benchmarks. Assuming you time the simpleFoam only then: Code: $ time mpirun -np 14 -bind-to none simpleFoam -parallel Gives a real time of 117s. Code: $ time mpirun -np 14 -bind-to hwthread simpleFoam -parallel Yields a real time of 150s. A simple Code: $ time ./Allrun Results in 157s of real time. (this is without -bind-to hwthread)

November 27, 2017, 09:36		#16
lac New Member Join Date: Apr 2016 Posts: 12 Rep Power: 10	If you use -bind-to-hwthread with HT turned on, I guess processes will be bind to the 'real' and 'HT' cores as well. So it may be better to use bind to cores. I only timed the simpleFoam execution btw.

January 23, 2018, 23:07		#17
The_Sle New Member Join Date: Jan 2018 Posts: 7 Rep Power: 8	Hi and thanks for this and other similar conversations, buying kit can be a pain without some information beforehand, and this forum eases that pain quite significantly I'd like to add the overclocking capabilities of Skylake-X to this conversation. I recently purchased a 7820X and am running OpenFOAM with it, quite succesfully. My chip (and pretty much all of them) will run 4,5 GHz on all cores on air cooling with ease. This is of course true (with some limitations) on the i9 chips as well, and the results improve beyond their AMD counterparts. With 32 GB of 3200 MHz memory, I can run the simpleFoam part of motorBike-tutorial in 121 seconds on 8 threads, which in my mind makes the Skylake look better value than Threadripper for OF use at least, when considering the disparity in motherboard and cooling costs. Cheers

January 24, 2018, 02:01		#18
JBeilke Senior Member Joern Beilke Join Date: Mar 2009 Location: Dresden Posts: 539 Rep Power: 20	Thanks for sharing the results. We usually used 6 cores for this benchmark. So it is easier to compare the results. It would be interesting to see some results from the Epyc for this benchmark.

January 24, 2018, 04:37		#19
Simbelmynë Senior Member Join Date: May 2012 Posts: 552 Rep Power: 16	Thank you for sharing the OC results. Was it with Allrun or with just the solver?

January 24, 2018, 15:57		#20
The_Sle New Member Join Date: Jan 2018 Posts: 7 Rep Power: 8	6 cores run in 134 seconds. Both results are for just the solver, with Code: time mpirun -bind-to none -np 6 simpleFoam -parallel

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
solving a conduction problem in FLUENT using UDF	Avin2407	Fluent UDF and Scheme Programming	1	March 13, 2015 03:02
Superlinear speedup in OpenFOAM 13	msrinath80	OpenFOAM Running, Solving & CFD	18	March 3, 2015 06:36
[OpenFOAM] Color display problem to view OpenFOAM results.	Sargam05	ParaView	16	May 11, 2013 01:10
CFX11 + Fortran compiler ?	Mohan	CFX	20	March 30, 2011 19:56
AMD X2 & INTEL core 2 are compatible for parallel?	nikolas	FLUENT	0	October 5, 2006 07:49