|
[Sponsors] |
Full Population of ram memory slots vs half population, dual channel |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
January 6, 2019, 13:45 |
Full Population of ram memory slots vs half population, dual channel
|
#1 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
Hi guys, I have an i7 7700 processor (quad-core, 2 memory channels).
I performed two tests: 1) 3 memory modules (2 x 16GB) + 1 x 8GB, all 2400mhz. I got the following results: # cores Wall time (s): ------------------------ 1 751.25 2 567.25 3 539.83 4 529.66 5 536.26 6 538.96 7 548.8 8 549.27 2) 2 memory modules (2x 16GB) # cores Wall time (s): ------------------------ 1 637.4 2 385.94 3 335.91 4 318.02 5 331.21 6 324.82 7 324.22 8 322.47 The difference in the times is certainly due to the fact that with 3 memories the dual channel was not exploited. However I have noticed that there is not a significant reduction of the calculation time using 3 and 4 cores. If I bought another 2 banks of 16gb 2400mhz ram, in order to completely populate the slots, could I reduce the calculation time with 3 and 4 cores ?? |
|
January 6, 2019, 14:49 |
|
#2 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
|
||
January 6, 2019, 16:02 |
|
#3 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
in which situations will populating all the memory slots allow to gain more performance?
|
|
January 6, 2019, 16:29 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
As long as you have more DIMM slots than memory channels: never.
|
|
January 6, 2019, 16:51 |
|
#5 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
thanks you for the answers.
In my motherboard i have 4 DIMM slots, but only 2 occupied. Memory channels are 2. sorry i don't understand this... please could you explain in simple words this concept please? why using 4 memory stick would be worst than using 2? |
|
January 7, 2019, 04:40 |
|
#6 |
Senior Member
Join Date: May 2012
Posts: 551
Rep Power: 16 |
First, your available bandwidth is calculated from the number of channels you have, the width of the bus and the speed (frequency) of the memory.
If you have 4 memory slots then you have 2 pairs that will work in dual channel mode. Here it is important to pair the memory correctly (usually trivial since most motherboards have color-coded slots, but it can be good to check which pair is the dominant one in the manual). With 4 memory slots populated there will be more strain on the memory controller. This means that, in order to keep the stability, the voltage to the memory controller might need to increase, or that the timings on the memory have to increase or that the frequency of the memory needs to decrease. Some motherboards actually state the maximum stable frequency for both 2 slots and 4 slots if they differ. To put it in another way - at best you will get the same performance that you have with 2 memory slots populated (with some rare exceptions). |
|
January 7, 2019, 17:07 |
|
#7 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
You can see this in effect with the memory speeds that are officially supported by AMD Ryzen CPUs: https://www.pugetsystems.com/blog/20...M-Speeds-1175/
The more Slots are populated (and the more ranks the memory modules have) the lower the maximum supported memory frequency. If you can push the memory speed to the same frequency with all slots populated, you end up with about the same performance. If you are lucky you could get slightly better results due to rank interleaving, but more likely overall performance will be worse because the motherboard applies slower secondary and tertiary timings to keep the high frequency stable with many ranks. Overclocking results will definitely be worse with 4 slots populated on a dual-channel platform. This is in general less of an issue with current Intel CPUs because the memory controllers used here are more mature tech. There is a reason why memory frequency world records are achieved using only one DIMM: it puts less strain on the memory controller. Memory slots vs. memory channels: A memory channel can only sustain a certain bandwidth based on the memory speed and the bus width. This bandwidth is saturated by using a single DIMM in one slot that belongs to it. Adding a second DIMM on the same channel does not increase the maximum bandwidth because the channel is the bottleneck here, not the individual DIMMs. |
|
January 8, 2019, 09:31 |
|
#8 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
Thanks you very much Simbelmynė and flotus1 for your really exhaustive answers, i appreciated them very much!
I'll decide what to do to increase performances! thanks you very much. |
|
January 23, 2019, 15:48 |
|
#9 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
I just wanted to say that this looks normal to me and agrees with what I saw in my testing. As you add more cores, you start running out of memory bandwidth, and then memory bandwidth becomes the bottleneck.
I saw perfectly linear scaling up to 3 cores using my 4930K with quad channel memory, and then sub linear @ 4 cores, getting worse after that. You already have non linear scaling @ 2 cores, which would agree with my testing. If you wanted better performance you would have to use higher frequency RAM, but that can be a pain to go past the "officially supported" frequencies. MY testing: (I tested many computers, but my personal cluster is built from) 4930K @ 4.2 GHz 2133 MHz Quad channel RAM double precision ON Hyper threading OFF ANSYS CFX My results "benchmarks" are posted on this site somewhere in the hardware forum. |
|
January 24, 2019, 08:08 |
|
#10 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
thank you very much evcelica for the post and the suggestion.
As an alternative idea, I was thinking of building a 2-node cluster (with a PC identical to the one I have) and put two ram in that too. In this way I could try to do the test on 4 cores (2 on one pc and 2 on the other pc) and see if the performances are better than 4 cores on a single pc. (connection done with gigabit ethernet) In your opinion could this be an alternative solution (except for the cost problem) ? |
|
January 24, 2019, 09:56 |
|
#11 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
Yes, with 2 nodes you should get almost perfectly linear scaling, even with just a gigabit connection.
*(As long as the problem is not really small, makes distributing more inefficient) So if your 2 core benchmark was 386s, you should expect 193s using 2+2. *(If this is the solver time only. I never count the pre and post processing of a simulation, as that can skew your results.) Wall time may include the solver setup etc? Try to only compare the actual calculation times. With my testing using gigabit connection and nproc = 4 vs 4+4, I saw 99.7% speedup. This may be obvious, but it is of course best to have identical hardware between the nodes. No reason to make one faster because you will only be as fast as your slowest node. (I see you stated "identical", so you know this, I'm just reiterating.) Of course things that do not effect calculation speed could be different, you could get a very cheap, or no video card at all in one, different hard drives, PSU, etc. Last edited by evcelica; February 5, 2019 at 12:03. |
|
January 26, 2019, 08:09 |
|
#12 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
Ok thanks you very much evcelica for your reply.
I'll try and see what will happen! |
|
February 21, 2019, 15:50 |
|
#13 |
Member
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8 |
Hi evcelica,
i have a question about the very last post you wrote in this thread. If i made a 2-node cluster, with two PC's that have extactly the same CPU and Ram modules, but different motherboard, could this affect/limitate the computation time ? In particolar Dell Precision t3620 + Hp Z240 (both with intel i7 7700 and 2400mhz ram frequency) Thanks you in advance. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Insufficient RAM for Intel XEON E5-2699 v4 | alexp88 | Hardware | 4 | January 2, 2017 05:18 |
Dual Channel memory vs Quad (4930k) | natty_king | Hardware | 1 | April 22, 2014 09:25 |
Lenovo C30 memory configuration and discussions with Lenovo | matthewe | Hardware | 3 | October 17, 2013 11:23 |