CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Full Population of ram memory slots vs half population, dual channel

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 6, 2019, 13:45
Default Full Population of ram memory slots vs half population, dual channel
  #1
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
Hi guys, I have an i7 7700 processor (quad-core, 2 memory channels).
I performed two tests:
1) 3 memory modules (2 x 16GB) + 1 x 8GB, all 2400mhz.
I got the following results:
# cores Wall time (s):
------------------------
1 751.25
2 567.25
3 539.83
4 529.66
5 536.26
6 538.96
7 548.8
8 549.27

2) 2 memory modules (2x 16GB)

# cores Wall time (s):
------------------------
1 637.4
2 385.94
3 335.91
4 318.02
5 331.21
6 324.82
7 324.22
8 322.47

The difference in the times is certainly due to the fact that with 3 memories the dual channel was not exploited.

However I have noticed that there is not a significant reduction of the calculation time using 3 and 4 cores.

If I bought another 2 banks of 16gb 2400mhz ram, in order to completely populate the slots, could I reduce the calculation time with 3 and 4 cores ??
Astan is offline   Reply With Quote

Old   January 6, 2019, 14:49
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
If I bought another 2 banks of 16gb 2400mhz ram, in order to completely populate the slots, could I reduce the calculation time with 3 and 4 cores ??
No. You already have the best memory performance for your CPU with 2 slots populated.
flotus1 is offline   Reply With Quote

Old   January 6, 2019, 16:02
Default
  #3
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
in which situations will populating all the memory slots allow to gain more performance?
Astan is offline   Reply With Quote

Old   January 6, 2019, 16:29
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
As long as you have more DIMM slots than memory channels: never.
flotus1 is offline   Reply With Quote

Old   January 6, 2019, 16:51
Default
  #5
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
thanks you for the answers.

In my motherboard i have 4 DIMM slots, but only 2 occupied.

Memory channels are 2.

sorry i don't understand this... please could you explain in simple words this concept please? why using 4 memory stick would be worst than using 2?
Astan is offline   Reply With Quote

Old   January 7, 2019, 04:40
Default
  #6
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 552
Rep Power: 16
Simbelmynė is on a distinguished road
First, your available bandwidth is calculated from the number of channels you have, the width of the bus and the speed (frequency) of the memory.



If you have 4 memory slots then you have 2 pairs that will work in dual channel mode. Here it is important to pair the memory correctly (usually trivial since most motherboards have color-coded slots, but it can be good to check which pair is the dominant one in the manual).


With 4 memory slots populated there will be more strain on the memory controller. This means that, in order to keep the stability, the voltage to the memory controller might need to increase, or that the timings on the memory have to increase or that the frequency of the memory needs to decrease.


Some motherboards actually state the maximum stable frequency for both 2 slots and 4 slots if they differ.


To put it in another way - at best you will get the same performance that you have with 2 memory slots populated (with some rare exceptions).
Simbelmynė is offline   Reply With Quote

Old   January 7, 2019, 17:07
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You can see this in effect with the memory speeds that are officially supported by AMD Ryzen CPUs: https://www.pugetsystems.com/blog/20...M-Speeds-1175/
The more Slots are populated (and the more ranks the memory modules have) the lower the maximum supported memory frequency.

If you can push the memory speed to the same frequency with all slots populated, you end up with about the same performance. If you are lucky you could get slightly better results due to rank interleaving, but more likely overall performance will be worse because the motherboard applies slower secondary and tertiary timings to keep the high frequency stable with many ranks. Overclocking results will definitely be worse with 4 slots populated on a dual-channel platform.
This is in general less of an issue with current Intel CPUs because the memory controllers used here are more mature tech.
There is a reason why memory frequency world records are achieved using only one DIMM: it puts less strain on the memory controller.

Memory slots vs. memory channels:
A memory channel can only sustain a certain bandwidth based on the memory speed and the bus width. This bandwidth is saturated by using a single DIMM in one slot that belongs to it. Adding a second DIMM on the same channel does not increase the maximum bandwidth because the channel is the bottleneck here, not the individual DIMMs.
flotus1 is offline   Reply With Quote

Old   January 8, 2019, 09:31
Default
  #8
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
Thanks you very much Simbelmynė and flotus1 for your really exhaustive answers, i appreciated them very much!

I'll decide what to do to increase performances!

thanks you very much.
Astan is offline   Reply With Quote

Old   January 23, 2019, 15:48
Default
  #9
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23
evcelica is on a distinguished road
I just wanted to say that this looks normal to me and agrees with what I saw in my testing. As you add more cores, you start running out of memory bandwidth, and then memory bandwidth becomes the bottleneck.
I saw perfectly linear scaling up to 3 cores using my 4930K with quad channel memory, and then sub linear @ 4 cores, getting worse after that. You already have non linear scaling @ 2 cores, which would agree with my testing. If you wanted better performance you would have to use higher frequency RAM, but that can be a pain to go past the "officially supported" frequencies.



MY testing: (I tested many computers, but my personal cluster is built from)


4930K @ 4.2 GHz
2133 MHz Quad channel RAM
double precision ON
Hyper threading OFF
ANSYS CFX
My results "benchmarks" are posted on this site somewhere in the hardware forum.
evcelica is offline   Reply With Quote

Old   January 24, 2019, 08:08
Default
  #10
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
thank you very much evcelica for the post and the suggestion.
As an alternative idea, I was thinking of building a 2-node cluster (with a PC identical to the one I have) and put two ram in that too.

In this way I could try to do the test on 4 cores (2 on one pc and 2 on the other pc) and see if the performances are better than 4 cores on a single pc.
(connection done with gigabit ethernet)

In your opinion could this be an alternative solution (except for the cost problem) ?
Astan is offline   Reply With Quote

Old   January 24, 2019, 09:56
Default
  #11
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23
evcelica is on a distinguished road
Yes, with 2 nodes you should get almost perfectly linear scaling, even with just a gigabit connection.

*(As long as the problem is not really small, makes distributing more inefficient)


So if your 2 core benchmark was 386s, you should expect 193s using 2+2.

*(If this is the solver time only. I never count the pre and post processing of a simulation, as that can skew your results.) Wall time may include the solver setup etc? Try to only compare the actual calculation times.
With my testing using gigabit connection and nproc = 4 vs 4+4, I saw 99.7% speedup.


This may be obvious, but it is of course best to have identical hardware between the nodes. No reason to make one faster because you will only be as fast as your slowest node. (I see you stated "identical", so you know this, I'm just reiterating.) Of course things that do not effect calculation speed could be different, you could get a very cheap, or no video card at all in one, different hard drives, PSU, etc.

Last edited by evcelica; February 5, 2019 at 12:03.
evcelica is offline   Reply With Quote

Old   January 26, 2019, 08:09
Default
  #12
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
Ok thanks you very much evcelica for your reply.

I'll try and see what will happen!
Astan is offline   Reply With Quote

Old   February 21, 2019, 15:50
Default
  #13
Member
 
Andrew
Join Date: Mar 2018
Posts: 82
Rep Power: 8
Astan is on a distinguished road
Hi evcelica,

i have a question about the very last post you wrote in this thread.

If i made a 2-node cluster, with two PC's that have extactly the same CPU and Ram modules, but different motherboard, could this affect/limitate the computation time ?

In particolar Dell Precision t3620 + Hp Z240 (both with intel i7 7700 and 2400mhz ram frequency)

Thanks you in advance.
Astan is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Insufficient RAM for Intel XEON E5-2699 v4 alexp88 Hardware 4 January 2, 2017 05:18
Dual Channel memory vs Quad (4930k) natty_king Hardware 1 April 22, 2014 09:25
Lenovo C30 memory configuration and discussions with Lenovo matthewe Hardware 3 October 17, 2013 11:23


All times are GMT -4. The time now is 15:41.