|
[Sponsors] |
The next best CFD processor might be a laptop CPU |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
August 19, 2013, 15:47 |
The next best CFD processor might be a laptop CPU
|
#1 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
Intel is releasing a chip that has 128mb of RAM on the CPU, ostensibly to give its integrated graphics some fast memory. Fortunately for us, this 128mb isn't actually graphics memory. It's just a gigantic L4 cache that can be used for anything. With this much cache and an efficiently stored mesh, you could probably eliminate 80%+ of traffic through the system memory bus.
Unfortunately for us, Intel is only releasing this as a laptop processor. Worse yet, they may be limiting this CPU to "Ultrabooks," so even if some manufacturer wanted to slap this thing on a ITX/mATX motherboard with PCI-e, Intel wouldn't let them. This is probably some scheme to avoid cannibalizing their high-margin Xeon business. So there is a 47W quad core processor out there that trounces everything other than the six-core 130W CPUs, but Intel won't sell it to you. Tech Report did the only CFD benchmark I can find. It's at the bottom of the page here. The laptop chip is the i7-4950HQ. Notice it barely loses to the $1000 six core i7. Last edited by kyle; August 20, 2013 at 11:40. |
|
August 21, 2013, 23:07 |
|
#2 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,188
Rep Power: 23 |
Thats interesting, the L4 cache seems like it helps a great amount with memory bandwidth intensive applications. Check this ArkIntel site where it lists the maximum memory bandwidth at 76.8 GB/s, even though its only two channel. That is exactly 1.5 times the SB-E XEONs 51.2 GB/s with four memory channels. That must have something to do with the L4 cache since the math doesn't add up anymore. I wonder if Haswell-E will feature a L4 cache? doubtful?
http://ark.intel.com/products/76085/...up-to-3_60-GHz |
|
August 22, 2013, 00:13 |
|
#3 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
If I know Intel, outside of laptops this feature will be exclusive to multi-thousand dollar Xeon CPUs.
|
|
August 22, 2013, 07:01 |
|
#4 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Perhaps Apple will jump on this for their next generation of MacMinis.
Because no matter which hardware they launch this CPU on, I'm guessing it'll be the next PS3-cluster-like craze we've seen some years ago. Or even low budget consoles... I wonder if Valve will pick these up for their own Steam based console But still, 128MB of RAM should require some special attention on the CFD software side, for properly scheduling memory accesses. Last edited by wyldckat; August 24, 2013 at 15:30. |
|
August 24, 2013, 15:21 |
|
#5 |
Member
Join Date: Jul 2011
Posts: 59
Rep Power: 15 |
http://www.anandtech.com/show/6993/i...950hq-tested/3
The first page has some nice latency and bandwidth numbers. The large data range bandwidth falls in line with the other data. It is only a small data size range( larger than L3 cache ~ smaller than 128Mb) that this shows improvement. Would be interesting to see some real number crunching data though. Most of the article is focus on gaming and that seems to be why they introduced this extra cache anyways. This is an OEM only part though and it seems like it will be awhile before they try introducing this on desktop/server parts if they decide to do that at all. It also seems like DDR4 won't show up until after broadwell comes out which is the die shrink of haswell. What I would like to see most is an increase in memory channels. One channel per core would be nice on the larger Xeons. The new 12 core Xeon E5's only have four channels. While it probably isn't necessary for most of their market the HPC market needs more bandwidth. A larger cache isn't going to fix this problem when your running program which Gb footprints |
|
August 25, 2013, 15:19 |
|
#6 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
Just because the cache isn't multiple Gb doesn't mean it isn't a huge benefit for cases that are that large. Most of the data travelling over the memory bus is being accessed multiple times, and a cache this large could almost eliminate redundant memory accesses. The first time you look up some variable for a cell it would be limited to the memory speed, but as long as your mesh is well organized all subsequent lookups would be at the cache speed.
|
|
December 17, 2013, 06:02 |
|
#7 |
Senior Member
Mr CFD
Join Date: Jun 2012
Location: Britain
Posts: 361
Rep Power: 15 |
I'm sorry to bump this thread. This CPU interests me.
I want a new laptop for medium - light CFD work (using Ansys CFX). I don't care if it's an ultrabook or what-ever book. Would there be any issues running a laptop for 3 to 5 days continuously with this chip? Thank you |
|
December 28, 2013, 17:33 |
|
#8 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quote:
|
||
January 16, 2014, 14:40 |
|
#9 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
There actually is a Steambox coming out with a similar chip, the i7 4770R. There are conflicting reports on whether this has 64mb or 128mb of cache.
Actually, it just showed up on Newegg... http://www.newegg.com/Product/Produc...56164012&nm_mc So who is going to buy one and benchmark it? Unfortunately no PCI-e slot, so no Infiniband. |
|
January 16, 2014, 15:51 |
|
#10 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Well... if USB3 and SATA 6Gbps or iSATA are available, I'd say that some sort of hack could be done, to at least handle MPI in a different way...
Wait, wait... mini-PCIe? Quote:
|
||
January 17, 2014, 06:41 |
|
#11 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
Yeah I saw that. Mini PCIe is just 1x, or 4 gigabit max. Most Infiniband cards use 8x. You might be able to use an adapter and get it to work at a reduced speed, which would still be better than ethernet.
And I don't think anyone has ever got RDMA working over USB or SATA. |
|
January 17, 2014, 17:01 |
|
#12 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quote:
A quick search at eBay indicates that PCIe 4x IB cards are available... older than that, it's PCI-X for servers, which are special editions of the old PCI protocol (before PCIe was invented). It would have to be an old school MPI system, namely a virtualized file based system. |
||
January 29, 2014, 04:44 |
|
#13 | ||
New Member
CFD
Join Date: Jan 2013
Posts: 23
Rep Power: 13 |
Quote:
Quote:
|
|||
April 2, 2014, 02:57 |
|
#14 |
New Member
Join Date: Oct 2013
Posts: 2
Rep Power: 0 |
Anyone crunched some numbers on the 4770R yet??
|
|
April 2, 2014, 13:25 |
|
#15 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
I actually bought one but I've been so swamped I haven't had time to properly test it. So far it seems like maybe a 10%-20% speedup over a regular 4770.
|
|
April 3, 2014, 06:24 |
|
#16 |
New Member
Join Date: Oct 2013
Posts: 2
Rep Power: 0 |
||
April 3, 2014, 11:14 |
|
#17 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
I actually got 1866mhz.
|
|
July 18, 2014, 00:56 |
|
#18 |
Member
Shawn
Join Date: Oct 2011
Posts: 56
Rep Power: 15 |
Any word yet on the speedup for the 4770R? For what type of simulation, and how big is the mesh? I suppose any other relevant information would be good too (memory speed, latency, etc.)
Thanks |
|
July 18, 2014, 12:08 |
|
#19 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
I didn't get around to doing any rigorous testing. I did a quick comparison to a i7 2600k with the OpenFOAM Motorbike example at various cell counts, and it seemed to be around 15%-20% faster than that. That isn't very promising seeing as the 2600k is two generations older and at about the same frequency.
It may be necessary to toy around with the cell ordering to really make use of this L4 cache. This could be something as simple as just decomposing the mesh into chunks that will fit on the cache and then merging them back together. Even if there is some way to optimize it, almost certainly this doesn't make sense as it is today. You can only get this chip in an iMac or the Gigabyte barebone system, both of which are a lot more expensive than a comparable system with an i7 4770k. Plus you're limited to two channels of 1866mhz DDR3 laptop memory. We are weeks from having access to systems with 4 channel DDR4 at 3000mhz+, which would blow any current system with Iris Pro out of the water. |
|
July 18, 2014, 12:15 |
|
#20 |
Member
Shawn
Join Date: Oct 2011
Posts: 56
Rep Power: 15 |
Hi Kyle,
Thanks for the info. 10%-20% over a 2600k doesn't seem like a huge bump. I suppose if the job isn't sufficiently large you could get a decent performance improvement (like the E3D case run at TR mentioned above) but once the jobs get bigger there's no substitute for more cores and bandwidth. Thanks |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
OpenFOAM 13 AMD quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 1 | November 11, 2007 00:23 |
CFD on a laptop ? | Vlad | Main CFD Forum | 5 | September 13, 2005 08:48 |
CFD JOBS and Expected Salary.... | Noel Harrison | Main CFD Forum | 11 | November 22, 2000 08:15 |
PC vs. Workstation | Tim Franke | Main CFD Forum | 5 | September 29, 1999 16:01 |
public CFD Code development | Heinz Wilkening | Main CFD Forum | 38 | March 5, 1999 12:44 |