|
[Sponsors] |
November 6, 2011, 04:54 |
4 cpu motherboard for CFD
|
#1 |
New Member
Join Date: Oct 2011
Posts: 5
Rep Power: 15 |
Have anyone used 4 cpu motherboars for cfd? With 4 cpu motherboard and four opteron 6174 prosessors you can build compact 48 core machine. For example this motherboard:
http://www.supermicro.com/Aplus/moth...x0/H8QGi-F.cfm Which one would be faster setup? hardware 1: cluster of 2 computers 2 cpu motherboards and two 6174 opteron for each machine hardware 2: 1 computer 4 pcs 6174 opteron 4 cpu motherboard |
|
November 7, 2011, 16:06 |
|
#2 | |
Senior Member
Vieri Abolaffio
Join Date: Jul 2010
Location: Always on the move.
Posts: 308
Rep Power: 17 |
Quote:
|
||
November 7, 2011, 18:02 |
|
#3 |
Senior Member
Robert
Join Date: Jun 2010
Posts: 117
Rep Power: 17 |
Aren't the 2 processors per board and 4 processors per board a different part number - for opterons now they are the 2000 and 8000 series I believe? Typically the 4 processor per board models are significantly more expensive.
I would think that with a decent interconnect that the two board solution is probably faster and cheaper. You will probably get more memory bandwidth per core with the 2 board solution. |
|
November 8, 2011, 05:05 |
|
#4 |
Senior Member
Markus Rehm
Join Date: Mar 2009
Location: Erlangen (Germany)
Posts: 184
Rep Power: 17 |
Hi,
the 4 processor solution together with the 12 core Opterons (6100 series aka Magny Cours) and the soon available 6200 series (aka Interlagos) which should fit into the same board are really very popular at the moment in the HPC community because they offer a very good price performance ratio. The memory bandwidth is also quite nice. For a benchmark you might read here: http://www.anandtech.com/show/3894/s...clash-dellr815 If you can wait: the prices of Interlagos should be even more competitive but what first benchmarks for yet available desktop FX-series indicate is that you need some compiler tuning to get full performance: http://www.phoronix.com/scan.php?pag...ompilers&num=1 Regards, Markus. |
|
November 9, 2011, 10:47 |
|
#5 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
CFD performance with unstructured grids on AMD's multi-socket boards is extremely poor. This article from anandtech tries to investigate why. I am assuming that Interlagos won't fix this entirely.
The best price/performance for CFD available now is far and away Intel's desktop chips. Four i5 2400 machines, which you can build for as little as $300 each, would blow your two choices out of the water. With just four machines you can get away with just a gig-e network. Or, you could wait a week and get the new Intel Sandy Bridge E chips, which have six cores and an absolutely ridiculous amount of memory bandwidth. They machines would cost a little more than ones using the current Sandy Bridge chips, but the performance should be significantly more as well. It definitely would be way cheaper, and way faster, than buying server class hardware from AMD. |
|
November 10, 2011, 04:49 |
|
#6 |
Senior Member
Markus Rehm
Join Date: Mar 2009
Location: Erlangen (Germany)
Posts: 184
Rep Power: 17 |
I doubt that Euler3d results are representative for general CFD
performance. On this system http://www.cfd-online.com/Forums/ope...tml#post314891 the speedup was almost linear. Also Gigabit Ethernet interconnects are not a good choice if you want top performance. From my point of view you are better off with Intel chips at the moment if the licensing model of your CFD code is per core. If this doesn't matter Opterons are often the better alternative. But as we saw before this is not generally valid so best you run benchmarks of your code before buying. Regards, Markus. |
|
November 10, 2011, 11:25 |
|
#7 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
Gigabit ethernet is good enough for very small clusters. I had a four node cluster with gigabit ethernet that scaled from one to four nodes at 90% efficiency. Infiniband would take that up to what, 93%? For the money I could just buy another node and get ~20% speedup instead of ~3%.
AMD just is not competitive right now. With traditional CFD on unstructured grids, performance is dominated by memory bandwidth, memory latency and caching... all of which are areas that Intel has a significant advantage. Clockspeed doesn't really matter, I overclocked my machines from 3.4ghz to 4.0ghz and only saw a tiny speedup. Regardless of per-core licensing issues, if you have a fixed amount of money to spend then buying Intel systems will give you the fastest cluster. All of this only holds true for traditional CFD on unstructured meshes. If you are using structured meshes or a Lattice Boltzman code like Exa, then AMD likely DOES make sense. |
|
November 11, 2011, 16:46 |
|
#8 |
Senior Member
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22 |
Another point to consider is energy consumption. My private owned AMD CPU is slower than the Xeon in my workstation and needs more energy. This is no issue as long as it doesn't run for a long time, but when it's up an 24/7 and under full load, it makes a huge difference. In Germany, it makes a difference of 50 bucks on the electricity bill per node in just a year. But the AMD would need to run at leas 20% longer to get the same results.
It's a shame, as I don't like the total market control and pricing policy of Intel - but at least the moement, AMD can't compete with the power and efficiency of Intel CPU's. |
|
December 18, 2011, 13:10 |
|
#9 |
New Member
Ulrich Siller
Join Date: Dec 2011
Location: Germany
Posts: 2
Rep Power: 0 |
I recently had the chance to make a little benchmark between a two socket XeonX5675 (24 Cores, 3.06GHz) and the new AMD Opteron 6274 (32 Cores, 2.1GHz). I run the DLR turbomachinery solver TRACE on a multi-block mesh of a axial compressor stage. OS was openSuse 12.1 in both cases, use of openMPI for parallelization
The results at a glance machine numberJobs numberCores timesteps/minute (over all jobs) XeonX5675 3 4 30,57 XeonX5675 3 8 33,93 XeonX5675 4 6 34,09 Opt6274 4 4 26,79 Opt6274 4 8 37,57 The main conclusions (from my perspective) - Hyperthreading on Xeon is only effective in case of imperfect load balancing, at least for this number crunching intensive code. - The sharing of one FPU for two cores on the Opteron system is the better deal for CFD, the test with 4*8 cores has about 40% more speed than 4*4 cores (one FPU per process) - Opteron is the better deal, especially for a four socket system with infiniband interconnection, resulting in much lower hardware costs. |
|
April 11, 2012, 06:14 |
|
#10 | |
New Member
Join Date: Mar 2009
Posts: 5
Rep Power: 17 |
Dear Mr. Siller
just to make sure I understand your benchmark correctly: You run three/four distinct cases utilizing all cores available to the system. Could it be that if you use all cores for one job (and make sure that no processor switches happen, emptying the INT/CMD/FPU pipelines) the results may look different? (And yes, I agree, HT is not relevant for CFD). I am asking because I have to make the desicion Opteron 62XX vs E5-26YY and there are different aspects to consider. From the Benchmarks http://www.amd.com/de/products/serve...t-servers.aspx ROMS and WRFv3 are interesting for CFD applications, while http://investors.ansys.com/releaseDe...leaseID=662929 it seems to me that the 6174 processor can only win in certain rather artificial situations. If any you need to consider 6276 as a direct E5-26YY competitor. Best regards, George Skillas Quote:
|
||
April 16, 2012, 09:08 |
|
#11 |
New Member
Ulrich Siller
Join Date: Dec 2011
Location: Germany
Posts: 2
Rep Power: 0 |
Hi Mr. Skillas,
your are right: I started the same computation n times on the machine and measured the time to finish for a specific number of timesteps. While for the Interlagos and the Xeon without HT all runs finished quite at the same time, the OT on case had very different running times (up to 10%). My little benchmark is far away answering even the most important questions of the matrix beeing relevant for parallel computing. We had the following strategy to answer the question: - We have no core based licensing issue of our CFD solver - that simplifies a lot. - Comparing the hardware costs of an Xeon based 2-socket server and an Interlagos 4-socket server (both with IB interconnection) we came up with approx. half the hardware costs per core for the AMD system - the lower clock speed of the AMD is already included. Last week we received our HPC cluster from Delta Computer GmbH (Hamburg) and we are now looking forward to test again in-house . Best regards, Ulrich Siller |
|
April 16, 2012, 17:49 |
|
#12 |
Senior Member
Charles
Join Date: Apr 2009
Posts: 185
Rep Power: 18 |
Ulrich, it would be great if you could keep us informed about what you find. I am particularly interested in seeing how well your application scales on a node compared to how well it scales across nodes. There seems to be quite a lot of uncertainty about whether it is really better to run with many cores on a motherboard (call it pure shared memory), or if it is faster to have more nodes, but not so many cores per motherboard.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
stop when I run in parallel | Nolwenn | OpenFOAM | 36 | March 21, 2021 05:56 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
OpenFOAM 13 Intel quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 13 | February 5, 2008 06:26 |
OpenFOAM 13 AMD quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 1 | November 11, 2007 00:23 |
Dual Core CPU | hjasak | OpenFOAM Running, Solving & CFD | 5 | July 22, 2006 04:57 |