|
[Sponsors] |
May 31, 2007, 18:27 |
CFX11 scaling on quad core Clovertown cpus?
|
#1 |
Guest
Posts: n/a
|
CFX11 scales very well on Woodcrest dual core CPUs. As does Fluent.
However, Fluent scales very poorly from 2 to 4 cores within a single MCP / single socket Clovertown. Does anyone have scaling data for CFX11 on Clovertowns or Xeon 32xx chips? |
|
May 31, 2007, 21:16 |
Re: CFX11 scaling on quad core Clovertown cpus?
|
#2 |
Guest
Posts: n/a
|
I will run some comparos for you on my QX6700 ( non Xeon). Do you want me to play with the applications processor affinity, or just let the XP32 OS deal with it?
Off the top of my head a sim I did the other day took 1hr 47 mins on 4 cores and 2hrs 8min of 2 cores. I think it is a problem with a memory bottle neck as there is no on chip memory controler. I will get back to you with hard figures. Stu |
|
June 1, 2007, 06:19 |
Re: CFX11 scaling on quad core Clovertown cpus?
|
#3 |
Guest
Posts: n/a
|
Much obliged. Just what I needed.
"Do you want me to play with the applications processor affinity, or just let the XP32 OS deal with it?" Letting the OS deal with it would be fine. Also could you let me know what motherboard chipset and memory settings (FSB, memory timings, ...) you are using? Have you ever overclocked the FSB? As the FSB is theoretically the bottleneck keeping the two dual cores fighting for memory access, a mild FSB overclock could have significant benefits (e.g. upping the FSB 10-20%). If I could suggest a geometry: Simple inlet/outlet tube with 500k cells per partition e.g. 2E6 total. This would eliminate partitioning as a potential issue. The reason I am asking about this is that on July 22 quad core pricing is going to become very affordable suggesting a possible application within clusters. However, this suitability will be largely dependent on whether the FSB contention issue is manageable. |
|
June 2, 2007, 08:39 |
Re: CFX11 scaling on quad core Clovertown cpus?
|
#4 |
Guest
Posts: n/a
|
How about this, send me an email to cfdstu@gmail.com with a .def file that you would like tested, and I will run it for you.
Stu |
|
June 2, 2007, 11:19 |
Re: CFX11 scaling on quad core Clovertown cpus?
|
#5 |
Guest
Posts: n/a
|
I could but it really just needs to be trivially simple e.g.
Tube: D 25 mm x 250 mm long. 2e6 uniformly distributed cells. Steady state SST turbulence model Water Inlet: Fully developed pipe flow, 1 m/s uniform inlet velocity Outlet: Average pressure etc. etc. i.e. the simplest possible case. You only need to run each case for ~20 iterations to get a feel for the compute time per iteration. Thanks again. Looking forward to your results! |
|
June 2, 2007, 17:01 |
Re: CFX11 scaling on quad core Clovertown cpus?
|
#6 |
Guest
Posts: n/a
|
I would suggest to perform a calculation for a unsteday flow. We have a benchmark with 500k nodes. It is just a cilinder filled with water with a hot bottom and a cold top. Select buoyancy, 100 iterations, convergence criterium 1e-5 and off you go.
Gert-Jan www.bunova.nl |
|
June 3, 2007, 05:28 |
Re: CFX11 scaling on quad core Clovertown cpus?
|
#7 |
Guest
Posts: n/a
|
I think many of us would appreciate it if you could also post the results to this forum when you have completed this test. Like I suspect a lot of people in CFD, I'm seriously considering getting some quad-core hardware soon, but there is this doubt that the performance on the Intel quad core hardware scales well enough to justify the additional parallel software license expense. I suspect that the scaling on AMD Barcelona will be much better, when that finally becomes available, but we don't actually KNOW. So any hard information on this topic will be very welcome right now!
|
|
June 3, 2007, 09:47 |
Results
|
#8 |
Guest
Posts: n/a
|
I have a Dell Precision 390 with a QX6700 (2.66 GHz) quad core CPU, 4 gig 667MHz ram, running XP32 SP2 with 3 gig switch enabled, & CFX 11 with update 1
Performed a multiphase transient sim with 306K elements CPU Wall Single 39m 26s 39m 49s MPICH 2 21m 16s 23m 01s MPICH 3 15m 45s 17m 53s Note 1 MPICH 4 15m 26s 17m 22s The MPICH 3 result seems strange but I was doing other things at the time. I hope this helps Stu |
|
June 3, 2007, 10:06 |
Re: Results
|
#9 |
Guest
Posts: n/a
|
Thanks. This directly confirms the poor Intel quad core scaling seen under Fluent.
In practice, you only get to utilise 3 of the 4 cores effectively. AMDs upcoming Barcelona will probably show better quad core scaling due to its native quad core design and shared L3 cache. We will be sticking to dual core Intel CPUs in our cluster. |
|
June 4, 2007, 08:01 |
Re: Results
|
#10 |
Guest
Posts: n/a
|
Thanks stu. Can you provide more details? Number of nodes (instead of elements), type of mesh, memory usage in the run, number of iterations, cpu time and wall clock time.
Thanks in advance, Gert-Jan |
|
June 4, 2007, 08:46 |
Re: Results
|
#11 |
Guest
Posts: n/a
|
No probs, I will do the best I can
Domain Name : Default Domain Total Number of Nodes = 56596 Total Number of Elements = 306824 Total Number of Tetrahedrons = 306824 Total Number of Faces = 15938 Mesh was cfx mesh, therefore pure tets. Iterations 10 (just for benchmarking), memory usage, didn't notice but did not spool to disk, all 100% CPU usage. The times given above are for CPU time and then Wall time, it is just that the format of this forum predates DOS 6.22 hence, it did not display correctly, even after previewing. Stu |
|
June 4, 2007, 11:44 |
Re: Results
|
#12 |
Guest
Posts: n/a
|
Hmmm. As I said, we use a benchmark which is quite large. It requires 1G of memory, 100 iterations. This to check floating point operations. You can run it if you have time avaialbe. Shall I share this benchmark?
Gert-Jan |
|
June 4, 2007, 19:43 |
Re: Results
|
#13 |
Guest
Posts: n/a
|
Yeah, no probs, send it to cfdstu@gmail.com and I will run it when I get time, just tell me what you want as a performance result, the .out file?
Stu |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
solving a conduction problem in FLUENT using UDF | Avin2407 | Fluent UDF and Scheme Programming | 1 | March 13, 2015 03:02 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
Parallel computing quad core | Prad | Main CFD Forum | 13 | February 9, 2009 15:28 |
intel quad core with ANSYS CFX-v11.0 (without SP1) | Rogerio Fernandes Brito | CFX | 12 | May 30, 2008 03:59 |
Questions about CPU's: quad core, dual core, etc. | Tim | FLUENT | 0 | February 26, 2007 15:02 |