|
[Sponsors] |
August 22, 2005, 14:50 |
Hi, I just test the parallel p
|
#1 |
New Member
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17 |
Hi, I just test the parallel performance of the solver, icoFoam, on a cluster. For single cpu mode, it takes about 31 hours while the 2 cpu mode takes 26 hours. So, the efficency is around 60%. Is it reasonable?
|
|
August 22, 2005, 14:54 |
It depends on the speed of the
|
#2 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
It depends on the speed of the interconnect, the size of the case and the parallel comms settings you have specified in .OpenFoam-1.1/controlDict.
|
|
August 22, 2005, 15:11 |
The size of the case is quite
|
#3 |
New Member
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17 |
The size of the case is quite huge, there are
262392 cells in the computation domain. I have no idea of editing the file of .OpenFoam-1.1/controlDict. Actually, I do not change it at all. It takes the form: InfoSwitches { writeJobInfo 0; FoamXwriteComments 1; } OptimisationSwitches { fileModificationSkew 10; scheduledTransfer 1; floatTransfer 0; nProcsSimpleSum 16; } |
|
August 22, 2005, 15:14 |
and the speed of the interconn
|
#4 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
and the speed of the interconnect?
|
|
August 22, 2005, 16:37 |
Every node of my cluster is du
|
#5 |
New Member
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17 |
Every node of my cluster is dual CPU system. So for the two CPU mode is actually running inside a node. And the interconnection between the mahcine node is 1 G byte.
|
|
August 22, 2005, 16:44 |
I assume from your results tha
|
#6 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
I assume from your results that the two CPUs are sharing the memory bus in each of your nodes and you are only getting 60% efficiency because the memory bus is saturated. Try running the case between two nodes.
|
|
August 22, 2005, 16:52 |
Thanks Henry,
I will try. And
|
#7 |
New Member
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17 |
Thanks Henry,
I will try. And there is a type erro in my previous post, the interconnection is 1G bite. |
|
August 24, 2005, 16:25 |
I have run the code in two CPU
|
#8 |
New Member
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17 |
I have run the code in two CPU but with two different node. Now, the efficiency seems to be higher than 100%!!!. That means the bottleneck is the bus speed in my cluster and I'd better to upgrad the mother board?
BTW, the running time given by icoFoam is CPU time, in stead of wall clock time, right? |
|
August 24, 2005, 16:47 |
Recent multi-CPU motherboards
|
#9 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
Recent multi-CPU motherboards like the Tyan dual and quad Opteron boards (and I am guessing the recent Xeon boards as well) have a separate memory bus for each CPU. The AMD-based boards have hyper-transport buses between the CPUs as well but I don't know if there is an equivalent for Xeon processors. This arrangement is far preferable to the old shared-memory multi-CPU machines because CPU speeds outstrip memory access which means that memory-access intensive codes like CFD would become memory-access limited unless each CPU has it's own memory.
All the OpenFOAM applications print CPU time but you can easily add a print for the wall-clock time using the clockTime() member function in the same way as cpuTime() is used. |
|
August 24, 2005, 16:57 |
>(and I am guessing the recent
|
#10 |
Senior Member
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25 |
>(and I am guessing the recent Xeon boards as well)
I am pretty sure this is not correct. The Nocona Xeon dual CPU motherboards still use a shared memory bus. Based on our experience, these systems are not a good target platform for the current incarnation of OpenFOAM. |
|
August 24, 2005, 17:03 |
Current CPU performance far ou
|
#11 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
Current CPU performance far outstrips memory access performance and it doesn't look like this situation will improve anytime soon which means that all codes that rely on rapid memory access of large amounts of data (that is all CFD codes not just the current incarnation of OpenFOAM) will benefit from each CPU having it's own memory bus.
|
|
August 25, 2005, 10:02 |
Does this mean that the Dual-C
|
#12 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Does this mean that the Dual-Core Opterons are not good for CFD computations? If I interpret the Block-Diagrams correctly both cores share the same memory-interface (leading to a similar problem like the Xeon-MoBos discussed above).
Does anyone have experience with OpenFOAM on DualCores?
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request |
|
August 25, 2005, 10:08 |
I would expect dual-core CPUs
|
#13 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
I would expect dual-core CPUs to suffer from the same problem because they share the same memory bus.
|
|
August 29, 2005, 04:54 |
At dual-Opterons (Athlon X2) e
|
#14 |
Guest
Posts: n/a
|
At dual-Opterons (Athlon X2) each CPU has its own RAM-Channel, the dual DDR-Ram bus is devided to a single for each CPU. The Performance of one CPU (aka Socket 939/940) decreases by approx. 8% to a Socket 754 CPU.
|
|
August 29, 2005, 05:25 |
I don't have such a CPU - the
|
#15 |
Guest
Posts: n/a
|
I don't have such a CPU - the 8% above are just the difference of a Socket 939-CPU with/without dual-DDR-Ram! There is a crossbar-switch between the CPU and the RAM, which should act in that way. Graphics, Harddisk and Ethernet use one (Athlon) to 3 (Opteron) HT-Links with 3.2 GB each.
At Tomshardware.de the difference between 2 single Opteron and on Dual-Opteron is neglectable. But they didn't use CFD for comparisons! |
|
August 30, 2005, 15:30 |
Thanks for all of you guys' i
|
#16 |
New Member
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17 |
Thanks for all of you guys' idea about the parallel performance.
Now I have a question about the CPU time. The CPU time provided by the function of elapsedCpuTime() counts only the main node's CPU time instead of all of the parallel node's, right? Another question is why evey machine only use a portion of the CPU resource as I am quite sure no other people is using the cluster. Here is the output of my top command: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 9397 hsing 25 0 18996 18M 8400 R 69.9 0.4 1000m 0 hsingFlow |
|
August 30, 2005, 15:38 |
Each node calculates it's own
|
#17 |
Senior Member
Join Date: Mar 2009
Posts: 854
Rep Power: 22 |
Each node calculates it's own CPU time but only the master write to the log via the Info statement. If you want to see the CPU time for all the nodes replace Info with Sout or Serr.
Only a fraction of the CPU is being used because the rest of the time it's waiting probably for data communication between nodes. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
parallel performance | ivandipia | CFX | 6 | January 29, 2009 16:26 |
Parallel performance | liu | OpenFOAM Running, Solving & CFD | 8 | October 17, 2006 11:04 |
Performance of interFoam running in parallel | hsieh | OpenFOAM Running, Solving & CFD | 8 | September 14, 2006 10:15 |
ANSYS CFX 10.0 Parallel Performance for Windows XP | Saturn | CFX | 4 | August 13, 2006 13:27 |
Parallel Performance of Fluent | Soheyl | FLUENT | 2 | October 30, 2005 07:11 |