|
[Sponsors] |
Scale-Up Study in Parallel Processing with OpenFoam |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
April 8, 2010, 17:12 |
Scale-Up Study in Parallel Processing with OpenFoam
|
#1 |
Senior Member
Seyyed Ali H.M.
Join Date: Nov 2009
Location: Utah
Posts: 107
Rep Power: 17 |
Hello everyone, Whats the plan?
Recently I have done a case with different numbers of processors to check the parallel processing performance in OpenFoam, But I got a strange result. I made a simple case and used Scotch method, which is kinda acting like Metis method. Then I used the cluseter in our lab (CMTL in UIC) to solve these cases. The cluster has 4 nodes with 8 processors on each node. The case is a 3d Backward facing step with 748K cells and Re=500, and the solver is rhoPisoFoam. I used 2,4,6,8,16,24,32 processors at a time to solve same Case, but after running I saw the 32-processor case was slower than 16-processor case. The attached file shows the speed of solutions ( Time step/Minute) versus number of processors Can Any one tell me what I did wrong, or if he /she had this problem before? Also Can anyone tell me about that Time openFoam is reporting in log file? |
|
April 9, 2010, 01:57 |
|
#2 |
Member
Andrew King
Join Date: Mar 2009
Location: Perth, Western Australia, Australia
Posts: 82
Rep Power: 17 |
Hi Sahm,
Your case is too small to stress your cluster, for 32 procs you only have 23k cells per proc. Your cluster is probably spending all its time communicating. Try a larger mesh (or faster interconnect). Cheers Andrew
__________________
Dr Andrew King Fluid Dynamics Research Group Curtin University |
|
April 14, 2010, 17:43 |
With Metis Method
|
#3 |
Senior Member
Seyyed Ali H.M.
Join Date: Nov 2009
Location: Utah
Posts: 107
Rep Power: 17 |
I changed decomposition method to Metis method, and there was an increase to my solution speeds, The attached file shows the speed-up.
|
|
April 19, 2010, 09:46 |
|
#4 |
Member
Fábio César Canesin
Join Date: Mar 2010
Location: Florianópolis
Posts: 67
Rep Power: 16 |
Can you upload it in pdf or open-document format ?? Running linux, so no xlsx for me =(
|
|
April 20, 2010, 16:22 |
New Results.
|
#5 |
Senior Member
Seyyed Ali H.M.
Join Date: Nov 2009
Location: Utah
Posts: 107
Rep Power: 17 |
Sorry for that,
this file is for 2 cases. one is backstep flow, with Solver rhopisoFoam, and 750k and 1.5m cells, the other one, is for cavity case with 1M cells, any comment about this is appreciated. |
|
April 22, 2010, 15:59 |
|
#6 |
Senior Member
Seyyed Ali H.M.
Join Date: Nov 2009
Location: Utah
Posts: 107
Rep Power: 17 |
Is any body Going to Help? Is any body not Going to Help?
Last edited by sahm; April 26, 2010 at 16:16. Reason: 2 Admin: If you can please delete this message. |
|
April 23, 2010, 05:50 |
|
#7 |
New Member
Simon Rees
Join Date: Mar 2009
Posts: 12
Rep Power: 17 |
What is your question??
I would say you should get better scalability than this but you will have to say more about the system - cores per node, interconnect type etc. |
|
April 23, 2010, 18:08 |
|
#8 |
Senior Member
Seyyed Ali H.M.
Join Date: Nov 2009
Location: Utah
Posts: 107
Rep Power: 17 |
Ok,
The system I`m working with is a cluster with 4 processing Node, and 1 head node, each node has 8 processors, with Infiniband connection between nodes. I dont know about ram of each node. But I didn't have any problem with that. The problem with this scale up study is that when I increased the number of Processors, the speed of calculation is decreased. I also tried this case with 3M cells, But still this problem exists. Do you have any Idea why this is happening and how I can make it better? |
|
April 23, 2010, 19:30 |
|
#9 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings SAHM,
It seems you have an "isolate and conquer" problem! 4 machines, 8 cores each, means that you should first test in a ramification method. In other words:
I suggest these tests, because by what I've seen from the first graph, it feels there is somewhat of an inertia like problem with the machines! In other words, by analysing the first graph, it seems that:
Additionally, don't forget that your infinyband interconnect might have options to configure for jumbo packets (useful for gigantic data transfer between nodes, which would be the case with 50-100 million cells ) or very low latency (useful for constant communication between nodes). Another thing that could hinder the scalability is how frequently do you save a case snapshot. In other words, does this case store:
Sooo... to conclude... be careful what you wish for! You just might get it ... or so I would think so :P Edit ---------------------------- I didn't see the other three graphs before I posted. But my train of thought still applies. And I should add another thing to the test list:
Best regards, Bruno
__________________
Last edited by wyldckat; April 23, 2010 at 19:58. Reason: added a missing thought process... and then I saw the other 3 graphs... |
|
April 26, 2010, 16:12 |
Wow, Still there is a question.
|
#10 |
Senior Member
Seyyed Ali H.M.
Join Date: Nov 2009
Location: Utah
Posts: 107
Rep Power: 17 |
Thanks Bruno. I forgot to erase the first sheet since that case was not defined well, but the other cases have the same problem. About your comments to run them with different types of parallelism, I have a question.
Actually I don't know, how to assign a certain sub-domain (of a decomposed domain) into a certain processor. I mean to run your tests, I need to define a specific processor for a specific sub-domain, or at least I should define how many cores of a node I`m going to use. For example I have to define how to use 4 processors, 2 cores/node and 2 nodes. Can you tell me how to do this? Since this cluster is shared between people in the lab, I need to ask for their permission when I'm going to use more than 8 cores. So running your case might take a long time. Besides, we use a software that enqueues the jobs in my cluster, its called Lava, and I should use it to define my jobs for the cluster, otherwise, other people might not see if I'm running something. Can you tell me how to define a job on certain cores on different nodes, and I'd appreciate it if you tell me how to do it with that Lava, if you know this software. I have an idea that I would like to discuss with you. I think if I define sub-domains for specific cores, I can make the interconnections between nodes minimum. I mean If speed of connection between cores of a node is faster than the connection of nodes, assigning neighbor sub-domains into cores of a single node might help since this reduces the data transferred between nodes ( through a slower connection). I would like to know your idea about this concept. Thanks again for your comment. |
|
April 26, 2010, 18:37 |
|
#11 | |||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings SAHM,
Quote:
Code:
cat machinefile.morab morab001 morab001 morab002 morab002 Code:
foamJob -p -s icoFoam So, for more information about the MPI application you are using, you should consult its manual You could also tweak the foamJob script (run "which foamJob" to know where it is ) to better suit your needs! Quote:
Not-so-quick answer: See --> this <-- short tutorial and the links it points to. But by what I can briefly see, it seems that it can operate as a wrapper for mpirun. So, my guess is that you could do a symbolic link of mpirun to Lava's mpirun and use foamJob as it is now! Quote:
Best regards, Bruno
__________________
|
||||
Tags |
cluster, openfoam, parallel, parallel processing, scale up |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
running OpenFoam in parallel | vishwa | OpenFOAM Running, Solving & CFD | 22 | August 2, 2015 09:53 |
Parallel solution in OpenFOAM | makaveli_lcf | OpenFOAM Running, Solving & CFD | 0 | September 21, 2009 09:07 |
parallel processing with icoFoam | sudhar | OpenFOAM Running, Solving & CFD | 0 | June 22, 2009 23:00 |
Parallel Processing Set Up | virgilhowardson | FLUENT | 1 | May 11, 2009 01:54 |
parallel processing and dynamic mesh | Anand | FLUENT | 0 | January 9, 2009 16:52 |