|
[Sponsors] |
Parallel processing of OpenFOAM cases on multicore processor??? |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 20, 2013, 21:06 |
|
#21 |
New Member
Join Date: Dec 2012
Posts: 19
Rep Power: 13 |
I'm not sure why you are having longer computation time but I have a guess:
Your longer processing time could be due to unoptimized selection of number of decomposition domain. when you decomposed the domain the communication between each threads during parallel computation also take process which correspondingly demands more time. In your case I believe if you decompose you domain into 5 or 3 instead of 4, you should be facing different processing time as the communication between threads might decrease or increase. it is not always efficient to decompose the domain into several parts. |
|
May 30, 2015, 04:15 |
|
#22 | |
Member
Ali Shamooni
Join Date: Oct 2010
Posts: 44
Rep Power: 16 |
Quote:
Dear Edmund and Bruno, It seems that Open MPI rank file can not detect multi threads, I mean when u have cores with HT enabled, in a rankfile u can only include physical processors. Is there any solution? Regards, Ali |
||
May 30, 2015, 09:20 |
|
#23 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Quote:
Beyond that, a very quick search lead me to this answer: http://stackoverflow.com/a/11761943 |
||
May 31, 2015, 08:35 |
|
#24 | |
Member
Ali Shamooni
Join Date: Oct 2010
Posts: 44
Rep Power: 16 |
Quote:
Tnx for quick response. It was helpful. I know that the maximum speedup would be 10-30% in some cases, when some processors become idle e.g. in combustion probs. I refer u to this paper, "An Empirical Study of Hyper-Threading in High Performance Computing Clusters". Ok lets forget the HT for the moment. I have another question, is there any report of OpenFOAM scalability above 32 processors like this "https://www.hpc.ntnu.no/display/hpc/...mance+on+Vilje" but without infiniband communication? I mean with Ethernet communication among nodes? The question may seem weird but let me describe it more, I'm not a pro in computer science so excuse me for probable mistakes. We have 3 Supermicro servers, each has 2 Intel Xeon E5-2690 (2*10 cores). I connected them via ethernet with Cat6 cables and a high speed switch. The problem is that I cant reproduce the result of "https://www.hpc.ntnu.no/display/hpc/...mance+on+Vilje" in 1M cells cavity case using 32 processors. The solution in 1 node is scalable, however increasing the nodes to 2 and 3 (40 and 60 processors respectively) there is no substantial speedup. When I change the problem to the combustion case (PDE+ODE solutions) an interesting behavior is seen. The scalability of ODE solution part is linear. But the PDEs solution time is still the same like cavity case. So it comes to my mind that maybe this is the prblem of communication among nodes. Since ODE solution part doesn't need any synchronization while the PDEs do. The conclusion: since the only major difference btw me and the cluster in "https://www.hpc.ntnu.no/display/hpc/...mance+on+Vilje" is the type of communication (ethernet VS infiniband) it seems that this is the source of lack of scalability under the same conditions. Is it true? Is there any report of significant speedup by using ethernet communication among nodes in clusters? Regards, Ali |
||
May 31, 2015, 18:58 |
|
#25 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Ali,
Quote:
http://www.cfd-online.com/Forums/har...tml#post518234 - post #8 Your cluster already falls within the details given in the image, namely that 1Gbps connection is not enough to support so many processors. Best regards, Bruno |
||
October 29, 2015, 07:31 |
|
#26 | |
Senior Member
Join Date: Mar 2015
Posts: 250
Rep Power: 12 |
Quote:
would you mind to explain this part of your quote in more detail? How can you tell then if it's a CPU cache problem? What should be saved in the cache? I can't imagine even the L3 cache is big enough to hold the whole mesh. Do you know of some tutorial or description of how to use the hierarcial decomposition method? I searched the user guide and the forum but didn't get a clue. Best regards, Kate |
||
October 31, 2015, 09:44 |
|
#27 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Kate,
Quote:
Quote:
Best regards, Bruno
__________________
|
|||
November 2, 2015, 05:28 |
|
#28 |
Senior Member
Join Date: Mar 2015
Posts: 250
Rep Power: 12 |
Hi Bruno,
I understand your thought process. But what does this mean for a real simulation. The problem is you can't actually see what is slowing down your parallel simulation, can you? My current way of procedure on a 2 socket machine with each having 6 cores and 3 memory channels is the following: 1) Run case in serial to have a reference 2) Run 2 threads on different sockets core-binded 3) Run 4 threads, 2 on every socket, core-binded 4) The same with 6, 8, 10 and 12 threads I run these test cases for 10 iterations each (is that enough), see which one finishes the fastet and go with this configuration for this case. Is there any other method? Regarding the hierarcial decomposition method. Not really. I don't understand what it is supposed to do. A quick example: Code:
28 hierarchicalCoeffs 29 { 30 n ( 3 1 2 ); 31 delta 0.001; 32 order xyz; 33 } Code:
---------------------- I I I I ---------------------- I I I I ---------------------- How does the order of splitting effect the outcome? Best regards, Kate Last edited by KateEisenhower; November 2, 2015 at 05:40. Reason: clarification |
|
November 2, 2015, 18:34 |
|
#29 | ||||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Hi Kate,
Quote:
Quote:
Quote:
Keep in mind that OpenFOAM technically uses boundary conditions of type "processor" for communicating the data between subdomains. And since small changes in a boundary condition can affect the solution, this means that more or less iterations might be needed to reach convergence. Keep in mind that this can either be iterations at the level of the matrix solvers (e.g. GAMG) or at the level of the outer iterations of the application solver (e.g. simpleFoam). Quote:
To a lesser extend, the other objective is to have the simulation be solved in the most efficient way possible, simultaneously if possible. This can be tested by modifying the "incompressible/icoFoam/cavity" tutorial case to be 3D and then test the various orders of decomposition. In theory, if we can have all of the domains work though the equation matrices in the same exact order in parallel, this should be the most optimum way to process the data. From your ASCII drawing, the efficient way would be to have all 6 processes work from the left to the right, then one line down and left to the right again, within their own subdomains, so that they are working side-by-side on solving the same parts of the matrices, at least for each pair of processes. I'm oversimplifying this, but this should become more apparent when testing with a 3D cavity case with a uniform mesh and uniform mesh distribution between processes. Translating this to a real simulation isn't as straight forward, but it can at least help you reduce the number of tests you need to do when looking for the best decomposition. But for more complex meshes, the usual decomposition to go with is Scotch or Metis, since they use graph theory (I can't remember the exact terminology) for trying to minimize the number of faces needed for communicating between subdomains. Best regards, Bruno |
|||||
October 8, 2017, 01:08 |
Can you help me. the errors appear when I run parallel. the comment "reconstructPar"
|
#30 |
Member
ESI
Join Date: Sep 2017
Posts: 49
Rep Power: 9 |
hi, everyone.
I am running parallel in OpenFoam. When I comment "reconstructPar - latestTime", it appears the errors. the first: there are the coordinates of the face in the Polymesh have "word" in the number. the second: in the file P appear the symbol as "^,$,&" in the number in here. I hope everyone helps me. thanh.jpg |
|
October 8, 2017, 14:18 |
|
#31 | |
New Member
Join Date: Dec 2012
Posts: 19
Rep Power: 13 |
Quote:
what solver did you use? It appears to me that your mesh has reformed, in this case you need to reconstract the mesh first, then reconstruct the fields. |
||
November 1, 2017, 10:25 |
|
#32 | |
New Member
Join Date: Jul 2017
Posts: 10
Rep Power: 9 |
Quote:
Hi Edmund I tried to do parallel calculation in two network pc by simulation does not run further it is stock as below please help me to find my failure [15:18][tec0683@rue-l020:/disk1/krishna/EinfacheRohre/bendtubeparalle/bendingtube]$ mpirun -np 8 -hostfile machines simpleFoam -parallel /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 2.1.1 | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 2.1.1-221db2718bbb Exec : simpleFoam -parallel Date : Nov 01 2017 Time : 15:18:49 Host : "linxuman" PID : 13714 with regards Anna |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HP MPI warning...Distributed parallel processing | Peter | CFX | 10 | May 14, 2011 07:17 |
FSI and parallel processing | Jorn | CFX | 5 | June 8, 2007 16:53 |
Paradox in parallel processing | Vagelis | FLUENT | 0 | October 26, 2005 06:36 |
About parallel processing in Linux | tuks | CFX | 10 | August 8, 2005 09:22 |
Parallel processing | L.S. Frinch | FLUENT | 1 | August 21, 2001 14:00 |