Parallel Processing

July 25, 2003, 20:52

hi I have a cluster consist of 9 Pentum 4(2.4Ghz) with 100 Mbit/sec network. I developed a 2d parallel naveir stokes solver with petsc and metis.because of some reasons,I could not gain performance. I want to know that is it possible to gain performace with implicit methods and with a 100 Mbit/sec network? I wonder if you could help me in this matter.(inform me some papers in this matter). this is my email: mbostandoust@yahoo.com bye

July 27, 2003, 11:49

Yes it is (probably) possible although you have not said what algorithm you are using and what size grids you are using. Both have a big influence on efficiency. To get an appreciation of likely performance look at the results of NAS Parallel Benchmarks for similarish machines. Some points:

* big grids parallelise better than small grids because of a larger volume to surface area ratio. You stand no chance of running a small 2D grid efficiently on such a machine (However, 15 years ago on a 9 Inmos transputer system one could achieve over 90% efficiency for implicit ADI line sweeps for the flow in 2D ducts on modest sized grids. Progress?)

* to get the best efficiency one needs to tune the number and size of messages for your hardware and this usually requires (mild) algorithmic changes. By using an off-the-shelf solver you are limiting your options somewhat but petsc is a big package and may have such parameters (I have never used it).

* as delivered ethernet is usually not optimised for low latency. Check your nic manufacturer for parameters to improve the performance in this respect. Less than 40usec is something to aim for. Cheap switchs and cheap nics can be a performance problem. Some nics perform poorly with certain motherboards (Checking out your PC hardware performance/compatability is another disappointing aspect of current computing).

* if you are really keen (desperate?) one can improve latency further by using OS-bypass software. This is reported to get down to 10usec or so (when it works well) but I have no direct experience.

July 27, 2003, 13:25

hi thanks for your comments. I developed the code with finite element with SUPG/PSPG equal order elements(collocated methods in finite volume). The most time consuming part of the code is solving sustem of equations in parallel.I used different solvers but I reach to the point that I should solve them with direct method or precondition the system of equation with LU and use GMRES or BCGSTAB for the iterative solutions.(because the system of equation are unsymmetric) currentlly there exist two good parallel direct solvers,superlu_dist and mumps. I used superlu_dist and I could not obtain performance in this regard. In the next week I will try mumps. I wonder to hear more comments or suggestions in this regard. bye

July 27, 2003, 14:35

What performance are you referring to? parallel efficiency or something else?

I have no experience of the parallel efficiency of off-the-shelf solvers but would suggest talking to the authors who are often keen for the results of their labours to be useful to others.

July 25, 2003, 20:52	Parallel Processing	#1
bostandoust Guest Posts: n/a	hi I have a cluster consist of 9 Pentum 4(2.4Ghz) with 100 Mbit/sec network. I developed a 2d parallel naveir stokes solver with petsc and metis.because of some reasons,I could not gain performance. I want to know that is it possible to gain performace with implicit methods and with a 100 Mbit/sec network? I wonder if you could help me in this matter.(inform me some papers in this matter). this is my email: mbostandoust@yahoo.com bye

July 27, 2003, 11:49	Re: Parallel Processing	#2
andy Guest Posts: n/a	Yes it is (probably) possible although you have not said what algorithm you are using and what size grids you are using. Both have a big influence on efficiency. To get an appreciation of likely performance look at the results of NAS Parallel Benchmarks for similarish machines. Some points: * big grids parallelise better than small grids because of a larger volume to surface area ratio. You stand no chance of running a small 2D grid efficiently on such a machine (However, 15 years ago on a 9 Inmos transputer system one could achieve over 90% efficiency for implicit ADI line sweeps for the flow in 2D ducts on modest sized grids. Progress?) * to get the best efficiency one needs to tune the number and size of messages for your hardware and this usually requires (mild) algorithmic changes. By using an off-the-shelf solver you are limiting your options somewhat but petsc is a big package and may have such parameters (I have never used it). * as delivered ethernet is usually not optimised for low latency. Check your nic manufacturer for parameters to improve the performance in this respect. Less than 40usec is something to aim for. Cheap switchs and cheap nics can be a performance problem. Some nics perform poorly with certain motherboards (Checking out your PC hardware performance/compatability is another disappointing aspect of current computing). * if you are really keen (desperate?) one can improve latency further by using OS-bypass software. This is reported to get down to 10usec or so (when it works well) but I have no direct experience.

July 27, 2003, 13:25	Re: Parallel Processing	#3
bostandoust Guest Posts: n/a	hi thanks for your comments. I developed the code with finite element with SUPG/PSPG equal order elements(collocated methods in finite volume). The most time consuming part of the code is solving sustem of equations in parallel.I used different solvers but I reach to the point that I should solve them with direct method or precondition the system of equation with LU and use GMRES or BCGSTAB for the iterative solutions.(because the system of equation are unsymmetric) currentlly there exist two good parallel direct solvers,superlu_dist and mumps. I used superlu_dist and I could not obtain performance in this regard. In the next week I will try mumps. I wonder to hear more comments or suggestions in this regard. bye

July 27, 2003, 14:35	Re: Parallel Processing	#4
andy Guest Posts: n/a	What performance are you referring to? parallel efficiency or something else? I have no experience of the parallel efficiency of off-the-shelf solvers but would suggest talking to the authors who are often keen for the results of their labours to be useful to others.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Parallel processing of OpenFOAM cases on multicore processor???	g.akbari	OpenFOAM Running, Solving & CFD	31	November 1, 2017 10:25
multiphase solver - parallel processing - GAMG	thibault_pringuey	OpenFOAM Programming & Development	2	August 27, 2013 23:03
HP MPI warning...Distributed parallel processing	Peter	CFX	10	May 14, 2011 07:17
bubbly flow and parallel processing	mvee	FLUENT	0	September 12, 2007 06:08
About parallel processing in Linux	tuks	CFX	10	August 8, 2005 09:22