MPI parallelisation

April 27, 2007, 10:24

I'm a beginner at this MPI programming business, and I'm finding it really difficult to understand how to make MPI do anything at all.

Let's say, for example, I have a fortran loop like this:

DO NP=1,NMAX

IF(yptc(NP) .LT. 0.000) THEN

S1ptc(NP) = 1000.0

S3ptc(NP) = 0.0

S2ptc(NP) = 0.0

ELSE

S2ptc(NP) = 8500.0

S1ptc(NP) = 0.0

S3ptc(NP) = 0.0

ENDIF

ENDDO

Where NMAX is a large number ~ 32 million.

I want to parallise this using MPI. Now, my code is all running on the master node, so no other processes have any information when this loop is reached. What is the best way to parallelise this loop, and what arguments do I need in my MPI_Bcast (or MPI_Scatter), and MPI_Gather subroutine calls?

April 27, 2007, 13:00

People still use MPI ????

April 27, 2007, 14:26

I dont think its obsolete. There are lots of codes out there that use MPI effectively. Especially parallel linear solvers. What do you use ?

April 27, 2007, 16:48

Many research labs are now using PVM. PVM seems to be the better choice.

April 27, 2007, 17:50

PVM has been around probably as long as MPI if not longer. Both of them are used extensively, but none of this was Andrew's question.

Your intention to "parallelize a loop" makes me suspect that you are thinking of parallelization in terms of OpenMP instructions, rather than MPI/PVM style data decomposition. Maybe you should provide some more detail on your application, or you might only get useless comments on PVM vs. MPI, or generic programming advice like this one: Rather than "parallelizing" a serial code, try to design your program with a philosophy of parallel computing from the ground up. There should be as little instruction as possible exclusively performed by the master. Make the processes as homogeneous as possible.

April 27, 2007, 17:54

PVM is around since the late eighties if I am informed correctly, so it is not exactly new.

What makes it the better choice in your opinion?

MPI is newer and the only of the two that I have ever used. Still from what I read there are certain advantages and disadvantages to both systems. I don't know if this has changed, but as far as I know PVM would not allow a non-blocking send operation.

April 28, 2007, 02:41

Im not well versed in Fortran.. but still give u how i wud do it in C.

Lets assume you have a Local quantity called NMAX_LOC where it is defined as the maximum index of the array that each process can accomodate. Eg say NMAX=12 and u use 3 processes (np=3)

process 0: NMAX_LOC = 4, index range -> 1:4

process 1: NMAX_LOC = 8, index range -> 5:8

process 2: NMAX_LOC = 12,index range -> 9:12

Now, this piece of code must be 'collective' on all processes, and the variables S1ptc, S3ptc.. etc are all local to each process. Declare another variable S1ptc_GLOB, ...etc collective to all processes.

Also "yptc" is known by all processes.i.e each process has a knowledge about the entire data of yptc(the entire array ).

/* +++++++++++++++++ */

MPI_Comm_Rank() --> Get the rank of the current process

NMAX_LOC = (rank+1)*NMAX/np;

nloc = 1;

do n = rank*NMAX/np + 1, NMAX_LOC

IF(yptc(n) .LT. 0.000) THEN

S1ptc(nloc) = 1000.0

S3ptc(nloc) = 0.0

S2ptc(nloc) = 0.0

nloc=nloc+1

ELSE

S2ptc(nloc) = 8500.0

S1ptc(nloc) = 0.0

S3ptc(nloc) = 0.0

nloc=nloc+1

ENDIF

ENDDO

MPI_Gather(S1ptc,NMAX/np,MPI_FLOAT,S1ptc_GLOB,NMAX/np,MPI_FLOAT,0,MPI_COMM_WORLD)

do the same for S2ptc, S3ptc.. etc. This will gather the local S1ptc's onto one array S1ptc_GLOB in process 0. The above may not be complete in all aspects but should give you a hint how you can proceed to do it in detail.

-Dominic

April 28, 2007, 08:37

Alas, I have neither the time nor the resources to design a completely new Large Eddy Simulation code from the ground up, so I am making do with what I have.

I am performing monte carlo simulation for Filtered Density Function closure. The computational cost per time step is increased by an order of magnitude with the FDF incorporated, as I have some 30 million particles floating around in the computational domain. Therefore it is very relevant that I parallelise these particular routines using MPI, as the computational resources I have available for use do not use SMP based parallelisms.

April 28, 2007, 11:41

Thanks for your help, I'll get round to trying this out next week!

April 30, 2007, 10:25

Yup, some applications are more readily parallelized than others. I suppose Monte Carlo must be a dream application for any parallel programmer. Not to say it's easy...

April 30, 2007, 11:59

PVM is to be preferred to MPI in a highly non-homogeneous cluster, according to some users. Otherwise, MPI is more powerful. I never did the test myself.

May 3, 2007, 20:04

Well, after a few days of head-scratching, and to be honest, swearing, I've just about managed to get my code to work in parallel with MPI.

I went for a pseudo domain-decomposition in the end, and I get a speed-up of almost x4 when using 4 processors, so I'm reasonably happy with the outcome. Now to iron out the final few bugs.....

Thanks for your help everyone.

April 27, 2007, 10:24	MPI parallelisation	#1
Andrew Guest Posts: n/a	I'm a beginner at this MPI programming business, and I'm finding it really difficult to understand how to make MPI do anything at all. Let's say, for example, I have a fortran loop like this: DO NP=1,NMAX IF(yptc(NP) .LT. 0.000) THEN S1ptc(NP) = 1000.0 S3ptc(NP) = 0.0 S2ptc(NP) = 0.0 ELSE S2ptc(NP) = 8500.0 S1ptc(NP) = 0.0 S3ptc(NP) = 0.0 ENDIF ENDDO Where NMAX is a large number ~ 32 million. I want to parallise this using MPI. Now, my code is all running on the master node, so no other processes have any information when this loop is reached. What is the best way to parallelise this loop, and what arguments do I need in my MPI_Bcast (or MPI_Scatter), and MPI_Gather subroutine calls?

April 27, 2007, 13:00	Re: MPI parallelisation	#2
newt Guest Posts: n/a	People still use MPI ????

April 27, 2007, 14:26	Re: MPI parallelisation	#3
Dominic Guest Posts: n/a	I dont think its obsolete. There are lots of codes out there that use MPI effectively. Especially parallel linear solvers. What do you use ?

April 27, 2007, 16:48	Re: MPI parallelisation	#4
CFD-GURU Guest Posts: n/a	Many research labs are now using PVM. PVM seems to be the better choice.

April 27, 2007, 17:50	Re: MPI parallelisation	#5
Mani Guest Posts: n/a	PVM has been around probably as long as MPI if not longer. Both of them are used extensively, but none of this was Andrew's question. Your intention to "parallelize a loop" makes me suspect that you are thinking of parallelization in terms of OpenMP instructions, rather than MPI/PVM style data decomposition. Maybe you should provide some more detail on your application, or you might only get useless comments on PVM vs. MPI, or generic programming advice like this one: Rather than "parallelizing" a serial code, try to design your program with a philosophy of parallel computing from the ground up. There should be as little instruction as possible exclusively performed by the master. Make the processes as homogeneous as possible.

April 27, 2007, 17:54	Re: MPI parallelisation	#6
O. Guest Posts: n/a	PVM is around since the late eighties if I am informed correctly, so it is not exactly new. What makes it the better choice in your opinion? MPI is newer and the only of the two that I have ever used. Still from what I read there are certain advantages and disadvantages to both systems. I don't know if this has changed, but as far as I know PVM would not allow a non-blocking send operation.

April 28, 2007, 02:41	Re: MPI parallelisation	#7
Dominic Guest Posts: n/a	Im not well versed in Fortran.. but still give u how i wud do it in C. Lets assume you have a Local quantity called NMAX_LOC where it is defined as the maximum index of the array that each process can accomodate. Eg say NMAX=12 and u use 3 processes (np=3) process 0: NMAX_LOC = 4, index range -> 1:4 process 1: NMAX_LOC = 8, index range -> 5:8 process 2: NMAX_LOC = 12,index range -> 9:12 Now, this piece of code must be 'collective' on all processes, and the variables S1ptc, S3ptc.. etc are all local to each process. Declare another variable S1ptc_GLOB, ...etc collective to all processes. Also "yptc" is known by all processes.i.e each process has a knowledge about the entire data of yptc(the entire array ). /* +++++++++++++++++ / MPI_Comm_Rank() --> Get the rank of the current process NMAX_LOC = (rank+1)NMAX/np; nloc = 1; do n = rank*NMAX/np + 1, NMAX_LOC IF(yptc(n) .LT. 0.000) THEN S1ptc(nloc) = 1000.0 S3ptc(nloc) = 0.0 S2ptc(nloc) = 0.0 nloc=nloc+1 ELSE S2ptc(nloc) = 8500.0 S1ptc(nloc) = 0.0 S3ptc(nloc) = 0.0 nloc=nloc+1 ENDIF ENDDO MPI_Gather(S1ptc,NMAX/np,MPI_FLOAT,S1ptc_GLOB,NMAX/np,MPI_FLOAT,0,MPI_COMM_WORLD) do the same for S2ptc, S3ptc.. etc. This will gather the local S1ptc's onto one array S1ptc_GLOB in process 0. The above may not be complete in all aspects but should give you a hint how you can proceed to do it in detail. -Dominic

April 28, 2007, 08:37	Re: MPI parallelisation	#8
Pagey Guest Posts: n/a	Alas, I have neither the time nor the resources to design a completely new Large Eddy Simulation code from the ground up, so I am making do with what I have. I am performing monte carlo simulation for Filtered Density Function closure. The computational cost per time step is increased by an order of magnitude with the FDF incorporated, as I have some 30 million particles floating around in the computational domain. Therefore it is very relevant that I parallelise these particular routines using MPI, as the computational resources I have available for use do not use SMP based parallelisms.

April 28, 2007, 11:41	Re: MPI parallelisation	#9
Andrew Guest Posts: n/a	Thanks for your help, I'll get round to trying this out next week!

April 30, 2007, 10:25	Re: MPI parallelisation	#10
Mani Guest Posts: n/a	Yup, some applications are more readily parallelized than others. I suppose Monte Carlo must be a dream application for any parallel programmer. Not to say it's easy...

April 30, 2007, 11:59	Re: MPI parallelisation	#11
jojo Guest Posts: n/a	PVM is to be preferred to MPI in a highly non-homogeneous cluster, according to some users. Otherwise, MPI is more powerful. I never did the test myself.

May 3, 2007, 20:04	Re: MPI parallelisation	#12
Andrew Guest Posts: n/a	Well, after a few days of head-scratching, and to be honest, swearing, I've just about managed to get my code to work in parallel with MPI. I went for a pseudo domain-decomposition in the end, and I get a speed-up of almost x4 when using 4 processors, so I'm reasonably happy with the outcome. Now to iron out the final few bugs..... Thanks for your help everyone.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
MPI error	florencenawei	OpenFOAM Installation	3	October 10, 2011 02:21
Compiling OpenFOAM on hpc-fe.gbar.dtu.dk	kaergaard	OpenFOAM Installation	1	June 16, 2011 02:33
Error using LaunderGibsonRSTM on SGI ALTIX 4700	jaswi	OpenFOAM	2	April 29, 2008 11:54
Is Testsuite on the way or not	lakeat	OpenFOAM Installation	6	April 28, 2008 12:12
MPI and parallel computation	Wang	Main CFD Forum	7	April 15, 2004 12:25