Limiters at interfaces in MPI

nived_mr · May 28, 2018, 04:28

We have a 3D unstructured grid serial code which we are trying to parallelize using MPI. For the cases involving shocks we have problems of convergence when using a 2nd order upwind scheme. The code is blowing up.

We have tried communicating the higher order reconstructed values at the interface faces. But there are oscilations along the shock in case of parallel code w.r.t the serial code.

Can anyone suggest a possible solution to the issue?

Thanks.

selig5576 · May 28, 2018, 04:43

When you say "blow up" are you referring to numerical blow up or your simulation crashing due to memory leaks? If its due to blow up, I would confirm that your serial code under the exact same conditions to ensure its not an implementation issue. In terms of debugging MPI codes I would use TotalView, it is pretty much the standard.

nived_mr · May 28, 2018, 05:09

The blow-up is a numerical blow-up. The parallel code converges for a few time step iterations and then diverges. We have set the same conditions in both the cases. The first order accurate simulations converge well. We have issues in 2nd order accurate convective scheme implementation.

naffrancois · May 28, 2018, 12:53

Did you check at first order if you have exact same results (up to machine accuracy) between serial and // code ? You did not mention it.

If so, then second order. You mention that you have a communication point for the limited recontructed states, I guess prior to the Riemann solver.

How do you compute the reconstructed states ? Assuming that first you compute the gradients with Green-Gauss or Least-Squares, both methods needs average values coming from a neighborhood. Did you communicate these average values prior gradient computation ?

Depending on your implementation, same remark may hold for the limiter, for example BJ limiter needs Min and Max over a neighborhood.

AliE · May 28, 2018, 13:01

Hi,

As people here is writing, you have a bug in the parallel code. Trust me, I have done the same job with my code at the resulta between the two codes must be the same. Since for the second order you need the upwind and the downwind cell, have you cheked that the ghost nodes are exchanged correctly? Naturally this is just a guess!

nived_mr · May 29, 2018, 02:42

Thanks for the replies.

I would like to clarify that we are getting a maximum error of about 3.47 x 10^-3 percentage in case of first order calculations while using 16 processors. I have attached the images of results showing the same for both 1st order and second order results. Is this fine? We are having a code with Implicit Jacobians solved using LU-SGS method and we haven't modified the Jacobian part till now for parallel computations.

Regarding the communication of reconstructed values, we communicate the solution at interface cells once we get the solution at a new time step, then we use these to find average values of the variables. These are then used to calculate Green-Gauss Gradient at all cell centres. Then we calculate the limited reconstructed values at faces, which are communicated prior to solving for the solution at the next time set.

We use the Venkatkrishnan limiter and we are calculating minimum and maximum across the neighbourhood considering the neighbour cells in other zones also, which does not change the result.

AliE · May 29, 2018, 03:11

Hi,

Unfortunatly a bug in mpi is very tedious to find. In my case for example I have done a contour of the central coeff of the matrices for the serial and the parallel version and the same thing for the rhs. Then I have looked at this results togwther. With these method you can see immeditlely if your equations are the same and I mean exactly the same at the 16 decimal digit

maybe you can try to do the same

if rhs are different than you can start the debug by going back.

naffrancois · May 29, 2018, 05:10

Your first results are in my opinion NOT fine. Even if it may look almost the same on the graphs, the error is much higher than real precision.

Then before switching to second order, you should first fix it with first order. Can you switch your temporal scheme to a simple explicit time integrator like 1st order Euler scheme for testing purpose ? If it works, then you add once at a time new ingredients, implicit, 2nd order etc

edit: you say you fill in ghost cells at the END of the time step. Is the initial solution communicated for the first time step ?

nived_mr · May 30, 2018, 02:01

Dear naffrancois,

As per your suggestion, I tried running the same case using explicit Euler time stepping and it gives exact match(13 decimal places) in case of first order calculations. Please see the attached results. But once I use the 2nd order limiter, it gives a result similar to the one in case of implicit code even with explicit Euler time stepping.

I tried running explicit code with a limiter value of 1.0, giving match up to 11-12 decimal places suggesting that the error is not in the gradient calculations.

nived_mr · May 31, 2018, 08:42

We have finally fixed a bug in the 2nd order convective flux limiter calculation, thus serial and parallel results of explicit Euler time-stepping are matching up to decimal places of machine precision.
Now we have a slight difference in the results of serial and parallel codes in case of implicit time-stepping using LU-SGS solver for the same case. Will it be because of a mismatch in the calculation in the LU-SGS solver caused due to decomposition of the computational domain in the parallel code? Kindly suggest a way to fix it.

naffrancois · May 31, 2018, 18:47

Well you have done a big step finding the bug for the explicit solver. I am not familiar with LUSGS time stepping, but clearly extra care has to be taken. I parallelized not so long time ago an unstructured implicit scheme using 1st order in space Jacobian with gmres, I did it that way: each cpu owns a submatrix of size Nelem*Nelem_overlap, like an augmented matrix. In krylov linear solvers, typical operations involve matrix-vector products Jac*x=b and scalar products. Communication of x vector is then necessary prior matrix-vector product and sum reduction (MPI_ALLREDUCE) for scalar products.

I dont know how this applies to LUSGS, maybe you forgot contributions coming from ghost cells in the Upper part ?

By the way I assume you are testing LUSGS with 1st order

nived_mr · June 1, 2018, 03:30

Right now we are not using any MPI Functions inside the LU-SGS solver. We are working out a possible solution to account for missing neighbour cell information in the LU-SGS Matrix in all the processes.

naffrancois · June 4, 2018, 16:22

I would say as long as you have off-diagonal blocks into your matrix, e.g. not a block-diagonal matrix, means that you have for each element neighbor information. Then at a certain point in your LUSGS solver you will need MPI calls to synchronize values between the partitions. Anytime you need values coming from a neighbor element you have to ask yourself if this value may belong to a different CPU.

AliE · June 5, 2018, 06:17

Quote:

Originally Posted by naffrancois

I would say as long as you have off-diagonal blocks into your matrix, e.g. not a block-diagonal matrix, means that you have for each element neighbor information. Then at a certain point in your LUSGS solver you will need MPI calls to synchronize values between the partitions. Anytime you need values coming from a neighbor element you have to ask yourself if this value may belong to a different CPU.

naffrancois is right. Have you tried to use PETSC in order to solve the linear system? It resolved a lot of problem related to the linear system parallelization in my case and it is super-fast!

May 28, 2018, 04:28	Limiters at interfaces in MPI	#1
nived_mr New Member Nived M R Join Date: May 2018 Posts: 6 Rep Power: 8	We have a 3D unstructured grid serial code which we are trying to parallelize using MPI. For the cases involving shocks we have problems of convergence when using a 2nd order upwind scheme. The code is blowing up. We have tried communicating the higher order reconstructed values at the interface faces. But there are oscilations along the shock in case of parallel code w.r.t the serial code. Can anyone suggest a possible solution to the issue? Thanks.

May 28, 2018, 04:43	Blowing up	#2
selig5576 Senior Member Selig Join Date: Jul 2016 Posts: 213 Rep Power: 11	When you say "blow up" are you referring to numerical blow up or your simulation crashing due to memory leaks? If its due to blow up, I would confirm that your serial code under the exact same conditions to ensure its not an implementation issue. In terms of debugging MPI codes I would use TotalView, it is pretty much the standard.

June 4, 2018, 16:22		#13
naffrancois Senior Member Join Date: Oct 2011 Posts: 242 Rep Power: 17	I would say as long as you have off-diagonal blocks into your matrix, e.g. not a block-diagonal matrix, means that you have for each element neighbor information. Then at a certain point in your LUSGS solver you will need MPI calls to synchronize values between the partitions. Anytime you need values coming from a neighbor element you have to ask yourself if this value may belong to a different CPU. AliE likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Doubt regarding MUSCL scheme implementation and associated limiters	SangeethCFD	Main CFD Forum	0	October 7, 2017 10:00
mpirun, best parameters	pablodecastillo	Hardware	18	November 10, 2016 13:36
[OpenFOAM.org] MPI compiling and version mismatch	pki	OpenFOAM Installation	7	June 15, 2015 17:21
Error using LaunderGibsonRSTM on SGI ALTIX 4700	jaswi	OpenFOAM	2	April 29, 2008 11:54
Is Testsuite on the way or not	lakeat	OpenFOAM Installation	6	April 28, 2008 12:12

May 28, 2018, 05:09		#3
nived_mr New Member Nived M R Join Date: May 2018 Posts: 6 Rep Power: 8	The blow-up is a numerical blow-up. The parallel code converges for a few time step iterations and then diverges. We have set the same conditions in both the cases. The first order accurate simulations converge well. We have issues in 2nd order accurate convective scheme implementation.

May 28, 2018, 12:53		#4
naffrancois Senior Member Join Date: Oct 2011 Posts: 242 Rep Power: 17	Did you check at first order if you have exact same results (up to machine accuracy) between serial and // code ? You did not mention it. If so, then second order. You mention that you have a communication point for the limited recontructed states, I guess prior to the Riemann solver. How do you compute the reconstructed states ? Assuming that first you compute the gradients with Green-Gauss or Least-Squares, both methods needs average values coming from a neighborhood. Did you communicate these average values prior gradient computation ? Depending on your implementation, same remark may hold for the limiter, for example BJ limiter needs Min and Max over a neighborhood.

May 28, 2018, 13:01		#5
AliE Senior Member Join Date: Dec 2017 Posts: 153 Rep Power: 8	Hi, As people here is writing, you have a bug in the parallel code. Trust me, I have done the same job with my code at the resulta between the two codes must be the same. Since for the second order you need the upwind and the downwind cell, have you cheked that the ghost nodes are exchanged correctly? Naturally this is just a guess!

May 29, 2018, 03:11		#7
AliE Senior Member Join Date: Dec 2017 Posts: 153 Rep Power: 8	Hi, Unfortunatly a bug in mpi is very tedious to find. In my case for example I have done a contour of the central coeff of the matrices for the serial and the parallel version and the same thing for the rhs. Then I have looked at this results togwther. With these method you can see immeditlely if your equations are the same and I mean exactly the same at the 16 decimal digit maybe you can try to do the same if rhs are different than you can start the debug by going back.

May 29, 2018, 05:10		#8
naffrancois Senior Member Join Date: Oct 2011 Posts: 242 Rep Power: 17	Your first results are in my opinion NOT fine. Even if it may look almost the same on the graphs, the error is much higher than real precision. Then before switching to second order, you should first fix it with first order. Can you switch your temporal scheme to a simple explicit time integrator like 1st order Euler scheme for testing purpose ? If it works, then you add once at a time new ingredients, implicit, 2nd order etc edit: you say you fill in ghost cells at the END of the time step. Is the initial solution communicated for the first time step ?

May 31, 2018, 08:42		#10
nived_mr New Member Nived M R Join Date: May 2018 Posts: 6 Rep Power: 8	We have finally fixed a bug in the 2nd order convective flux limiter calculation, thus serial and parallel results of explicit Euler time-stepping are matching up to decimal places of machine precision. Now we have a slight difference in the results of serial and parallel codes in case of implicit time-stepping using LU-SGS solver for the same case. Will it be because of a mismatch in the calculation in the LU-SGS solver caused due to decomposition of the computational domain in the parallel code? Kindly suggest a way to fix it.

May 31, 2018, 18:47		#11
naffrancois Senior Member Join Date: Oct 2011 Posts: 242 Rep Power: 17	Well you have done a big step finding the bug for the explicit solver. I am not familiar with LUSGS time stepping, but clearly extra care has to be taken. I parallelized not so long time ago an unstructured implicit scheme using 1st order in space Jacobian with gmres, I did it that way: each cpu owns a submatrix of size NelemNelem_overlap, like an augmented matrix. In krylov linear solvers, typical operations involve matrix-vector products Jacx=b and scalar products. Communication of x vector is then necessary prior matrix-vector product and sum reduction (MPI_ALLREDUCE) for scalar products. I dont know how this applies to LUSGS, maybe you forgot contributions coming from ghost cells in the Upper part ? By the way I assume you are testing LUSGS with 1st order

June 1, 2018, 03:30		#12
nived_mr New Member Nived M R Join Date: May 2018 Posts: 6 Rep Power: 8	Right now we are not using any MPI Functions inside the LU-SGS solver. We are working out a possible solution to account for missing neighbour cell information in the LU-SGS Matrix in all the processes.