Ram, cache and cpu upgrade help

January 23, 2007, 11:22

Hi,

I'm writing my own NS code and I'm simulating 2D unsteady flow using about 200-350 X 80-160 grid points. I need about half a day or more to run my simulation, depending on the no. of grids used. I'm considering of upgrading my current AMD Athlon 2400+@2.2Ghz to the new intel core 2 duo. However, I've a couple of questions...

1. How can I estimate the amt of RAM usage? I wonder if 1G is enough, or do I need 2G?

2. I read in the forum ppl saying that L2 cache is important in matrix computation. Is that so? Because I'm thinking of getting a cheaper cpu with 2mb instead of 4mb L2 cache.

3. Will I be able to take advantage of the dual core cpu? I'm now using MPI to parallelized my code.

4. If I'm getting the E4300 or E6300, should I expect my run time to reduce by half or more?

Thks!

January 23, 2007, 14:25

- estimation of required memory is very simple: if we take 200 Byte per grid (i take appriximately upper bound) u need 350*160*200= 11 MB (low, in fact for 2d we usually don't need large memory), if your os take 200 MB, 500MB ram is sufficient.

In my opinion, upgrading system don't has considrable effect for you (your system is state-of-the-art).

AMD vs. Intel: note that L2 is essential but after L1, AMDs has usually larger L1 (128 K), bigger L2 help to cache coherency and decreasing cache-miss, but if u use cache-aware algorithm u can get better performance, for more related to cache read especiall issue if Int. J. High Per. Computing, fortunately is freeely online avalible: http://hpc.sagepub.com/content/vol18/issue1/

or read this: http://casper.cs.yale.edu/mgnet/www/...u/thesis.ps.gz

>Will I be able to take advantage of the dual core cpu? >I'm now using MPI to parallelized my code yes with execution with at least 2-Job, also i u run your code in parallel on your avalible CPU i guess that you give better performance, it is due to multithreading and reducing cache miss (try this)

Finally i suggest u that for reducing CPU time concenterate on your code (improvement of linear solver or convertion of explict to implict), half-day is very large on about 40 K grid, especially if your grid be structured.

Hope this Help.

January 23, 2007, 20:29

Hi,

thanks for your comments rt.

I don't think my sys is now state of the art and the thing now is that I need to run many different cases, hence the need to improve my computing speed.

I need to determine how much improvement I can get from the money spent.

Btw, I am running moving airfoil with deforming grid at around Re~10^4. If CFL=1, my time step is 0.8 to 1e-4. Hence I need many time steps. My current linear solver is PETSc, which uses the krylov method. It should be one of the fastest linear solvers around. Implicit method is used. Lastly, I'm now running using the uniprocessor of Xeon 3.06Ghz on my school's server.

So is it still considered slow?

Thanks

January 23, 2007, 21:55

If you've been using PETSC you could try to use an inexact-Newton Krylov solver and try to enlarge the linear tolerance. I'm almost sure that PETSC has Inexact-Newton solvers. In inexact-Newton-type methods you can use large linear tolerances (up to 0.99) and let the algorithm adapt the tolerance according to the solution convergence.

By the way, is your solver fully-coupled (velocity and pressure being solved in the same linear system)? In this case, if you're using GMRES, you could enlarge the number of Krylov vector (let's say 25, 35, 45).

I hope it helps you in anything

Regards

Renato.

January 24, 2007, 02:10

What is your type of grid? What is your solver? (what krylov method) Has it non-linear iteration? (inner-outer iteration)

>My current linear solver is PETSc, It should be one of
:the fastest linear solvers around

It is not correct, PETSc is general purpose solver and is suitable for large scale problem on massivliy parallel cluster not your very small problem on single cpu.

not that your problem is large scale from time scale veiw not spatial, so parallelism (based on domain decomposition) don't help you because very well

Especially if your grid is structured, selection of PETSc is poor, on structured grid it is possible to implement faster krylov or multigrid solver.

January 24, 2007, 03:57

Is that so?

Well, I'm quite a novice with regard to which solver to use.

Yes, I'm using structured grid (c-grid). I've thought of using a multigrid solver but I can't find a freely available and suitable one.

I'm now using either biconjugate gradient stable or GMRES with preconditioner. In the past, I've used some other solver packages such as SPARSKIT and NSPCG but they are even slower than PETSc.

In that case, do you know of any faster krylov or multigrid solver which I can use? I wouldn't want to go into learning and writing a multigrid sovler from scratch.

Thanks rt!

January 24, 2007, 05:11

The big question to ask when selecting hardware for CFD simulation is how much memory does the solver use for your real CFD problems and to ignore the large amount of marketing on the performance of small in-cache test cases. The reason for this can be seen in the linpack plots here:

http://techreport.com/reviews/2003q2...2/index.x?pg=3

where the large effect of overflowing the cache (or two caches) can be seen. Note that most real CFD problems will have typical matrix sizes off the plot to the right. Your 2D case may not.

In addition, features on the motherboards can sometimes double or, in some cases, quadruple the performance of accessing main memory. Finding out about this and paying the extra for the motherboard can be wise.

Note that clock speed has little effect on performance when operating with large matrices which is good news because low clock speed chips are usually much cheaper.

It is possible to reorganise the way solvers work to try to limit the damage caused by overflowing the cache but this is not usually present using general purpose Fortran/C/C++ solvers but is often present in optimised low-level matrix libraries.

January 24, 2007, 05:44

note that general purpose sparse solvers use general sparsity pattern (suitable for unstructured grid), but for structured grid sparsity is completely known, i guess that your stencil is five point, so location of neighbours are completely known, implementation of such solver with known regular sparsity pattern is very simple and is very efficient than others. In my experience with ICCG it was 10-20 times faster than using SPARSEKIT.

u don't like implement it, so i guide u to use PERIC's code that are freely avalible from springer ftp, in that there is directory that contain solver (in FORTRAN 77) for structured grid (from GS, CIP, BCGTSTAB to MG and MG with CG precond). They are very easy to learn and use.

Also FAS multigrid is usefull, implementation is very simple, e.g. solve problem completely on coarse then interpolate on finer (as inintial guess) and follow sequence.

But, what is your system? (pressure poisson, velosity or coupled pressure-velosity, in block format), and do u use non-linear iteration? also what is your solution method?

January 24, 2007, 07:40

Thanks rt!

I am using the pressure correction which involves solving a momentum eqn followed by poisson eqn to ensure continuty.

The eqns are linearized so it's basically a system of linear eqns.

Btw, my stencil is 9 pt and that's why I can't use most of the available solver. Moreover, the diagonal arrangement is not valid at the cell location where the faces meet (c-grid intersection).

In PETSc, the eqns are solved using BCGTSTAB or GMRES.

Thank you

January 24, 2007, 08:15

do you have incompressible flow?

your CFL is 1. and Re=10^4, i suggest to decrease CFL to .5 and use explicit treatment of momentum equations. and only implicit treatement of pressure eq. Also don't perform any non-linear iterations. As usually pressure eq. is SPD, the congugate gradient with incomple cholesky preconditioner is the best choice.

Also adapting 5-point stencil solvers to 9-point is simple.

>>the diagonal arrangement is not valid

what u mean from "diagonal arrangement" ?

January 24, 2007, 08:35

regarding to: >>the diagonal arrangement is not valid

ok, this impose limitation for using structured solver, but i suggers a cure (i don't experience it)

u can treate a row cells that fall below/over branch cut as dirichlet bc, and move them to rhs but update them every some iterations (e.g. every iteration), in this manner u solve sequence of linear equations to acheiving converged pressure field, but each of them is solved with low tollorance (in fact solution of weake non-linear eq.)

i think it is cheaper than using unstructured solver.

January 24, 2007, 08:52

>In my experience with ICCG it was 10-20 times faster than using SPARSEKIT

Can ICCG be used for solving Navier stokes problems. Using the linear system of equations are unsymmetric because of boundary conditions. I guess ICCG is used for symmetric systems only (am I correct?)

I would like to know more details about this implementation. Is this applied for fully coupled velocity pressure system or for segregated approach (like SIMPLE etc.)

January 24, 2007, 09:00

>>I guess ICCG is used for symmetric systems only (am I correct?) u r right,

my method was fractional step (sometimes called two-step projection): explict treatment of momentom and only solution of pressure poisson eq. implicitely for enforcing incompressibility. as pre. eq. is symmetric ICCG can be applied.

January 24, 2007, 10:01

Hi rt,

Are you using explicit treatment for both convection as well as diffusion? I'm using implicit treatment for both. Maybe that's also one of the reasons why my solver seems slow to you. I tried explicit treatment for convection but I've to lower my CFL no. Moreover, it's not as stable.

However, I'll try to look at the recommendation you have given. I believe as you said, a structured solver will be faster.

January 24, 2007, 10:05

my diff and conve both are treated explicitely (your Renold is high so viscus term has negligible effect and is (probably) stable under stability of convection term)

January 23, 2007, 11:22	Ram, cache and cpu upgrade help	#1
zonexo Guest Posts: n/a	Hi, I'm writing my own NS code and I'm simulating 2D unsteady flow using about 200-350 X 80-160 grid points. I need about half a day or more to run my simulation, depending on the no. of grids used. I'm considering of upgrading my current AMD Athlon 2400+@2.2Ghz to the new intel core 2 duo. However, I've a couple of questions... 1. How can I estimate the amt of RAM usage? I wonder if 1G is enough, or do I need 2G? 2. I read in the forum ppl saying that L2 cache is important in matrix computation. Is that so? Because I'm thinking of getting a cheaper cpu with 2mb instead of 4mb L2 cache. 3. Will I be able to take advantage of the dual core cpu? I'm now using MPI to parallelized my code. 4. If I'm getting the E4300 or E6300, should I expect my run time to reduce by half or more? Thks!

January 23, 2007, 14:25	Re: Ram, cache and cpu upgrade help	#2
rt Guest Posts: n/a	- estimation of required memory is very simple: if we take 200 Byte per grid (i take appriximately upper bound) u need 350160200= 11 MB (low, in fact for 2d we usually don't need large memory), if your os take 200 MB, 500MB ram is sufficient. In my opinion, upgrading system don't has considrable effect for you (your system is state-of-the-art). AMD vs. Intel: note that L2 is essential but after L1, AMDs has usually larger L1 (128 K), bigger L2 help to cache coherency and decreasing cache-miss, but if u use cache-aware algorithm u can get better performance, for more related to cache read especiall issue if Int. J. High Per. Computing, fortunately is freeely online avalible: http://hpc.sagepub.com/content/vol18/issue1/ or read this: http://casper.cs.yale.edu/mgnet/www/...u/thesis.ps.gz >Will I be able to take advantage of the dual core cpu? >I'm now using MPI to parallelized my code yes with execution with at least 2-Job, also i u run your code in parallel on your avalible CPU i guess that you give better performance, it is due to multithreading and reducing cache miss (try this) Finally i suggest u that for reducing CPU time concenterate on your code (improvement of linear solver or convertion of explict to implict), half-day is very large on about 40 K grid, especially if your grid be structured. Hope this Help.

January 23, 2007, 20:29	Re: Ram, cache and cpu upgrade help	#3
zonexo Guest Posts: n/a	Hi, thanks for your comments rt. I don't think my sys is now state of the art and the thing now is that I need to run many different cases, hence the need to improve my computing speed. I need to determine how much improvement I can get from the money spent. Btw, I am running moving airfoil with deforming grid at around Re~10^4. If CFL=1, my time step is 0.8 to 1e-4. Hence I need many time steps. My current linear solver is PETSc, which uses the krylov method. It should be one of the fastest linear solvers around. Implicit method is used. Lastly, I'm now running using the uniprocessor of Xeon 3.06Ghz on my school's server. So is it still considered slow? Thanks

January 23, 2007, 21:55	Re: Ram, cache and cpu upgrade help	#4
Renato. Guest Posts: n/a	If you've been using PETSC you could try to use an inexact-Newton Krylov solver and try to enlarge the linear tolerance. I'm almost sure that PETSC has Inexact-Newton solvers. In inexact-Newton-type methods you can use large linear tolerances (up to 0.99) and let the algorithm adapt the tolerance according to the solution convergence. By the way, is your solver fully-coupled (velocity and pressure being solved in the same linear system)? In this case, if you're using GMRES, you could enlarge the number of Krylov vector (let's say 25, 35, 45). I hope it helps you in anything Regards Renato.

January 24, 2007, 02:10	Re: Ram, cache and cpu upgrade help	#5
rt Guest Posts: n/a	What is your type of grid? What is your solver? (what krylov method) Has it non-linear iteration? (inner-outer iteration) >My current linear solver is PETSc, It should be one of :the fastest linear solvers around It is not correct, PETSc is general purpose solver and is suitable for large scale problem on massivliy parallel cluster not your very small problem on single cpu. not that your problem is large scale from time scale veiw not spatial, so parallelism (based on domain decomposition) don't help you because very well Especially if your grid is structured, selection of PETSc is poor, on structured grid it is possible to implement faster krylov or multigrid solver.

January 24, 2007, 03:57	Re: Ram, cache and cpu upgrade help	#6
zonexo Guest Posts: n/a	Is that so? Well, I'm quite a novice with regard to which solver to use. Yes, I'm using structured grid (c-grid). I've thought of using a multigrid solver but I can't find a freely available and suitable one. I'm now using either biconjugate gradient stable or GMRES with preconditioner. In the past, I've used some other solver packages such as SPARSKIT and NSPCG but they are even slower than PETSc. In that case, do you know of any faster krylov or multigrid solver which I can use? I wouldn't want to go into learning and writing a multigrid sovler from scratch. Thanks rt!

January 24, 2007, 05:11	Re: Ram, cache and cpu upgrade help	#7
andy Guest Posts: n/a	The big question to ask when selecting hardware for CFD simulation is how much memory does the solver use for your real CFD problems and to ignore the large amount of marketing on the performance of small in-cache test cases. The reason for this can be seen in the linpack plots here: http://techreport.com/reviews/2003q2...2/index.x?pg=3 where the large effect of overflowing the cache (or two caches) can be seen. Note that most real CFD problems will have typical matrix sizes off the plot to the right. Your 2D case may not. In addition, features on the motherboards can sometimes double or, in some cases, quadruple the performance of accessing main memory. Finding out about this and paying the extra for the motherboard can be wise. Note that clock speed has little effect on performance when operating with large matrices which is good news because low clock speed chips are usually much cheaper. It is possible to reorganise the way solvers work to try to limit the damage caused by overflowing the cache but this is not usually present using general purpose Fortran/C/C++ solvers but is often present in optimised low-level matrix libraries.

January 24, 2007, 05:44	Re: Ram, cache and cpu upgrade help	#8
rt Guest Posts: n/a	note that general purpose sparse solvers use general sparsity pattern (suitable for unstructured grid), but for structured grid sparsity is completely known, i guess that your stencil is five point, so location of neighbours are completely known, implementation of such solver with known regular sparsity pattern is very simple and is very efficient than others. In my experience with ICCG it was 10-20 times faster than using SPARSEKIT. u don't like implement it, so i guide u to use PERIC's code that are freely avalible from springer ftp, in that there is directory that contain solver (in FORTRAN 77) for structured grid (from GS, CIP, BCGTSTAB to MG and MG with CG precond). They are very easy to learn and use. Also FAS multigrid is usefull, implementation is very simple, e.g. solve problem completely on coarse then interpolate on finer (as inintial guess) and follow sequence. But, what is your system? (pressure poisson, velosity or coupled pressure-velosity, in block format), and do u use non-linear iteration? also what is your solution method?

January 24, 2007, 07:40	Re: Ram, cache and cpu upgrade help	#9
zonexo Guest Posts: n/a	Thanks rt! I am using the pressure correction which involves solving a momentum eqn followed by poisson eqn to ensure continuty. The eqns are linearized so it's basically a system of linear eqns. Btw, my stencil is 9 pt and that's why I can't use most of the available solver. Moreover, the diagonal arrangement is not valid at the cell location where the faces meet (c-grid intersection). In PETSc, the eqns are solved using BCGTSTAB or GMRES. Thank you

January 24, 2007, 08:15	Re: Ram, cache and cpu upgrade help	#10
rt Guest Posts: n/a	do you have incompressible flow? your CFL is 1. and Re=10^4, i suggest to decrease CFL to .5 and use explicit treatment of momentum equations. and only implicit treatement of pressure eq. Also don't perform any non-linear iterations. As usually pressure eq. is SPD, the congugate gradient with incomple cholesky preconditioner is the best choice. Also adapting 5-point stencil solvers to 9-point is simple. >>the diagonal arrangement is not valid what u mean from "diagonal arrangement" ?

January 24, 2007, 08:35	Re: Ram, cache and cpu upgrade help	#11
rt Guest Posts: n/a	regarding to: >>the diagonal arrangement is not valid ok, this impose limitation for using structured solver, but i suggers a cure (i don't experience it) u can treate a row cells that fall below/over branch cut as dirichlet bc, and move them to rhs but update them every some iterations (e.g. every iteration), in this manner u solve sequence of linear equations to acheiving converged pressure field, but each of them is solved with low tollorance (in fact solution of weake non-linear eq.) i think it is cheaper than using unstructured solver.

January 24, 2007, 08:52	Re: Ram, cache and cpu upgrade help	#12
raju Guest Posts: n/a	>In my experience with ICCG it was 10-20 times faster than using SPARSEKIT Can ICCG be used for solving Navier stokes problems. Using the linear system of equations are unsymmetric because of boundary conditions. I guess ICCG is used for symmetric systems only (am I correct?) I would like to know more details about this implementation. Is this applied for fully coupled velocity pressure system or for segregated approach (like SIMPLE etc.)

January 24, 2007, 09:00	Re: Ram, cache and cpu upgrade help	#13
rt Guest Posts: n/a	>>I guess ICCG is used for symmetric systems only (am I correct?) u r right, my method was fractional step (sometimes called two-step projection): explict treatment of momentom and only solution of pressure poisson eq. implicitely for enforcing incompressibility. as pre. eq. is symmetric ICCG can be applied.

January 24, 2007, 10:01	Re: Ram, cache and cpu upgrade help	#14
zonexo Guest Posts: n/a	Hi rt, Are you using explicit treatment for both convection as well as diffusion? I'm using implicit treatment for both. Maybe that's also one of the reasons why my solver seems slow to you. I tried explicit treatment for convection but I've to lower my CFL no. Moreover, it's not as stable. However, I'll try to look at the recommendation you have given. I believe as you said, a structured solver will be faster.

January 24, 2007, 10:05	Re: Ram, cache and cpu upgrade help	#15
rt Guest Posts: n/a	my diff and conve both are treated explicitely (your Renold is high so viscus term has negligible effect and is (probably) stable under stability of convection term)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
stop when I run in parallel	Nolwenn	OpenFOAM	36	March 21, 2021 05:56
Superlinear speedup in OpenFOAM 13	msrinath80	OpenFOAM Running, Solving & CFD	18	March 3, 2015 06:36
OpenFOAM 13 Intel quadcore parallel results	msrinath80	OpenFOAM Running, Solving & CFD	13	February 5, 2008 06:26
OpenFOAM 13 AMD quadcore parallel results	msrinath80	OpenFOAM Running, Solving & CFD	1	November 11, 2007 00:23
CPU efficiency over 100% !!	X. Ye	Main CFD Forum	6	September 7, 1999 11:07