|
[Sponsors] |
April 20, 2009, 12:27 |
star 4.06 memory on linux cluster
|
#1 |
New Member
john mck
Join Date: Apr 2009
Posts: 3
Rep Power: 17 |
I'm trying to run Star 4.06 on a linux cluster with pbs, on 900,000 cells modelling incompressible transient flow. Each node of the cluster has two processors with 4 cores, and 8GB of shared memory. The model is partitioned using metis.
Each processor is an Intel(R) Xeon(R) CPU E5430 @ 2.66GHz Uname -a gives: Linux 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:14:47 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux The compiler is Absoft 9.0 EP 64 bit. My question is this: If I use 8 processes on each of 2 nodes, ie 16 processes in total, each process takes 860Mb virtual memory (mostly data and stack). If I use 8 processes on each of 4 nodes, ie 32 processes in total, each process takes 820Mb. How come if I use more processes each doesnt use proportionally less memory? I would have expected the total memory used to stay almost constant. As it stands I can't use many more cells before the nodes run out of memory, and start to swap. Any advice appreciated. My apologies if I've missed something obvious like an option. Regards John Mck. |
|
April 20, 2009, 12:49 |
|
#2 |
Senior Member
Aroon
Join Date: Apr 2009
Location: Racine WI
Posts: 148
Rep Power: 17 |
The memory used is not only related to the problem size. As you use more processors, the communication overhead between the processors increases. So you'll not have a linear decrease in the memory usage. At some point using more processors may result in slower performance due to the communication overhead
|
|
April 20, 2009, 16:17 |
yes a tradeoff - but not yet?
|
#3 |
New Member
john mck
Join Date: Apr 2009
Posts: 3
Rep Power: 17 |
Yes, I'd agree that eventually there is a trade off, when using more nodes adds a greater communications overhead then the computational benefit they bring.
But I didnt think I'd reached that point yet. I'm finding that I can't run 1,000,000 cells on two nodes each with 8Gb and 8 processors. Other threads indicate that I should be able to do this on a single processor with 2GB memory. Any ideas? Regards John |
|
April 20, 2009, 17:25 |
|
#4 |
Senior Member
Aroon
Join Date: Apr 2009
Location: Racine WI
Posts: 148
Rep Power: 17 |
That was my initial thought too. I use similar machines (my Linux machine) shows the same capabilities as your except for a different linux version. And I frequently run about 1 million size meshes on 1 processor.
|
|
April 20, 2009, 19:15 |
|
#5 |
Senior Member
Join Date: Apr 2009
Posts: 159
Rep Power: 17 |
johnmck,
Just out of curiosity, have you benchmarked your quad-cores? I was advised to go with dual-cores instead of quad-cores because of the inherent performance loss when using all 4 cores (which I confirmed on my head-node with Star-CCM+). What is your "speedup" going from 7 to 8 cores on one of your nodes? Thanks, f-w |
|
April 21, 2009, 03:48 |
|
#6 | |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Quote:
With the changes in memory access with the Nehalem cpus, the impact of the memory bottleneck should become less significant in the future ... it might even be better in the current generation of AMD cpus. |
||
April 21, 2009, 08:58 |
More Results
|
#7 |
New Member
john mck
Join Date: Apr 2009
Posts: 3
Rep Power: 17 |
I ran some more tests (mesh 96x99x96=912384 cells), and yes we are reaching a tradeoff:
Nodes x processes per node 1x1=1 (ie serial) uses 1870Mb/process 1x2=2 uses 1340 1x4=4 uses 1060 1x8=8 uses 940 2x4=8 uses 930 2x8=16 uses 860 4x8=32 uses 820 For our work using 32 licences the memory per process doesnt fall much below half the serial memory requirement. So for big jobs we'll have to only partly use nodes, in order to get enough memory on them. The memory overhead due to parallel working seems surprisingly high, to me at least. Many Thanks Regards John mck |
|
April 22, 2009, 14:02 |
|
#8 |
Member
Join Date: Mar 2009
Posts: 44
Rep Power: 17 |
Your model is too small to make your conclusion valid. By 32 cores you only have 28000 cells on each core (that's a very small number). At that size the overhead of all the "halo" cells (the cells that exist at boundaries between two domains) are just not going to decrease any further. If you run a much larger (like an order of magnitude) model, you will see the memory effect you are looking for.
|
|
Tags |
linux, memory, parallel, star, xeon |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Running on Distibuted Memory linux itanium cluster | Josh | FLUENT | 0 | January 29, 2007 01:18 |
HPC on a Linux cluster | Jihwan | Siemens | 2 | November 22, 2005 11:17 |
[Commercial meshers] Trimmed cell and embedded refinement mesh conversion issues | michele | OpenFOAM Meshing & Mesh Conversion | 2 | July 15, 2005 05:15 |
Linux Cluster Performance with a bi-processor PC | M. | FLUENT | 1 | April 22, 2005 10:25 |
Star and cluster under Linux | jens | Siemens | 1 | January 19, 2000 04:59 |